Research and implementation of visual question and answer system based on deep learning

[1] Jin, Z., Zhang, Y., Wu, F. et al. (2022). An algorithmic model of artificial intelligence under the combination of data-driven and knowledge-guided. Journal of Electronics and Information: http://kns.cnki.net/kcms/detail/11.4494.TN.20220901.1012.006.html Search in Google Scholar

[2] Stanislaw, A., Aishwarya, A., Jiasen L., et al. (2015). VQA: Visual Question Answering. In: Proceedings of the International Conference on Computer Vision. Santiago, Chile: IEEE, 2425-2433. Search in Google Scholar

[3] Xinlei, C., Hao, F., Tsung-Yi, L., et al. (2015). Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv preprint arXiv: 1504.00325. Search in Google Scholar

[4] Wang, Y., Zhu, M., Xu, C. et al. (2022). Visual quizzing using image description and knowledge graph enhanced representation. Journal of Tsinghua University (Natural Science Edition), 62(05), 900-907. https://doi.org/10.16511/j.cnki.qhdxxb.2022.21.010 Search in Google Scholar

[5] Huang, T. W., Yang, Y. L., Yang, X. (2021). A review of deep learning-based visual question and answer research (in English). Journal of Central South University, 28(03), 728-746. Search in Google Scholar

[6] Zhang, B., Li, L., Cha Z. et al. (2022). An active learning method for visual question and answer based on cross-modal contrast learning. Journal of Computer Science, 45(08), 1730-1745. Search in Google Scholar

[7] Malinowski, M., Fritz, M. (2014). A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In: Proceedings of the 28th Conference and Workshop on Neural Information Processing Systems. Montreal, Quebec, Canada: NIPS, 1682-1690.] Search in Google Scholar

[8] Du, P. F., Li, S. Y., Gao, Y. L. (2021). A review of research on multimodal visual language representation learning. Journal of Software, 32(02), 327-348. https://doi.org/10.13328/j.cnki.jos.006125 Search in Google Scholar

[9] Wang, Y., Zhuo, Y., Wu, Y. et al. (2018). Question and answer algorithm for image fragmentation information based on deep neural network.Computer Research and Development, 55(12), 2600-2610. Search in Google Scholar

[10] Kumar, A., Irsoy, O., Ondruska, P. et al. (2015). Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In: Proceedings of the 33rd International Conference on Machine Learning. Lille, France: ICML, 1378-1387.] Search in Google Scholar

[11] Wang, H. J, Zhong, Z. F., Zhang, M. (2005). Research and simulation of an algorithm to determine the area where military RAUs are located. Systems Engineering and Electronics Technology, 04, 715-717+743. Search in Google Scholar

[12] Andreas, J., Rohrbach, M., Darrell, T. et al. (2015). Deep Compositional Question Answering with Neural Module Networks. arXiv preprint arXiv: 1511.02799. Search in Google Scholar

[13] Andreas, J., Rohrbach, M., Darrell, T. et al. (2016). Learning to Compose Neural Networks for Question Answering. in: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: HLT-NAACL, 1545-1554. Search in Google Scholar

[14] Hyeonwoo, No., Bohyung, H. (2016). Training Recurrent Answering Units with Joint Loss Minimization for VQA. arXiv preprint arXiv: 1606.03647. Search in Google Scholar

[15] Wang, Y. Q., Wu, F., Wang, C. H. Y., et al. (2019). A new dynamic memory network for visual question and answer. Computer Applications Research, 37(10), 1-5. Search in Google Scholar

[16] Yu, J., Wang, L., Yu, Z. (2018). Research on visual question and answer technology. Computer Research and Development, 55(09), 1946-1958. Search in Google Scholar

[17] Zhou, S., Chen, Z., Yinpeng, D., et al. (2018). Learning Visual Knowledge Memory Networks for Visual Question Answering. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 7736-7745. Search in Google Scholar

[18] Kim, J. H., Jun, J., Zhang, B. T. (2018). Bilinear attention networks. Advances in Neural Information Processing Systems, 31. Search in Google Scholar

[19] Nguyen, D. K., Okatani, T. (2019). Multi-task learning of hierarchical vision-languagere presentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10492-10501. Search in Google Scholar

[20] Guo, J., Hu, G., Xu, W., et al. (2017). Hierarchical content importance-based video quality assessment for HEVC encoded videos transmitted over LTE networks. Journal of Visual Communication and Image Representation, 43, 50-60. Search in Google Scholar

[21] Wang, X., Chen, Qiao-hong, Sun, Q., et al. (2022). A visual question-and-answer approach based on relational reasoning and gating mechanism. Journal of Zhejiang University (Engineering Edition), 56(01), 36-46. Search in Google Scholar

[22] Nguyen, T. V., Zhao, Q., Yan, S. (2018). Attentive systems: a survey. International Journalof Computer Vision, 126(1), 86-110. Search in Google Scholar

[23] Wang, Q. D., Cheng, K. (2022). Monocular image depth estimation based on dense connectivity. Journal of Huazhong University of Science and Technology (Natural Science Edition):1-8 DOI: 10.13245/j.hust.229472. Search in Google Scholar

[24] Lianhui, L., Jun, L., Shaoquan, Z. (2021). Hyper spectral image classification method based on 3D Octave convolution and Bi-RNN attention network. Journal of Photonics, 50(09), 284-296. Search in Google Scholar

[25] Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. neural Computation, 9(8), 1735-1780. Search in Google Scholar

[26] Pan, X. D., Zhang, Q., Yang, M. (2022). Deep learning training data leakage induction based on neuronal activation pattern control. Computer Research and Development:1-15. http://kns.cnki.net/kcms/detail/11.1777.TP.20220831.1228.014.html Search in Google Scholar

[27] Wang, T-Y., Chen, H., Wang, G,. et al. (2022). An EEG sleep staging model using wavelet transform and bidirectional long- and short-term memory network. Journal of Xi'an Jiaotong University, 09, 1-8. http://kns.cnki.net/kcms/detail/61.1069.T.20220606.1545.002.html Search in Google Scholar

[28] Monk, D. W., Lv, F., et al. (2021). Traffic flow prediction in irregular areas based on multi-map convolutional networks and gated cyclic units (in English). Frontiers of Information Technology & Electronic Engineering, 22(09), 1179-1194. Search in Google Scholar

[29] Zhang, Y., Gao, X., He, L., et al. (2020). Objective video quality assessment combiningtransfer learning with CNN. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2716-2730. Search in Google Scholar

eISSN:: 2444-8656
Langue:: Anglais

Périodicité:: Volume Open
Sujets de la revue:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

RSS Feed de la revue

Research and implementation of visual question and answer system based on deep learning

Publié en ligne: 02 juin 2023

Pages: -

Reçu: 02 juil. 2022

Accepté: 19 oct. 2022

DOI: https://doi.org/10.2478/amns.2023.1.00182

Mots clésDeep learning, Modal data, Recurrent neural networks, Visual inference networks, Question-and-answer prediction models

© 2023 Kunming Wu, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
Deep learning, Modal data, Recurrent neural networks, Visual inference networks, Question-and-answer prediction models