Acceso abierto

Research and implementation of visual question and answer system based on deep learning

   | 02 jun 2023

Cite

Jin, Z., Zhang, Y., Wu, F. et al. (2022). An algorithmic model of artificial intelligence under the combination of data-driven and knowledge-guided. Journal of Electronics and Information: http://kns.cnki.net/kcms/detail/11.4494.TN.20220901.1012.006.html Search in Google Scholar

Stanislaw, A., Aishwarya, A., Jiasen L., et al. (2015). VQA: Visual Question Answering. In: Proceedings of the International Conference on Computer Vision. Santiago, Chile: IEEE, 2425-2433. Search in Google Scholar

Xinlei, C., Hao, F., Tsung-Yi, L., et al. (2015). Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv preprint arXiv: 1504.00325. Search in Google Scholar

Wang, Y., Zhu, M., Xu, C. et al. (2022). Visual quizzing using image description and knowledge graph enhanced representation. Journal of Tsinghua University (Natural Science Edition), 62(05), 900-907. https://doi.org/10.16511/j.cnki.qhdxxb.2022.21.010 Search in Google Scholar

Huang, T. W., Yang, Y. L., Yang, X. (2021). A review of deep learning-based visual question and answer research (in English). Journal of Central South University, 28(03), 728-746. Search in Google Scholar

Zhang, B., Li, L., Cha Z. et al. (2022). An active learning method for visual question and answer based on cross-modal contrast learning. Journal of Computer Science, 45(08), 1730-1745. Search in Google Scholar

Malinowski, M., Fritz, M. (2014). A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In: Proceedings of the 28th Conference and Workshop on Neural Information Processing Systems. Montreal, Quebec, Canada: NIPS, 1682-1690.] Search in Google Scholar

Du, P. F., Li, S. Y., Gao, Y. L. (2021). A review of research on multimodal visual language representation learning. Journal of Software, 32(02), 327-348. https://doi.org/10.13328/j.cnki.jos.006125 Search in Google Scholar

Wang, Y., Zhuo, Y., Wu, Y. et al. (2018). Question and answer algorithm for image fragmentation information based on deep neural network.Computer Research and Development, 55(12), 2600-2610. Search in Google Scholar

Kumar, A., Irsoy, O., Ondruska, P. et al. (2015). Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In: Proceedings of the 33rd International Conference on Machine Learning. Lille, France: ICML, 1378-1387.] Search in Google Scholar

Wang, H. J, Zhong, Z. F., Zhang, M. (2005). Research and simulation of an algorithm to determine the area where military RAUs are located. Systems Engineering and Electronics Technology, 04, 715-717+743. Search in Google Scholar

Andreas, J., Rohrbach, M., Darrell, T. et al. (2015). Deep Compositional Question Answering with Neural Module Networks. arXiv preprint arXiv: 1511.02799. Search in Google Scholar

Andreas, J., Rohrbach, M., Darrell, T. et al. (2016). Learning to Compose Neural Networks for Question Answering. in: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: HLT-NAACL, 1545-1554. Search in Google Scholar

Hyeonwoo, No., Bohyung, H. (2016). Training Recurrent Answering Units with Joint Loss Minimization for VQA. arXiv preprint arXiv: 1606.03647. Search in Google Scholar

Wang, Y. Q., Wu, F., Wang, C. H. Y., et al. (2019). A new dynamic memory network for visual question and answer. Computer Applications Research, 37(10), 1-5. Search in Google Scholar

Yu, J., Wang, L., Yu, Z. (2018). Research on visual question and answer technology. Computer Research and Development, 55(09), 1946-1958. Search in Google Scholar

Zhou, S., Chen, Z., Yinpeng, D., et al. (2018). Learning Visual Knowledge Memory Networks for Visual Question Answering. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 7736-7745. Search in Google Scholar

Kim, J. H., Jun, J., Zhang, B. T. (2018). Bilinear attention networks. Advances in Neural Information Processing Systems, 31. Search in Google Scholar

Nguyen, D. K., Okatani, T. (2019). Multi-task learning of hierarchical vision-languagere presentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10492-10501. Search in Google Scholar

Guo, J., Hu, G., Xu, W., et al. (2017). Hierarchical content importance-based video quality assessment for HEVC encoded videos transmitted over LTE networks. Journal of Visual Communication and Image Representation, 43, 50-60. Search in Google Scholar

Wang, X., Chen, Qiao-hong, Sun, Q., et al. (2022). A visual question-and-answer approach based on relational reasoning and gating mechanism. Journal of Zhejiang University (Engineering Edition), 56(01), 36-46. Search in Google Scholar

Nguyen, T. V., Zhao, Q., Yan, S. (2018). Attentive systems: a survey. International Journalof Computer Vision, 126(1), 86-110. Search in Google Scholar

Wang, Q. D., Cheng, K. (2022). Monocular image depth estimation based on dense connectivity. Journal of Huazhong University of Science and Technology (Natural Science Edition):1-8 DOI: 10.13245/j.hust.229472. Search in Google Scholar

Lianhui, L., Jun, L., Shaoquan, Z. (2021). Hyper spectral image classification method based on 3D Octave convolution and Bi-RNN attention network. Journal of Photonics, 50(09), 284-296. Search in Google Scholar

Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. neural Computation, 9(8), 1735-1780. Search in Google Scholar

Pan, X. D., Zhang, Q., Yang, M. (2022). Deep learning training data leakage induction based on neuronal activation pattern control. Computer Research and Development:1-15. http://kns.cnki.net/kcms/detail/11.1777.TP.20220831.1228.014.html Search in Google Scholar

Wang, T-Y., Chen, H., Wang, G,. et al. (2022). An EEG sleep staging model using wavelet transform and bidirectional long- and short-term memory network. Journal of Xi'an Jiaotong University, 09, 1-8. http://kns.cnki.net/kcms/detail/61.1069.T.20220606.1545.002.html Search in Google Scholar

Monk, D. W., Lv, F., et al. (2021). Traffic flow prediction in irregular areas based on multi-map convolutional networks and gated cyclic units (in English). Frontiers of Information Technology & Electronic Engineering, 22(09), 1179-1194. Search in Google Scholar

Zhang, Y., Gao, X., He, L., et al. (2020). Objective video quality assessment combiningtransfer learning with CNN. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2716-2730. Search in Google Scholar

eISSN:
2444-8656
Idioma:
Inglés
Calendario de la edición:
Volume Open
Temas de la revista:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics