This work is licensed under the Creative Commons Attribution 4.0 International License.
Jin, Z., Zhang, Y., Wu, F. et al. (2022). An algorithmic model of artificial intelligence under the combination of data-driven and knowledge-guided. Journal of Electronics and Information: http://kns.cnki.net/kcms/detail/11.4494.TN.20220901.1012.006.htmlSearch in Google Scholar
Stanislaw, A., Aishwarya, A., Jiasen L., et al. (2015). VQA: Visual Question Answering. In: Proceedings of the International Conference on Computer Vision. Santiago, Chile: IEEE, 2425-2433.Search in Google Scholar
Xinlei, C., Hao, F., Tsung-Yi, L., et al. (2015). Microsoft COCO Captions: Data Collection and Evaluation Server. arXiv preprint arXiv: 1504.00325.Search in Google Scholar
Wang, Y., Zhu, M., Xu, C. et al. (2022). Visual quizzing using image description and knowledge graph enhanced representation. Journal of Tsinghua University (Natural Science Edition), 62(05), 900-907. https://doi.org/10.16511/j.cnki.qhdxxb.2022.21.010Search in Google Scholar
Huang, T. W., Yang, Y. L., Yang, X. (2021). A review of deep learning-based visual question and answer research (in English). Journal of Central South University, 28(03), 728-746.Search in Google Scholar
Zhang, B., Li, L., Cha Z. et al. (2022). An active learning method for visual question and answer based on cross-modal contrast learning. Journal of Computer Science, 45(08), 1730-1745.Search in Google Scholar
Malinowski, M., Fritz, M. (2014). A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In: Proceedings of the 28th Conference and Workshop on Neural Information Processing Systems. Montreal, Quebec, Canada: NIPS, 1682-1690.]Search in Google Scholar
Du, P. F., Li, S. Y., Gao, Y. L. (2021). A review of research on multimodal visual language representation learning. Journal of Software, 32(02), 327-348. https://doi.org/10.13328/j.cnki.jos.006125Search in Google Scholar
Wang, Y., Zhuo, Y., Wu, Y. et al. (2018). Question and answer algorithm for image fragmentation information based on deep neural network.Computer Research and Development, 55(12), 2600-2610.Search in Google Scholar
Kumar, A., Irsoy, O., Ondruska, P. et al. (2015). Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In: Proceedings of the 33rd International Conference on Machine Learning. Lille, France: ICML, 1378-1387.]Search in Google Scholar
Wang, H. J, Zhong, Z. F., Zhang, M. (2005). Research and simulation of an algorithm to determine the area where military RAUs are located. Systems Engineering and Electronics Technology, 04, 715-717+743.Search in Google Scholar
Andreas, J., Rohrbach, M., Darrell, T. et al. (2015). Deep Compositional Question Answering with Neural Module Networks. arXiv preprint arXiv: 1511.02799.Search in Google Scholar
Andreas, J., Rohrbach, M., Darrell, T. et al. (2016). Learning to Compose Neural Networks for Question Answering. in: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: HLT-NAACL, 1545-1554.Search in Google Scholar
Hyeonwoo, No., Bohyung, H. (2016). Training Recurrent Answering Units with Joint Loss Minimization for VQA. arXiv preprint arXiv: 1606.03647.Search in Google Scholar
Wang, Y. Q., Wu, F., Wang, C. H. Y., et al. (2019). A new dynamic memory network for visual question and answer. Computer Applications Research, 37(10), 1-5.Search in Google Scholar
Yu, J., Wang, L., Yu, Z. (2018). Research on visual question and answer technology. Computer Research and Development, 55(09), 1946-1958.Search in Google Scholar
Zhou, S., Chen, Z., Yinpeng, D., et al. (2018). Learning Visual Knowledge Memory Networks for Visual Question Answering. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 7736-7745.Search in Google Scholar
Kim, J. H., Jun, J., Zhang, B. T. (2018). Bilinear attention networks. Advances in Neural Information Processing Systems, 31.Search in Google Scholar
Nguyen, D. K., Okatani, T. (2019). Multi-task learning of hierarchical vision-languagere presentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10492-10501.Search in Google Scholar
Guo, J., Hu, G., Xu, W., et al. (2017). Hierarchical content importance-based video quality assessment for HEVC encoded videos transmitted over LTE networks. Journal of Visual Communication and Image Representation, 43, 50-60.Search in Google Scholar
Wang, X., Chen, Qiao-hong, Sun, Q., et al. (2022). A visual question-and-answer approach based on relational reasoning and gating mechanism. Journal of Zhejiang University (Engineering Edition), 56(01), 36-46.Search in Google Scholar
Nguyen, T. V., Zhao, Q., Yan, S. (2018). Attentive systems: a survey. International Journalof Computer Vision, 126(1), 86-110.Search in Google Scholar
Wang, Q. D., Cheng, K. (2022). Monocular image depth estimation based on dense connectivity. Journal of Huazhong University of Science and Technology (Natural Science Edition):1-8 DOI: 10.13245/j.hust.229472.Search in Google Scholar
Lianhui, L., Jun, L., Shaoquan, Z. (2021). Hyper spectral image classification method based on 3D Octave convolution and Bi-RNN attention network. Journal of Photonics, 50(09), 284-296.Search in Google Scholar
Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. neural Computation, 9(8), 1735-1780.Search in Google Scholar
Pan, X. D., Zhang, Q., Yang, M. (2022). Deep learning training data leakage induction based on neuronal activation pattern control. Computer Research and Development:1-15. http://kns.cnki.net/kcms/detail/11.1777.TP.20220831.1228.014.htmlSearch in Google Scholar
Wang, T-Y., Chen, H., Wang, G,. et al. (2022). An EEG sleep staging model using wavelet transform and bidirectional long- and short-term memory network. Journal of Xi'an Jiaotong University, 09, 1-8. http://kns.cnki.net/kcms/detail/61.1069.T.20220606.1545.002.htmlSearch in Google Scholar
Monk, D. W., Lv, F., et al. (2021). Traffic flow prediction in irregular areas based on multi-map convolutional networks and gated cyclic units (in English). Frontiers of Information Technology & Electronic Engineering, 22(09), 1179-1194.Search in Google Scholar
Zhang, Y., Gao, X., He, L., et al. (2020). Objective video quality assessment combiningtransfer learning with CNN. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2716-2730.Search in Google Scholar