Research and implementation of visual question and answer system based on deep learning

With the development and improvement of deep learning technology, its application and practice in modal data (image, speech, and text) has been achieved tremendously. In this paper, based on the neural network modality class model in deep learning, we analyze its adaptation to the visual question and answer system, propose a visual question and answer model based on the gated attention mechanism, and construct a question, and answer prediction mechanism adapted to the recurrent neural network transfer model. To address the problem of low accuracy of the model on complex problems, the inference network module is built using visual inference so that the model can extract complex problem features to improve the inference capability of the model. By predicting answers through semantic information about text and visual elements in images, correlations across modalities, and inference, advances in natural language processing and computational vision have led to improved answer accuracy in deep learning-based visual quiz models. Multiple sets of experiments show that models with deep learning inference capabilities answer complex questions with significantly higher accuracy than other existing methods.

eISSN:: 2444-8656
Język:: Angielski

Częstotliwość wydawania:: Volume Open
Dziedziny czasopisma:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Kanał RSS czasopisma

Research and implementation of visual question and answer system based on deep learning

Data publikacji: 02 cze 2023

Zakres stron: -

Otrzymano: 02 lip 2022

Przyjęty: 19 paź 2022

DOI: https://doi.org/10.2478/amns.2023.1.00182

Słowa kluczoweDeep learning, Modal data, Recurrent neural networks, Visual inference networks, Question-and-answer prediction models

© 2023 Kunming Wu, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Deep learning, Modal data, Recurrent neural networks, Visual inference networks, Question-and-answer prediction models