Exploring Semantic Understanding and Generative Modeling in Speech-Text Multimodal Data Fusion

[1] Yamamoto, M., Takamiya, A., Sawada, K., Yoshimura, M., Kitazawa, M., Liang, K. C., ... & Kishimoto, T. (2020). Using speech recognition technology to investigate the association between timing-related speech features and depression severity. PloS one, 15(9), e0238726. Search in Google Scholar

[2] Joseph, J., Moore, Z. E., Patton, D., O’Connor, T., & Nugent, L. E. (2020). The impact of implementing speech recognition technology on the accuracy and efficiency (time to complete) clinical documentation by nurses: A systematic review. Journal of clinical nursing, 29(13-14), 2125-2137. Search in Google Scholar

[3] Ok, M. W., Rao, K., Pennington, J., & Ulloa, P. R. (2022). Speech recognition technology for writing: usage patterns and perceptions of students with high incidence disabilities. Journal of Special Education Technology, 37(2), 191-202. Search in Google Scholar

[4] Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018. Search in Google Scholar

[5] Benkerzaz, S., Elmir, Y., & Dennai, A. (2019). A study on automatic speech recognition. Journal of Information Technology Review, 10(3), 77-85. Search in Google Scholar

[6] Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., ... & Almojil, M. (2021). Automatic speech recognition: Systematic literature review. Ieee Access, 9, 131858-131876. Search in Google Scholar

[7] Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2022). ‘Look, I can speak correctly’: learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning, 1-29. Search in Google Scholar

[8] Zulch, P., Distasio, M., Cushman, T., Wilson, B., Hart, B., & Blasch, E. (2019, March). Escape data collection for multi-modal data fusion research. In 2019 IEEE Aerospace Conference (pp. 1-10). IEEE. Search in Google Scholar

[9] Ding, S., Hu, S., Li, X., Zhang, Y., & Wu, D. D. (2021). Leveraging multimodal semantic fusion for gastric cancer screening via hierarchical attention mechanism. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7), 4286-4299. Search in Google Scholar

[10] Wang, F., Lin, S., Wu, H., Li, H., Wang, R., Luo, X., & He, X. (2019, July). SPFusionNet: Sketch segmentation using multi-modal data fusion. In 2019 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1654-1659). IEEE. Search in Google Scholar

[11] Calvo, A. F., Holguin, G. A., & Medeiros, H. (2019). Human activity recognition using multi-modal data fusion. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 23rd Iberoamerican Congress, CIARP 2018, Madrid, Spain, November 19-22, 2018, Proceedings 23 (pp. 946-953). Springer International Publishing. Search in Google Scholar

[12] Valada, A., Oliveira, G. L., Brox, T., & Burgard, W. (2017). Deep multispectral semantic scene understanding of forested environments using multimodal fusion. In 2016 International Symposium on Experimental Robotics (pp. 465-477). Springer International Publishing. Search in Google Scholar

[13] Yang, F., Ning, B., & Li, H. (2022, July). An overview of multimodal fusion learning. In International Conference on Mobile Computing, Applications, and Services (pp. 259-268). Cham: Springer Nature Switzerland. Search in Google Scholar

[14] Chen, W., Wang, W., Liu, L., & Lew, M. S. (2021). New ideas and trends in deep multimodal content understanding: A review. Neurocomputing, 426, 195-215. Search in Google Scholar

[15] Pustejovsky, J., & Krishnaswamy, N. (2022, June). Multimodal semantics for affordances and actions. In International Conference on Human-Computer Interaction (pp. 137-160). Cham: Springer International Publishing. Search in Google Scholar

[16] Zhang, Z., Chen, K., Wang, R., Utiyama, M., Sumita, E., Li, Z., & Zhao, H. (2023). Universal multimodal representation for language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 9169-9185. Search in Google Scholar

[17] Gao, J., Li, P., Chen, Z., & Zhang, J. (2020). A survey on deep learning for multimodal data fusion. Neural Computation, 32(5), 829-864. Search in Google Scholar

[18] Pawłowski, M., Wróblewska, A., & Sysko-Romańczuk, S. (2023). Effective techniques for multimodal data fusion: A comparative analysis. Sensors, 23(5), 2381. Search in Google Scholar

[19] Gandhi, A., Adhvaryu, K., Poria, S., Cambria, E., & Hussain, A. (2023). Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion, 91, 424-444. Search in Google Scholar

[20] Zhang, Y., Sidibé, D., Morel, O., & Mériaudeau, F. (2021). Deep multimodal fusion for semantic image segmentation: A survey. Image and Vision Computing, 105, 104042. Search in Google Scholar

[21] Krstev, I., Pavikjevikj, M., Toshevska, M., & Gievska, S. (2022, June). Multimodal data fusion for automatic detection of alzheimer’s disease. In International Conference on Human-Computer Interaction (pp. 79-94). Cham: Springer International Publishing. Search in Google Scholar

[22] Mu, S., Cui, M., & Huang, X. (2020). Multimodal data fusion in learning analytics: A systematic review. Sensors, 20(23), 6856. Search in Google Scholar

[23] Atmaja, B. T., Sasou, A., & Akagi, M. (2022). Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion. Speech Communication, 140, 11-28. Search in Google Scholar

[24] Palaskar, S., Salakhutdinov, R., Black, A. W., & Metze, F. (2021). Multimodal Speech Summarization Through Semantic Concept Learning. In Interspeech (pp. 791-795). Search in Google Scholar

[25] Wenkai Zhang,Wei Chen,Hongjing Pan,Alireza Sanaeifar,Yan Hu,Wanghong Shi... & Yong He. (2024). Rapid identification of the aging time of Liupao tea using AI-multimodal fusion sensing technology combined with analysis of tea polysaccharide conjugates. International journal of biological macromolecules134569. Search in Google Scholar

[26] Wei Wang,Jingwen Li,Jianwu Jiang,Bo Wang,Qingyang Wang,Ertao Gao & Tao Yue. (2024). Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology. Symmetry(1), Search in Google Scholar

[27] Rezende João Marcos de,Rodrigues Izabella Martins da Costa,Resendo Leandro Colombi & Komati Karin Satie. (2024). Combining natural language processing techniques and algorithms LSA, word2vec and WMD for technological forecasting and similarity analysis in patent documents. Technology Analysis & Strategic Management(8),1695-1716. Search in Google Scholar

[28] Zhewen Cui,Wei Guan & Xianku Zhang. (2024). USV formation navigation decision-making through hybrid deep reinforcement learning using self-attention mechanism. Expert Systems With Applications124906-124906. Search in Google Scholar

[29] Yanhua Shao,Jiajia Ning,Huicao Shao,Duo Zhang,Hongyu Chu & Zhenwen Ren. (2024). Lightweight face mask detection algorithm with attention mechanism. Engineering Applications of Artificial Intelligence(PA),109077-109077. Search in Google Scholar

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Exploring Semantic Understanding and Generative Modeling in Speech-Text Multimodal Data Fusion

Haitao Yu

Xuqiang Wang

Yifan Sun

Yifan Yang

Yan Sun

Publié en ligne: 11 nov. 2024

Reçu: 17 juin 2024

Accepté: 07 oct. 2024

DOI: https://doi.org/10.2478/amns-2024-3156

Mots clésSemantic understanding, Speech dataset, Multimodal data fusion, Attention mechanism, Word embedding

© 2024 Haitao Yu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
Semantic understanding, Speech dataset, Multimodal data fusion, Attention mechanism, Word embedding