Accès libre

Exploring Semantic Understanding and Generative Modeling in Speech-Text Multimodal Data Fusion

, , ,  et   
11 nov. 2024
À propos de cet article

Citez
Télécharger la couverture

Yamamoto, M., Takamiya, A., Sawada, K., Yoshimura, M., Kitazawa, M., Liang, K. C., ... & Kishimoto, T. (2020). Using speech recognition technology to investigate the association between timing-related speech features and depression severity. PloS one, 15(9), e0238726. Search in Google Scholar

Joseph, J., Moore, Z. E., Patton, D., O’Connor, T., & Nugent, L. E. (2020). The impact of implementing speech recognition technology on the accuracy and efficiency (time to complete) clinical documentation by nurses: A systematic review. Journal of clinical nursing, 29(13-14), 2125-2137. Search in Google Scholar

Ok, M. W., Rao, K., Pennington, J., & Ulloa, P. R. (2022). Speech recognition technology for writing: usage patterns and perceptions of students with high incidence disabilities. Journal of Special Education Technology, 37(2), 191-202. Search in Google Scholar

Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018. Search in Google Scholar

Benkerzaz, S., Elmir, Y., & Dennai, A. (2019). A study on automatic speech recognition. Journal of Information Technology Review, 10(3), 77-85. Search in Google Scholar

Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., ... & Almojil, M. (2021). Automatic speech recognition: Systematic literature review. Ieee Access, 9, 131858-131876. Search in Google Scholar

Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2022). ‘Look, I can speak correctly’: learning vocabulary and pronunciation through websites equipped with automatic speech recognition technology. Computer Assisted Language Learning, 1-29. Search in Google Scholar

Zulch, P., Distasio, M., Cushman, T., Wilson, B., Hart, B., & Blasch, E. (2019, March). Escape data collection for multi-modal data fusion research. In 2019 IEEE Aerospace Conference (pp. 1-10). IEEE. Search in Google Scholar

Ding, S., Hu, S., Li, X., Zhang, Y., & Wu, D. D. (2021). Leveraging multimodal semantic fusion for gastric cancer screening via hierarchical attention mechanism. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(7), 4286-4299. Search in Google Scholar

Wang, F., Lin, S., Wu, H., Li, H., Wang, R., Luo, X., & He, X. (2019, July). SPFusionNet: Sketch segmentation using multi-modal data fusion. In 2019 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1654-1659). IEEE. Search in Google Scholar

Calvo, A. F., Holguin, G. A., & Medeiros, H. (2019). Human activity recognition using multi-modal data fusion. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 23rd Iberoamerican Congress, CIARP 2018, Madrid, Spain, November 19-22, 2018, Proceedings 23 (pp. 946-953). Springer International Publishing. Search in Google Scholar

Valada, A., Oliveira, G. L., Brox, T., & Burgard, W. (2017). Deep multispectral semantic scene understanding of forested environments using multimodal fusion. In 2016 International Symposium on Experimental Robotics (pp. 465-477). Springer International Publishing. Search in Google Scholar

Yang, F., Ning, B., & Li, H. (2022, July). An overview of multimodal fusion learning. In International Conference on Mobile Computing, Applications, and Services (pp. 259-268). Cham: Springer Nature Switzerland. Search in Google Scholar

Chen, W., Wang, W., Liu, L., & Lew, M. S. (2021). New ideas and trends in deep multimodal content understanding: A review. Neurocomputing, 426, 195-215. Search in Google Scholar

Pustejovsky, J., & Krishnaswamy, N. (2022, June). Multimodal semantics for affordances and actions. In International Conference on Human-Computer Interaction (pp. 137-160). Cham: Springer International Publishing. Search in Google Scholar

Zhang, Z., Chen, K., Wang, R., Utiyama, M., Sumita, E., Li, Z., & Zhao, H. (2023). Universal multimodal representation for language understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 9169-9185. Search in Google Scholar

Gao, J., Li, P., Chen, Z., & Zhang, J. (2020). A survey on deep learning for multimodal data fusion. Neural Computation, 32(5), 829-864. Search in Google Scholar

Pawłowski, M., Wróblewska, A., & Sysko-Romańczuk, S. (2023). Effective techniques for multimodal data fusion: A comparative analysis. Sensors, 23(5), 2381. Search in Google Scholar

Gandhi, A., Adhvaryu, K., Poria, S., Cambria, E., & Hussain, A. (2023). Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion, 91, 424-444. Search in Google Scholar

Zhang, Y., Sidibé, D., Morel, O., & Mériaudeau, F. (2021). Deep multimodal fusion for semantic image segmentation: A survey. Image and Vision Computing, 105, 104042. Search in Google Scholar

Krstev, I., Pavikjevikj, M., Toshevska, M., & Gievska, S. (2022, June). Multimodal data fusion for automatic detection of alzheimer’s disease. In International Conference on Human-Computer Interaction (pp. 79-94). Cham: Springer International Publishing. Search in Google Scholar

Mu, S., Cui, M., & Huang, X. (2020). Multimodal data fusion in learning analytics: A systematic review. Sensors, 20(23), 6856. Search in Google Scholar

Atmaja, B. T., Sasou, A., & Akagi, M. (2022). Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion. Speech Communication, 140, 11-28. Search in Google Scholar

Palaskar, S., Salakhutdinov, R., Black, A. W., & Metze, F. (2021). Multimodal Speech Summarization Through Semantic Concept Learning. In Interspeech (pp. 791-795). Search in Google Scholar

Wenkai Zhang,Wei Chen,Hongjing Pan,Alireza Sanaeifar,Yan Hu,Wanghong Shi... & Yong He. (2024). Rapid identification of the aging time of Liupao tea using AI-multimodal fusion sensing technology combined with analysis of tea polysaccharide conjugates. International journal of biological macromolecules134569. Search in Google Scholar

Wei Wang,Jingwen Li,Jianwu Jiang,Bo Wang,Qingyang Wang,Ertao Gao & Tao Yue. (2024). Autonomous Data Association and Intelligent Information Discovery Based on Multimodal Fusion Technology. Symmetry(1), Search in Google Scholar

Rezende João Marcos de,Rodrigues Izabella Martins da Costa,Resendo Leandro Colombi & Komati Karin Satie. (2024). Combining natural language processing techniques and algorithms LSA, word2vec and WMD for technological forecasting and similarity analysis in patent documents. Technology Analysis & Strategic Management(8),1695-1716. Search in Google Scholar

Zhewen Cui,Wei Guan & Xianku Zhang. (2024). USV formation navigation decision-making through hybrid deep reinforcement learning using self-attention mechanism. Expert Systems With Applications124906-124906. Search in Google Scholar

Yanhua Shao,Jiajia Ning,Huicao Shao,Duo Zhang,Hongyu Chu & Zhenwen Ren. (2024). Lightweight face mask detection algorithm with attention mechanism. Engineering Applications of Artificial Intelligence(PA),109077-109077. Search in Google Scholar