Robust speech recognition based on deep learning for sports game review

[1] Daga, N., Deole, P. Y., Chopdekar, S. (2021). Real time transcription and feed of voice messages based on user presence and preference. US20210306294A1. Search in Google Scholar

[2] Saleem, N., Gao, J., Khattak, M. I., Rauf, H. T., Kadry, S., & Shafi, M. (2022). Deepresgru: residual gated recurrent neural network-augmented kalman filtering for speech enhancement and recognition. Knowledge-Based Systems, 238, 107914. Search in Google Scholar

[3] Wang, Z., Wang, H., Yu, H., et al. (2021). Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System. IEEE transactions on human-machine systems, 51-5. Search in Google Scholar

[4] Lin, Y., Wu, Y. K., Guo, D., et al. (2021). A Deep Learning Framework of Autonomous Pilot Agent for Air Traffic Controller Training. IEEE transactions on human-machine systems, 51-5. Search in Google Scholar

[5] Yamauchi, A., Imagawa, H., Yokonishi, H., et al. (2022). Gender- and Age- Stratified Normative Voice Data in Japanese-Speaking Subjects: Analysis of Sustained Habitual Phonations. Search in Google Scholar

[6] Xie, Q., Kim, Y., Wang, Y., et al. (2014). Principles and Efficient Implementation of Charge Replacement in Hybrid Electrical Energy Storage Systems. IEEE Transactions on Power Electronics, 29-11. Search in Google Scholar

[7] Schimmels, J. E. (2020). Update on ART (Accelerated Resolution Therapy) in the Military and Beyond. Journal of the American Psychiatric Nurses Association, (4)26. Search in Google Scholar

[8] Hasan, R., Shams, R., Rahman, M., et al. (2021). Consumer trust and perceived risk for voice-controlled artificial intelligence: The case of Siri. Search in Google Scholar

[9] Choi, W. Y., Lee, S. H., Chung C. C. (2022). Horizonwise Model-Predictive Control With Application to Autonomous Driving Vehicle. IEEE transactions on industrial informatics, 18-10. Search in Google Scholar

[10] Wang, Z., Wang, H., Yu, H., et al. (2021). Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System. IEEE transactions on human-machine systems, 51-5. Search in Google Scholar

[11] Chen, J., Wang, Y., Yoho, S. E., Wang, D., & Healy, E. W. (2016). Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. The Journal of the Acoustical Society of America, 139(5), 2604-2612. Search in Google Scholar

[12] Mimura, M., Sakai, S., Kawahara, T. (2016). Joint optimization of denoising autoencoder and DNN acoustic model based on multi-target learning for noisy speech recognition. Proceedings of the 17th Annual Conference of the International Speech Communication Association, 3803-3807. Search in Google Scholar

[13] Wang, Z. Q., & Wang, D. (2016). A joint training framework for robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 796-806. Search in Google Scholar

[14] Ravanelli, M., Brakel, P., Omologo, M., et al. (2017). A network of deep neural networks for distant speech recognition. Proceedings of the 42th IEEE International Conference on Acoustics, Speech and Signal Processing, 4880-4884. Search in Google Scholar

[15] Huang, P. S, Kim, M., Hasegawa-Johnson, M., et al. (2014). Deep learning for monaural speech separation. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1562-1566. Search in Google Scholar

[16] Huang, P. S., Kim, M., Hasegawa-Johnson, M., et al. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2136-2147. Search in Google Scholar

[17] Geiger, J. T., Weninger, F., Gemmeke, J. F., et al. (2014). Memory-enhanced neural networks and NMF for robust ASR. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(6), 1037-1046. Search in Google Scholar

[18] Chan, W., Jaitly, N., Le, Q., et al. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. Proceedings of the 2016 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 4960–4964. Search in Google Scholar

[19] Zhang, Z.,Geiger, J., Pohjalainen, J., et al. (2018). Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology, 9(5), 49:1-49:28. Search in Google Scholar

[20] Gupta, S., Nguyen, D., Rana, S., et al. (2022). Verification of integrity of deployed deep learning models using Bayesian Optimization. Knowledge-based systems, 241-Apr.6. Search in Google Scholar

[21] Kang, S., Han, D., Lee, J., et al. (2021). GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation. IEEE Journal of Solid-State Circuits, 56-9. Search in Google Scholar

[22] Hormaechea-Agulla, D., Matatall, K.. A., L,e D. T., et al. (2021). Article Chronic infection drives Dnmt3a-loss-of-function clonal hematopoiesis via IFN gamma signaling. Cell stem cell, 28-8. Search in Google Scholar

[23] Hk, A., Ja, B., Mk, C. (2022). An Improved Method for Text Detection using Adam Optimization Algorithm. Global Transitions Proceedings, 23-8, 112-145. Search in Google Scholar

[24] BaiI, C. T., Gao, Z. Q., Li A., et al. (2021). Research on speech recognition of military equipment control based on gateway network. Journal of Computer Engineering, 47(7), 301-306. Search in Google Scholar

[25] Zhao, X., Shao, Y., Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, IEEE, 20(5), 1608–1616. Search in Google Scholar

[26] Dauphin, Y. N., Fan A., Auli M., et al. (2017). Language modeling with gated convolutional networks. Proceedings of the 2017 International conference on machine learning. PMLR, 933–941. Search in Google Scholar

[27] Ravanelli, M., Zhong, J., Pascual, S., et al. (2020). Multi-task self -supervised learning for robust speech recognition. Proceedings of the 2020 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 6989–6993. Search in Google Scholar

[28] He, K, Zhang, X., Ren, S., et al. (2016). Deep residual learning for image recognition. Proceedings of the 2016 International Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 770–778. Search in Google Scholar

[29] Bu,, H., Du J., Na, X., et al. (2017). Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline. Proceedings of the 2017 Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA). Piscataway: IEEE, 1–5. Search in Google Scholar

[30] Kim,, S., Hori, T., Watanabe S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. Proceedings of the 2017 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 4835–4839. Search in Google Scholar

[31] Ravi, M. (2020). Distribution of a codeword across individual storage units to reduce the bit error rate:, EP3699762A1, 135–143. Search in Google Scholar

[32] Sabir, Z., Raja, M. A. Z., Guirao, J. L. G., et al. (2021). A novel design of fractional Meyer wavelet neural networks with application to the nonlinear singular fractional Lane-Emden systems. Alexandria Engineering Journal, 60(2), 2641-2659. Search in Google Scholar

eISSN:: 2444-8656
Langue:: Anglais

Périodicité:: Volume Open
Sujets de la revue:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

RSS Feed de la revue

Robust speech recognition based on deep learning for sports game review

Publié en ligne: 28 avr. 2023

Pages: -

Reçu: 06 avr. 2022

Accepté: 07 sept. 2022

DOI: https://doi.org/10.2478/amns.2023.1.00075

Mots clésDeep learning model, Generative adversarial network GAN algorithm, Robust speech recognition, Speech feature vector, Loss function

© 2023 Min Liu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
Deep learning model, Generative adversarial network GAN algorithm, Robust speech recognition, Speech feature vector, Loss function