Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

Kang. G Shin and Parameswaran Ramanathan. (2020). Real-Time Computing: A New Discipline of Computer Science and Engineering, Proceedings. IEEE, vol. 82, no. 1, pp. 6-24, 1994.Search in Google Scholar

Alec Radford, Jong Wook Kim, Tao Xu , Greg Brockman, Christine McLeavey, Ilya Sutskever. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv:2212.04356Search in Google Scholar

Likhomanenko, T., Xu, Q., Pratap, V., Tomasello, P., Kahn, J., Avidov, G., Collobert, R., and Synnaeve, G. (2020). Rethinking evaluation in asr: Are our models robust enough? arXiv preprint arXiv:2010.11745Search in Google Scholar

Baevski, A., Zhou, H., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv:2006.11477.Search in Google Scholar

Chan, W., Park, D., Lee, C., Zhang, Y., Le, Q., and Norouzi, M. SpeechStew. (2021). Simply mix all available speech recognition data to train one large neural network. arXiv preprint arXiv:2104.02133.Search in Google Scholar

Zhang, Y., Park, D. S., Han, W., Qin, J., Gulati, A., Shor, J., Jansen, A., Xu, Y., Huang, Y., Wang, S., et al. (2021). BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv:2109.13226.Search in Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008.Search in Google Scholar

Valk, J. and Aluma ̈e, T. (2021) Voxlingua107: a dataset for spoken language recognition. In 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 652–658. IEEE.Search in Google Scholar

Sanchit Gandhi, Patrick von Platen & Alexander M. Rush. (2017). Distil-Whisper: Robust knowledge distillation via large-scale pseudo labelling, arXiv:2311.00430Search in Google Scholar

Nicolas Patry. (2022) Making automatic speech recognition work on large files with Wav2Vec2 in Transformers. https://huggingface.co/blog/asr-chunking. Accessed: 25 Nov.,Search in Google Scholar

H. Nanjo and T. Kawahara (2005) A new ASR evaluation measure and minimum Bayes-risk decoding for open-domain speech understanding 2024.Search in Google Scholar

Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., and Song, D. (2020). Pretrained transformers improve out-of-distribution robustness. arXiv preprint arXiv:2004.06100.Search in Google Scholar

Research Institute for Artificial Intelligence “Mihai Drăgănescu”, Romanian Academy Web, Romanian datasets, http://www.racai.roSearch in Google Scholar

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Wirtschaftswissenschaften, Volkswirtschaft, Volkswirtschaft, andere, Betriebswirtschaft, Betriebswirtschaft, andere, Industrielle Chemie, Energiegewinnung und Umwandlung

Zeitschrift RSS Feed

Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

Robert Cristian Necula

Pavel-Cristian Craciun

Online veröffentlicht: 24. Juli 2025

Seitenbereich: 1282 - 1293

DOI: https://doi.org/10.2478/picbe-2025-0101

Schlüsselwörterreal time, automatic speech recognition (ASR), whispers, multi-language, knowledge distillation, natural language processing

© 2025 Robert Cristian Necula et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
real time, automatic speech recognition (ASR), whispers, multi-language, knowledge distillation, natural language processing