Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

In the context of huge volume of data generated in the last couple of years and the need of processing audio stream in real time data processing flows, this paper aims to explore, identify and experiment Automatic Speech Recognition (ASR) models in the context of low latency applications depending on different contexts using specific data sets. Our work aims to explore whether if such models can be deployed in the real time stream audio workflow and for that we compare multiple size of the Whisper model for multiple data sets and analyze the results. Additionally, we tested the models for English and Romanian language. In the future, for the multi-language context it is highly important to analyze using model dedicated fine-tuned for specific language and compare with the results of multi-language models. This paper contributes to the field by providing evidence, comparison, statistics, level of accuracy in order to use ASR models in the context of low latency applications, exploring multiple versions of Whispers models and help researcher in dimensioning the environments for real time infrastructure and extend streaming capabilities with Artificial Intelligence models. Equally, the paper can used for IT professional in running critical systems in different industries to emphasize the techniques to be used in running ASR model in production.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Biznes i ekonomia, Ekonomia polityczna, Ekonomia polityczna, inne, Zarządzanie biznesem, Zarządzanie przedsiębiorstwem, inne, Chemia przemysłowa, Pozyskiwanie i przetwarzanie energii

Kanał RSS czasopisma

Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

Robert Cristian Necula

Pavel-Cristian Craciun

Data publikacji: 24 lip 2025

Zakres stron: 1282 - 1293

DOI: https://doi.org/10.2478/picbe-2025-0101

Słowa kluczowereal time, automatic speech recognition (ASR), whispers, multi-language, knowledge distillation, natural language processing

© 2025 Robert Cristian Necula et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
real time, automatic speech recognition (ASR), whispers, multi-language, knowledge distillation, natural language processing