Otwarty dostęp

Running Automatic Speech Recognition (ASR) Model in the Context of Real Time Data Streaming Architecture

 oraz   
24 lip 2025

Zacytuj
Pobierz okładkę

In the context of huge volume of data generated in the last couple of years and the need of processing audio stream in real time data processing flows, this paper aims to explore, identify and experiment Automatic Speech Recognition (ASR) models in the context of low latency applications depending on different contexts using specific data sets. Our work aims to explore whether if such models can be deployed in the real time stream audio workflow and for that we compare multiple size of the Whisper model for multiple data sets and analyze the results. Additionally, we tested the models for English and Romanian language. In the future, for the multi-language context it is highly important to analyze using model dedicated fine-tuned for specific language and compare with the results of multi-language models. This paper contributes to the field by providing evidence, comparison, statistics, level of accuracy in order to use ASR models in the context of low latency applications, exploring multiple versions of Whispers models and help researcher in dimensioning the environments for real time infrastructure and extend streaming capabilities with Artificial Intelligence models. Equally, the paper can used for IT professional in running critical systems in different industries to emphasize the techniques to be used in running ASR model in production.