Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Automatic speech recognition (ASR) is essential for developing intelligent systems capable of accurately processing human speech, particularly in low-resource languages. This study addresses the challenges faced by ASR systems in Indian languages, where data and resources are limited. The authors propose a novel three-step methodology that combines data augmentation and semi-supervised learning to enhance ASR performance. First, an enhanced long short-term memory (LSTM) network is used to train a baseline model with limited labeled data. Next, synthetic data is generated and combined with original recordings to refine the ASR model. Finally, semi-supervised training further boosts accuracy. Evaluations demonstrate significant improvements over existing models for Hindi, Marathi, and Odia languages.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Engineering, Introductions and Overviews, Engineering, other

Journal RSS Feed

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Tripti Choudhary

Vishal Goyal

Atul Bansal

Article Category: Research Article

Published Online: Mar 04, 2025

Received: Nov 20, 2024

DOI: https://doi.org/10.2478/ijssis-2025-0009

KeywordsAutomatic Speech Recognition, Data Augmentation, Semi-supervised learning, Low-resource ASR

© 2025 Tripti Choudhary et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Automatic Speech Recognition, Data Augmentation, Semi-supervised learning, Low-resource ASR