Open Access

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

,  and   
Mar 04, 2025

Cite
Download Cover

Figure 1:

The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.
The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.

Figure 2:

LSTM block structure. LSTM, long short-term memory.
LSTM block structure. LSTM, long short-term memory.

Figure 3:

Overview of the data augmentation strategy and training pipeline for the proposed model.
Overview of the data augmentation strategy and training pipeline for the proposed model.

Figure 4:

Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.
Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.

Figure 5:

Comparative results.
Comparative results.

Figure 6:

WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.
WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.

Dataset details for each language

Hindi Marathi Odia



Train Test Val. Train Test Val. Train Test Val.
Size in hours 95.05 5.55 5.49 93.89 5.0 0.67 94.54 5.49 4.66
Channel compression 3GP 3GP 3GP 3GP 3GP M4A M4A M4A M4A
Unique sentences 4506 386 316 2543 200 120 820 65 124
# Speakers 59 19 18 31 31
Words in vocabulary 6092 1681 1359 3245 547 350 1584 224 334

WER (%) for Indian languages

Model Hindi Marathi Odia



w/o LM with LM w/o LM with LM w/o LM with LM
Proposed LSTM transformer (Baseline) 14.1 11.4 12.7 10.6 24.6 21.3
+ Synthetic data augmentation 13.5 10.9 12.3 10.2 24.2 21.0
+ Semi-supervised training 13.2 10.5 12.1 9.8 23.9 20.6
Language:
English
Publication timeframe:
1 times per year
Journal Subjects:
Engineering, Introductions and Overviews, Engineering, other