Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Choudhary, Tripti; Goyal, Vishal; Bansal, Atul

Open Access

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

,

and

Mar 04, 2025

International Journal on Smart Sensing and Intelligent Systems

Volume 18 (2025): Issue 1 (January 2025)

About this article

Cite

Share

Download Cover

Article Category: Research Article

Published Online: Mar 04, 2025

Received: Nov 20, 2024

DOI: https://doi.org/10.2478/ijssis-2025-0009

Keywords
Automatic Speech Recognition, Data Augmentation, Semi-supervised learning, Low-resource ASR

© 2025 Tripti Choudhary et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The proposed methodology for LSTM-based transformer. LSTM, long short-term memory.

LSTM block structure. LSTM, long short-term memory.

Overview of the data augmentation strategy and training pipeline for the proposed model.

Diagram of proposed framework in combination of multilingual supervised training and semi-supervised training for Indian languages using untranscribed data. The proposed enhanced LSTM-Transformer architecture is used to train the supervised mode (left); amalgamation of synthetic data from TTS system (middle); semi-supervised training with both transcribed and untranscribed data (right). LSTM, long short-term memory; TTS, text-to-speech.

WER on Odia, Hindi, and Marathi using (A) TDNN (B) Transformer (C) Proposed LSTM Transformer.

Dataset details for each language

	Hindi			Marathi			Odia

	Train	Test	Val.	Train	Test	Val.	Train	Test	Val.
Size in hours	95.05	5.55	5.49	93.89	5.0	0.67	94.54	5.49	4.66
Channel compression	3GP	3GP	3GP	3GP	3GP	M4A	M4A	M4A	M4A
Unique sentences	4506	386	316	2543	200	120	820	65	124
# Speakers	59	19	18	31	31	–	–	–	–
Words in vocabulary	6092	1681	1359	3245	547	350	1584	224	334

WER (%) for Indian languages

Model	Hindi		Marathi		Odia

	w/o LM	with LM	w/o LM	with LM	w/o LM	with LM
Proposed LSTM transformer (Baseline)	14.1	11.4	12.7	10.6	24.6	21.3
+ Synthetic data augmentation	13.5	10.9	12.3	10.2	24.2	21.0
+ Semi-supervised training	13.2	10.5	12.1	9.8	23.9	20.6

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Engineering, Introductions and Overviews, Engineering, other

Journal RSS Feed

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Article Category: Research Article

Published Online: Mar 04, 2025

Received: Nov 20, 2024

DOI: https://doi.org/10.2478/ijssis-2025-0009

Keywords
Automatic Speech Recognition, Data Augmentation, Semi-supervised learning, Low-resource ASR

© 2025 Tripti Choudhary et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Dataset details for each language

WER (%) for Indian languages

Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR

Tripti Choudhary

Vishal Goyal

Atul Bansal

Article Category: Research Article

Published Online: Mar 04, 2025

Received: Nov 20, 2024

DOI: https://doi.org/10.2478/ijssis-2025-0009

KeywordsAutomatic Speech Recognition, Data Augmentation, Semi-supervised learning, Low-resource ASR

© 2025 Tripti Choudhary et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Dataset details for each language

WER (%) for Indian languages

Keywords
Automatic Speech Recognition, Data Augmentation, Semi-supervised learning, Low-resource ASR