Enhanced lstm network with semi-supervised learning and data augmentation for low-resource ASR
, and
Mar 04, 2025
About this article
Article Category: Research Article
Published Online: Mar 04, 2025
Received: Nov 20, 2024
DOI: https://doi.org/10.2478/ijssis-2025-0009
Keywords
© 2025 Tripti Choudhary et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Dataset details for each language
Size in hours | 95.05 | 5.55 | 5.49 | 93.89 | 5.0 | 0.67 | 94.54 | 5.49 | 4.66 |
Channel compression | 3GP | 3GP | 3GP | 3GP | 3GP | M4A | M4A | M4A | M4A |
Unique sentences | 4506 | 386 | 316 | 2543 | 200 | 120 | 820 | 65 | 124 |
# Speakers | 59 | 19 | 18 | 31 | 31 | – | – | – | – |
Words in vocabulary | 6092 | 1681 | 1359 | 3245 | 547 | 350 | 1584 | 224 | 334 |
WER (%) for Indian languages
Model | Hindi | Marathi | Odia | |||
---|---|---|---|---|---|---|
w/o LM | with LM | w/o LM | with LM | w/o LM | with LM | |
Proposed LSTM transformer (Baseline) | 14.1 | 11.4 | 12.7 | 10.6 | 24.6 | 21.3 |
+ Synthetic data augmentation | 13.5 | 10.9 | 12.3 | 10.2 | 24.2 | 21.0 |
+ Semi-supervised training | 13.2 | 10.5 | 12.1 | 9.8 | 23.9 | 20.6 |