A multi-threaded approach for improved and faster accent transcription of chemical terms
, , , , and
Apr 25, 2025
About this article
Article Category: Research article
Published Online: Apr 25, 2025
Received: Feb 05, 2025
DOI: https://doi.org/10.2478/ijssis-2025-0016
Keywords
© 2025 Sonali Kothari et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Figure 13:

Figure 14:

Figure 15:

Comparative results (in seconds)
audio 001 | 38.15 | 44.80 | 40.83 |
audio 002 | 70.97 | 79.53 | 79.83 |
audio 003 | 80.69 | 87.78 | 82.72 |
audio 004 | 54.86 | 62.19 | 59.21 |
audio 005 | 33.25 | 38.09 | 39.40 |
audio 006 | 40.93 | 58.66 | 53.68 |
audio 007 | 48.13 | 53.85 | 51.81 |
audio 008 | 33.49 | 38.68 | 35.13 |
audio 009 | 33.94 | 38.55 | 33.82 |
audio 010 | 48.95 | 54.28 | 50.15 |
Performance of existing AER systems over Indian accents
Indian Accent Support | Strong (multilingual model trained on diverse accents) [ |
Varies (depends on fine-tuned dataset) [ |
Good (Google has extensive Indian English training data) [ |
Regional Variants (Hindi-English, Tamil-English, etc.) | Handles code-switching well [ |
Requires specific fine-tuning for mixed languages [ |
Decent but struggles with heavy accents [ |
Noise Robustness | Strong (performs well in real-world noisy environments) [ |
Moderate (depends on fine-tuned model) [ |
Good (handles background noise effectively) [ |
Spoken Speed Adaptability | Good (handles fast speech well) [ |
Varies (pre-trained models sometimes struggle) [ |
Good (adjusts well to fast-paced speech) [ |
First meaningful transcription time (in seconds)
audio 001 | 38.15 | 44.80 | 3.00 |
audio 002 | 70.97 | 79.53 | 5.05 |
audio 003 | 80.69 | 87.78 | 4.33 |
audio 004 | 54.86 | 62.19 | 4.35 |
audio 005 | 33.25 | 38.09 | 2.87 |
audio 006 | 40.93 | 58.66 | 6.10 |
audio 007 | 48.13 | 53.85 | 3.05 |
audio 008 | 33.49 | 38.68 | 2.73 |
audio 009 | 33.94 | 38.55 | 2.51 |
audio 010 | 48.95 | 54.28 | 3.40 |
Performance of existing AER systems for chemical term recognition
Chemical Terms Recognition | Limited (depends on general training data, not domain-specific) [ |
Can be fine-tuned for better accuracy [ |
Good (Google’s general corpus covers some scientific terms) [ |
Adaptability to Scientific Jargon | Poor without custom fine-tuning [ |
Can be trained on specialized datasets [ |
Better but not perfect [ |
Handling of Long & Complex Terms | Struggles with rare chemical names [ |
Can be improved with domain-specific training [ |
Sometimes recognizes common scientific terms but struggles with rare ones [ |
Stress testing (hours)
long audio01 | 1.144 | 1.299 | 1.144 |
long audio02 | 3.027 | 3.363 | 3.029 |