A multi-threaded approach for improved and faster accent transcription of chemical terms

[1] Hinsvark, Arthur, et al. Accented Speech Recognition: A Survey. arXiv:2104.10747, arXiv, 2 June 2021. arXiv.org, DOI: 10.48550/arXiv.2104.10747. HinsvarkArthur Accented Speech Recognition: A Survey arXiv:2104.10747, arXiv, 2 June 2021 arXiv.org, 10.48550/arXiv.2104.10747 Open DOI Search in Google Scholar

[2] Droua-Hamdani, G., Selouani, S. A., & Boudraa, M. (2012, June 6). Speaker-independent ASR for Modern Standard Arabic: effect of regional accents. International Journal of Speech Technology, 15(4), 487–493. DOI: 10.1007/s10772-012-9146-4 Droua-HamdaniG. SelouaniS. A. BoudraaM. 2012 June 6 Speaker-independent ASR for Modern Standard Arabic: effect of regional accents International Journal of Speech Technology 15 4 487 493 10.1007/s10772-012-9146-4 Open DOI Search in Google Scholar

[3] Vergyri, Dimitra & Lamel, Lori & Gauvain, Jean-Luc. (2010). Automatic speech recognition of multiple accented English data. 1652–1655. 10.21437/Interspeech.2010-477. VergyriDimitra LamelLori GauvainJean-Luc 2010 Automatic speech recognition of multiple accented English data. 1652–1655 10.21437/Interspeech.2010-477 Open DOI Search in Google Scholar

[4] Lin, Zhaofeng, Tanvina Patel, and Odette Scharenborg. “Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation.” 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. LinZhaofeng PatelTanvina ScharenborgOdette “Improving Whispered Speech Recognition Performance Using Pseudo-Whispered Based Data Augmentation.” 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) IEEE 2023 Search in Google Scholar

[5] Chang, Jungwon, and Hosung Nam. “Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset.” Phonetics and Speech Sciences 15.3 (2023): 83–88. ChangJungwon NamHosung “Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset.” Phonetics and Speech Sciences 15 3 2023 83 88 Search in Google Scholar

[6] Hock, KIA Siang, and L. I. Lingxia. “Automated processing of massive audio/video content using FFmpeg.” Code4Lib Journal 23 (2014). HockKIA Siang LingxiaL. I. “Automated processing of massive audio/video content using FFmpeg.” Code4Lib Journal 23 2014 Search in Google Scholar

[7] Swain, M. C., & Cole, J. M. (2016, October 6). ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature. Journal of Chemical Information and Modeling, 56(10), 1894–1904. DOI: 10.1021/acs.jcim.6b00207 SwainM. C. ColeJ. M. 2016 October 6 ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature Journal of Chemical Information and Modeling 56 10 1894 1904 10.1021/acs.jcim.6b00207 Open DOI Search in Google Scholar

[8] Kothari, S.., Chiwhane, S.., Satya, R.., Ansari, M. A.., Mehta, S.., Naranatt, P.., & Karthikeyan, M.. (2023). Fine-tuning ASR Model Performance on Indian Regional Accents for Accurate Chemical Term Prediction in Audio. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 485–494. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3583 KothariS. ChiwhaneS. SatyaR. AnsariM. A. MehtaS. NaranattP. KarthikeyanM. 2023 Fine-tuning ASR Model Performance on Indian Regional Accents for Accurate Chemical Term Prediction in Audio International Journal of Intelligent Systems and Applications in Engineering 11 4 485 494 Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3583 Search in Google Scholar

[9] Phillips, S., & Rogers, A. (1999). International Journal of Parallel Programming, 27(4), 257–288. doi:10.1023/a:1018741730355 PhillipsS. RogersA. 1999 International Journal of Parallel Programming 27 4 257 288 10.1023/a:1018741730355 Open DOI Search in Google Scholar

[10] Dong, Qianqian, et al. “Learning When to Translate for Streaming Speech.” ArXiv (Cornell University), 15 Sept. 2021, DOI: 10.48550/arxiv.2109.07368. Accessed 4 Feb. 2024. DongQianqian “Learning When to Translate for Streaming Speech.” ArXiv (Cornell University) 15 Sept. 2021 10.48550/arxiv.2109.07368 Accessed 4 Feb. 2024. Open DOI Search in Google Scholar

[11] Krallinger, M., Rabal, O., Leitner, F. et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles. J Cheminform 7 (Suppl 1), S2 (2015). DOI: 10.1186/1758-2946-7-S1-S2 KrallingerM. RabalO. LeitnerF. The CHEMDNER corpus of chemicals and drugs and its annotation principles J Cheminform 7 Suppl 1 S2 2015 10.1186/1758-2946-7-S1-S2 Open DOI Search in Google Scholar

[12] Chong, Jike & Friedland, Gerald & Janin, Adam & Morgan, Nelson & Oei, Chris. (2010). Opportunities and challenges of parallelizing speech recognition. 2-2. ChongJike FriedlandGerald JaninAdam MorganNelson OeiChris 2010 Opportunities and challenges of parallelizing speech recognition 2-2. Search in Google Scholar

[13] Saito, Takashi. “A framework of human-based speech transcription with a speech chunking front-end.” 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, 2015. SaitoTakashi “A framework of human-based speech transcription with a speech chunking front-end.” 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) IEEE 2015 Search in Google Scholar

[14] Jorge, Javier, et al. “Live streaming speech recognition using deep bidirectional LSTM acoustic models and interpolated language models.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2021): 148–161. JorgeJavier “Live streaming speech recognition using deep bidirectional LSTM acoustic models and interpolated language models.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 2021 148 161 Search in Google Scholar

[15] Perero-Codosero, Juan M., et al. “Exploring Open-Source Deep Learning ASR for Speech-to-Text TV program transcription.” IberSPEECH. 2018. Perero-CodoseroJuan M. “Exploring Open-Source Deep Learning ASR for Speech-to-Text TV program transcription.” IberSPEECH 2018 Search in Google Scholar

[16] A. Radford et al., “Robust Speech Recognition via Large-Scale Weak Supervision,” arXiv preprint arXiv:2212.04356, 2022. [Online]. Available: https://arxiv.org/abs/2212.04356 RadfordA. “Robust Speech Recognition via Large-Scale Weak Supervision,” arXiv preprint arXiv:2212.04356, 2022 [Online]. Available: https://arxiv.org/abs/2212.04356 Search in Google Scholar

[17] Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460, 2020. [Online]. Available: https://arxiv.org/abs/2006.11477 BaevskiH. Zhou MohamedA. AuliM. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems 33 12449 12460 2020 [Online]. Available: https://arxiv.org/abs/2006.11477 Search in Google Scholar

[18] Google Cloud, “Speech-to-Text API,” 2023. [Online]. Available: https://cloud.google.com/speech-to-text Google Cloud “Speech-to-Text API,” 2023 [Online]. Available: https://cloud.google.com/speech-to-text Search in Google Scholar

[19] P. Jyothi and M. Hasegawa-Johnson, “Acoustic Model Adaptation for Indian English Speech Recognition,” in Proc. Interspeech, 2015, pp. 1565–1569. JyothiP. Hasegawa-JohnsonM. “Acoustic Model Adaptation for Indian English Speech Recognition,” in Proc. Interspeech 2015 1565 1569 Search in Google Scholar

[20] A. Gupta, P. K. Ghosh, and H. A. Murthy, “Automatic Speech Recognition for Indian Accents: A Survey,” in IEEE Access, vol. 10, pp. 59347–59365, 2022. [Online]. Available: DOI: 10.1109/ACCESS.2022.3179123 GuptaA. GhoshP. K. MurthyH. A. “Automatic Speech Recognition for Indian Accents: A Survey,” in IEEE Access 10 59347 59365 2022 [Online]. Available: 10.1109/ACCESS.2022.3179123 Open DOI Search in Google Scholar

[21] S. Manjunath and K. R. Ramakrishnan, “Domain-Specific Speech Recognition: Challenges and Solutions,” in IEEE Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 431–445, 2021. ManjunathS. RamakrishnanK. R. “Domain-Specific Speech Recognition: Challenges and Solutions,” in IEEE Transactions on Audio, Speech, and Language Processing 29 431 445 2021 Search in Google Scholar

[22] A. Rao, R. Patel, and M. S. Deshpande, “Performance Evaluation of ASR Systems for Indian Accents Using Deep Learning Techniques,” in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), pp. 678–683, IEEE, 2021. RaoA. PatelR. DeshpandeM. S. “Performance Evaluation of ASR Systems for Indian Accents Using Deep Learning Techniques,” in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) 678 683 IEEE 2021 Search in Google Scholar

[23] S. Setty, S. R. Patil, and N. Gupta, “Speech Recognition for Chemistry Terminology Using Deep Learning,” in IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 3456–3465, 2022. SettyS. PatilS. R. GuptaN. “Speech Recognition for Chemistry Terminology Using Deep Learning,” in IEEE Transactions on Neural Networks and Learning Systems 33 7 3456 3465 2022 Search in Google Scholar

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Engineering, Introductions and Overviews, Engineering, other

Journal RSS Feed

A multi-threaded approach for improved and faster accent transcription of chemical terms

Sonali Kothari

Shwetambari Chiwhane

Shreeja Mehta

Pranav Naranatt

Md. Asad Ansari

Rithwik Satya

Article Category: Research article

Published Online: Apr 25, 2025

Received: Feb 05, 2025

DOI: https://doi.org/10.2478/ijssis-2025-0016

Keywordschemical terms classification, Indian regional-accents of English, multi-threading architecture, chemical entities, speech to text conversion

© 2025 Sonali Kothari et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
chemical terms classification, Indian regional-accents of English, multi-threading architecture, chemical entities, speech to text conversion