This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.BaevskiAlexeiZhouYuhaoMohamedAbdelrahmanAuliMichaelwav2vec 2.0: A framework for self-supervised learning of speech representationsAdvances in neural information processing systems3312449124602020Search in Google Scholar
Anton Ragni, Kate M Knill, Shakti P Rath, and Mark JF Gales. Data augmentation for low resource languages. In INTERSPEECH 2014: 15th annual conference of the international speech communication association, pages 810–814. International Speech Communication Association (ISCA), 2014.RagniAntonKnillKate MRathShakti PGalesMark JFData augmentation for low resource languagesInINTERSPEECH 2014: 15th annual conference of the international speech communication association810814International Speech Communication Association (ISCA)2014Search in Google Scholar
Shiyu Zhou, Shuang Xu, and Bo Xu. Multilingual end-to-end speech recognition with a single transformer on low-resource languages. arXiv preprint arXiv:1806.05059, 2018.ZhouShiyuXuShuangXuBoMultilingual end-to-end speech recognition with a single transformer on low-resource languagesarXiv preprint arXiv:1806.050592018Search in Google Scholar
Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, and Bo Xu. Applying wav2vec2. 0 to speech recognition in various low-resource languages. arXiv preprint arXiv:2012.12121, 2020.YiChengWangJianzhongChengNingZhouShiyuXuBoApplying wav2vec2. 0 to speech recognition in various low-resource languagesarXiv preprint arXiv:2012.121212020Search in Google Scholar
Satwinder Singh, Ruili Wang, and Feng Hou. Improved meta learning for low resource speech recognition. In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4798–4802. IEEE, 2022.SinghSatwinderWangRuiliHouFengImproved meta learning for low resource speech recognitionInICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)47984802IEEE2022Search in Google Scholar
Ankit Kumar and Rajesh Kumar Aggarwal. A hybrid cnn-ligru acoustic modeling using raw waveform sincnet for hindi asr. Computer Science, 21(4), 2020.KumarAnkitAggarwalRajesh KumarA hybrid cnn-ligru acoustic modeling using raw waveform sincnet for hindi asrComputer Science2142020Search in Google Scholar
A Kumar, T Choudhary, M Dua, and M Sabharwal. Hybrid end-to-end architecture for hindi speech recognition system. In Proceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences: PCCDS 2021, pages 267–276. Springer, 2022.KumarAChoudharyTDuaMSabharwalMHybrid end-to-end architecture for hindi speech recognition systemInProceedings of the International Conference on Paradigms of Communication, Computing and Data Sciences: PCCDS 2021267276Springer2022Search in Google Scholar
Ankit Kumar and Rajesh K Aggarwal. An investigation of multilingual tdnn-blstm acoustic modeling for hindi speech recognition. International Journal of Sensors Wireless Communications and Control, 12(1):19–31, 2022.KumarAnkitAggarwalRajesh KAn investigation of multilingual tdnn-blstm acoustic modeling for hindi speech recognitionInternational Journal of Sensors Wireless Communications and Control12119312022Search in Google Scholar
Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh, and Khaled Shaalan. Speech recognition using deep neural networks: A systematic review. IEEE access, 7:19143–19165, 2019.NassifAli BouShahinIsmailAttiliImtinanAzzehMohammadShaalanKhaledSpeech recognition using deep neural networks: A systematic reviewIEEE access719143191652019Search in Google Scholar
Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, and Martijn Wieling. Making more of little data: Improving low-resource automatic speech recognition using data augmentation. arXiv preprint arXiv:2305.10951, 2023.BarteldsMartijnSanNayMcDonnellBradleyJurafskyDanWielingMartijnMaking more of little data: Improving low-resource automatic speech recognition using data augmentationarXiv preprint arXiv:2305.109512023Search in Google Scholar
Ankit Kumar and Rajesh Kumar Aggarwal. An exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognition. Journal of Reliable Intelligent Environments, 8(2):117–132, 2022.KumarAnkitAggarwalRajesh KumarAn exploration of semi-supervised and language-adversarial transfer learning using hybrid acoustic model for hindi speech recognitionJournal of Reliable Intelligent Environments821171322022Search in Google Scholar
Jacob Kahn, Ann Lee, and Awni Hannun. Self-training for end-to-end speech recognition. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7084–7088. IEEE, 2020.KahnJacobLeeAnnHannunAwniSelf-training for end-to-end speech recognitionInICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)70847088IEEE2020Search in Google Scholar
Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, and Takaaki Hori. Momentum pseudo-labeling for semi-supervised speech recognition. arXiv preprint arXiv:2106.08922, 2021.HiguchiYosukeMoritzNikoRouxJonathan LeHoriTakaakiMomentum pseudo-labeling for semi-supervised speech recognitionarXiv preprint arXiv:2106.089222021Search in Google Scholar
Daniel S Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, and Quoc V Le. Improved noisy student training for automatic speech recognition. arXiv preprint arXiv:2005.09629, 2020.ParkDaniel SZhangYuJiaYeHanWeiChiuChung-ChengLiBoWuYonghuiLeQuoc VImproved noisy student training for automatic speech recognitionarXiv preprint arXiv:2005.096292020Search in Google Scholar
Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, and Yonghong Yan. Alternative pseudo-labeling for semi-supervised automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.ZhuHanGaoDongjiChengGaofengPoveyDanielZhangPengyuanYanYonghongAlternative pseudo-labeling for semi-supervised automatic speech recognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing2023Search in Google Scholar
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.BaevskiAlexeiZhouYuhaoMohamedAbdelrahmanAuliMichaelwav2vec 2.0: A framework for self-supervised learning of speech representationsAdvances in neural information processing systems3312449124602020Search in Google Scholar
Julia Mainzinger. Fine-tuning asr models for very low-resource languages: A study on mvskoke. Master's thesis, University of Washington, 2024.MainzingerJuliaFine-tuning asr models for very low-resource languages: A study on mvskokeMaster's thesis,University of Washington2024Search in Google Scholar
Robert Jimerson, Zoey Liu, and Emily Prud'Hommeaux. An (unhelpful) guide to selecting the best asr architecture for your under-resourced language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1008–1016, 2023.JimersonRobertLiuZoeyPrud'HommeauxEmilyAn (unhelpful) guide to selecting the best asr architecture for your under-resourced languageInProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)100810162023Search in Google Scholar
Shiyue Zhang, Ben Frey, and Mohit Bansal. How can nlp help revitalize endangered languages? a case study and roadmap for the cherokee language. arXiv preprint arXiv:2204.11909, 2022.ZhangShiyueFreyBenBansalMohitHow can nlp help revitalize endangered languages? a case study and roadmap for the cherokee languagearXiv preprint arXiv:2204.119092022Search in Google Scholar
Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, et al. Scaling speech technology to 1,000+ languages. Journal of Machine Learning Research, 25(97):1–52, 2024.PratapVineelTjandraAndrosShiBowenTomaselloPadenBabuArunKunduSayaniElkahkyAliNiZhaohengVyasApoorvFazel-ZarandiMaryamScaling speech technology to 1,000+ languagesJournal of Machine Learning Research25971522024Search in Google Scholar
Marieke Meelen, Alexander O'neill, and Rolando Coto-Solano. End-to-end speech recognition for endangered languages of nepal. In Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 83–93, 2024.MeelenMariekeO'neillAlexanderCoto-SolanoRolandoEnd-to-end speech recognition for endangered languages of nepalInProceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages83932024Search in Google Scholar
Panji Arisaputra, Alif Tri Handoyo, and Amalia Zahra. Xls-r deep learning model for multilingual asr on low-resource languages: Indonesian, javanese, and sundanese. arXiv preprint arXiv:2401.06832, 2024.ArisaputraPanjiHandoyoAlif TriZahraAmaliaXls-r deep learning model for multilingual asr on low-resource languages: Indonesian, javanese, and sundanesearXiv preprint arXiv:2401.068322024Search in Google Scholar
Siqing Qin, Longbiao Wang, Sheng Li, Jianwu Dang, and Lixin Pan. Improving low-resource tibetan end-to-end asr by multilingual and multilevel unit modeling. EURASIP Journal on Audio, Speech, and Music Processing, 2022(1):2, 2022.QinSiqingWangLongbiaoLiShengDangJianwuPanLixinImproving low-resource tibetan end-to-end asr by multilingual and multilevel unit modelingEURASIP Journal on Audio, Speech, and Music Processing2022122022Search in Google Scholar
Kaushal Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M Khapra. Effectiveness of mining audio and text pairs from public data for improving asr systems for low-resource languages. In Icassp 2023–2023 ieee international conference on acoustics, speech and signal processing (icassp), pages 1–5. IEEE, 2023.BhogaleKaushalRamanAbhigyanJavedTahirDoddapaneniSumanthKunchukuttanAnoopKumarPratyushKhapraMitesh MEffectiveness of mining audio and text pairs from public data for improving asr systems for low-resource languagesInIcassp 2023–2023 ieee international conference on acoustics, speech and signal processing (icassp)15IEEE2023Search in Google Scholar
Zoey Liu, Justin Spence, and Emily Prud'Hommeaux. Studying the impact of language model size for low-resource asr. In Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 77–83, 2023.LiuZoeySpenceJustinPrud'HommeauxEmilyStudying the impact of language model size for low-resource asrInProceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages77832023Search in Google Scholar
Gueorgui Pironkov, Sean UN Wood, and Stéphane Dupont. Hybrid-task learning for robust automatic speech recognition. Computer Speech & Language, 64:101103, 2020.PironkovGueorguiWoodSean UNDupontStéphaneHybrid-task learning for robust automatic speech recognitionComputer Speech & Language641011032020Search in Google Scholar
Mohamed Tamazin, Ahmed Gouda, and Mohamed Khedr. Enhanced automatic speech recognition system based on enhancing power-normalized cepstral coefficients. Applied Sciences, 9(10):2166, 2019.TamazinMohamedGoudaAhmedKhedrMohamedEnhanced automatic speech recognition system based on enhancing power-normalized cepstral coefficientsApplied Sciences91021662019Search in Google Scholar
Syed Shahnawazuddin, KT Deepak, Gayadhar Pradhan, and Rohit Sinha. Enhancing noise and pitch robustness of children's asr. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5225–5229. IEEE, 2017.ShahnawazuddinSyedDeepakKTPradhanGayadharSinhaRohitEnhancing noise and pitch robustness of children's asrIn2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)52255229IEEE2017Search in Google Scholar
Jiri Malek, Jindrich Zdansky, and Petr Cerva. Robust automatic recognition of speech with background music. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5210–5214. IEEE, 2017.MalekJiriZdanskyJindrichCervaPetrRobust automatic recognition of speech with background musicIn2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)52105214IEEE2017Search in Google Scholar
Sheng-Chieh Lee, Jhing-Fa Wang, and Miao-Hia Chen. Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors, 18(7):2068, 2018.LeeSheng-ChiehWangJhing-FaChenMiao-HiaThreshold-based noise detection and reduction for automatic speech recognition system in human-robot interactionsSensors18720682018Search in Google Scholar
Daniel S Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, and Quoc V Le. Improved noisy student training for automatic speech recognition. arXiv preprint arXiv:2005.09629, 2020.ParkDaniel SZhangYuJiaYeHanWeiChiuChung-ChengLiBoWuYonghuiLeQuoc VImproved noisy student training for automatic speech recognitionarXiv preprint arXiv:2005.096292020Search in Google Scholar
Satyender Jaglan, Sanjeev Kumar Dhull, and Krishna Kant Singh. Tertiary wavelet model based automatic epilepsy classification system. International Journal of Intelligent Unmanned Systems, 11(1):166–181, 2023.JaglanSatyenderDhullSanjeev KumarSinghKrishna KantTertiary wavelet model based automatic epilepsy classification systemInternational Journal of Intelligent Unmanned Systems1111661812023Search in Google Scholar
Yuzong Liu and Katrin Kirchhoff. Graph-based semisupervised learning for acoustic modeling in automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11):1946–1956, 2016.LiuYuzongKirchhoffKatrinGraph-based semisupervised learning for acoustic modeling in automatic speech recognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing2411194619562016Search in Google Scholar
Michael I Mandel and Jon Barker. Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions. In INTERSPEECH, pages 1991–1995, 2016.MandelMichael IBarkerJonMultichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditionsInINTERSPEECH199119952016Search in Google Scholar
Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, and Hiroshi G Okuno. Automatic speech recognition for mixed dialect utterances by mixing dialect language models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2):373–382, 2015.HirayamaNaokiYoshinoKoichiroItoyamaKatsutoshiMoriShinsukeOkunoHiroshi GAutomatic speech recognition for mixed dialect utterances by mixing dialect language modelsIEEE/ACM Transactions on Audio, Speech, and Language Processing2323733822015Search in Google Scholar
Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, and Li Liu. Deep learning for scene classification: A survey. arXiv preprint arXiv:2101.10531, 2021.ZengDeluLiaoMinyuTavakolianMohammadGuoYulanZhouBoleiHuDewenPietikäinenMattiLiuLiDeep learning for scene classification: A surveyarXiv preprint arXiv:2101.105312021Search in Google Scholar
Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, and Vivek Raghavan. Vakyansh: Asr toolkit for low resource indic languages. arXiv preprint arXiv:2203.16512, 2022.ChadhaHarveen SinghGuptaAnirudhShahPriyanshiChhimwalNeerajDhuriyaAnkurGaurRishabhRaghavanVivekVakyansh: Asr toolkit for low resource indic languagesarXiv preprint arXiv:2203.165122022Search in Google Scholar
Jaehyeon Kim, Sungwon Kim, Jungil Kong, and Sungroh Yoon. Glow-tts: A generative flow for text-to-speech via monotonic alignment search. Advances in Neural Information Processing Systems, 33:8067–8077, 2020.KimJaehyeonKimSungwonKongJungilYoonSungrohGlow-tts: A generative flow for text-to-speech via monotonic alignment searchAdvances in Neural Information Processing Systems33806780772020Search in Google Scholar
Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in neural information processing systems, 33:17022–17033, 2020.KongJungilKimJaehyeonBaeJaekyoungHifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesisAdvances in neural information processing systems3317022170332020Search in Google Scholar
Lori Lamel, Jean-Luc Gauvain, and Gilles Adda. Lightly supervised and unsupervised acoustic model training. Computer Speech & Language, 16(1):115–129, 2002.LamelLoriGauvainJean-LucAddaGillesLightly supervised and unsupervised acoustic model trainingComputer Speech & Language1611151292002Search in Google Scholar
Ho Yin Chan and Phil Woodland. Improving broadcast news transcription by lightly supervised discriminative training. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages I–737. IEEE, 2004.ChanHo YinWoodlandPhilImproving broadcast news transcription by lightly supervised discriminative trainingIn2004 IEEE International Conference on Acoustics, Speech, and Signal Processing1I737IEEE2004Search in Google Scholar
Vimal Manohar, Hossein Hadian, Daniel Povey, and Sanjeev Khudanpur. Semi-supervised training of acoustic models using lattice-free mmi. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4844–4848. IEEE, 2018.ManoharVimalHadianHosseinPoveyDanielKhudanpurSanjeevSemi-supervised training of acoustic models using lattice-free mmiIn2018 IEEE international conference on acoustics, speech and signal processing (ICASSP)48444848IEEE2018Search in Google Scholar
Thiago Fraga-Silva, Jean-Luc Gauvain, and Lori Lamel. Lattice-based unsupervised acoustic model training. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4656–4659. IEEE, 2011.Fraga-SilvaThiagoGauvainJean-LucLamelLoriLattice-based unsupervised acoustic model trainingIn2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)46564659IEEE2011Search in Google Scholar
Vaswani, A. Attention is all you need, Advances in Neural Information Processing Systems, 2017.VaswaniAAttention is all you need, Advances in Neural Information Processing Systems2017Search in Google Scholar