1. bookVolume 15 (2020): Issue 2 (December 2020)
Journal Details
License
Format
Journal
eISSN
1857-8462
First Published
01 Jul 2005
Publication timeframe
2 times per year
Languages
English
Open Access

Automatic Speech Recognition: A Comprehensive Survey

Published Online: 06 Apr 2021
Volume & Issue: Volume 15 (2020) - Issue 2 (December 2020)
Page range: 86 - 112
Journal Details
License
Format
Journal
eISSN
1857-8462
First Published
01 Jul 2005
Publication timeframe
2 times per year
Languages
English
Abstract

Speech recognition is an interdisciplinary subfield of natural language processing (NLP) that facilitates the recognition and translation of spoken language into text by machine. Speech recognition plays an important role in digital transformation. It is widely used in different areas such as education, industry, and healthcare and has recently been used in many Internet of Things and Machine Learning applications. The process of speech recognition is one of the most difficult processes in computer science. Despite numerous searches in this domain, an optimal method for speech recognition has not yet been found. This is due to the fact that there are many attributes that characterize natural languages and every language has its particular highlights. The aim of this research is to provide a comprehensive understanding of the various techniques within the domain of Speech Recognition through a systematic literature review of the existing work. We will introduce the most significant and relevant techniques that may provide some directions in the future research.

Keywords

Yu, D., & Deng, L. (2016). AUTOMATIC SPEECH RECOGNITION. Springer london limited.Search in Google Scholar

Ruby, J. (2020). Automatic Speech Recognition and Machine Learning for Robotic Arm in Surgery. American Journal of Clinical Surgery, 2(1), 10-18.Search in Google Scholar

Juang, B. H. (1991). Speech recognition in adverse environments. Computer speech & language, 5(3), 275-294.10.1016/0885-2308(91)90011-ESearch in Google Scholar

Kitchenham, B., Brereton, O. P., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering–a systematic literature review. Information and software technology, 51(1), 7-15.10.1016/j.infsof.2008.09.009Search in Google Scholar

Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., & Khalil, M. (2007). Lessons from applying the systematic literature review process within the software engineering domain. Journal of systems and software, 80(4), 571-583.10.1016/j.jss.2006.07.009Search in Google Scholar

Shahin, M., Ahmed, B., McKechnie, J., Ballard, K., & Gutierrez-Osuna, R. (2014). A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech. In Fifteenth Annual Conference of the International Speech Communication Association.10.21437/Interspeech.2014-377Search in Google Scholar

Tachbelie, M. Y., Abulimiti, A., Abate, S. T., & Schultz, T. (2020, May). Dnn-based speech recognition for globalphone languages. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8269-8273). IEEE.10.1109/ICASSP40776.2020.9053144Search in Google Scholar

Kadyan, V., & Kaur, M. (2020). SGMM-Based Modeling Classifier for Punjabi Automatic Speech Recognition System. In Smart Computing Paradigms: New Progresses and Challenges (pp. 149-155). Springer, Singapore.Search in Google Scholar

Wu, Z., & Cao, Z. (2005). Improved MFCC-based feature for robust speaker identification. Tsinghua Science & Technology, 10(2), 158-161.10.1016/S1007-0214(05)70048-1Search in Google Scholar

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2011). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing, 20(1), 30-42.10.1109/TASL.2011.2134090Search in Google Scholar

Tsunoo, E., Kashiwagi, Y., Asakawa, S., & Kumakura, T. (2019). End-to-end adaptation with backpropagation through WFST for on-device speech recognition system. arXiv preprint arXiv:1905.07149.Search in Google Scholar

Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., & Mohri, M. (2007, July). OpenFst: A general and efficient weighted finite-state transducer library. In International Conference on Implementation and Application of Automata (pp. 11-23). Springer, Berlin, Heidelberg.10.1007/978-3-540-76336-9_3Search in Google Scholar

Chen, Z., Jain, M., Wang, Y., Seltzer, M. L., & Fuegen, C. (2019, May). End-to-end contextual speech recognition using class language models and a token passing decoder. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6186-6190). IEEE.10.1109/ICASSP.2019.8683573Search in Google Scholar

Kneser, R., & Ney, H. (1993). Improved clustering techniques for class-based statistical language modelling. In Third European Conference on Speech Communication and Technology.10.21437/Eurospeech.1993-229Search in Google Scholar

Hall, K., Cho, E., Allauzen, C., Beaufays, F., Coccaro, N., Nakajima, K., ... & Zhang, L. (2015). Composition-based on-the-fly rescoring for salient n-gram biasing.10.21437/Interspeech.2015-340Search in Google Scholar

Wang, C., Wu, Y., Liu, S., Li, J., Lu, L., Ye, G., & Zhou, M. (2020). Low latency end-to-end streaming speech recognition with a scout network. arXiv preprint arXiv:2003.10369.Search in Google Scholar

Sun, R. H., & Chol, R. J. (2020). Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition. Speech Communication, 117, 21-27.10.1016/j.specom.2020.01.001Search in Google Scholar

Hermansky, H., Ellis, D. P., & Sharma, S. (2000, June). Tandem connectionist feature extraction for conventional HMM systems. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100) (Vol. 3, pp. 1635-1638). IEEE.Search in Google Scholar

Xu, H., Su, H., Chng, E. S., & Li, H. (2014). Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems. In Fifteenth annual conference of the international speech communication association.10.21437/Interspeech.2014-472Search in Google Scholar

Yu, D., & Seltzer, M. L. (2011). Improved bottleneck features using pretrained deep neural networks. In Twelfth annual conference of the international speech communication association.10.21437/Interspeech.2011-91Search in Google Scholar

Tanaka, T., Masumura, R., Moriya, T., Oba, T., & Aono, Y. (2019). A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge. In INTERSPEECH (pp. 2210-2214).10.21437/Interspeech.2019-2263Search in Google Scholar

Yu, D., & Seltzer, M. L. (2011). Improved bottleneck features using pretrained deep neural networks. In Twelfth annual conference of the international speech communication association.10.21437/Interspeech.2011-91Search in Google Scholar

Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.Search in Google Scholar

Hori, T., Hori, C., Minami, Y., & Nakamura, A. (2007). Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition. IEEE Transactions on audio, speech, and language processing, 15(4), 1352-1365.10.1109/TASL.2006.889790Search in Google Scholar

Chiu, C. C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., ... & Bacchiani, M. (2018, April). State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4774-4778). IEEE.10.1109/ICASSP.2018.8462105Search in Google Scholar

Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016, March). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4960-4964). IEEE.10.1109/ICASSP.2016.7472621Search in Google Scholar

Schuster, M., & Nakajima, K. (2012, March). Japanese and korean voice search. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5149-5152). IEEE.10.1109/ICASSP.2012.6289079Search in Google Scholar

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.Search in Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.Search in Google Scholar

Nguyen, T. S., Stueker, S., Niehues, J., & Waibel, A. (2020, May). Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7689-7693). IEEE.10.1109/ICASSP40776.2020.9054130Search in Google Scholar

Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779.Search in Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.Search in Google Scholar

Weng, C., Cui, J., Wang, G., Wang, J., Yu, C., Su, D., & Yu, D. (2018, September). Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition. In Interspeech (pp. 761-765).10.21437/Interspeech.2018-1030Search in Google Scholar

Pham, N. Q., Nguyen, T. S., Niehues, J., Müller, M., Stüker, S., & Waibel, A. (2019). Very deep self-attention networks for end-to-end speech recognition. arXiv preprint arXiv:1904.13377.Search in Google Scholar

Xu, H., Ding, S., & Watanabe, S. (2019, May). Improving end-to-end speech recognition with pronunciation-assisted sub-word modeling. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7110-7114). IEEE.10.1109/ICASSP.2019.8682494Search in Google Scholar

Sperber, M., Niehues, J., Neubig, G., Stüker, S., & Waibel, A. (2018). Self-attentional acoustic models. arXiv preprint arXiv:1803.09519.Search in Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.Search in Google Scholar

Liu, A. H., Sung, T. W., Chuang, S. P., Lee, H. Y., & Lee, L. S. (2020, May). Sequence-to-Sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7879-7883). IEEE.10.1109/ICASSP40776.2020.9053324Search in Google Scholar

Unanue, I. J., Borzeshi, E. Z., Esmaili, N., & Piccardi, M. (2019). ReWE: Regressing word embeddings for regularization of neural machine translation systems. arXiv preprint arXiv:1904.02461.Search in Google Scholar

Toshniwal, S., Kannan, A., Chiu, C. C., Wu, Y., Sainath, T. N., & Livescu, K. (2018, December). A comparison of techniques for language model integration in encoder-decoder speech recognition. In 2018 IEEE spoken language technology workshop (SLT) (pp. 369-375). IEEE.10.1109/SLT.2018.8639038Search in Google Scholar

Caruana, R. (1997). Multitask learning. Machine learning, 28(1), 41-75.10.1023/A:1007379606734Search in Google Scholar

Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359.10.1109/TKDE.2009.191Search in Google Scholar

Inaguma, H., Cho, J., Baskar, M. K., Kawahara, T., & Watanabe, S. (2019, May). Transfer learning of language-independent end-to-end ASR with language model fusion. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6096-6100). IEEE.10.1109/ICASSP.2019.8682918Search in Google Scholar

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.Search in Google Scholar

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.Search in Google Scholar

Zeyer, A., Irie, K., Schlüter, R., & Ney, H. (2018). Improved training of end-to-end attention models for speech recognition. arXiv preprint arXiv:1805.03294.Search in Google Scholar

Sriram, A., Jun, H., Satheesh, S., & Coates, A. (2017). Cold fusion: Training seq2seq models together with language models. arXiv preprint arXiv:1708.06426.Search in Google Scholar

Gulcehre, C., Firat, O., Xu, K., Cho, K., Barrault, L., Lin, H. C., ... & Bengio, Y. (2015). On using monolingual corpora in neural machine translation. arXiv preprint arXiv:1503.03535.Search in Google Scholar

Denisov, P., & Vu, N. T. (2019). End-to-end multi-speaker speech recognition using speaker embeddings and transfer learning. arXiv preprint arXiv:1908.04737.Search in Google Scholar

Watanabe, S., Hori, T., Kim, S., Hershey, J. R., & Hayashi, T. (2017). Hybrid CTC/attention architecture for end-to-end speech recognition. IEEE Journal of Selected Topics in Signal Processing, 11(8), 1240-1253.10.1109/JSTSP.2017.2763455Search in Google Scholar

Jia, Y., Zhang, Y., Weiss, R. J., Wang, Q., Shen, J., Ren, F., ... & Wu, Y. (2018). Transfer learning from speaker verification to multispeaker text-to-speech synthesis. arXiv preprint arXiv:1806.04558.Search in Google Scholar

Seki, H., Hori, T., Watanabe, S., Roux, J. L., & Hershey, J. R. (2018). A purely end-to-end system for multi-speaker speech recognition. arXiv preprint arXiv:1805.05826.Search in Google Scholar

Heigold, G., Vanhoucke, V., Senior, A., Nguyen, P., Ranzato, M. A., Devin, M., & Dean, J. (2013, May). Multilingual acoustic models using distributed deep neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8619-8623). IEEE.10.1109/ICASSP.2013.6639348Search in Google Scholar

Kubo, Y., & Bacchiani, M. (2020, May). Joint phoneme-grapheme model for end-to-end speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6119-6123). IEEE.10.1109/ICASSP40776.2020.9054557Search in Google Scholar

Lee, J., Mansimov, E., & Cho, K. (2018). Deterministic non-autoregressive neural sequence modeling by iterative refinement. arXiv preprint arXiv:1802.06901.Search in Google Scholar

Xia, Y., Tian, F., Wu, L., Lin, J., Qin, T., Yu, N., & Liu, T. Y. (2017, December). Deliberation networks: Sequence generation beyond one-pass decoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 1782-1792).Search in Google Scholar

Wang, D., & Zheng, T. F. (2015, December). Transfer learning for speech and language processing. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (pp. 1225-1237). IEEE.10.1109/APSIPA.2015.7415532Search in Google Scholar

Yu, C., Chen, Y., Li, Y., Kang, M., Xu, S., & Liu, X. (2019). Cross-language end-to-end speech recognition research based on transfer learning for the low-resource Tujia language. Symmetry, 11(2), 179.10.3390/sym11020179Search in Google Scholar

Ozeki, M., & Okatani, T. (2014, November). Understanding convolutional neural networks in terms of category-level attributes. In Asian Conference on Computer Vision (pp. 362-375). Springer, Cham.10.1007/978-3-319-16808-1_25Search in Google Scholar

Zeghidour, N., Usunier, N., Synnaeve, G., Collobert, R., & Dupoux, E. (2018). End-to-end speech recognition from the raw waveform. arXiv preprint arXiv:1806.07098.Search in Google Scholar

Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173-182). PMLR.Search in Google Scholar

Abad, A., Bell, P., Carmantini, A., & Renais, S. (2020, May). Cross lingual transfer learning for zero-resource domain adaptation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6909-6913). IEEE.10.1109/ICASSP40776.2020.9054468Search in Google Scholar

Hsu, J. Y., Chen, Y. J., & Lee, H. Y. (2020, May). Meta learning for end-to-end low-resource speech recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7844-7848). IEEE.10.1109/ICASSP40776.2020.9053112Search in Google Scholar

Rusu, A. A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., & Hadsell, R. (2018). Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960.Search in Google Scholar

Finn, C., Abbeel, P., & Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (pp. 1126-1135). PMLR.Search in Google Scholar

Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp. 369-376).10.1145/1143844.1143891Search in Google Scholar

Matsuura, K., Mimura, M., Sakai, S., & Kawahara, T. (2020). Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition. arXiv preprint arXiv:2005.09256.Search in Google Scholar

Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.Search in Google Scholar

Zhang, S., Do, C. T., Doddipatla, R., & Renals, S. (2020, May). Learning Noise Invariant Features Through Transfer Learning For Robust End-to-End Speech Recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7024-7028). IEEE.10.1109/ICASSP40776.2020.9053169Search in Google Scholar

Chen, Z., & Yang, H. (2020, June). Yi Language Speech Recognition using Deep Learning Methods. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 1, pp. 1064-1068). IEEE.10.1109/ITNEC48623.2020.9084771Search in Google Scholar

Rashmi, S., Hanumanthappa, M., & Reddy, M. V. (2018). Hidden Markov Model for speech recognition system—a pilot study and a naive approach for speech-to-text model. In Speech and Language Processing for Human-Machine Communications (pp. 77-90). Springer, Singapore.10.1007/978-981-10-6626-9_9Search in Google Scholar

Karáfidt, M., Baskar, M. K., Veselý, K., Grézl, F., Burget, L., & Černocký, J. (2018, April). Analysis of multilingual blstm acoustic model on low and high resource languages. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5789-5793). IEEE.10.1109/ICASSP.2018.8462083Search in Google Scholar

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE transactions on acoustics, speech, and signal processing, 37(3), 328-339.10.1109/29.21701Search in Google Scholar

Kim, S., Hori, T., & Watanabe, S. (2017, March). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4835-4839). IEEE.10.1109/ICASSP.2017.7953075Search in Google Scholar

Aung, M. A. A., & Pa, W. P. (2020, February). Time Delay Neural Network for Myanmar Automatic Speech Recognition. In 2020 IEEE Conference on Computer Applications (ICCA) (pp. 1-4). IEEE.10.1109/ICCA49400.2020.9022808Search in Google Scholar

Ragni, A., Knill, K. M., Rath, S. P., & Gales, M. J. (2014, September). Data augmentation for low resource languages. In INTERSPEECH 2014: 15th Annual Conference of the International Speech Communication Association (pp. 810-814). International Speech Communication Association (ISCA).10.21437/Interspeech.2014-207Search in Google Scholar

Gokay, R., & Yalcin, H. (2019, March). Improving low Resource Turkish speech recognition with Data Augmentation and TTS. In 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD) (pp. 357-360). IEEE.10.1109/SSD.2019.8893184Search in Google Scholar

Ko, T., Peddinti, V., Povey, D., & Khudanpur, S. (2015). Audio augmentation for speech recognition. In Sixteenth Annual Conference of the International Speech Communication Association.10.21437/Interspeech.2015-711Search in Google Scholar

Tachibana, H., Uenoyama, K., & Aihara, S. (2018, April). Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4784-4788). IEEE.10.1109/ICASSP.2018.8461829Search in Google Scholar

Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., ... & Zhu, Z. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173-182). PMLR.Search in Google Scholar

Availablehttps://github.com/SeanNaren/deepspeech.pytorch accesed 25.05. 2020.Search in Google Scholar

Graves, A., Fernández, S., Gomez, F., & Schmidhuber, J. (2006, June). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (pp. 369-376).10.1145/1143844.1143891Search in Google Scholar

Heafield, K., Pouzyrevsky, I., Clark, J. H., & Koehn, P. (2013, August). Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 690-696).Search in Google Scholar

Thai, B., Jimerson, R., Ptucha, R., & Prud’hommeaux, E. (2020, May). Fully Convolutional ASR for Less-Resourced Endangered Languages. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 126-130).Search in Google Scholar

Miao, Y., Gowayyed, M., Na, X., Ko, T., Metze, F., & Waibel, A. (2016, March). An empirical exploration of CTC acoustic models. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2623-2627). IEEE.10.1109/ICASSP.2016.7472152Search in Google Scholar

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., ... & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo