Acceso abierto

An Intelligent Framework for Person Identification Using Voice Recognition and Audio Data Classification


Cite

[1] D.S. Park, W. Chan, Y. Zhang, C.C. Chiu, B. Zoph, E.D. Cubuk, and Q.V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” 2019. arXiv preprint arXiv:1904.08779. https://doi.org/10.48550/arXiv.1904.08779 Search in Google Scholar

[2] T. Fukuda, O. Ichikawa, and M. Nishimura, “Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition,” Speech Communication, vol. 98, pp. 95–103, Apr. 2018. https://doi.org/10.1016/j.specom.2018.01.008 Search in Google Scholar

[3] M. Wickert, “Real-time digital signal processing using pyaudio\_helper and the ipywidgets,” in Proceedings of the 17th Python in Science Conference, Austin, TX, USA, Jul. 2018, pp. 9–15. https://doi.org/10.25080/Majora-4af1f417-00e Search in Google Scholar

[4] A. Srivastava and S. Maheshwari, “Signal denoising and multiresolution analysis by discrete wavelet transform,” Innovative Trends in Applied Physical, Chemical, Mathematical Sciences and Emerging Energy Technology for Sustainable Development, 2015. Search in Google Scholar

[5] J. P. Dron and F. Bolaers, “Improvement of the sensitivity of the scalar indicators (crest factor, kurtosis) using a de-noising method by spectral subtraction: application to the detection of defects in ball bearings,” Journal of Sound and Vibration, vol. 270, no. 1–2, pp. 61–73, Feb. 2004. https://doi.org/10.1016/S0022-460X(03)00483-8 Search in Google Scholar

[6] E. Eban, A. Jansen, and S. Chaudhuri, “Filtering wind noises in video content,” U.S. Patent Application 15/826 622, March 22, 2018. Search in Google Scholar

[7] B.B. Ali, W. Wojcik, O. Mamyrbayev, M. Turdalyuly, and N. Mekebayev, “Speech recognizer -based non-uniform spectral compression for robust MFCC feature extraction,” Przeglad Elektrotechniczny, vol. 94, no. 6, pp.90–93, Jun. 2018. https://doi.org/10.15199/48.2018.06.17 Search in Google Scholar

[8] Ç.P. Dautov and M.S. Özerdem, “Wavelet transform and signal denoising using Wavelet method,” in 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, May 2018, pp. 1–4. https://doi.org/10.1109/SIU.2018.8404418 Search in Google Scholar

[9] R. Liu, L.O. Hall, K.W. Bowyer, D.B. Goldgof, R. Gatenby, and K.B. Ahmed, “Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction,” in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, Oct. 2017, pp. 1357–1362. https://doi.org/10.1109/SMC.2017.8122802 Search in Google Scholar

[10] R. Lotfian and C. Busso, “Over-sampling emotional speech data based on subjective evaluations provided by multiple individuals,” IEEE Transactions on Affective Computing, vol. 12, no. 4, pp. 870–882, Feb. 2019. https://doi.org/10.1109/TAFFC.2019.2901465 Search in Google Scholar

[11] Khan, I., Ullah, A. and Emad, S.M., “Robust Feature Extraction Techniques in Speech Recognition: A Comparative Analysis” in 2019 KIET Journal of Computing and Information Sciences, 2(2), pp.11-11. Search in Google Scholar

[12] E. Mulyanto, E.M. Yuniarno, and M.H. Purnomo, “Adding an emotions filter to Javanese text -to-speech system,” in 2018 International Conference on Computer Engineering, Network and Intelligent Multimedia (CENIM), Surabaya, Indonesia, Nov. 2018, pp. 142–146. https://doi.org/10.1109/CENIM.2018.8711229 Search in Google Scholar

[13] H. Liao, G. Pundak, O. Siohan, M.K. Carroll, N. Coccaro, Q.M. Jiang, T.N. Sainath, A. Senior, F. Beaufays, and M. Bacchiani, “Large vocabulary automatic speech recognition for children,” in Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, Sep. 2015, pp. 1611–1615. https://doi.org/10.21437/Interspeech.2015-373 Search in Google Scholar

[14] K.E. Kafoori and S.M. Ahadi, “Robust recognition of noisy speech through partial imputation of missing data,” Circuits, Systems, and Signal Processing, vol. 37, no. 4, pp. 1625–1648, Apr. 2018. https://doi.org/10.1007/s00034-017-0616-4 Search in Google Scholar

[15] H.F.C. Chuctaya, R.N.M. Mercado, and J.J.G. Gaona, “Isolated automatic speech recognition of Quechua numbers using MFCC, DTW and KNN,” Int. J. Adv. Comput. Sci. Appl, vol. 9, no. 10, pp. 24–29, 2018. https://doi.org/10.14569/IJACSA.2018.091003 Search in Google Scholar

[16] A. Winursito, R. Hidayat, A. Bejo, and M.N.Y. Utomo, “Feature data reduction of MFCC using PCA and SVD in speech recognition system,” in 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, Jul. 2018, pp. 1–6. https://doi.org/10.1109/ICSCEE.2018.8538414 Search in Google Scholar

[17] L.N. Thu, A. Win, and H.N. Oo, “A review for reduction of noise by wavelet transform in audio signals,” International Research Journal of Engineering and Technology (IRJET), vol. 6, no. 5, May 2019. Search in Google Scholar

[18] Y. Luo and N. Mesgarani, “Tasnet: time-domain audio separation network for real-time, single-channel speech separation,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 696–700. https://doi.org/10.1109/ICASSP.2018.8462116 Search in Google Scholar

[19] E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, “SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory,” Knowledge and Information Systems, vol. 33, no. 2, pp. 245–265, Nov. 2012. https://doi.org/10.1007/s10115-011-0465-6 Search in Google Scholar

[20] L. Abdi and S. Hashemi, “To combat multi-class imbalanced problems by means of over-sampling techniques,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 238–251, Jul. 2015. https://doi.org/10.1109/TKDE.2015.2458858 Search in Google Scholar

[21] A.E. Martin, “A compositional neural architecture for language,” Journal of Cognitive Neuroscience, vol. 32, no. 8, pp. 1407–1427, Aug. 2020. https://doi.org/10.1162/jocn_a_0155232108553 Search in Google Scholar

[22] S. Böck, F. Korzeniowski, J. Schlüter, F. Krebs, and G. Widmer, “Madmom: A new Python audio and music signal processing library,” in Proceedings of the 24th ACM International Conference on Multimedia, Oct. 2016, pp. 1174–1178. https://doi.org/10.1145/2964284.2973795 Search in Google Scholar

[23] Z. Wang, “Baidu online network technology Beijing Co Ltd, Audio processing method and apparatus based on artificial intelligence,” U.S. Patent Application 10/192163, 2019. Search in Google Scholar

[24] J.P. Cunningham and Z. Ghahramani, “Linear dimensionality reduction: Survey, insights, and generalizations,” The Journal of Machine Learning Research, vol. 16, no. 1, pp. 2859–2900, 2015. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://stat.columbia.edu/~cunningham/pdf/CunninghamJMLR2015.pdf Search in Google Scholar

[25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp.436–444, May 2015. https://doi.org/10.1038/nature1453926017442 Search in Google Scholar

[26] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, Mar. 2016, pp. 4960–4964. https://doi.org/10.1109/ICASSP.2016.7472621 Search in Google Scholar

[27] J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-based models for speech recognition,” in Advances in Neural Information Processing Systems, Proceedings of the Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 2015, pp. 577–585. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://proceedings.neurips.cc/paper/2015/file/1068c6e4c8051cfd4e9ea8072e3189e2-Paper.pdf Search in Google Scholar

[28] V.Z. Këpuska and H.A. Elharati, “Robust speech recognition system using conventional and hybrid features of MFCC, LPCC, PLP, RASTAPLP and hidden Markov model classifier in noisy conditions,” Journal of Computer and Communications, vol. 3, no. 6, pp. 1–9, Jun. 2015. https://doi.org/10.4236/jcc.2015.36001 Search in Google Scholar

[29] Ç.P. Dautov and M.S. Özerdem, “Wavelet transform and signal denoising using Wavelet method,” in 2018 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey, May 2018, pp. 1–4. https://doi.org/10.1109/SIU.2018.8404418 Search in Google Scholar

[30] N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. https://doi.org/10.1613/jair.953 Search in Google Scholar

[31] Z. Tüske, R. Schlüter, and H. Ney, “Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp. 4859–4863. https://doi.org/10.1109/ICASSP.2018.8461871 Search in Google Scholar

[32] R. Shadiev, T.T. Wu, A. Sun, and Y.M. Huang, “Applications of speech-to-text recognition and computer-aided translation for facilitating cross-cultural learning through a learning activity: issues and their solutions,” Educational Technology Research and Development, vol. 66, no. 1, pp. 191–214, Feb. 2018. https://doi.org/10.1007/s11423-017-9556-8 Search in Google Scholar

eISSN:
2255-8691
Idioma:
Inglés
Calendario de la edición:
2 veces al año
Temas de la revista:
Computer Sciences, Artificial Intelligence, Information Technology, Project Management, Software Development