Performance analysis of speech enhancement using spectral gating with U-Net

[1]Y. Masuyama, M. Togami and T. Komatsu, “Consistency-aware multi-channel speech enhancement using deep neural networks”, Proceedings 2020 IEEE International Acoustics, Speech and Signal Processing Conference (ICASSP), pp. 821-825, 2020. DOI: 10.1109/ICASSP40776.2020.9053501Search in Google Scholar

[2]P. C. Loizou, Speech enhancement: theory and practice, 1st ed. Boca Raton: CRC press, pp. 1-10, 2007.Search in Google Scholar

[3]S. Gannot, E. Vincent, S. Markovich-Golan and A. Ozerov, “A consolidated perspective on multi microphone speech enhancement and source separation”, IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 25, no. 4, pp. 692-730, 2017. DOI: 10.1109/TASLP.2016.2647702Search in Google Scholar

[4]C. Rascon, “Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications”, Sensors, vol. 23, no. 9, p. 4394, 2023. DOI: https://doi.org/10.3390/s23094394Search in Google Scholar

[5]H. Garg, B. Sharma, S. Shekhar and R. Agarwal, “Spoofing detection system for e-health digital twin using Efficient Net Convolution Neural Network”, Multimedia Tools and Applications, vol. 81, no. 16, pp. 26873-26888, 2022. DOI: https://doi.org/10.1007/s11042-021-11578-5Search in Google Scholar

[6]D. Agarwal and A. Bansal, “Fingerprint liveness detection through fusion of pores perspiration and texture features”, J. King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4089-4098, 2020. DOI: https://doi.org/10.1016/j.jksuci.2020.10.003Search in Google Scholar

[7]G. Gosztolya and T. Grósz, “Domain adaptation of deep neural networks for automatic speech recognition via wireless sensors”, Journal of Electrical Engineering, vol. 67, no. 2, pp. 124-130, 2016. DOI: https://doi.org/10.1007/s11042-022-13056-ySearch in Google Scholar

[8]S. Shekhar, D. K. Sharma, M. M. Sufyan Beg, “Hindi Roman linguistic framework for retrieving transliteration variants using bootstrapping”, Procedia Computer Science, vol. 125, pp. 59-67, 2018. DOI: 10.1016/j.procs.2017.12.010Search in Google Scholar

[9]R. Martinek, M. Kelnar, J. Vanus, P. Bilik and J. Zidek, “A robust approach for acoustic noise suppression in speech using ANFIS”, Journal of electrical engineering, vol. 66, no. 6, pp. 301-310, 2015. DOI: https://doi.org/10.2478/jee-2015-0050Search in Google Scholar

[10]Y. Tsao and Y. H. Lai, “Generalized maximum a posteriori spectral amplitude estimation for speech enhancement”, Speech Communication, vol. 76, pp. 112-126, 2016. DOI: https://doi.org/10.1016/j.specom.2015.10.003Search in Google Scholar

[11]J. Cheng, R. Liang and L. Zhao, “DNN-based speech enhancement with self-attention on feature dimension”, Multimedia Tools and Applications, vol. 79, pp. 32449-32470, 2020. DOI: https://doi.org/10.1007/s11042-020-09345-zSearch in Google Scholar

[12]S. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, 1979. DOI: 10.1109/TASSP.1979.1163209Search in Google Scholar

[13]P. Scalart, “Speech enhancement based on a priori signal to noise estimation”, Proceedings 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 629-632, 1996. DOI: 10.1109/ICASSP.1996.543199Search in Google Scholar

[14]Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”, IEEE Trans. on acoustics, speech, and signal processing, Vol. 32, no. 6, pp. 1109-1121, 1984. DOI: 10.1109/TASSP.1984.1164453Search in Google Scholar

[15]C. Lan, Y. Wang, L. Zhang, C. Liu and X. Lin, “Research on Speech Enhancement Algorithm of Multiresolution Cochleagram Based on Skip Connection Deep Neural Network”, Sensors, vol. 2022, 2022. DOI: https://doi.org/10.1155/2022/5208372Search in Google Scholar

[16]Z. Kang, Z. Huang and C. Lu, “Speech Enhancement Using U-Net with Compressed Sensing”, App. Sciences, vol. 12, no. 9, p. 4161, 2022. DOI: https://doi.org/10.3390/app12094161Search in Google Scholar

[17]O. Ronneberger, P. Fischer and T. Brox, “U-net: Convolutional networks for biomedical image segmentation”, Proceedings 2015 International Conference on Medical image computing and computer-assisted intervention, (Springer Cham.), pp. 234-241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28Search in Google Scholar

[18]C. Geng and L. Wang, “End-to-end speech enhancement based on discrete cosine transform”, Proceedings 2020 IEEE International Artificial Intelligence and Computer Applications Conf. (ICAICA), pp. 379-383, 2020. DOI: 10.1109/ICAICA50127.2020.9182513Search in Google Scholar

[19]D. Stoller, S. Ewert and S. Dixon S, “Wave-unet: A multi-scale neural network for end-to-end audio source separation”, arXiv preprint arXiv:1806.03185, 2018. DOI: https://doi.org/10.48550/arXiv.1806.03185Search in Google Scholar

[20]C. Macartney and T. Weyde, “Improved speech enhancement with the wave-u-net”, arXiv preprint arXiv:1811.11307, 2018. DOI: https://doi.org/10.48550/arXiv.1811.11307Search in Google Scholar

[21]B. Widrow, J. R. Glover, J. M. McCool, J. Kaunitz, C. S. Williams, R. H. Hearn and R. C. Goodlin, “Adaptive noise cancelling: Principles and applications”, Proceedings of the IEEE, vol. 63, no. 12, pp. 1692-1716, 1975. DOI: 10.1109/PROC.1975.10036Search in Google Scholar

[22]M. Ravanelli, T. Parcollet, P. Plantinga, A. Rouhe, S. Cornell, L. Lugosch, and Y. Bengio, “SpeechBrain: A general-purpose speech toolkit”, arXiv preprint arXiv:2106.04624, 2021. DOI: https://doi.org/10.48550/arXiv.2106.04624Search in Google Scholar

[23]V. Panayotov, G. Chen, D. Povey and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books”, Proceedings IEEE International Acoustics, Speech and Signal Processing Conference (ICASSP), pp. 5206-5210, 2015. DOI: 10.1109/ICASSP.2015.7178964Search in Google Scholar

[24]P. Loizou and Y. Hu, “NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms”, Speech Communication vol. 49, pp. 588-601, 2007. DOI: 10.1016/j.specom.2006.12.006Search in Google Scholar

[25]I. T. Recommendation, “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs”, Rec. ITU-T. P. 862, 2001.Search in Google Scholar

[26]M. Al-Akhras, K. Daqrouq and A. R. Al-Qawasmi, “Perceptual evaluation of speech enhancement,” In 2010 7th International Multi-Conference on Systems, Signals and Devices, pp. 1-6, IEEE, 2010. DOI: 10.1109/SSD.2010.5585514Search in Google Scholar

[27]M. Kolbaek, Z. H. Tan and J. Jensen, “On the relationship between short-time objective intelligibility and short-time spectral-amplitude mean-square error for speech enhancement”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 2, pp. 283-295, 2018. DOI: 10.1109/TASLP.2018.2877909Search in Google Scholar

[28]R. Giri, U. Isik and A. Krishnaswamy, “Attention wave-u-net for speech enhancement”, IEEE Workshop 2019 Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 249-253, 2019. DOI: 10.1109/WASPAA.2019.8937186Search in Google Scholar

eISSN:: 1339-309X
Langue:: Anglais

Périodicité:: 6 fois par an
Sujets de la revue:: Engineering, Introductions and Overviews, other

RSS Feed de la revue

Performance analysis of speech enhancement using spectral gating with U-Net

Publié en ligne: 21 oct. 2023

Pages: 365 - 373

Reçu: 26 juil. 2023

DOI: https://doi.org/10.2478/jee-2023-0044

Mots clésspeech enhancement, spectral gating, deep neural network, U-Net, optimizers

© 2023 Jharna Agrawal et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
speech enhancement, spectral gating, deep neural network, U-Net, optimizers