Dynamic Voice Parameter Modifications in Speech Signals

[1] M. A. Karjalainen, V. T. Pulkki, “Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics”, John Wiley & Sons Ltd., 2015. Search in Google Scholar

[2] G. Fant, “Acoustic Theory of Speech Production”, Walter de Gruyter GmbH, 1970.10.1515/9783110873429 Search in Google Scholar

[3] J. L. Flanagan, “Speech Analysis, Synthesis and Perception”, Springer, 1972.10.1007/978-3-662-01562-9 Search in Google Scholar

[4] M. R. Schroeder, “A brief history of synthetic speech”, Speech Communication, vol. 13, Elsevier, 1993.10.1016/0167-6393(93)90074-U Search in Google Scholar

[5] https://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/, Accessed on 8th November 2020. Search in Google Scholar

[6] U. Zölzer, “DAFx Digital Audio Effects 2nd Edition”, John Wiley & Sons Ltd., 2011.10.1002/9781119991298 Search in Google Scholar

[7] https://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en, Accessed on 15th May 2021. Search in Google Scholar

[8] V. Panayotov, G. Chen, D. Povey, S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books”, IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206-5210, 2015. Search in Google Scholar

[9] R. Wang, J. Lu, “Investigation of golden speakers for second language learners from imitation preference perspective by voice modification”, Speech Communication, vol. 53 (2), pp. 175-184, 2011.10.1016/j.specom.2010.08.015 Search in Google Scholar

[10] H. Kai, S. Takamichi, S. Shiota, H. Kiya, “Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules”, IEEE Spoken Language Technology Workshop (SLT), pp. 560-566, 2021.10.1109/SLT48900.2021.9383535 Search in Google Scholar

[11] R. González Hautamäki, “Human-induced voice modification and speaker recognition”, Automatic, perceptual and acoustic perspectives, Publications of the University of Eastern Finland, Dissertations in Forestry and Natural Sciences, 2017. Search in Google Scholar

[12] E. S. Ottosen, M. Dörfler, “A Phase Vocoder Based on Nonstationary Gabor Frames”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25 (11), pp. 2199-2208, 2017. Search in Google Scholar

[13] M. Kaniewska, “Human Voice Modification Using Instantaneous Complex Frequency”, J. of the Audio Engineering Society (JAES), Paper 8136, 2010. Search in Google Scholar

[14] V. V. Nar, A. N. Cheeran, S. Banerjee, “Verification of TD-PSOLA for Implementing Voice Modification”, Int. J. of Engineering Research and Applications (IJERA), Vol. 3 (3), pp.461-465, 2013. Search in Google Scholar

[15] S. Kannan, P. R. Raju, R. S. S. Madhav, S. Tripathi, “Voice Conversion Using Spectral Mapping and TD-PSOLA”, Advances in Computing and Network Communications, Lecture Notes in Electrical Engineering, vol. 736, Springer, Singapore, 2021.10.1007/978-981-33-6987-0_17 Search in Google Scholar

[16] Y. Y. Zhang, F. F. Wang, W. T. Du, “The DSP Implementation of Algorithm for Voice Speed Changing and Pitch Shifting Based on TD-PSOLA”, Applied Mechanics and Materials, vol. 543-547, pp. 2833-2837, 2014. Search in Google Scholar

[17] B. Akanksh, S. Vekkot, S. Tripathi, “Interconversion of Emotions in Speech Using TD-PSOLA”, Advances in Signal Processing and Intelligent Recognition Systems, Advances in Intelligent Systems and Computing, vol. 425, Springer, Cham, 2016.10.1007/978-3-319-28658-7_32 Search in Google Scholar

[18] A. Moinet, T. Dutoit, “PVSOLA: A phase vocoder with synchronized overlap-add”, Proc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, 2011. Search in Google Scholar

[19] S. Kraft, M. Holters, A. v. d. Knesebeck, U. Zölzer, “Improved PVSOLA time-stretching and pitch-shifting for polyphonic audio”, Proc. Int. Conf. on Digital Audio Effects (DAFx), York, UK, pp. 17-21, 2012. Search in Google Scholar

[20] J. Laroche, “Frequency-Domain Techniques for High-Quality Voice Modification”, Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, 2003. Search in Google Scholar

[21] M. Liuni, A. Röbel, “Phase vocoder and beyond”, Musica/Tecnologia, vol. 7, pp. 73-89, 2013. Search in Google Scholar

[22] Z. Průša, N. Holighaus, “Phase vocoder done right”, 25th European Signal Processing Conference (EUSIPCO), pp. 976-980, 2017.10.23919/EUSIPCO.2017.8081353 Search in Google Scholar

[23] M. Morise, F. Yokomori, K. Ozawa, “WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications”, IEICE Transactions on Information and Systems, 2016.10.1587/transinf.2015EDP7457 Search in Google Scholar

[24] A. Roebel, “A Shape-Invariant Phase Vocoder for Speech Transformation”, Digital Audio Effects (DAFx), Graz, Austria, pp. 1-1, 2010. Search in Google Scholar

[25] A. Roebel, “Shape-invariant speech transformation with the phase vocoder”, InterSpeech, Makuhari, Japan, pp. 2146-2149, 2010. Search in Google Scholar

[26] A. Sorin, S. Shechtman, A. Rendel, “Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities”, Interspeech, pp. 1373-1377, 2017. Search in Google Scholar

[27] O. Perrotin, I. Mcloughlin, “GFM-Voc: A real-time voice quality modification system”, Interspeech, 20th Annual Conf. of the Int. Speech Communication Association, Graz, Austria, pp. 3685-3686, 2019. Search in Google Scholar

[28] Y. Stylianou, “Voice Transformation: A survey”, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 3585-3588, 2009. Search in Google Scholar

[29] M. Dolson, “The Phase Vocoder: A Tutorial”, Computer Music Journal, vol. 10, 1986.10.2307/3680093 Search in Google Scholar

[30] J.L. Flanagan, R.M. Golden, “Phase Vocoder”, Bell System Technical Journal, 1966.10.1002/j.1538-7305.1966.tb01706.x Search in Google Scholar

[31] https://www.dsprelated.com/freebooks/sasp/Choice_Hop_Size.html, Accessed on 22nd April 2021. Search in Google Scholar

[32] https://www.mathworks.com/help/matlab/matlab_prog/what-are-system-objects.html, Accessed on 13th April 2021. Search in Google Scholar

[33] https://www.itu.int/rec/T-REC-P.862-200102-I/en, Accessed on 15th May 2021. Search in Google Scholar

[34] https://www.itu.int/rec/T-REC-P.800.1/en, Accessed on 15th May 2021. Search in Google Scholar

[35] https://www.itu.int/rec/T-REC-P.862.1-200311-I/en, Accessed on 15th May 2021. Search in Google Scholar

[36] https://www.itu.int/rec/T-REC-P.863, Accessed on 17th May 2021 Search in Google Scholar

[37] https://qxlab.ucd.ie/index.php/speech-quality-metrics/, Accessed on 17th May 2021. Search in Google Scholar

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Engineering, Electrical Engineering, Automation, Mechanical Engineering, Production Technology, Process Engineering and Industrial Engineering

Journal RSS Feed

Dynamic Voice Parameter Modifications in Speech Signals

Filip Cristian George

Neghină Mihai

Published Online: Dec 30, 2021

Page range: 1 - 10

DOI: https://doi.org/10.2478/aucts-2021-0001

KeywordsSpeech, Voice, Volume, Duration, Pitch, Timbre, Formants, Amplification, Attenuation, Phase Vocoder, Time-scaling, Pitch-shifting, Interpolation, Spectral envelope, Cepstrum

© 2021 Filip Cristian George et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Speech, Voice, Volume, Duration, Pitch, Timbre, Formants, Amplification, Attenuation, Phase Vocoder, Time-scaling, Pitch-shifting, Interpolation, Spectral envelope, Cepstrum