1. bookVolume 10 (2010): Issue 3 (June 2010)
Journal Details
License
Format
Journal
eISSN
1335-8871
First Published
07 Mar 2008
Publication timeframe
6 times per year
Languages
English
access type Open Access

An Experiment with Evaluation of Emotional Speech Conversion by Spectrograms

Published Online: 16 Jun 2010
Volume & Issue: Volume 10 (2010) - Issue 3 (June 2010)
Page range: 72 - 77
Journal Details
License
Format
Journal
eISSN
1335-8871
First Published
07 Mar 2008
Publication timeframe
6 times per year
Languages
English
An Experiment with Evaluation of Emotional Speech Conversion by Spectrograms

The spectrogram is a useful tool for visual quality comparison of different types of emotional synthetic speech. This paper is focused on application of this method to evaluation of sentences after spectral and prosodic modifications for emotional speech conversion. Performed experiments with spectrogram evaluation for several male and female speakers and four emotional states (joy, sadness, anger, neutral state) are also described.

Keywords

Möller, S., Jekosch, U., Mersdorf, J., Kraft, V. (2001). Auditory assessment of synthetized speech in application scenarios: two case studies. Speech Communication, 34, 229-246.10.1016/S0167-6393(00)00036-4Search in Google Scholar

Audibert, N., Vincent, D., Aubergé, V., Rosec, O. (2006). Evaluation of expressive speech resynthesis. In LREC 2006: Workshop on Emotional Corpora. Genova, Italy, 37-40.Search in Google Scholar

Gobl, C., Ní Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40, 189-212.10.1016/S0167-6393(02)00082-1Search in Google Scholar

Tučková, J., Holub, J., Duběda, T. (2009). Technical and phonetic aspects of speech quality assessment: the case of prosody synthesis. In Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gesture, Gaze and Facial Expressions. Lecture Notes in Artificial Intelligence 5641. Berlin Heidelberg: Springer-Verlag, 106-115.10.1007/978-3-642-03320-9_13Search in Google Scholar

Přibilová, A., Přibil, J. (2006). Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description. Speech Communication, 48, 1691-1703.10.1016/j.specom.2006.08.001Search in Google Scholar

Přibil, J., Přibilová, A. (2008). Application of expressive speech in TTS system with cepstral description. In Esposito, A. et. al (eds.) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction 2007. Lecture Notes in Artificial Intelligence 5042. Berlin Heidelberg: Springer-Verlag, 201-213.10.1007/978-3-540-70872-8_15Search in Google Scholar

Volaufová, J. (2005). Statistical methods in biomedical research and measurement science. Measurement Science Review, 5 (1), 1-10.Search in Google Scholar

Hartung, J., Makambi, H. K, Arcac, D. (2001). An extended ANOVA F-test with applications to the heterogeneity problem in meta-analysis. Biometrical Journal, 43 (2), 135-146.10.1002/1521-4036(200105)43:2<135::AID-BIMJ135>3.0.CO;2-HSearch in Google Scholar

Welch, P. D. (1967). The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics, AU-15, 70-73.10.1109/TAU.1967.1161901Search in Google Scholar

Stoica, P., Moses, R. L. (1997). Introduction to Spectral Analysis. Prentice-Hall, 52-54Search in Google Scholar

Slifka, J., Anderson, T. R. (1995). Speaker modification with LPC pole analysis. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Detroit, 644-647.10.1109/ICASSP.1995.479680Search in Google Scholar

Přibilová, A., Přibil, J. (2009). Spectrum modification for emotional speech synthesis. In Esposito, A. et. al. (eds.) Multimodal Signals: Cognitive and Algorithmic Issues. Lecture Notes in Artificial Intelligence 5398. Berlin Heidelberg: Springer-Verlag, 232-241.10.1007/978-3-642-00525-1_23Search in Google Scholar

Přibilová, A., Přibil, J. (2009). Harmonic model for female voice emotional synthesis. In Fierrez, J. et al. (eds.) Biometric ID Management and Multimodal Communication. Lecture Notes in Artificial Intelligence 5707. Berlin Heidelberg: Springer-Verlag, 49-56.10.1007/978-3-642-04391-8_6Search in Google Scholar

Scherer, K. R. (2003). Vocal communication of emotions: review of research paradigms. Speech Communication, 40, 227-256.10.1016/S0167-6393(02)00084-5Search in Google Scholar

Benesty, J., Sondhi, M. M., Huang, Y. (eds.) (2008). Springer Handbook of Speech Processing. Berlin Heidelberg: Springer-Verlag.10.1007/978-3-540-49127-9Search in Google Scholar

Khuri, A. I., Mathew, T., Sinha, B. K. (1998). Statistical Tests for Mixed Models. New York: John Wiley & Sons.Search in Google Scholar

Srinivasan, S., DeLiang, W. (2010). Robust speech recognition by integrating speech separation and hypothesis testing. Speech Communication, 52, 72-81.10.1016/j.specom.2009.08.008Search in Google Scholar

Atassi, H., Smékal, Z. (2008). Real-time model for automatic vocal emotion recognition. In Proceedings of the 31st International Conference on Telecommunications and Signal Processing. Parádfürdö, Hungary, 21-25.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo