1. bookVolume 9 (2009): Issue 4 (August 2009)
Journal Details
License
Format
Journal
eISSN
1335-8871
First Published
07 Mar 2008
Publication timeframe
6 times per year
Languages
English
access type Open Access

Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

Published Online: 03 Sep 2009
Volume & Issue: Volume 9 (2009) - Issue 4 (August 2009)
Page range: 95 - 104
Journal Details
License
Format
Journal
eISSN
1335-8871
First Published
07 Mar 2008
Publication timeframe
6 times per year
Languages
English
Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

Keywords

Iriondo, I., et al. (2009). Automatic refinement of an expressive speech corpus assembling subjective perception and automatic classification. Speech Communication, 51 (9), 744-758.10.1016/j.specom.2008.12.001Search in Google Scholar

Gobl, C., Ní Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40 (1-2), 189-212.10.1016/S0167-6393(02)00082-1Search in Google Scholar

d'Alessandro, C., et al. (1998). Effectiveness of a periodic and aperiodic decomposition method for analysis of voice sources. IEEE Transactions on Speech and Audio Processing, 6, 12-23.10.1109/89.650305Search in Google Scholar

Schoentgen, J. (2003). Decomposition of vocal cycle length perturbations into vocal jitter and vocal microtremor, and comparison of their size in normophonic speakers. Journal of Voice, 17, 114-125.10.1016/S0892-1997(03)00014-6Search in Google Scholar

Shahnaz, C., et al. (2006). A new technique for the estimation of jitter and shimmer of voiced speech signal. In Proceedings of the Canadian Conference on Electrical and Computer Engineering, CCECE 2006. IEEE, 2112-2115.10.1109/CCECE.2006.277799Search in Google Scholar

Farrús, M., et al. (2007). Jitter and shimmer measurements for speaker recognition. In Proceedings of the International Conference Interspeech 2007. Curran Associates, 778-781.10.21437/Interspeech.2007-147Search in Google Scholar

Perrot, P., et al. (2007). Voice disguise and automatic detection: review and perspectives. In Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) Progress in Nonlinear Speech Processing (Lecture Notes in Computer Science / Image Processing, Computer Vision, Pattern Recognition, and Graphics). Springer, 101-117.10.1007/978-3-540-71505-4_7Search in Google Scholar

Murphy, P. (2008). Source-filter comparison of measurements of fundamental frequency perturbation and amplitude perturbation for synthesized voice signals. Journal of Voice, 22, 125-137.10.1016/j.jvoice.2006.09.00717147983Search in Google Scholar

Juslin, P.N., Laukka, P. (2003). Communication of emotions in vocal expression and music performance: different channels, same code? Psychological Bulletin, 129, 770-814.10.1037/0033-2909.129.5.77012956543Search in Google Scholar

Tao, J., et al. (2009). Realistic visual speech synthesis based on hybrid concatenation method. IEEE Transactions on Audio, Speech, and Language Processing, 17, 469-477.10.1109/TASL.2008.2011538Search in Google Scholar

Přibilová, A., Přibil, J. (2006). Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description. Speech Communication, 48, 1691-1703.10.1016/j.specom.2006.08.001Search in Google Scholar

Přibilová, A., Přibil, J. (2009). Spectrum modification for emotional speech synthesis. In Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals: Cognitive and Algorithmic Issues (Lecture Notes in Artificial Intelligence). Springer, 232-241.10.1007/978-3-642-00525-1_23Search in Google Scholar

Vích, R. (2000). Cepstral speech model, Padé approximation, excitation, and gain matching in cepstral speech synthesis. In Proceedings of the 15th Biennial EURASIP Conference Biosignal 2000. Brno: University of Technology, 77-82.Search in Google Scholar

Gray, A.H., Jr., Markel, J.D. (1974). A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-22, 207-217.10.1109/TASSP.1974.1162572Search in Google Scholar

Ito, T., et al. (2005). Analysis and recognition of whispered speech. Speech Communication, 45, 139-152.10.1016/j.specom.2003.10.005Search in Google Scholar

Přibil, J., Přibilová, A. (2006). Voicing transition frequency determination for harmonic speech model. In Proceedings of the 13th International Conference on Systems, Signals and Image Processing, 25-28.Search in Google Scholar

Scherer, K.R. (2003). Vocal communication of emotion: a review of research paradigms. Speech Communication, 40, 227-256.10.1016/S0167-6393(02)00084-5Search in Google Scholar

Iida, A., et al. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161-187.10.1016/S0167-6393(02)00081-XSearch in Google Scholar

Oppenheim, A.V., Schafer, R.W. (1989). Digital Signal Processing. New Jersey: Prentice Hall.Search in Google Scholar

Suhov, Y., Kelbert, M. (2005). Probability and Statistics by Example: Volume I, Basic Probability and Statistics. Cambridge University Press.Search in Google Scholar

Boersma, P., Weenink, D. (2008). Praat: doing phonetics by computer (Version 5.0.32) [Computer Program]. Retrieved August 12, 2008, from http://www.praat.org/Search in Google Scholar

Boersma, P., Weenink, D. (2007). Praat — tutorial. Intro 4. Pitch analysis. Retrieved September 5, 2007, from http://www.fon.hum.uva.nl/praat/manual/Intro_4___Pitch_analysis.htmlSearch in Google Scholar

Vich, R., Nouza, J., Vondra, M. (2008). Automatic speech recognition used for intelligibility assessment of text-to-speech systems In Esposito, A., et al. (eds.) Verbal and Nonverbal Features of Human-Human and Human-Machine Interactions (Lecture Notes in Artificial Intelligence). Springer, 136-148.10.1007/978-3-540-70872-8_10Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo