1. bookVolume 71 (2020): Issue 2 (April 2020)
Journal Details
License
Format
Journal
eISSN
1339-309X
First Published
07 Jun 2011
Publication timeframe
6 times per year
Languages
English
Open Access

Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Published Online: 13 May 2020
Volume & Issue: Volume 71 (2020) - Issue 2 (April 2020)
Page range: 78 - 86
Received: 01 Oct 2019
Journal Details
License
Format
Journal
eISSN
1339-309X
First Published
07 Jun 2011
Publication timeframe
6 times per year
Languages
English

[1] A. Zelenik and Z. Kacic, “Multi-Resolution Feature Extraction Algorithm in Emotional Speech Recognition”, Elektronika ir Elektrotechnika, vol. 21, no. 5, pp. 54–58, 2015, DOI: 10.5755/j01.eee.21.5.13328.10.5755/j01.eee.21.5.13328Search in Google Scholar

[2] M. Grůber and J. Matoušek, “Listening-Test-Based Annotation of Communicative Functions for Expressive Speech Synthesis”, P. Sojka, A. Horak, I. Kopecek, K. Pala (eds.): Text, Speech, and Dialogue (TSD) 2010, LNCS, vol. 6231, pp. 283–290, Springer 2010.Search in Google Scholar

[3] P. C. Loizou, “Speech Quality Assessment”, W. Tao, et al.(eds): Multimedia Analysis, Processing and Communications. Studies Computational Intelligence, vol. 346, pp. 623–654, Springer, Berlin, Heidelberg, 2011, DOI:10.1007/978-3-642-19551-8_23.10.1007/978-3-642-19551-8_23Search in Google Scholar

[4] H. Ye and S. Young, “High Quality Voice Morphing”, ICASSP 2004 Proceedings. IEEE International Conference on Acoustics, Speech, and Signal Processing, 17-21 May 2004, Montreal, Canada, DOI:10.1109/ICASSP.2004.1325909.10.1109/ICASSP.2004.1325909Search in Google Scholar

[5] M. Adiban, B. BabaAli and S. Shehnepoor, “Statistical Feature Embedding for Heart Sound Classification”, Journal of Electrical Engineering, vol. 70, no. 4, pp. 259–272, 2019, DOI: 10.2478/jee-2019-0056.10.2478/jee-2019-0056Search in Google Scholar

[6] B. Boilović, B. M. Todorović and M. Obradović, “Text-Independent Speaker Recognition using Two-Dimensional Information Entropy”, Journal of Electrical Engineering, vol. 66, no. 3, pp. 169–173, 2015, DOI: 10.1515/jee-2015-0027.Search in Google Scholar

[7] C. Y. Lee and Z. J. Lee, “A Novel Algorithm Applied to Classify Unbalanced Data”, Applied Soft Computing, vol. 12, pp. 2481–2485, 2012, DOI: 10.1016/j.asoc.2012.03.051.10.1016/j.asoc.2012.03.051Search in Google Scholar

[8] R. Vích, J. Nouza and M. Vondra, “Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems”, A. Esposito et al. (eds.): Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction, LNCS, vol. 5042, pp. 136–148, Springer 2008.Search in Google Scholar

[9] M. Cerňak, M. Rusko and M. Trnka, “Diagnostic Evaluation of Synthetic Speech using Speech Recognition”, Procs. of the 16th International Congress on Sound and Vibration (ICSV16), Kraków, Poland, 5-9 July, p. 6, 2009, https://pdfs.semanticscholar.org/502b/f1d8bfb0cc90cd3defcc9d479d9a97b23b66.pdf.Search in Google Scholar

[10] S. Möller, and J. Heimansberg, “Estimation of TTS Quality Telephone Environments Using a Reference-free Quality Prediction Model”, Second ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems, Berlin, Germany, September 2006, pp. 56–60, ISCA Archive, http://www.isca-speech.org/archive_open/pqs2006.Search in Google Scholar

[11] D.-Y. Huang, “Prediction of Perceived Sound Quality of Synthetic Speech”, Procs. of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2011 Xi’an, China, October 18-21, 2011, p. 6, http://www.apsipa.org/proceedings2011/pdf/APSIPA100.pdf.Search in Google Scholar

[12] S. Möller et al, “Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems”, 2010, INTERSPEECH- 2010, pp. 1325–1328, https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1325.pdf.10.21437/Interspeech.2010-413Search in Google Scholar

[13] F. Hinterleitner et al, “Predicting the Quality of Synthesized Speech using Reference-Based Prediction Measures”, Studientexte zur Sprachkommunikation: Elektronische Sprachsignalver-arbeitung, Session: Sprachsynthese-Evaluation und Prosodie, 2011, pp. 99–106, TUDpress, Dresden, http://www.essv.de/paper.php?id=14.Search in Google Scholar

[14] J. P. H. van Santen, “Segmental Duration and Speech Timing”, Y. Sagisaka, N.Campbell, N.Higuchi (eds.): Computing Prosody, Springer, New York, NY, pp. 225–248, 1997.10.1007/978-1-4612-2258-3_15Search in Google Scholar

[15] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.Search in Google Scholar

[16] V. Rodellar-Biarge, D. Palacios-Alonso, V. Nieto-Lluis, and P. Gomez-Vilda, “Towards the search of detection speech-relevant features for stress”, Expert Systems, vol. 32, no.6, pp. 710-718, 2015.DOI: 10.1111/exsy.12109.10.1111/exsy.12109Search in Google Scholar

[17] A. J. Hunt and A. W. Black, “Unit Selection a Concatenative Speech Synthesis System using a Large Speech Database”, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Atlanta (Georgia, USA), pp. 373–376, 1996, DOI: 10.1109/ICASSP.1996.541110.10.1109/ICASSP.1996.541110Search in Google Scholar

[18] J. Kala and J. Matoušek, “Very Fast Unit Selection using Viterbi Search with Zero-Concatenation-Cost Chains”, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), Florence, Italy, pp. 2569–2573, 2014.Search in Google Scholar

[19] M. Jůzová, D. Tihelka and R. Skarnitzl, “Last Syllable Unit Penalization Unit Selection TTS”, K. Ekstein and V. Matousek (eds.): Text, Speech, and Dialogue (TSD 2017), LNAI vol. 10415, pp. 317–325, 2017, DOI: 10.1007/978-3-319-64206-2 36.10.1007/978-3-319-64206-2Search in Google Scholar

[20] D. Tihelka, Z. Hanzlíček, M. Jůzová, J. Vít, J. Matoušek and M. Grůber, “Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies”, P. Sojka, A.Horák, I.Kopeček, and K. Pala (eds): Text, Speech, and Dialogue (TSD 2018), LNAI 11107, pp. 369–378, 2018, DOI: doi.org/10.1007/978-3-030-00794-2_40.Search in Google Scholar

[21] Z. Hanzlíček, J. Vít, and D. Tihelka, “WaveNet-Based Speech Synthesis Applied to Czech – A Comparison with the Traditional Synthesis Methods”, P. Sojka, A.Horák, I.Kopeček, and K. Pala (eds): Text, Speech, and Dialogue (TSD 2018), LNAI 11107, pp. 445–452, 2018, DOI: 10.1007/978-3-030-00794-2_48.10.1007/978-3-030-00794-2_48Search in Google Scholar

[22] J. Vít, Z. Hanzlíček and J. Matoušek, “Czech Speech Synthesis with Generative Neural Vocoder”, K. Ekštein (ed.): Text, Speech, and Dialogue (TSD 2019), LNAI 11697, pp. 307–315, 2019, DOI: 10.1007/978-3-030-27947-9_26.10.1007/978-3-030-27947-9_26Search in Google Scholar

[23] J. Matoušek, D. Tihelka and J. Psutka, “New Slovak Unit-Selection Speech Synthesis ARTIC TTS System”, Proceedings of the International Multiconference of Engineers and Computer Scientists (IMECS), San Francisco, USA, 2011.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo