Accesso libero

Using the Bag-of-Audio-Words approach for emotion recognition

INFORMAZIONI SU QUESTO ARTICOLO

Cita

[1] D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding. Proceedings of SODA. 8 (2007) pp. 1027–1035, https://dl.acm.org/doi/10.5555/1283383.1283494 ⇒6 Search in Google Scholar

[2] S. Bernhard, C. John, S. John, J. Alexander, C. Robert, Estimating the support of a high-dimensional distribution. Neural Computation 13 (2001) 1443–1471, https://doi.org/10.1162/089976601750264965 ⇒810.1162/08997660175026496511440593 Search in Google Scholar

[3] F. Burkhardt, M. Ballegooy, K. Engelbrecht, T. Polzehl, J. Stegmann, Emotion detection in dialog systems: Applications, strategies and challenges, Proceedings Of ACII. 2009, pp. 985–989. ⇒210.1109/ACII.2009.5349498 Search in Google Scholar

[4] C. Chang, C. Lin, LIBSVM: A library for support vector machines, ACM Transactions On Intelligent Systems And Technology 2, 3 (2011) 1–27, https://doi.org/10.1145/1961189.1961199 ⇒810.1145/1961189.1961199 Search in Google Scholar

[5] G. Csurka, F. Perronnin, Fisher Vectors: Beyond bag-of-visual-words image representations, Computer Vision, Imaging And Computer Graphics, Theory And Applications 229 (2011) 28–42, http://dx.doi.org/10.1007/978-3-642-25382-9_2 ⇒310.1007/978-3-642-25382-9_2 Search in Google Scholar

[6] L. Deng, J. Droppo, A. Acero, A Bayesian approach to speech feature enhancement using the dynamic cepstral prior, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing 1 (2002) 829–832, http://dx.doi.org/10.1109/ICASSP.2002.5743867 ⇒610.1109/ICASSP.2002.5743867 Search in Google Scholar

[7] F. Eyben, M. Wöllmer, B. Schuller, Opensmile: The Munich Versatile and Fast Open-source Audio Feature Extractor. Proceedings Of ACM Multimedia. (2010) pp. 1459–1462, http://doi.acm.org/10.1145/1873951.1874246 ⇒710.1145/1873951.1874246 Search in Google Scholar

[8] G. Gosztolya, Using the Fisher vector representation for audio-based emotion recognition. Acta Polytechnica Hungarica. 17 (2020) 7–23, http://dx.doi.org/10.12700/APH.17.6.2020.6.1 ⇒3, 710.12700/APH.17.6.2020.6.1 Search in Google Scholar

[9] T. Grósz, L. Tóth, A comparison of deep neural network training methods for large vocabulary speech recognition. Proceedings Of TSD, 2013, pp. 36–43, http://dx.doi.org/10.1007/978-3-642-40585-3_6 ⇒310.1007/978-3-642-40585-3_6 Search in Google Scholar

[10] M. Hossain, G. Muhammad, Cloud-assisted speech and face recognition framework for health monitoring. Mobile Networks And Applications. 20 (2015) 391–399, https://doi.org/10.1007/s11036-015-0586-3 ⇒210.1007/s11036-015-0586-3 Search in Google Scholar

[11] D. Issa, M. F. Demirci, A. Yazici, Speech emotion recognition with deep convolutional neural networks, Biomedical Signal Processing and Control, 59 (2020) 101894, http://dx.doi.org/10.1016/j.bspc.2020.101894 ⇒310.1016/j.bspc.2020.101894 Search in Google Scholar

[12] J. James, L. Tian, C. Inez Watson, An open source emotional speech corpus for human robot interaction applications, Proceedings Of Interspeech (2018) pp. 2768–2772, https://doi.org/10.21437/Interspeech.2018-1349 ⇒210.21437/Interspeech.2018-1349 Search in Google Scholar

[13] C. Jones, J. Sutherland, Acoustic emotion recognition for affective computer gaming. 1 (2008), https://doi.org/10.1007/978-3-540-85099-1 ⇒210.1007/978-3-540-85099-1 Search in Google Scholar

[14] F. Metze, A. Batliner, F. Eyben, T. Polzehl, B. Schuller, S. Steidl, Emotion recognition using imperfect speech recognition. Proceedings Of Interspeech, 2010, pp. 478–481, http://dx.doi.org/10.21437/Interspeech.2010-202 ⇒810.21437/Interspeech.2010-202 Search in Google Scholar

[15] S. Pancoast, M. Akbacak, Bag-of-audio-words approach for multimedia event classification. Proceedings Of Interspeech, 2012, pp. 2105–2108. ⇒310.21437/Interspeech.2012-561 Search in Google Scholar

[16] S. Pancoast, M. Akbacak, Softening quantization in bag-of-audio-words. Proceedings Of ICASSP, 2014, pp. 1370–1374, https://doi.org/10.1109/ICASSP.2014.6853821 ⇒610.1109/ICASSP.2014.6853821 Search in Google Scholar

[17] J. Přibil, A. Přibilová, J. Matoušek, GMM-based speaker age and gender classificationinCzechand Slovak. Journal Of Electrical Engineering, 68 (2017) 3–12, http://dx.doi.org/10.1515/jee-2017-0001 ⇒210.1515/jee-2017-0001 Search in Google Scholar

[18] P. Raghavendra, W. Tianzi, V. Jesus, C. Nanxin, D. Najim, X-vectors meet emotions: A study on dependencies between emotion and speaker recognition, ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7169–7173, http://dx.doi.org/10.1109/ICASSP40776.2020.9054317 ⇒310.1109/ICASSP40776.2020.9054317 Search in Google Scholar

[19] S. Rawat, P. Schulam, S. Burger, D. Ding, Y. Wang, F. Metze, Robust audio-codebooks for large-scale event detection in consumer videos, Proceedings Of Interspeech, 2013, pp. 2929–2933 ⇒610.21437/Interspeech.2013-654 Search in Google Scholar

[20] M. Schmitt, F. Ringeval, B. Schuller, At the border of acoustics and linguistics: Bag-of-audio-words for the cecognition of emotions in speech, Proceedings Of Interspeech, 2016, pp. 495-499, http://dx.doi.org/10.21437/Interspeech.2016-1124 ⇒310.21437/Interspeech.2016-1124 Search in Google Scholar

[21] M. Schmitt, B. Schuller, openXBOW – Introducing the Passau open-source cross-modal bag-of-words toolkit. Journal Of Machine Learning Research 18 (2017) 1–5, http://jmlr.org/papers/v18/17-113.html ⇒5 Search in Google Scholar

[22] B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, et al., The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism, Proceedings Of Interspeech, 2013, pp. 148–152. ⇒710.21437/Interspeech.2013-56 Search in Google Scholar

[23] D. Sztahó, V. Imre, K. Vicsi, Automatic classification of emotions in spontaneous speech, Proceedings Of COST 2102, 2011, pp. 229–239, http://dx.doi.org/10.1007/978-3-642-25775-9_23 ⇒3, 710.1007/978-3-642-25775-9_23 Search in Google Scholar

[24] M. Swain, A. Routray, P. Kabisatpathy, Databases, features and classifiers for speech emotion recognition: a review, Int J Speech Technol 21 (2018) 93–120, https://link.springer.com/article/10.1007/s10772-018-9491-z ⇒310.1007/s10772-018-9491-z Search in Google Scholar

[25] K. Vicsi, D. Sztahó, Recognition of emotions on the basis of different levels of speech segments. Journal Of Advanced Computational Intelligence And Intelligent Informatics 16 (2012) 335–340, http://dx.doi.org/10.20965/jaciii.2012.p0335 ⇒3, 710.20965/jaciii.2012.p0335 Search in Google Scholar

[26] L. Vidrascu, L. Devillers, Detection of real-life emotions in call centers, Proceedings Of Interspeech, 2005, pp. 1841–1844, http://dx.doi.org/10.21437/Interspeech.2005-582 ⇒210.21437/Interspeech.2005-582 Search in Google Scholar

[27] T. Zhang, J. Wu, Speech emotion recognition with i-vector feature and RNN model, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), 2015, pp. 524-528, http://dx.doi.org/10.1109/ChinaSIP.2015.7230458 ⇒310.1109/ChinaSIP.2015.7230458 Search in Google Scholar

[28] Y. Zhang, R. Jin, Z. Zhou, Understanding bag-of-words model: A statistical framework. International Journal Of Machine Learning And Cybernetics 1 (2010) 43–52, http://dx.doi.org/10.1007/s13042-010-0001-0 ⇒310.1007/s13042-010-0001-0 Search in Google Scholar

eISSN:
2066-7760
Lingua:
Inglese
Frequenza di pubblicazione:
2 volte all'anno
Argomenti della rivista:
Computer Sciences, other