A Bimodal Deep Model to Capture Emotions from Music Tracks

[1] L. Smietanka and T. Maka, “Interpreting convolutional layers in DNN model based on time–frequency representation of emotional speech,” Journal of Artificial Intelligence and Soft Computing Research, vol. 14, no. 1, pp. 5–23, Jan. 2024, doi: 10.2478/jaiscr-2024-0001. Search in Google Scholar

[2] S. Sheykhivand, Z. Mousavi, T. Y. Rezaii, and A. Farzamnia, “Recognizing Emotions Evoked by Music Using CNN-LSTM Networks on EEG Signals,” IEEE Access, vol. 8, pp. 139332-139345, 2020, doi: 10.1109/ACCESS.2020.3011882. Search in Google Scholar

[3] Y. Takahashi, T. Hochin, and H. Nomiya, “Relationship between Mental States with Strong Emotion Aroused by Music Pieces and Their Feature Values,” in Proc. 2014 IIAI 3rd International Conference on Advanced Applied Informatics, 2014, pp. 718-725, doi: 10.1109/IIAIAAI.2014.147. Search in Google Scholar

[4] P. A. Wood and S. K. Semwal, “On exploring the connection between music classification and evoking emotion,” in Proc. 2015 International Conference on Collaboration Technologies and Systems(CTS), 2015, pp. 474-476, doi: 10.1109/CTS.2015.7210471. Search in Google Scholar

[5] M. Agapaki, E. A. Pinkerton, and E. Papatzikis, “Music and neuroscience research for mental health, cognition, and development: Ways forward,” Frontiers in Psychology, vol. 13, 2022, doi: https://doi.org/10.3389/fpsyg.2022.976883. Search in Google Scholar

[6] Y. Song, S. Dixon, M. Pearce, and A. Halpern, “Perceived and Induced Emotion Responses to Popular Music: Categorical and Dimensional Models,” Music Perception: An Interdisciplinary Journal, vol. 33, pp. 472-492, Apr. 2016, doi: 10.1525/mp.2016.33.4.472. Search in Google Scholar

[7] Y. Yuan, “Emotion of Music: Extraction and Composing,” Journal of Education, Humanities and Social Sciences, vol. 13, pp. 422-428, May 2023, doi: 10.54097/ehss.v13i.8207. Search in Google Scholar

[8] S. A. Sujeesha, J. B. Mala, and R. Rajeev, “Automatic music mood classification using multi-modal attention framework,” *Engineering Applications of Artificial Intelligence*, vol. 128, p. 107355, 2024, doi: 10.1016/j.engappai.2023.107355. Search in Google Scholar

[9] M. Schedl, P. Knees, B. McFee, D. Bogdanov, and M. Kaminskas, “Music recommender systems,” in Recommender systems handbook, Springer, 2015, pp. 453-492. Search in Google Scholar

[10] MorphCast Technology. Available: https://www.morphcast.com. Accessed: November 2024. Search in Google Scholar

[11] S. Zhao, G. Jia, J. Yang, G. Ding, and K. Keutzer, “Emotion Recognition From Multiple Modalities: Fundamentals and methodologies,” IEEE Signal Processing Magazine, vol. 38, no. 6, pp. 59-73, Nov. 2021, doi: 10.1109/msp.2021.3106895. Search in Google Scholar

[12] T. Li, “Music emotion recognition using deep convolutional neural networks,” Journal of Computational Methods in Science and Engineering, vol. 24, no. 4-5, pp. 3063-3078, 2024, doi: 10.3233/JCM-247551. Search in Google Scholar

[13] P. L. Louro, H. Redinho, R. Malheiro, R. P. Paiva, and R. Panda, “A comparison study of deep learning methodologies for music emotion recognition,” Sensors, vol. 24, no. 7, p. 2201, 2024, doi: 10.3390/s24072201. Search in Google Scholar

[14] M. Blaszke, G. Korvel, and B. Kostek, “Exploring neural networks for musical instrument identification in polyphonic audio,” IEEE Intelligent Systems, pp. 1-11, 2024, doi: 10.1109/mis.2024.3392586. Search in Google Scholar

[15] M. Barata and P. Coelho, “Music Streaming Services: Understanding the drivers of customer purchase and intention to recommend,” Heliyon, vol. 7, p. e07783, Aug. 2021, doi: 10.1016/j.heliyon.2021.e07783. Search in Google Scholar

[16] J. Webster, “The promise of personalization: Exploring how music streaming platforms are shaping the performance of class identities and distinction,” New Media & Society, p. 146144482110278, Jul. 2021, doi: 10.1177/14614448211027863. Search in Google Scholar

[17] E. Schmidt, D. Turnbull, and Y. Kim, “Feature selection for content-based, time-varying musical emotion regression,” in Proc ACM SIGMM Int Conf Multimedia Info Retrieval, Mar. 2010, pp. 267-274, doi: 10.1145/1743384.1743431. Search in Google Scholar

[18] Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, I.-B. Liao, Y.-C. Ho, and H. H. Chen, “Toward Multimodal Music Emotion Classification,” in Advances in Multimedia Information Processing - PCM 2008, 2008, pp. 70-79. Search in Google Scholar

[19] T. Ciborowski, S. Reginis, D. Weber, A. Kurowski, and B. Kostek, “Classifying Emotions in Film Music—A Deep Learning Approach,” Electronics, vol. 10, no. 23, p. 2955, Nov. 2021, doi: 10.3390/electronics10232955. Search in Google Scholar

[20] X. Han, F. Chen, and J. Ban, “Music Emotion Recognition Based on a Neural Network with an Inception-GRU Residual Structure,” Electronics, vol. 12, no. 4, p. 978, Feb. 2023, doi: 10.3390/electronics12040978. Search in Google Scholar

[21] Y. J. Liao, W. C. Wang, S.-J. Ruan, Y. H. Lee, and S. C. Chen, “A Music Playback Algorithm Based on Residual-Inception Blocks for Music Emotion Classification and Physiological Information,” Sensors, vol. 22, no. 3, p. 777, Jan. 2022, doi: 10.3390/s22030777. Search in Google Scholar

[22] R. Sarkar, S. Choudhury, S. Dutta, A. Roy, and S. K. Saha, “Recognition of emotion in music based on deep convolutional neural network,” Multimedia Tools and Applications, vol. 79, pp. 765-783, 2019, [Online]. Available: https://api.semanticscholar.org/CorpusID:254866914. Search in Google Scholar

[23] S. Giammusso, M. Guerriero, P. Lisena, E. Palumbo, and R. Troncy, “Predicting the emotion of playlists using track lyrics,” International Society for Music Information Retrieval ISMIR, Late Breaking Session, 2017. Search in Google Scholar

[24] Y. Agrawal, R. Shanker, and V. Alluri, “Transformer-based approach towards music emotion recognition from lyrics,” Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science, vol 12657. Springer, 2021, doi: 10.1007/978-3-030-72240-1 12. Search in Google Scholar

[25] D. Han, Y. Kong, H. Jiayi, and G. Wang, “A survey of music emotion recognition,” Frontiers of Computer Science, vol. 16, Dec. 2022, doi: 10.1007/s11704-021-0569-4. Search in Google Scholar

[26] T. Baltrušaitis, C. Ahuja, and L. -P. Morency, “Multimodal Machine Learning: A Survey and Taxonomy,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423-443, 1 Feb. 2019, doi: 10.1109/TPAMI.2018.2798607. Search in Google Scholar

[27] R. Delbouys, R. Hennequin, F. Piccoli, J. Royo-Letelier, and M. Moussallam, “Music Mood Detection Based On Audio And Lyrics With Deep Neural Net,” ISMIR 2018 https://doi.org/10.48550/arXiv.1809.07276 Search in Google Scholar

[28] I. A. P. Santana et al., “Music4all: A new music database and its applications,” in Proc. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020, pp. 399-404, doi: 10.1109/IWSSIP48289.2020.9145170. Search in Google Scholar

[29] E. Çano and M. Morisio, “Moodylyrics: A sentiment annotated lyrics dataset,” in Proc. 2017 International conference on intelligent systems, meta-heuristics & swarm intelligence, 2017, pp. 118-124, doi: 10.1145/3059336.3059340. Search in Google Scholar

[30] E. Çano and M. Morisio, “Music mood dataset creation based on last.Fm tags,” in Proc. 2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria, 2017, pp. 15-26, DOI:10.5121/csit.2017.70603. Search in Google Scholar

[31] R.E. Thayer: The Biopsychology of Mood and Arousal, Oxford University Press, 1989. Search in Google Scholar

[32] J. Russell, “A Circumplex Model of Affect,” Journal of Personality and Social Psychology, vol. 39, pp. 1161-1178, Dec. 1980, doi: 10.1037/h0077714. Search in Google Scholar

[33] Social music service - Last.fm. Available: https://www.last.fm/. Accessed: November 2024. Search in Google Scholar

[34] Genius - Song Lyrics & Knowledge. Available: https://genius.com/. Accessed: November 2024. Search in Google Scholar

[35] YouTube. Available: https://www.youtube.com. Accessed: November 2024. Search in Google Scholar

[36] M. Sakowicz and J. Tobolewski, “Development and study of an algorithm for the automatic labeling of musical pieces in the context of emotion evoked,” M.Sc. thesis, Gdansk University of Technology and Universitat Politècnica de Catalunya (co-supervised by B. Kostek and J. Turmo), 2023. Search in Google Scholar

[37] Genius and Spotify partnering. Available: https://genius.com/a/genius-and-spotify-together. Accessed: November 2024. Search in Google Scholar

[38] Pafy library. Available: https://pypi.org/project/pafy/. Accessed: November 2024. Search in Google Scholar

[39] Moviepy library. Available: https://pypi.org/project/moviepy/. Accessed: November 2024. Search in Google Scholar

[40] M. Honnibal and I. Montani, “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing,” 2017. Available: https://github.com/explosion/spaCy. Accessed: November 2024. Search in Google Scholar

[41] P. N. Johnson-Laird and K. Oatley, “Emotions, Simulation, and Abstract Art,” Art & Perception, vol. 9, no. 3, pp. 260-292, 2021, DOI: https://doi.org/10.1163/22134913-bja10029. Search in Google Scholar

[42] P. N. Johnson-Laird and K. Oatley, “How poetry evokes emotions,” Acta Psycho-logica, vol. 224, p. 103506, 2022, doi: https://doi.org/10.1016/j.actpsy.2022.103506. Search in Google Scholar

[43] J. Pennington, R. Socher, and C. Manning, “GloVe: Global Vectors for Word Representation,” in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Oct. 2014, pp. 1532-1543, doi: 10.3115/v1/D14-1162. Search in Google Scholar

[44] SpaCy - pre-trained pipeline for English. Available: https://spacy.io/models/en\#en_core_web_lg. Accessed: November 2024. Search in Google Scholar

[45] S. Loria, “Textblob Documentation,” Release 0.15, vol. 2, 2018. Available: https://textblob.readthedocs.io/en/dev/. Accessed: November 2024. Search in Google Scholar

[46] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825-2830, 2011. Available: http://jmlr.org/papers/v12/pedregosa11a.html. Accessed: November 2024. Search in Google Scholar

[47] ”Paradise City” Guns N’ Roses https://genius.com/Guns-n-roses-paradise-citylyrics Search in Google Scholar

[48] FastText - text classification tutorial. Available: https://fasttext.cc/docs/en/supervisedtutorial.html. Accessed: November 2024. Search in Google Scholar

[49] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” Jan. 2020, pp. 38-45, doi: 10.18653/v1/2020.emnlp-demos.6. Search in Google Scholar

[50] XLNet (base-sized model). Available: https://huggingface.co/xlnet-base-cased. Accessed: November 2024. Search in Google Scholar

[51] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019. https://doi.org/10.48550/arXiv.1906.08237 Search in Google Scholar

[52] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2818-2826 doi: 10.1109/CVPR.2016.308. Search in Google Scholar

[53] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90. Search in Google Scholar

[54] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd International Conference on Learning Representations(ICLR 2015), 2015, pp. 1-14. https://doi.org/10.48550/arXiv.1409.1556 Search in Google Scholar

[55] Librosa library. Available: https://librosa.org/. Accessed: November 2024. Search in Google Scholar

[56] Chollet, F. et al., 2015. Keras. Available: https://github.com/fchollet/keras. Accessed: November 2024. Search in Google Scholar

[57] TensorFlow library. Available: https://www.tensorflow.org/?hl=pl. Accessed: November 2024. Search in Google Scholar

[58] S. C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, and M. Lungren, “Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines,” npj Digital Medicine, vol. 3, 12, 2020. https://doi.org/10.1038/s41746-020-00341-z Search in Google Scholar

[59] A. Paszke et al., 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 32. Curran Associates, Inc., pp. 8024-8035. Search in Google Scholar

[60] Combining two deep learning models. Available: https://control.com/technical-articles/combining-two-deep-learning-models/. Accessed: November 2024. Search in Google Scholar

[61] Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A.Hanjalic, and N. Oliver, “TFMAP: Optimizing MAP for top-n context-aware recommendation,” in Proc. 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155-164, Portland Oregon USA, August 2012, doi: 10.1145/2348283.2348308. Search in Google Scholar

[62] K. Pyrovolakis, P.K. Tzouveli, and G. Stamou, Multi-Modal Song Mood Detection with Deep Learning. Sensors (Basel, Switzerland), 22, 2022, doi:10.3390/s22031065 Search in Google Scholar

[63] E. N. Shaday, V. J. L. Engel, and H. Heryanto, “Application of the Bidirectional Long Short-Term Memory Method with Comparison of Word2Vec, GloVe, and FastText for Emotion Classification in Song Lyrics”, Procedia Computer Science, vol. 245, pp. 137-146, 2024, https://doi.org/10.1016/j.procs.2024.10.237 Search in Google Scholar

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Databases and Data Mining

Journal RSS Feed

A Bimodal Deep Model to Capture Emotions from Music Tracks

Jan Tobolewski

Michał Sakowicz

Jordi Turmo

Bożena Kostek

Published Online: Mar 18, 2025

Page range: 215 - 235

Received: Nov 28, 2024

Accepted: Feb 21, 2025

DOI: https://doi.org/10.2478/jaiscr-2025-0011

Keywordsautomatic labeling, deep model, emotion, music, lyrics, machine learning

© 2025 Jan Tobolewski et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
automatic labeling, deep model, emotion, music, lyrics, machine learning