Publicado en línea: 03 jul 2025
Páginas: 84 - 99
Recibido: 19 jun 2024
Aceptado: 04 ene 2025
DOI: https://doi.org/10.2478/crdj-2025-0006
Palabras clave
© 2025 Karlo Borovčak et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The intersection of artificial intelligence and music technology is creating new possibilities for cultural preservation and innovation. This study aims to utilise this technology by optimising deep learning models for accurate instrument classification, thereby contributing to advancements in music recognition, database organisation, and educational transcription tasks. Using the IRMAS dataset, we evaluated several neural network architectures, including DenseNet121, ResNet-50, and ConvNeXt, trained on log-Mel spectrograms of segmented audio clips to capture the unique acoustic features of each instrument. Results indicate that DenseNet121 achieved the highest classification accuracy, with notable performance in precision, recall, and F1-score compared to other models. However, challenges were observed in recognising instruments with fewer training samples, like the clarinet and cello, underscoring the importance of balanced datasets. While data augmentation techniques only partially addressed class imbalance, the findings offer valuable insights into designing robust music processing systems, highlighting areas for improvement in feature extraction and data handling. This study contributes to the development of AI-driven tools in music, offering potential benefits for cultural and educational growth.