Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets

Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022. Blei D. M. Ng A. Y. & Jordan M. I. ( 2003 ). Latent Dirichlet allocation . Journal of Machine Learning Research , 3 ( Jan ), 993 - 1022 . Search in Google Scholar

Boutell, M. R., Luo, J. B., Shen, X. P. & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757-1771. https://doi.org/10.1016/j.patcog.2004.03.009 Boutell M. R. Luo J. B. Shen X. P. & Brown C. M. ( 2004 ). Learning multi-label scene classification . Pattern Recognition , 37 ( 9 ), 1757 - 1771 . https://doi.org/ 10.1016/j.patcog.2004.03.009 Open DOI Search in Google Scholar

Chen, L., Xu, S., Zhu, L. J., Zhang, J., Lei, X. P. & Yang, G. C. (2020). A deep learning based method for extracting semantic information from patent documents. Scientometrics, 125(1), 289-312. Chen L. Xu S. Zhu L. J. Zhang J. Lei X. P. & Yang G. C. ( 2020 ). A deep learning based method for extracting semantic information from patent documents . Scientometrics , 125 ( 1 ), 289 - 312 . Search in Google Scholar

Chen, L., Xu, S., Zhu, L J.., Zhang, J., Yang, G. C., & Xu, H. Y. (2022). A deep learning based method benefiting from characteristics of patents for semantic relation classification. Journal of Informetrics, 16(3), 101312. Chen L. Xu S. Zhu L J. Zhang J. Yang G. C. & Xu H. Y. ( 2022 ). A deep learning based method benefiting from characteristics of patents for semantic relation classification . Journal of Informetrics , 16 ( 3 ), 101312 . Search in Google Scholar

Chen, Q. Y., Allot, A., Leaman, R., Islamaj, R., Du, J. C., Fang, L., …, & Lu, Z. Y. (2022) Multilabel classification for biomedical literature: an overview of the BioCreative VII LitCovid track for COVID-19 literature topic annotation. Database, 2022, baac069. Chen Q. Y. Allot A. Leaman R. Islamaj R. Du J. C. Fang L. …, & Lu Z. Y. ( 2022 ) Multilabel classification for biomedical literature: an overview of the BioCreative VII LitCovid track for COVID-19 literature topic annotation . Database , 2022 , baac069 . Search in Google Scholar

Clare, A. & King, R. D. (2001). Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 42-53). Springer, Berlin, Heidelberg. Clare A. & King R. D. ( 2001 ). Knowledge discovery in multi-label phenotype data . In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 42 - 53 ). Springer , Berlin, Heidelberg . Search in Google Scholar

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407. Deerwester S. Dumais S. T. Furnas G. W. Landauer T. K. & Harshman R. ( 1990 ). Indexing by latent semantic analysis . Journal of the American Society for Information Science , 41 ( 6 ), 391 - 407 . Search in Google Scholar

Dekel, O. & Shamir, O. (2010). Multiclass-multilabel classification with more classes than examples. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 137-144). Dekel O. & Shamir O. ( 2010 ). Multiclass-multilabel classification with more classes than examples . In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (pp. 137 - 144 ). Search in Google Scholar

Du, J. C., Chen, Q. Y., Peng, Y. F., Xiang, Y., Tao, C., & Lu, Z. Y. (2019). ML-Net: multi-label classification of biomedical texts with deep neural networks. Journal of the American Medical Informatics Association, 26(11), 1279-1285. Du J. C. Chen Q. Y. Peng Y. F. Xiang Y. Tao C. & Lu Z. Y. ( 2019 ). ML-Net: multi-label classification of biomedical texts with deep neural networks . Journal of the American Medical Informatics Association , 26 ( 11 ), 1279 - 1285 . Search in Google Scholar

Elisseeff, A. & Weston, J. (2001). A kernel method for multi-labelled classification. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (pp. 681–687). Elisseeff A. & Weston J. ( 2001 ). A kernel method for multi-labelled classification . In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (pp. 681 – 687 ). Search in Google Scholar

Freitas Rocha, V., Varejão, F. M., & Segatto, M. E. V. (2022). Ensemble of classifier chains and decision templates for multi-label classification. Knowledge and Information Systems, 1-21. Freitas Rocha V. Varejão F. M. & Segatto M. E. V. ( 2022 ). Ensemble of classifier chains and decision templates for multi-label classification . Knowledge and Information Systems , 1 - 21 . Search in Google Scholar

Fürnkranz, J. & Hüllermeier, E. (2003). Pairwise preference learning and ranking. In: European Conference on Machine Learning (pp. 145-156). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_15 Fürnkranz J. & Hüllermeier E. ( 2003 ). Pairwise preference learning and ranking . In: European Conference on Machine Learning (pp. 145 - 156 ). Springer , Berlin, Heidelberg . https://doi.org/ 10.1007/978-3-540-39857-8_15 Open DOI Search in Google Scholar

Fürnkranz, J., Hüllermeier, E., Loza Mencía, E. & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133-153. Fürnkranz J. Hüllermeier E. Loza Mencía E. & Brinker K. ( 2008 ). Multilabel classification via calibrated label ranking . Machine Learning , 73 ( 2 ), 133 - 153 . Search in Google Scholar

Ghamrawi, N. & McCallum, A. (2005). Collective multi-label classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 195-200). Ghamrawi N. & McCallum A. ( 2005 ). Collective multi-label classification . In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 195 - 200 ). Search in Google Scholar

Godbole, S. & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 22-30). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24775-3_5 Godbole S. & Sarawagi S. ( 2004 ). Discriminative methods for multi-labeled classification . In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 22 - 30 ). Springer , Berlin, Heidelberg . https://doi.org/ 10.1007/978-3-540-24775-3_5 Open DOI Search in Google Scholar

Haghighian Roudsari, A., Afshar, J., Lee, W., & Lee, S. (2022). PatentNet: multi-label classification of patent documents using deep learning based language understanding. Scientometrics, 1-25. Haghighian Roudsari A. Afshar J. Lee W. & Lee S. ( 2022 ). PatentNet: multi-label classification of patent documents using deep learning based language understanding . Scientometrics , 1 - 25 . Search in Google Scholar

Katakis, I., Tsoumakas, G. & Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD 2008 Discovery Challenge (p. 5). Katakis I. Tsoumakas G. & Vlahavas I. ( 2008 ). Multilabel text classification for automated tag suggestion . In: Proceedings of the ECML/PKDD 2008 Discovery Challenge (p. 5 ). Search in Google Scholar

Kim, Y. (2014). Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empiric in Natural Language Processing (pp. 1746–1751). Kim Y. ( 2014 ). Convolutional neural networks for sentence classification . In: Proceedings of the 2014 Conference on Empiric in Natural Language Processing (pp. 1746 – 1751 ). Search in Google Scholar

Lai, S. W., Xu, L. H., Liu, K. & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (pp. 2267–2273). Lai S. W. Xu L. H. Liu K. & Zhao J. ( 2015 ). Recurrent convolutional neural networks for text classification . In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (pp. 2267 – 2273 ). Search in Google Scholar

Lewis, D. D., Yang, Y. M., Russell-Rose, T. & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5(Apr), 361-397. Lewis D. D. Yang Y. M. Russell-Rose T. & Li F. ( 2004 ). Rcv1: A new benchmark collection for text categorization research . Journal of Machine Learning Research , 5 ( Apr ), 361 - 397 . Search in Google Scholar

Li, T. & Ogihara, M. (2003). Detecting emotion in music. In: Proceedings of the 4th International Conference on Music Information Retrieval. Li T. & Ogihara M. ( 2003 ). Detecting emotion in music . In: Proceedings of the 4th International Conference on Music Information Retrieval . Search in Google Scholar

Liu, L. Q., Mu, F. N., Li, P. Y., Mu, X., Tang, J., Ai, X. S., … & Zhou, X. (2019). NeuralClassifier: an open-source neural hierarchical multi-label text classification toolkit. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 87-92). https://doi.org/10.18653/v1/P19-3015 Liu L. Q. Mu F. N. Li P. Y. Mu X. Tang J. Ai X. S. … & Zhou X. ( 2019 ). NeuralClassifier: an open-source neural hierarchical multi-label text classification toolkit . In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 87 - 92 ). https://doi.org/ 10.18653/v1/P19-3015 Open DOI Search in Google Scholar

Liu, P. F., Qiu, X. P. & Huang, X. J. (2016). Recurrent neural network for text classification with multi-task learning. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (pp. 2873–2879). https://doi.org/10.48550/arXiv.1605.05101 Liu P. F. Qiu X. P. & Huang X. J. ( 2016 ). Recurrent neural network for text classification with multi-task learning . In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (pp. 2873 – 2879 ). https://doi.org/ 10.48550/arXiv.1605.05101 Open DOI Search in Google Scholar

Liu, T. Y., Yang, Y. M., Wan, H., Zeng, H. J., Chen, Z. & Ma, W. Y. (2005). Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explorations Newsletter, 7(1), 36-43. Liu T. Y. Yang Y. M. Wan H. Zeng H. J. Chen Z. & Ma W. Y. ( 2005 ). Support vector machines classification with a very large-scale taxonomy . ACM SIGKDD Explorations Newsletter , 7 ( 1 ), 36 - 43 . Search in Google Scholar

Madjarov, G., Kocev, D., Gjorgjevikj, D. & Džeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9), 3084-3104. https://doi.org/10.1016/j.patcog.2012.03.004. Madjarov G. Kocev D. Gjorgjevikj D. & Džeroski S. ( 2012 ). An extensive experimental comparison of methods for multi-label learning . Pattern Recognition , 45 ( 9 ), 3084 - 3104 . https://doi.org/ 10.1016/j.patcog.2012.03.004 . Open DOI Search in Google Scholar

Pang, B. & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1-135. http://dx.doi.org/10.1561/1500000011 Pang B. & Lee L. ( 2008 ). Opinion mining and sentiment analysis . Foundations and Trends in Information Retrieval , 2 ( 1–2 ), 1 - 135 . http://dx.doi.org/ 10.1561/1500000011 Open DOI Search in Google Scholar

Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D. J., Johnson, N., Cohen, K. B. & Duch, W. (2007). A shared task involving multi-label classification of clinical free text. In: Biological, Translational, and Clinical Language Processing (pp. 97-104). Pestian J. Brew C. Matykiewicz P. Hovermale D. J. Johnson N. Cohen K. B. & Duch W. ( 2007 ). A shared task involving multi-label classification of clinical free text . In: Biological, Translational, and Clinical Language Processing (pp. 97 - 104 ). Search in Google Scholar

Read, J., Martino, L., Olmos, P. M. & Luengo, D. (2015). Scalable multi-output label prediction: From classifier chains to classifier trellises. Pattern Recognition, 48(6), 2096-2109. https:// doi.org/10.1016/j.patcog.2015.01.004 Read J. Martino L. Olmos P. M. & Luengo D. ( 2015 ). Scalable multi-output label prediction: From classifier chains to classifier trellises . Pattern Recognition , 48 ( 6 ), 2096 - 2109 . https:// doi.org/ 10.1016/j.patcog.2015.01.004 Open DOI Search in Google Scholar

Read, J., Pfahringer, B. & Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In: Proceedings of the 8^th IEEE International Conference on Data Mining (pp. 995-1000). https://doi.org/10.1109/ICDM.2008.74 Read J. Pfahringer B. & Holmes G. ( 2008 ). Multi-label classification using ensembles of pruned sets . In: Proceedings of the 8th IEEE International Conference on Data Mining (pp. 995 - 1000 ). https://doi.org/ 10.1109/ICDM.2008.74 Open DOI Search in Google Scholar

Read, J., Pfahringer, B., Holmes, G. & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333-359. Read J. Pfahringer B. Holmes G. & Frank E. ( 2011 ). Classifier chains for multi-label classification . Machine Learning , 85 ( 3 ), 333 - 359 . Search in Google Scholar

Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine learning, 85, 333-359. Read J. Pfahringer B. Holmes G. & Frank E. ( 2011 ). Classifier chains for multi-label classification . Machine learning , 85 , 333 - 359 . Search in Google Scholar

Roudsari, A. H., Afshar, J. Lee, W. & Lee S. (2022). PatentNet: multi-label classification of patent documents using deep learning base language understanding. Scientometrics, 127(1), 207-231. https://doi.org/10.1007/s11192-021-04179-4 Roudsari A. H. Afshar J. Lee W. & Lee S . ( 2022 ). PatentNet: multi-label classification of patent documents using deep learning base language understanding . Scientometrics , 127 ( 1 ), 207 - 231 . https://doi.org/ 10.1007/s11192-021-04179-4 Open DOI Search in Google Scholar

Rubin, T. N., Chambers, A., Smyth, P. & Steyvers, M. (2012). Statistical topic models for multilabel document classification. Machine Learning, 88(1), 157-208. https://doi.org/10.1007/s10994-011-5272-5 Rubin T. N. Chambers A. Smyth P. & Steyvers M. ( 2012 ). Statistical topic models for multilabel document classification . Machine Learning , 88 ( 1 ), 157 - 208 . https://doi.org/ 10.1007/s10994-011-5272-5 Open DOI Search in Google Scholar

Schapire, R. E. (1999). A brief introduction to boosting. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence (pp. 1401-1406). Schapire R. E. ( 1999 ). A brief introduction to boosting . In: Proceedings of the 16th International Joint Conference on Artificial Intelligence (pp. 1401 - 1406 ). Search in Google Scholar

Sechidis, K., Tsoumakas, G. & Vlahavas, I. (2011). On the stratification of multi-label data. In: Proceedings of the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Database (pp. 145-158). Sechidis K. Tsoumakas G. & Vlahavas I. ( 2011 ). On the stratification of multi-label data . In: Proceedings of the European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Database (pp. 145 - 158 ). Search in Google Scholar

Szymański, P. & Kajdanowicz, T. (2017). A scikit-based Python environment for performing multilabel classification. https://doi.org/10.48550/arXiv.1702.01460 Szymański P. & Kajdanowicz T. ( 2017 ). A scikit-based Python environment for performing multilabel classification . https://doi.org/ 10.48550/arXiv.1702.01460 Open DOI Search in Google Scholar

Szymański, P., Kajdanowicz, T. & Kersting, K. (2016). How is a data-driven approach better than random choice in label space division for multi-label classification? Entropy, 18(8), 282. https://doi.org/10.3390/e18080282 Szymański P. Kajdanowicz T. & Kersting K. ( 2016 ). How is a data-driven approach better than random choice in label space division for multi-label classification? Entropy , 18 ( 8 ), 282 . https://doi.org/ 10.3390/e18080282 Open DOI Search in Google Scholar

Trohidis, K., Tsoumakas, G., Kalliris, G. & Vlahavas, I. (2011). Multi-label classification of music by emotion. EURASIP Journal on Audio, Speech, and Music Processing, 2011(1), 1-9. https:// doi.org/10.1186/1687-4722-2011-426793 Trohidis K. Tsoumakas G. Kalliris G. & Vlahavas I. ( 2011 ). Multi-label classification of music by emotion . EURASIP Journal on Audio, Speech, and Music Processing , 2011 ( 1 ), 1 - 9 . https://doi.org/ 10.1186/1687-4722-2011-426793 Open DOI Search in Google Scholar

Tsoumakas, G. & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In: Proceedings of the 18th European Conference on Machine Learning (pp. 406-417). https://doi.org/10.1007/978-3-540-74958-5_38 Tsoumakas G. & Vlahavas I. ( 2007 ). Random k-labelsets: An ensemble method for multilabel classification . In: Proceedings of the 18th European Conference on Machine Learning (pp. 406 - 417 ). https://doi.org/ 10.1007/978-3-540-74958-5_38 Open DOI Search in Google Scholar

Tsoumakas, G., Katakis, I. & Vlahavas, I. (2010). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079-1089. https://doi.org/10.1109/TKDE.2010.164 Tsoumakas G. Katakis I. & Vlahavas I. ( 2010 ). Random k-labelsets for multilabel classification . IEEE Transactions on Knowledge and Data Engineering , 23 ( 7 ), 1079 - 1089 . https://doi.org/ 10.1109/TKDE.2010.164 Open DOI Search in Google Scholar

Ueda, N. & Saito, K. (2002). Parametric mixture models for multi-labeled text. In: Proceedings of the 15th International Conference on Neural Information Processing Systems (pp. 737-744). Ueda N. & Saito K. ( 2002 ). Parametric mixture models for multi-labeled text . In: Proceedings of the 15th International Conference on Neural Information Processing Systems (pp. 737 - 744 ). Search in Google Scholar

Xu, S. & An, X. (2019). ML²S-SVM: multi-label least-squares support vector machine classifiers, The Electronic Library, 37(6), 1040-1058. https://doi.org/10.1108/EL-09-2019-0207 Xu S. & An X. ( 2019 ). ML²S-SVM: multi-label least-squares support vector machine classifiers , The Electronic Library , 37 ( 6 ), 1040 - 1058 . https://doi.org/ 10.1108/EL-09-2019-0207 Open DOI Search in Google Scholar

Xu, S. (2018). Bayesian naïve Bayes classifiers to text classification. Journal of Information Science, 44(1), 48-59. https://doi.org/10.1177/0165551516677946 Xu S. ( 2018 ). Bayesian naïve Bayes classifiers to text classification . Journal of Information Science , 44 ( 1 ), 48 - 59 . https://doi.org/ 10.1177/0165551516677946 Open DOI Search in Google Scholar

Yang, Y. M., Zhang, J. & Kisiel, B. (2003). A scalability analysis of classifiers in text categorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 96-103). Yang Y. M. Zhang J. & Kisiel B. ( 2003 ). A scalability analysis of classifiers in text categorization . In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 96 - 103 ). Search in Google Scholar

Yu, Z. L., Wang, Q., Fan, Y., Dai, H. J. & Qiu, M. K. (2015). An improved classifier chain algorithm for multi-label classification of big data analysis. In: Proceedings of the IEEE 12th International Conference on Embedded Software and Systems (pp. 1298-1301). https://doi.org/10.1109/ HPCC-CSS-ICESS.2015.240 Yu Z. L. Wang Q. Fan Y. Dai H. J. & Qiu M. K. ( 2015 ). An improved classifier chain algorithm for multi-label classification of big data analysis . In: Proceedings of the IEEE 12th International Conference on Embedded Software and Systems (pp. 1298 - 1301 ). https://doi.org/ 10.1109/ HPCC-CSS-ICESS.2015.240 Open DOI Search in Google Scholar

Zhang, M. L. & Zhou, Z. H. (2006). Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1338-1351. https://doi.org/10.1109/TKDE.2006.162 Zhang M. L. & Zhou Z. H. ( 2006 ). Multilabel neural networks with applications to functional genomics and text categorization . IEEE Transactions on Knowledge and Data Engineering , 18 ( 10 ), 1338 - 1351 . https://doi.org/ 10.1109/TKDE.2006.162 Open DOI Search in Google Scholar

Zhang, M. L. & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038-2048. https://doi.org/10.1016/j.patcog.2006.12.019 Zhang M. L. & Zhou Z. H. ( 2007 ). ML-KNN: A lazy learning approach to multi-label learning . Pattern Recognition , 40 ( 7 ), 2038 - 2048 . https://doi.org/ 10.1016/j.patcog.2006.12.019 Open DOI Search in Google Scholar

Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Informática, Tecnologías de la información, Gestión de proyectos, Bases de datos y minería de datos

RSS Feed de revista

Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets

Shuo Xu

Yuefu Zhang

Xin An

Sainan Pi

Categoría del artículo: Research Papers

Publicado en línea: 27 may 2024

Páginas: 81 - 103

Recibido: 05 nov 2023

Aceptado: 26 feb 2024

DOI: https://doi.org/10.2478/jdis-2024-0014

Palabras claveMulti-label classification, Real-World datasets, Hierarchical structure, Classification system, Label correlation, Machine learning

© 2024 Shuo Xu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
Multi-label classification, Real-World datasets, Hierarchical structure, Classification system, Label correlation, Machine learning