Albanian Text Classification: Bag of Words Model and Word Analogies

1. Antonellis, I., Bouras, C., Poulopoulos, V. (2006), “Personalized news categorization through scalable text classification”, in Zhou, X., Li, J., Shen, H. T., Kitsuregawa, M., Zhang, Y. (Eds.) Frontiers of WWW Research and Development – APWeb 2006, Springer, Berlin, Heidelberg, pp. 391-401.10.1007/11610113_35Search in Google Scholar

2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2017), “Enriching word vectors with subword information”, Transactions of the Association of Computational Linguistics, Vol. 5, pp.135-146.10.1162/tacl_a_00051Search in Google Scholar

3. Chaudhari, S. V., Lade, S. (2013), “Classification of News and Research Articles Using Text Pattern Mining”, IOSR Journal of Computer Engineering (IOSR-JCE), Vol. 14, No. 5, pp. 120-126.10.9790/0661-145120126Search in Google Scholar

4. Cortes, C., Vapnik, V. (1995), “Support-vector networks”, Machine Learning, Vol. 20, No. 3, pp. 273-297.10.1007/BF00994018Search in Google Scholar

5. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y., (2006), “Online passive-aggressive algorithms”, Journal of Machine Learning Research, Vol. 7, pp. 551-585.Search in Google Scholar

6. Gui, Y., Gao, Z., Li, R., Yang, X. (2012), “Hierarchical text classification for news articles based-on named entities”, in Zhou, S., Zhangs, S., Karypis, G. (Eds.) Advanced Data Mining and Applications, Springer, Berlin, Heidelberg, pp. 318-329.10.1007/978-3-642-35527-1_27Search in Google Scholar

7. Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S. (2017), “Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks”, in Proceedings of Symposium in Information and Human Language Technology, Uberlandia, MG, Brazil, pp. 122-131.Search in Google Scholar

8. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T. (2016), “Bag of tricks for efficient text classification”, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol. 2, Short Papers, pp. 427-431.10.18653/v1/E17-2068Search in Google Scholar

9. Jurka, T. P., Collingwood, L., Boydstun, A. E., Grossman, E., van Atteveldt, W. (2013) “RTextTools: A supervised learning package for text classification”, The R Journal, Vol. 5, No. 1, pp. 6-12.10.32614/RJ-2013-001Search in Google Scholar

10. Liparas, D., HaCohen-Kerner, Y., Moumtzidou, A., Vrochidis, S., Kompatsiaris, I. (2014), “News Articles Classification Using Random Forests and Weighted Multimodal Features”, in Lamas, D., Buitelaar, P. (Eds.), Multidisciplinary Information Retrieval, Springer, Cham, pp. 63-75.10.1007/978-3-319-12979-2_6Search in Google Scholar

11. Manning, C. D., Raghavan, P., Schutze, H. (2008). Introduction to Information Retrieval, New York, Cambridge University Press.10.1017/CBO9780511809071Search in Google Scholar

12. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013), “Efficient estimation of word representations in vector space”, in Proceedings of the International Conference on Learning Representations (ICLR 2013), available at: https://arxiv.org/pdf/1301.3781.pdfSearch in Google Scholar

13. September 2013).Search in Google Scholar

14. Natural Language Processing Group (2014). Web corpora of Bosnian, Croatian and Serbian top-level domain published, available at: http://nlp.ffzg.hr/web-corpora-of-bosniancroatian-and-serbian-top-level-domain-published/ (7 September 2014).Search in Google Scholar

15. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V. Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011), “Scikit-learn: Machine Learning in Python”, In Journal of Machine Learning Research, Vol. 12, pp. 2825-2830.Search in Google Scholar

16. Raschka, S. (2015). Python machine learning, Birmingham, Packt Publishing Ltd.Search in Google Scholar

17. Rubin, T. N., Chambers, A., Smyth, P., Steyvers, M. (2012), “Statistical topic models for multilabel document classification”, Machine Learning, Vol. 88, No. 1-2, pp. 157-208.10.1007/s10994-011-5272-5Search in Google Scholar

18. Scannell, K. P. (2007), “The Crúbadán Project: Corpus building for under-resourced languages”, in Fairon, C., Naets, H., Kilgarriff, A., de Schryver, G. M. (Eds.), Building and Exploring Web Corpora, Proceedings of the 3rd Web as Corpus Workshop, Vol. 4, pp. 5-15.Search in Google Scholar

19. Swezey, R. M., Sano, H., Shiramatsu, S., Ozono, T., Shintani, T. (2012), “Automatic detection of news articles of interest to regional communities”, International Journal of Computer Science and Network Security, Vol. 12, No. 6, pp. 99-106.Search in Google Scholar

20. Tyers, F. M., Alperen, M. S. (2010), “South-east European times: A parallel corpus of Balkan languages”, in Proceedings of the LREC Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, pp. 49-53.Search in Google Scholar

21. Zhou, D., Resnick, P., Mei, Q. (2011), “Classifying the Political Leaning of News Articles and Users from User Votes”, in 5^th International AAAI Conference on Web and Social Media, North America, pp. 417-424.10.1609/icwsm.v5i1.14108Search in Google Scholar

eISSN:: 1847-9375
Idioma:: Inglés

Calendario de la edición:: 2 veces al año
Temas de la revista:: Business and Economics, Business Management, Management, Organization, Corporate Governance, other, Mathematics and Statistics for Economists, Mathematics

RSS Feed de revista

Albanian Text Classification: Bag of Words Model and Word Analogies

Publicado en línea: 09 may 2019

Páginas: 74 - 87

Recibido: 01 dic 2017

Aceptado: 22 feb 2018

DOI: https://doi.org/10.2478/bsrj-2019-0006

Palabras clavedata mining, text classification, news articles, machine learning

© 2019 Arbana Kadriu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Palabras clave
data mining, text classification, news articles, machine learning