1. bookVolume 117 (2020): Issue 1 (January 2020)
Journal Details
License
Format
Journal
eISSN
2353-737X
First Published
20 May 2020
Publication timeframe
1 time per year
Languages
English
access type Open Access

An evaluation of machine learning and latent semantic analysis in text sentiment classification

Published Online: 01 Oct 2020
Volume & Issue: Volume 117 (2020) - Issue 1 (January 2020)
Page range: -
Received: 20 Jun 2020
Accepted: 22 Sep 2020
Journal Details
License
Format
Journal
eISSN
2353-737X
First Published
20 May 2020
Publication timeframe
1 time per year
Languages
English
Abstract

In this paper, we compare the following machine learning methods as classifiers for sentiment analysis: k – nearest neighbours (kNN), artificial neural network (ANN), support vector machine (SVM), random forest. We used a dataset containing 5,000 movie reviews in which 2,500 were marked as positive and 2,500 as negative. We chose 5,189 words which have an influence on sentence sentiment. The dataset was prepared using a term document matrix (TDM) and classical multidimensional scaling (MDS). This is the first time that TDM and MDS have been used to choose the characteristics of text in sentiment analysis. In this case, we decided to examine different indicators of the specific classifier, such as kernel type for SVM and neighbour count in kNN. All calculations were performed in the R language, in the program R Studio v 3.5.2. Our work can be reproduced because all of our data sets and source code are public.

Keywords

Agarwal, B., & Mittal, N. (2016). Machine Learning Approach for Sentiment Analysis. In Prominent Feature Extraction for Sentiment Analysis (pp. 21–45). Springer, Cham.10.1007/978-3-319-25343-5_3Search in Google Scholar

Andrew L. Maas, R. E. (2011). Learning Word Vectors for Sentiment Analysis. 49th annual meeting of the association for computational linguistics: Human language technologies, 142–150.Search in Google Scholar

Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. Springer Science & Business Media.Search in Google Scholar

Burrell, J. (2016). How the machine ‘thinks’: Understanding. Big Data & Society, 1–12. http://doi.org/10.1177/205395171562251210.1177/2053951715622512Search in Google Scholar

Cox, M. A., & Cox, T. F. (2008). Multidimensional scaling. In Handbook of data visualization. In Handbook of Data Visualization (pp. 315–347). Berlin, Heidelberg: Springer.Search in Google Scholar

D. Tang, F. W. (2014). Learning Sentiment-Specific Word Embedding. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1, 1555–1565.Search in Google Scholar

Dos Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 69–78.Search in Google Scholar

Jifara, W., Jiang, F., Rho, S., Cheng, M., & Liu, S. (2019). Medical image denoising using convolutional neural network: a residual learning approach. The Journal of Supercomputing, 704–718.10.1007/s11227-017-2080-0Search in Google Scholar

Krouska, A., Troussas, C., & Virvou, M. (2016). The effect of preprocessing techniques on Twitter sentiment analysis. 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA), IEEE, 1–6.10.1109/IISA.2016.7785373Search in Google Scholar

Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical approach. Psychometrika.10.1007/BF02289694Search in Google Scholar

Kruskal, J. B. (1978). Multidimensional scaling. Sage.10.4135/9781412985130Search in Google Scholar

Mattila, M., & Salman, H. (2018). Analysing Social Media Marketing on Twitter using Sentiment Analysis. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229787 (access: 20/06/2020).Search in Google Scholar

Miazga, J., & Hachaj, T. (n.d.). Datasets and source code used in this article. Retrieved from https://github.com/JusMia/sentimentanalysis_ML (August 20, 2020).Search in Google Scholar

Ramteke, J., Shah, S., Godhia, D., & Shaikh, A. (2016). Election result prediction using Twitter sentiment analysis. 2016 international conference on inventive computation technologies (ICICT), Vol. 1, IEEE, 1–5.10.1109/INVENTIVE.2016.7823280Search in Google Scholar

Salminen, J., Yoganathan, V., Corporan, J., Jansen, B. J., & Jung, S.-G. (2019). Machine learning approach to auto-tagging online content for content marketing efficiency: A comparative analysis between methods and content type. Journal of Business Research, 203–217.10.1016/j.jbusres.2019.04.018Search in Google Scholar

Santra, A. K. (2012). Genetic Algorithm and Confusion Matrix for Document Clustering. International Journal of Computer Science Issues (IJCSI), 9(1), 322–328.Search in Google Scholar

Sebastiani, F. (2002). Consiglio Nazionale Delle Ricerche. Machine learning in automated text categorization. ACM Computing Surveys, 34, 1–47.10.1145/505282.505283Search in Google Scholar

Shimodaira, H., Noma, K.-I., Nakai, M., & Sagayama, S. (2002). Dynamic Time-Alignment Kernel in Support Vector Machine. Advances in neural information processing systems, 21–928.Search in Google Scholar

Soucy, P. &. (2005, July). Beyond TFIDF weighting for text categorization in the vector space model. IJCAI, 5, 1130–1135.Search in Google Scholar

Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117–126.10.1016/j.eswa.2016.03.028Search in Google Scholar

Trsteniak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based Framework for Text Categorization. Procedia Engineering, 69, 1356–1364.10.1016/j.proeng.2014.03.129Search in Google Scholar

Wang, X., Zhang, C., Ji, Y., Sun, L., Wu, L., & Bao, Z. (2013). A depression detection model based on sentiment analysis in micro-blog social network. Pacific- -Asia Conference on Knowledge Discovery and Data Mining, 201–213.10.1007/978-3-642-40319-4_18Search in Google Scholar

Yan, B. Y. (2017). Microblog sentiment classification using parallel SVM in apache spark. 2017 IEEE International Congress on Big Data (BigData Congress), IEEE, 282–288.10.1109/BigDataCongress.2017.43Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo