Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

Big data research has become an important discipline in information systems research. However, the flood of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts. Thus, research indicates that there is an increasing need to develop more efficient algorithms for treating mixed data in big data for effective decision making. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm that handles the problem of mixed data. We then use big data platforms to implement the algorithm, demonstrating its functionalities by applying the algorithm in a detailed case study. This provides us with a solid basis for performing more targeted profiling for decision making and research using big data. Consequently, the decision makers will be able to treat mixed data, numerical and categorical data, to explain and predict phenomena in the big data ecosystem. Our research includes a detailed end-to-end case study that presents an implementation of the suggested procedure. This demonstrates its capabilities and the advantages that allow it to improve the decision-making process by targeting organizations’ business requirements to a specific cluster[s]/profiles[s] based on the enhancement outcomes.

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Databases and Data Mining

Journal RSS Feed

Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

Oded Koren

Carina Antonia Hallin

Nir Perel

Dror Bendet

Published Online: Aug 30, 2019

Page range: 293 - 302

Received: May 08, 2019

Accepted: Jul 25, 2019

DOI: https://doi.org/10.2478/jaiscr-2019-0010

KeywordsBig data, mixed data, Hadoop, K-means, decision making

© 2019 Oded Koren et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
Big data, mixed data, Hadoop, K-means, decision making