Accès libre

Applications and Challenges of Statistics in Large-Scale Data Mining

À propos de cet article

Citez

As mathematical statistics evolve, their incorporation across diverse fields has markedly increased. This study examines specific challenges within statistical applications to data mining. By synthesizing theoretical frameworks and practical applications, this research delves into the utilization of statistical methods in data mining, enriched with practical examples. Notably, enhancements to the K-Means clustering algorithm are introduced through the optimization of initial clustering centers and the integration of a Gini index-based weighting system. This refined algorithm is subsequently applied to segment student behavioral groups, utilizing behavioral data from university students as the sample. Additionally, multiple linear regression models are employed to scrutinize variables related to student performance and to formulate a predictive model for their academic achievements. The analysis results in the identification of eight consumer behavior groups and nine academic effort groups, facilitating the classification of students. The variables exhibit varying levels of correlation with student performance, which are statistically significant (p < 0.05). Specifically, the total time spent on the Internet shows a negative correlation (-0.074), whereas grades from the previous semester display a positive correlation (0.593), both of which are particularly pronounced. The predictive model demonstrates a high accuracy, exceeding 80%, in forecasting student grades. Although the convergence of data mining and mathematical statistics presents challenges, it simultaneously offers substantial opportunities for the advancement of the field.

eISSN:
2444-8656
Langue:
Anglais
Périodicité:
Volume Open
Sujets de la revue:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics