Performance Analysis of Data Balancing Methods for Churn Prediction

This study evaluates the influence of various data balancing techniques on the performance of machine learning models for churn prediction across multiple imbalanced datasets. The proposed approach consists of data preparation, application of data balancing techniques on the training data, model training with hyperparameter optimization using genetic algorithms and comparative performance evaluation of the trained models. Six balancing techniques are evaluated —Random Undersampling, Random Oversampling, SMOTE, SMOTEENN, KMeansSMOTE, and ADASYN. The machine learning algorithms chosen are ensembles, such as Random Forest, Gradient Boosting Machines and XGBoost. Results indicate that XGBoost consistently outperforms other models, particularly when used in combination with SMOTE and SMOTEENN, achieving the highest sensitivity, F1 score and overall performance. Random Forest also reveals excellent predictive capabilities, especially with regard to correctly classifying loyal customers. SMOTE and SMOTEENN, particularly in combination with XGBoost and GBM, stand out as the most effective data balancing techniques, significantly improving model sensitivity. SMOTE performs particularly well when used with XGBoost and GBM, while SMOTEENN improves Random Forest’s ability to detect churners. The findings highlight the importance of selecting the appropriate algorithm and balancing technique based on dataset characteristics, business requirements and objectives of customer retention strategies.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Wirtschaftswissenschaften, Volkswirtschaft, Volkswirtschaft, andere, Betriebswirtschaft, Betriebswirtschaft, andere, Industrielle Chemie, Energiegewinnung und Umwandlung

Zeitschrift RSS Feed

Performance Analysis of Data Balancing Methods for Churn Prediction

Yanka Aleksandrova

Desislava Koleva

Online veröffentlicht: 24. Juli 2025

Seitenbereich: 944 - 957

DOI: https://doi.org/10.2478/picbe-2025-0074

Schlüsselwörterchurn prediction, machine learning, data balancing techniques, SMOTE, SMOTEENN, ensemble machine learning

© 2025 Yanka Aleksandrova et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
churn prediction, machine learning, data balancing techniques, SMOTE, SMOTEENN, ensemble machine learning