1. bookVolume 2 (2020): Issue 1 (December 2020)
Journal Details
First Published
20 Oct 2019
Publication timeframe
1 time per year
access type Open Access

Churn prepaid customers classified by HyperOpt techniques

Published Online: 31 May 2021
Page range: 107 - 119
Journal Details
First Published
20 Oct 2019
Publication timeframe
1 time per year

The telecommunications industry is representative when it comes to a country’s economy. In this industry, the customer plays a very important role in maintaining a stable income. The churn customer is one of the most important concerns for large companies. This increased attention is due to its direct effect on the revenues of large companies in the telecommunications industry, companies being in a constant search to develop ways to predict this type of customer. The aim of our paper is to identify potential customers at risk of churn using modern data mining techniques, often used in the business world. From the nine techniques tested, we choose as the churn prediction model, the technique with the highest performance. The effectiveness of the model is tested and evaluated by the f1-score. The model developed in the paper uses machine learning techniques on the Python platform, exploring a wide range of algorithms from logistic regression and the method of balancing the analyzed data set (Balanced Random Forest) to supervised learning methods (K-Nearest Neighbors, Naive Bayes) and optimization packages (Ligh GBM, CATBoost, ADABoost, RUSBoost, Stochastic Gradient Descent). The techniques analyzed in this paper cover a diverse range of methods that are compared in terms of performance. RUSBoost proves to be the best churn prediction model for telecom customers in this study. RUSBoost has the lowest loss function of all the tested techniques.


Amrehn, M., Mualla, F., Angelopoulou, E., Steidl, S., & Maier, A. (2019, January 4). The Random Forest Classifier in WEKA: Discussion and New Developments for Imbalanced Data. Retrieved from arxiv.org: https://arxiv.org/pdf/1812.08102.pdfSearch in Google Scholar

Azeem, M., Usman, M., & Fong, A. C. (2017). A churn prediction model for prepaid customers in telecom using fuzzy classifiers. New York: Springer Science+Business Media.Search in Google Scholar

Bergstra, J., Yamins, D., & Cox, D. D. (2013). Hyperopt: A Python Library for Optimizing theHyperparameters of Machine Learning Algorithms. THe 12th Python in Science Conference.Search in Google Scholar

Brownlee, J. (2016, March 23). Gradient Descent For Machine Learning. Retrieved from machinelearningmastery.com: https://machinelearningmastery.com/gradient-descent-for-machine-learning/Search in Google Scholar

Brownlee, J. (2016, April 25). Machine Learning Mastery. Retrieved from Boosting and AdaBoost for Machine Learning: https://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/Search in Google Scholar

Brownlee, J. (2020, August 15). Logistic Regression for Machine Learning. Retrieved from machinelearningmastery.com: https://machinelearningmastery.com/logistic-regression-for-machine-learning/Search in Google Scholar

catboost.ai. (2020). Overview of CatBoost. Retrieved from catboost.ai: https://catboost.ai/docs/concepts/about.htmlSearch in Google Scholar

CatBoost: gradient boosting with categorical features support. (2018).Search in Google Scholar

Chawla, N. V., Hall, L. O., Bowyer, K. W., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 321-357.Search in Google Scholar

Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. (2003). Smoteboost: Improving prediction of the minority class in boosting. Principles of Knowledge Discovery in Databases.Search in Google Scholar

Chen, C., Liaw, A., & Breiman, L. (n.d.). statistics.berkeley. Retrieved from Using Random Forest to Learn Imbalanced Data: https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdfSearch in Google Scholar

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM.Search in Google Scholar

Cunningham, P., & Delany, S. J. (2007). k-Nearest Neighbour Classifiers. ResearchGate. datacamp.com. (2020, October 31). F1-Score. Retrieved from datacamp.com: https://campus.datacamp.com/courses/marketing-analytics-predicting-customer-churn-in-python/churn-prediction?ex=16Search in Google Scholar

Dorogush, A. V., Ershov, V., & Gulin, A. (2018, October 24). CatBoost: gradient boosting with categorical features. Retrieved from arxiv.org: https://arxiv.org/pdf/1810.11363.pdfSearch in Google Scholar

Drummond, C., & Holte, R. C. (2003). Class imbalance, and cost sensitivity: why under-sampling beats over-sampling. International Conference on Machine Learning.Search in Google Scholar

Fasel, I. (2001, October 23). AdaBoost. Retrieved from caseweb.ucsd.edu: https://cseweb.ucsd.edu/classes/fa01/cse291/AdaBoost.pdfSearch in Google Scholar

Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. The 13th International Conference on Machine Learning, (pp. 148-156).Search in Google Scholar

G. Lemaitre, F. N. (2020). imbalanced-learn. Retrieved from Ensemble of samplers: https://imbalanced-learn.readthedocs.io/en/stable/ensemble.html geeksforgeeks.org. (2020, May 16). ML | Stochastic Gradient Descent (SGD). Retrieved from geeksforgeeks.org: https://www.geeksforgeeks.org/ml-stochastic-gradient-descent-sgd/Search in Google Scholar

Harrison, O. (2018, September 10). Towards Data Science. Retrieved from Machine Learning Basics with the K-Nearest Neighbors Algorithm: https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761Search in Google Scholar

Hulse, J. V., Khoshgoftaar, T. M., & Napolitano, A. (2007). Experimental perspectives on learning from imbalanced data. The 24th International Conference on Machine Learning, (pp. 935-942). Corvallis, OR, USA.Search in Google Scholar

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., . . . Liu, T. (2017). LightGBM: A Highly Efficient Gradient Boosting. 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA.Search in Google Scholar

Kesikoglu, M. H., Atasever, U. H., Ozkan, C., & E.Besdok. (2016, July 12-19). THE USAGE OF RUSBOOST BOOSTING METHOD FOR CLASSIFICATION OF IMPERVIOUS SURFACES. Retrieved from Semantincsholar.org: https://pdfs.semanticscholar.org/98eb/e7c18ee8040f43cf4677fbf4ef8ed0a067c9.pdfSearch in Google Scholar

Kumar, N. (2020, May 15). Geeks for Geeks. Retrieved from Naive Bayes Classifiers: https://www.geeksforgeeks.org/naive-bayes-classifiers/Search in Google Scholar

LightGBM. (2020, October 30). LightGBM. Retrieved from LightGMB - Features: https://lightgbm.readthedocs.io/en/latest/Features.htmlSearch in Google Scholar

Microsoft Corporation. (2020, October 30). LightGBM, Release Redmond, Washington, USA: Microsoft Corporation. Retrieved from What Makes LightGBM lightning fast?: https://towardsdatascience.com/what-makes-lightgbm-lightning-fast-a27cf0d9785eSearch in Google Scholar

Mihai, M. (2010, May 5). Software.ucv.ro. Retrieved from Naive-Bayes Classification Algorithm: http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab4-NaiveBayes.pdfSearch in Google Scholar

NCSS. (2020). Logistic Regression. Retrieved from ncss-wpengine.netdna-ssl.com: https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Logistic_Regression.pdfSearch in Google Scholar

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2019, January 20). CatBoost: unbiased boosting with categorical features. Retrieved from arxiv.org: https://arxiv.org/pdf/1706.09516.pdfSearch in Google Scholar

readthedocs.io. (2020). imbalanced-learn. Retrieved from RUSBoostClassifier: https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.ensemble.RUSBoostClassifier.htmlSearch in Google Scholar

scikit-learn.org. (2020). Naive Bayes. Retrieved from scikit-learn.org: https://scikit-learn.org/stable/modules/naive_bayes.htmlSearch in Google Scholar

scikit-learn.org. (2020). Stochastic Gradient Descent. Retrieved from scikit-learn.org: https://scikit-learn.org/stable/modules/sgd.htmlSearch in Google Scholar

Seiffert, C., Khoshgoftaar, T. M., Hulse, J. V., & Napolitano, A. (2008). RUSBoost: Improving Classification Performance when Training Data is Skewed. ResearchGate.Search in Google Scholar

Seiffert, C., Khoshgoftaar, T. M., Hulse, J. V., & Napolitano, A. (2010, January 1). RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 1, pp. 185-197.Search in Google Scholar

Sutton, O. (2012, February). Introduction to k Nearest Neighbour Classification and. Retrieved from Leicestermath.org.uk: http://www.leicestermath.org.uk/KNN/OliverKNN_Talk.pdfSearch in Google Scholar

Swaminathan, S. (2018, March 15). Logistic Regression — Detailed Overview. Retrieved from https://towardsdatascience.com:https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bcSearch in Google Scholar

Tu, C., Liu, H., & Xu, B. (2017). AdaBoost typical Algorithm and its application research. MATEC Web of Conferences 139, 00222. EDP Sciences.Search in Google Scholar

Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo