1. bookVolume 15 (2020): Issue 3 (September 2020)
Journal Details
License
Format
Journal
First Published
30 Mar 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Will they repay their debt? Identification of borrowers likely to be charged off

Published Online: 08 Oct 2020
Page range: 393 - 409
Journal Details
License
Format
Journal
First Published
30 Mar 2015
Publication timeframe
4 times per year
Languages
English

Recent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).

Keywords

Abdemoula, A. K. (2015). Bank Credit Risk Analysis with K-Nearest-Neighbor Classifier: Case of Tunisian Banks. Accounting and Management Information Systems, 14(1), 79-106.Search in Google Scholar

Akcura, K., & Chhibber, A. (2018). Design and Comparison of Data Mining Techniques for Predicting Probability of Default on a Loan. Retrieved from https://www.researchgate.net/profile/Korhan_Akcura2/publication/335173357_Design_and_Comparison_of_Data_Mining_Techniques_for_Predicting_Probability_of_Default_on_a_Loan/links/5d54a79d92851c93b630b8be/Design-and-Comparison-of-Data-Mining-Techniques-for-PrSearch in Google Scholar

Bayraci, S., & Susuz, O. (2019). A Deep Neural Network (DNN) based Classification Model in Application to Loan Default Prediction. Theoretical and Applied Economics, XXV(4(621)), 75-84.Search in Google Scholar

Belas, J., Smrcka, L., Gavurova, B., Dvorsky, J. (2018). The Impact of Social and Economic Factors in the Credit Risk Management of SME. Technological and Economic Development of Economy, Volume 24 Issue 3: 1215–1230.Search in Google Scholar

Çığşar, B., & Ünal, D. (2019). Comparison of Data Mining Classification Algorithms Determining the Default Risk. Scientific Programming, 2019, 1-8.Search in Google Scholar

Coser, A., Maer-Matei, M., & Albu, C. (2019). Predictive Models for Loan Default Risk Assessment. Economic Computation and Economic Cybernetics Studies and Research, 53, 149-165. doi:10.24818/18423264/53.2.19.09Search in Google Scholar

Dima, M.A., Vasilache, S. (2009). ANN Model for Corporate Credit Risk Assessment, 2009 International Association of Computer Science and Information Technology Spring Conference (IACSIT-SC 2009), International Conference on Information and Financial Engineering (ICIFE2009), Singapore, 17-19 April, ISSN 976-07695-3653-8, pp. 94-98, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5190000Search in Google Scholar

Gurný, P., & Gurný, M. (2013). Comparison of Credit Scoring Models on Probability of Default Estimation for Us Banks. Prague Economic Papers, 22(2), 163-181. doi:10.18267/j.pep.446Search in Google Scholar

Härdle, W. K., Moro, R. A., & Schäfer, D. (2007). Estimating probabilities of default with support vector machines. Discussion Paper Series 2: Banking and Financial Studies, Deutsche Bundesbank.Search in Google Scholar

Hoang, N. D. (2019). Automatic detection of asphalt pavement raveling using image texture based feature extraction and stochastic gradient descent logistic regression. Automation in Construction, 105, 102843. doi:https://doi.org/10.1016/j.autcon.2019.102843Search in Google Scholar

Kaya, M. E., Gurgen, F., & Okay, N. (2008). An Analysis of Support Vector Machines for Credit Risk Modeling. Proceedings of the 2008 conference on Applications of Data Mining in E-Business and Finance (pp. 25-33). IOS Press.Search in Google Scholar

Kostin, K. (2018). Foresight of the global digital trends. Strategic Management, 23(2), 11–19. https://doi.org/10.5937/StraMan1801011K.Search in Google Scholar

Lending Club. (n.d.). database. Retrieved from https://www.kaggle.com/wordsforthewise/lending-clubSearch in Google Scholar

Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q., & Niu, X. (2018). Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electronic Commerce Research and Applications, 31, 24-39. doi:https://doi.org/10.1016/j.elerap.2018.08.002Search in Google Scholar

Manogaran, G., & Lopez, D. (2018). Health data analytics using scalable logistic regression with stochastic gradient descent. International Journal of Advanced Intelligence Paradigms, 10(1/2), 118–132. doi:https://doi.org/10.1504/IJAIP.2018.089494Search in Google Scholar

Nabende, P., & Senfuma, S. (2019). A study of machine learning models for predicting loan status from Ugandan loan applications. Athens.Search in Google Scholar

Nh, N., & Mai, N. (2018). Naïve Bayesian Classifier and Classification Trees for the Predictive Accuracy of Probability of Default Credit Card Clients. American Journal of Data Mining and Knowledge Discovery, 3(1), 1-12. Retrieved from http://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20180301.11.pdfSearch in Google Scholar

Obare, D., & Muraya, M. (2018). Comparison of Accuracy of Support Vector Machine Model and Logistic Regression Model in Predicting Individual Loan Defaults. American Journal of Applied Mathematics and Statistics, 6(6), 266-271. doi:10.12691/ajams-6-6-8Search in Google Scholar

Ploeg, S. v., Verschoor, W. F., & Menken, J. (2010). Bank Default Prediction Models. A Comparison and an Application to Credit Rating Transitions. Rotterdam, The Netherlands: Erasmus University. Retrieved from https://pdfs.semanticscholar.org/04ef/35711bda7b4595e9f98618508b8a7e4a9c57.pdfSearch in Google Scholar

Raei, R., Kousha, M. S., Fallahpour, S., & Fadacinejad, M. (2016). A hybrid model for estimating the probability of default of corporate customers. Iranian Journal of Managment Studies, 9(3), 651-673.Search in Google Scholar

Ramakrishnan, S., Mirzaei, M., & Bekri, M. (2015). Corporate Default Prediction with AdaBoost and Bagging Classifiers. Jurnal Teknologi, 73(2), 45-50. Retrieved from https://jurnalteknologi.utm.my/index.php/jurnalteknologi/article/view/4191/295 4Search in Google Scholar

Sariev, E., & Germano, G. (2020). Bayesian Regularized Artificial Neural Networks for the Estimation of the Probability of Default. Quantitative Finance, 20(0), 311-328.Search in Google Scholar

Turiel, J., & Aste, T. (2020). Peer-to-peer loan acceptance and default prediction with artificial intelligence. Royal Society Open Science, 7(6). doi:10.1098/rsos.191649Search in Google Scholar

Turlík, T. (2018). Neural networks and tree-based credit scoring models. Prague: Charles University, Faculty of Social Sciences, Institute of Economic Studies.Search in Google Scholar

Wang, B., Wang, Y., Qin, K., & Xia, Q. (2018). Detecting transportation modes based on LightGBM classifier from GPS trajectory data. 2018 26th International Conference on Geoinformatics (pp. 1-7). IEEE. doi:https://doi.org/10.1109/GEOINFORMATICS.2018.8557149Search in Google Scholar

Wang, W., Cao, J., Lu, H., & Wang, J. (2013). A Default Discrimination Method for Manufacturing Companies by Improved PSO-based LS-SVM. International Journal of Hybrid Information Technology, 6, 1-12.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo