Robust Hybrid Data-Level Approach for Handling Skewed Fat-Tailed Distributed Datasets and Diverse Features in Financial Credit Risk

[1] Ahmad M., Aftab S., Muhammad S. S., Ahmad S., Machine learning techniques for sentiment analysis: A review, International Journal of Multidisciplinary Sciences and Engineering, 2017, 8(3), 27. Search in Google Scholar

[2] Akosa J., Predictive accuracy: A misleading performance measure for highly imbalanced data, in Proceedings of the SAS Global Forum, 2017, 12, 1–4. SAS Institute Inc. Cary, NC, USA. Search in Google Scholar

[3] Arafa A., El-Fishawy N., Badawy M., Radad M., RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification, Journal of King Saud University-Computer and Information Sciences, 2022, 34(8), 5059–5074. Search in Google Scholar

[4] Araf I., Idri A., Chairi I., Cost-sensitive learning for imbalanced medical data: a review, Artificial Intelligence Review, 2024, 57(4), 1–72. Search in Google Scholar

[5] Asselman A., Khaldi M., Aammou S., Enhancing the prediction of student performance based on the machine learning XGBoost algorithm, Interactive Learning Environments, 2023, 31(6), 3360–3379. Search in Google Scholar

[6] Awaji B., Senan E.M., Olayah F., Alshari E.A., Alsulami M., Abosaq H.A., Alqahtani J., Janrao P., Hybrid techniques of facial feature image analysis for early detection of autism spectrum disorder based on combined CNN features, Diagnostics, 2023, 13(18), 2948. Search in Google Scholar

[7] Azhar N.A., Pozi M.S.M., Din A.M., Jatowt A., An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis, IEEE Transactions on Knowledge and Data Engineering, 2022. Search in Google Scholar

[8] Bader-El-Den M., Teitei E., Perry T., Biased random forest for dealing with the class imbalance problem, IEEE Transactions on Neural Networks and Learning Systems, 2018, vol. 30, no. 7, pp. 2163–2172, IEEE. Search in Google Scholar

[9] Bansal A., Jain A., Analysis of focussed under-sampling techniques with machine learning classifiers, 2021 IEEE/ACIS 19th International Conference on Software Engineering Research, Management and Applications (SERA), 2021, 91–96. Search in Google Scholar

[10] Batista G.E.A.P.A., Prati R.C., Monard M.C., A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 2004, 6(1), 20–29. Search in Google Scholar

[11] Belgiu M., Drăguţ L., Random forest in remote sensing: A review of applications and future directions, ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 114, 24–31. Search in Google Scholar

[12] Bi J., Zhang C., An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme, Knowledge-Based Systems, 2018, vol. 158, pp. 81–93, Elsevier. Search in Google Scholar

[13] Brown I., Mues C., An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 2012, 39(3), 3446–3453. Search in Google Scholar

[14] Breiman L., Random forests, Machine Learning, 2001, 45, 5–32. Search in Google Scholar

[15] Budjač R., Nikmon M., Schreiber P., Zahradníková B., Janáčová D., Automated machine learning overview, Research Papers Faculty of Materials Science and Technology Slovak University of Technology, 2019, 27(45), 107–112. Search in Google Scholar

[16] Carmona P., Climent F., and Momparler A., Predicting failure in the US banking sector: An extreme gradient boosting approach, International Review of Economics & Finance, 2019, vol. 61, pp. 304–323, Elsevier. Search in Google Scholar

[17] Caouette J.B., Altman E.I., Narayanan P., Nimmo R., Managing credit risk: The great challenge for global financial markets, John Wiley & Sons, 2011. Search in Google Scholar

[18] Carrington A.M., Fieguth P.W., Qazi H., Holzinger A., Chen H.H., Mayr F., Manuel D.G., A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms, BMC Medical Informatics and Decision Making, 2020, 20, 1–12. Search in Google Scholar

[19] Chandra W., Suprihatin B., Resti Y., Median-KNN Regressor-SMOTE-Tomek links for handling missing and imbalanced data in air quality prediction, Symmetry, 2023, 15(4), 887. Search in Google Scholar

[20] Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 2002, 16, 321–357. Search in Google Scholar

[21] Chen T., Guestrin C., XGBoost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, 785–794. Search in Google Scholar

[22] Chen N., Ribeiro B., Chen A., Financial credit risk assessment: a recent review, Artificial Intelligence Review, 2016, 45, 1–23, Springer. Search in Google Scholar

[23] Chen W., Liu C., Yu W., Wen J., Zhou L., Asymmetric Loss-Oriented Cost Sensitive Learning Model for Credit Risk Assessment, (Journal Name Pending), 2024. Search in Google Scholar

[24] Cheng K., Zhang C., Yu H., Yang X., Zou H., Gao S., Grouped SMOTE with noise filtering mechanism for classifying imbalanced data, IEEE Access, 2019, 7, 170668–170681. Search in Google Scholar

[25] Cheng L.-C., Wu Y.-T., Chao C.-T., Wang J.-H., Detecting fake reviewers from the social context with a graph neural network method, Decision Support Systems, 2024, vol. 179, p. 114150, Elsevier. Search in Google Scholar

[26] Couronné R., Probst P., Boulesteix A.-L., Random forest versus logistic regression: A large-scale benchmark experiment, BMC Bioinformatics, 2018, 19, 1–14. Search in Google Scholar

[27] Davis J., Goadrich M., The relationship between Precision-Recall and ROC curves, in Proceedings of the 23rd International Conference on Machine Learning, 2006, 233–240. Search in Google Scholar

[28] Demšar J., Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research, 2006, 7, 1–30. Search in Google Scholar

[29] De Corso E., Pasquini E., Trimarchi M., La Mantia I., Pagella F., Ottaviano G., Garzaro M., Pipolo C., Torretta S., Seccia V., et al., Dupilumab in the treatment of severe uncontrolled chronic rhinosinusitis with nasal polyps (CRSwNP): a multicentric observational phase IV real-life study (DUPIREAL), Allergy, 2023, 78(10), 2669–2683. Search in Google Scholar

[30] Ding H., Sun Y., Wang Z., Huang N., Shen Z., Cui X., RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification, Information Processing & Management, 2023, 60(2), 103235. Search in Google Scholar

[31] Dong J., Chen Y., Yao B., Zhang X., Zeng N., A neural network boosting regression model based on XGBoost, Applied Soft Computing, 2022, 125, 109067. Search in Google Scholar

[32] Douzas G., Bacao F., Geometric SMOTE: a geometrically enhanced drop-in replacement for SMOTE, Information Sciences, 2019, 501, 118–135. Search in Google Scholar

[33] Elreedy D., Atiya A.F., A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance, Information Sciences, 2019, 505, 32–64. Search in Google Scholar

[34] Ferreira A.J., Figueiredo M.A.T., Boosting algorithms: A review of methods, theory, and applications, Ensemble Machine Learning: Methods and Applications, 2012, 35–85. Search in Google Scholar

[35] Fernández A., García S., Herrera F., Chawla N.V., SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, Journal of Artificial Intelligence Research, 2018, 61, 863–905. Search in Google Scholar

[36] Fränti P., Mariescu-Istodor R., Soft precision and recall, Pattern Recognition Letters, 2023, 167, 115–121. Search in Google Scholar

[37] Friedman M., A comparison of alternative tests of significance for the problem of m rankings, The Annals of Mathematical Statistics, 1940, 11(1), 86–92. Search in Google Scholar

[38] Fonseca J., Bacao F., Geometric SMOTE for imbalanced datasets with nominal and continuous features, Expert Systems with Applications, 2023, 234, 121053. Search in Google Scholar

[39] Fu G.-H., Yi L.-Z., Pan J., Tuning model parameters in class-imbalanced learning with precision-recall curve, Biometrical Journal, 2019, 61(3), 652–664. Search in Google Scholar

[40] Galar M., Fernández A., Barrenechea E., Bustince H., Herrera F., A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2011, 42(4), 463–484. Search in Google Scholar

[41] García S., Fernández A., Luengo J., Herrera F., Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power, Information Sciences, 2010, 180(10), 2044–2064. Search in Google Scholar

[42] Ganganwar V., An overview of classification algorithms for imbalanced datasets, International Journal of Emerging Technology and Advanced Engineering, 2012, 2(4), 42–47. Search in Google Scholar

[43] Gnip P., Vokorokos L., Drotár P., Selective oversampling approach for strongly imbalanced data, PeerJ Computer Science, 2021, 7, e604. Search in Google Scholar

[44] Gu J., Random Forest Based Imbalanced Data Cleaning and Classification, 2007, Citeseer. Search in Google Scholar

[45] Gu Q., Cai Z., Zhu L., Huang B., Data mining on imbalanced data sets, in 2008 International Conference on Advanced Computer Theory and Engineering, 2008, 1020–1024, IEEE. Search in Google Scholar

[46] Gür E., Duyan Y.A., Balcı F., Mice make temporal inferences about novel locations based on previously learned spatiotemporal contingencies, Animal Cognition, 26(3), 771–779, 2023. Search in Google Scholar

[47] Guo K., Wan X., Liu L., Gao Z., Yang M., Fault diagnosis of intelligent production line based on digital twin and improved random forest, Applied Sciences, 2021, 11(16), 7733. Search in Google Scholar

[48] Han S., Williamson B.D., Fong Y., Improving random forest predictions in small datasets from two-phase sampling designs, BMC Medical Informatics and Decision Making, 2021, 21, 1–9, Springer. Search in Google Scholar

[49] Han H., Wang W.-Y., Mao B.-H., Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, International Conference on Intelligent Computing, 2005, 878–887, Springer. Search in Google Scholar

[50] Hand D. J., Till R. J., A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine Learning, 2001, 45, 171–186. Search in Google Scholar

[51] Hancock J.T., Khoshgoftaar T.M., Johnson J.M., Evaluating classifier performance with highly imbalanced big data, Journal of Big Data, 2023, 10(1), 42. Search in Google Scholar

[52] Hanley J.A., McNeil B.J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Radiology, 1982, 143(1), 29–36. Search in Google Scholar

[53] He H., Garcia E.A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9), 1263–1284, IEEE. Search in Google Scholar

[54] He H., Ma Y., Imbalanced learning: foundations, algorithms, and applications, 2013, John Wiley & Sons. Search in Google Scholar

[55] Hossin M., Sulaiman M.N., A review on evaluation metrics for data classification evaluations, International Journal of Data Mining & Knowledge Management Process, 2015, 5(2), 1. Search in Google Scholar

[56] Islam A., Belhaouari S.B., Rehman A.U., Bensmail H., KNNOR: An oversampling technique for imbalanced datasets, Applied Soft Computing, 2022, 115, 108288. Search in Google Scholar

[57] Ishwaran H., Lu M., Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival, Statistics in Medicine, 2019, 38(4), 558–582. Search in Google Scholar

[58] Jiang C., Lu W., Wang Z., Ding Y., Benchmarking state-of-the-art imbalanced data learning approaches for credit scoring, Expert Systems with Applications, 2023, 213, 118878, Elsevier. Search in Google Scholar

[59] Johnson J.M., Khoshgoftaar T.M., Survey on deep learning with class imbalance, Journal of Big Data, 2019, 6(1), 1–54. Search in Google Scholar

[60] Kaur H., Pannu H.S., Malhi A.K., A systematic review on imbalanced data challenges in machine learning: Applications and solutions, ACM Computing Surveys (CSUR), 2019, 52(4), 1–36. Search in Google Scholar

[61] Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y., LightGBM: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems, 2017, 30. Search in Google Scholar

[62] Khalilia M., Chakraborty S., Popescu M., Predicting disease risks from highly imbalanced data using random forest, BMC Medical Informatics and Decision Making, 2011, 11, 1–13. Search in Google Scholar

[63] Khoshgoftaar T.M., Golawala M., and Van H.Ja, An empirical study of learning from imbalanced data using random forest, 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), 2007, vol. 2, pp. 310–317, IEEE. Search in Google Scholar

[64] Kim T., Lee J.-S., Maximizing AUC to learn weighted naive Bayes for imbalanced data classification, Expert Systems with Applications, 2023, 217, 119564. Search in Google Scholar

[65] Kotsiantis S., Kanellopoulos D., Pintelas P., Handling imbalanced datasets: A review, GESTS International Transactions on Computer Science and Engineering, 2006, 30(1), 25–36. Search in Google Scholar

[66] Kraiem M.S., Sánchez-Hernández F., Moreno-García M.N., Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties. An approach based on association models, Applied Sciences, 2021, 11(18), 8546. Search in Google Scholar

[67] Kumar V., Lalotra G.S., Sasikala P., Rajput D.S., Kaluri R., Lakshmanna K., Shorfuzzaman M., Alsufyani A., Uddin M., Addressing binary classification over class imbalanced clinical datasets using computationally intelligent techniques, Healthcare, 2022, 10(7), 1293. Search in Google Scholar

[68] Kyiu A., Tawiah V., IFRS 9 implementation and bank risk, in Accounting Forum, 2023, 1–25, Taylor & Francis. Search in Google Scholar

[69] Lai S.B.S., Shahri N.H.N.B.M., Mohamad M.B., Rahman H.A.B., Rambli A.B., Comparing the performance of AdaBoost, XGBoost, and logistic regression for imbalanced data, Mathematics and Statistics, 2021, 9(3), 379–385. Search in Google Scholar

[70] Laker J., The Evolution of Risk and Risk Management–A Prudential Regulator’s Perspective, Conference–2007, 2007, Reserve Bank of Australia. Search in Google Scholar

[71] Le T., Vo M.T., Vo B., Lee M.Y., Baik S.W., A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction, Complexity, 2019, 2019, 8460934. Search in Google Scholar

[72] Li X., Liu Q., A hybrid sampling algorithm for imbalanced and class-overlap data based on natural neighbors and density estimation, Knowledge and Information Systems, 2024, 1–32. Search in Google Scholar

[73] Liang X.W., Jiang A.P., Li T., Xue Y.Y., Wang G.T., LR-SMOTE—An improved unbalanced data set oversampling based on K-means and SVM, Knowledge-Based Systems, 2020, 196, 105845. Search in Google Scholar

[74] Limanto S., Buliali J.L., Saikhu A., GLoW SMOTE-D: Oversampling Technique to Improve Prediction Model Performance of Students Failure in Courses, IEEE Access, 2024. Search in Google Scholar

[75] Ling C. X., Huang J., Zhang H., AUC: a statistically consistent and more discriminating measure than accuracy, Ijcai, 2003, 3, 519–524. Search in Google Scholar

[76] Liu F., Qian Q., Cost-sensitive variational autoencoding classifier for imbalanced data classification, Algorithms, 2022, 15(5), 139. Search in Google Scholar

[77] Liu Y., Yu X., Huang J.X., An Aijun, Combining integrated sampling with SVM ensembles for learning from imbalanced datasets, Information Processing & Management, 2011, 47(4), 617–631. Search in Google Scholar

[78] Liu Y., Wang H., Fei Y., Liu Y., Shen L., Zhuang Z., Zhang X., Research on the prediction of green plum acidity based on improved XGBoost, Sensors, 2021, 21(3), 930. Search in Google Scholar

[79] Lommers K., Harzli O. E., Kim J., Confronting machine learning with financial research, arXiv preprint arXiv:2103.00366, 2021. Search in Google Scholar

[80] Malhotra R., Jain J., Predicting defects in imbalanced data using resampling methods: an empirical investigation, PeerJ Computer Science, 2022, 8, e573. Search in Google Scholar

[81] Ma G., Wang Y., Can the Chinese domestic bond and stock markets facilitate a globalising renminbi, Economic and Political Studies, 2020, vol. 8, no. 3, pp. 291–311, Taylor & Francis. Search in Google Scholar

[82] Mellor A., Boukir S., Haywood A., and Jones S., Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 105, 2015, pp. 155–168, Elsevier. Search in Google Scholar

[83] More A.S., Rana D.P., Performance enrichment through parameter tuning of random forest classification for imbalanced data applications, Materials Today: Proceedings, 2022, vol. 56, pp. 3585–3593, Elsevier. Search in Google Scholar

[84] Mukherjee M., Khushi M., SMOTE-ENC: A novel SMOTE-based method to generate synthetic data for nominal and continuous features, Applied System Innovation, 2021, 4(1), 18. Search in Google Scholar

[85] Niu A., Cai B., Cai S., Big Data Analytics for Complex Credit Risk Assessment of Network Lending Based on SMOTE Algorithm, Complexity, 2020, vol. 2020, no. 1, p. 8563030, Wiley Online Library. Search in Google Scholar

[86] Ozdemir B., Evolution of risk management from risk compliance to strategic risk management: From Basel I to Basel II, III and IFRS 9, Journal of Risk Management in Financial Institutions, vol. 11, no. 1, 2018, pp. 76–85, Henry Stewart Publications. Search in Google Scholar

[87] Pes B., Learning from high-dimensional and class-imbalanced datasets using random forests, Information, 2021, 12(8), 286. Search in Google Scholar

[88] Paing M.P., Pintavirooj C., Tungjitkusolmun S., Choomchuay S., Hamamoto K., Comparison of sampling methods for imbalanced data classification in random forest, 2018 11th Biomedical Engineering International Conference (BMEiCON), 2018, 1–5. Search in Google Scholar

[89] Prakash A., Thangaraj J., Roy S., Srivastav S., Mishra J. K., Model-aware XG-Boost method towards optimum performance of flexible distributed Raman amplifier, IEEE Photonics Journal, 2023, 15(4), 1–10. Search in Google Scholar

[90] Qiu W., Credit risk prediction in an imbalanced social lending environment based on XGBoost, 2019 5th International Conference on Big Data and Information Analytics (BigDIA), 2019, pp. 150–156, IEEE. Search in Google Scholar

[91] Qolomany B., Al-Fuqaha A., Gupta A., Benhaddou D., Alwajidi S., Qadir J., Fong A. C., Leveraging machine learning and big data for smart buildings: A comprehensive survey, IEEE Access, 2019, 7, 90316–90356. Search in Google Scholar

[92] Brennan, P. “A comprehensive survey of methods for overcoming the class imbalance problem in fraud detection.” Institute of Technology Blanchardstown, Dublin, Ireland, 2012. Search in Google Scholar

[93] Rao C.R., Karl Pearson chi-square test the dawn of statistical inference, Goodness-of-fit tests and model validity, 2002, 9–24, Springer. Search in Google Scholar

[94] Ratnaningsih T., SMOTE-MRS: A Novel SMOTE-Multiresolution Sampling Technique for Imbalanced Distribution to Improve Prediction of Anemia, (Journal Name Pending), 2024. Search in Google Scholar

[95] Rawat S.S., Mishra A.K., Review of Methods for Handling Class-Imbalanced in Classification Problems, arXiv preprint arXiv:2211.05456, 2022. Search in Google Scholar

[96] Rijsbergen V., Information retrieval; Butterworth, 1978, J. Librariansh., 1979, 11, 237. Search in Google Scholar

[97] Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., & Rigol-Sanchez, J. P., An assessment of the effectiveness of a random forest classifier for land-cover classification, ISPRS Journal of Photogrammetry and Remote Sensing, 2012, 67, 93–104. Search in Google Scholar

[98] Rodríguez-Galiano V.F., Abarca-Hernández F., Ghimire B., Chica-Olmo M., Atkinson P. M., Jeganathan C., Incorporating spatial variability measures in land-cover classification using Random Forest, Procedia Environmental Sciences, vol. 3, 2011, pp. 44–49, Elsevier. Search in Google Scholar

[99] Rout N., Mishra D., Mallick M.K., Handling imbalanced data: a survey, in International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications: ASISA 2016, 2018, 431–443, Springer. Search in Google Scholar

[100] Schapire R.E., Freund Y., Boosting: Foundations and algorithms, Kybernetes, 2013, 42(1), 164–166. Search in Google Scholar

[101] Sarker I. H., Machine learning: Algorithms, real-world applications and research directions, SN Computer Science, 2021, 2(3), 160. Search in Google Scholar

[102] Shi S., Li J., Zhu D., Yang F., Xu Y., A hybrid imbalanced classification model based on data density, Information Sciences, 2023, 624, 50–67. Search in Google Scholar

[103] Singh A., Ranjan R. K., Tiwari A., Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms, Journal of Experimental & Theoretical Artificial Intelligence, 2022, 34(4), 571–598. Search in Google Scholar

[104] Singla P., Domingos P., Discriminative training of Markov logic networks, in AAAI, 2005, 868–873. Search in Google Scholar

[105] Sibindi R., Mwangi R. W., Waititu A. G., A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices, Engineering Reports, 2023, 5(4), e12599. Search in Google Scholar

[106] Soltanzadeh P., Hashemzadeh M., RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem, Information Sciences, 542, 92–111, 2021. Search in Google Scholar

[107] Srivastava J., Sharan A., SMOTEEN Hybrid Sampling Based Improved Phishing Website Detection, Authorea Preprints, 2023. Search in Google Scholar

[108] Sun Z., Song Q., Zhu X., Sun H., Xu B., Zhou Y., A novel ensemble method for classifying imbalanced data, Pattern Recognition, 2015, 48(5), 1623–1637. Search in Google Scholar

[109] Sun Z., Ying W., Zhang W., Gong S., Undersampling method based on minority class density for imbalanced data, Expert Systems with Applications, 2024, 249, 123328. Search in Google Scholar

[110] Szeghalmy S., Fazekas A., A comparative study on noise filtering of imbalanced data sets, Knowledge-Based Systems, 2024, 301, 112236. Search in Google Scholar

[111] Tao X., Guo X., Zheng Y., Zhang X., Chen Z., Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification, Knowledge-Based Systems, 2023, 277, 110795. Search in Google Scholar

[112] Taye M.M., Understanding of machine learning with deep learning: architectures, workflow, applications and future directions, Computers, 2023, 12(5), 91. Search in Google Scholar

[113] Ting K.M., An instance-weighting method to induce cost-sensitive trees, IEEE Transactions on Knowledge and Data Engineering, 2002, vol. 14, no. 3, pp. 659–665, IEEE. Search in Google Scholar

[114] Tomašević N., Mladenić D., Class imbalance and the curse of minority hubs, Knowledge-Based Systems, 2013, 53, 157–172. Search in Google Scholar

[115] Tomek I., Two modifications of CNN, IEEE Transactions on Systems, Man, and Cybernetics, 1976, 6, 769–776. Search in Google Scholar

[116] Uddin Md G., Nash S., Diganta M. T. M., Rahman A., Olbert A. I., Robust machine learning algorithms for predicting coastal water quality index, Journal of Environmental Management, 2022, 321, 115923. Search in Google Scholar

[117] Verikas A., Gelzinis A., Bacauskiene M., Mining data with random forests: A survey and results of new tests, Pattern Recognition, 2011, 44(2), 330–349. Search in Google Scholar

[118] Wang A.X., Chukova S.S., Nguyen B.P., Synthetic minority oversampling using edited displacement-based k-nearest neighbors, Applied Soft Computing, 2023, 148, 110895. Search in Google Scholar

[119] Wang K., Wan J., Li G., Sun H., A hybrid algorithm-level ensemble model for imbalanced credit default prediction in the energy industry, Energies, 2022, 15(14), 5206. Search in Google Scholar

[120] Wang C., Deng C., Wang S., Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost, Pattern Recognition Letters, 2020, 136, 190–197. Search in Google Scholar

[121] Wilson D.L., Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man, and Cybernetics, 1972, 3, 408–421. Search in Google Scholar

[122] Yan J., Tang B., He H., Detection of false data attacks in smart grid with supervised learning, 2016 International Joint Conference on Neural Networks (IJCNN), 2016, 1395–1402. Search in Google Scholar

[123] Yanenkova I., Nehoda Y., Drobyazko S., Zavhorodnii A., Berezovska L., Modeling of bank credit risk management using the cost risk model, Journal of Risk and Financial Management, 2021, 14(5), 211. Search in Google Scholar

[124] Yin S., Dey D.K., Valdez E.A., Gan G., Vadiveloo J., Skewed link regression models for imbalanced binary response with applications to life insurance, arXiv preprint arXiv:2007.15172, 2020. Search in Google Scholar

[125] Xu T., Comparative Analysis of Machine Learning Algorithms for Consumer Credit Risk Assessment, Transactions on Computer Science and Intelligent Systems Research, 2024, vol. 4, pp. 60–67. Search in Google Scholar

[126] Zhang C., Tan K.C., Li H., Hong G.S., A cost-sensitive deep belief network for imbalanced classification, IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(1), 109–122. Search in Google Scholar

[127] Zhang C., Wang G., Zhou Y., Yao L., Jiang Z.L., Liao Q., Wang X., Feature selection for high dimensional imbalanced class data based on F-measure optimization, in 2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), 2017, 278–283, IEEE. Search in Google Scholar

[128] Zhang W., He Y., Wang L., Liu S., Meng X., Landslide susceptibility mapping using random forest and extreme gradient boosting: A case study of Fengjie, Chongqing, Geological Journal, 2023, 58(6), 2372–2387. Search in Google Scholar

[129] Zhang W., Wu C., Zhong H., Li Y., Wang L., Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization, Geoscience Frontiers, 2021, 12(1), 469–477. Search in Google Scholar

[130] Zhao Z., Cui T., Ding S., Li J., Bellotti A.G., Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction, Mathematics, 2024, vol. 12, no. 5, p. 701, MDPI. Search in Google Scholar

[131] Zheng M., Wang F., Hu X., Miao Y., Cao H., Tang M., A method for analyzing the performance impact of imbalanced binary data on machine learning models, Axioms, 2022, 11(11), 607. Search in Google Scholar

[132] Zhu L., Qiu D., Ergu D., Ying C., Liu K., A study on predicting loan default based on the random forest algorithm, Procedia Computer Science, 2019, 162, 503–513. Search in Google Scholar

[133] Zhu T., Lin Y., Liu Y., Synthetic minority oversampling technique for multiclass imbalance problems, Pattern Recognition, 2017, 72, 327–340. Search in Google Scholar

[134] Zou Y., Gao C., Gao H., Business failure prediction based on a cost-sensitive extreme gradient boosting machine, IEEE Access, 2022, vol. 10, pp. 42623–42639, IEEE. Search in Google Scholar

Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Informática, Inteligencia artificial, Desarrollo de software

RSS Feed de revista

Robust Hybrid Data-Level Approach for Handling Skewed Fat-Tailed Distributed Datasets and Diverse Features in Financial Credit Risk

Keith R Musara

Edmore Ranganai

Charles Chimedza

Florence Matarise

Sheunesu Munyira

Publicado en línea: 10 jun 2025

Páginas: 229 - 270

Recibido: 31 ago 2024

Aceptado: 13 feb 2025

DOI: https://doi.org/10.2478/fcds-2025-0009

Palabras claveSkewed fat-tailed distributed datasets, Diverse features, Noisy instances

© 2025 Keith R Musara et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Palabras clave
Skewed fat-tailed distributed datasets, Diverse features, Noisy instances