Early Warning System for Debt Group Migration: The Case of One Commercial Bank in Vietnam

Nguyen, Quoc Hung; Trinh, Hoang Viet; Phuong, Truong Viet; Ly, Truong Thi Minh

Open Access

Early Warning System for Debt Group Migration: The Case of One Commercial Bank in Vietnam

,

,

and

Sep 10, 2024

Foundations of Management

Volume 16 (2024): Issue 1 (January 2024)

About this article

Cite

Share

Download Cover

Published Online: Sep 10, 2024

Page range: 195 - 216

DOI: https://doi.org/10.2478/fman-2024-0012

Keywords
machine learning models, debt group migration, B score, C score, model parameters tuning

© 2024 Quoc Hung Nguyena et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The lifecycle of a customer’s credit relationship with a bank. (Source: Author’s own research)

A decision tree classifier (Source: Author’s illustration)

A random forest classifier (Source: Author’s illustration)

A random forest classifier (Source: Author’s own research)

ROC and AUROC (Source: Author’s illustration)

Logistic regression results (Source: Author’s own calculation)

Support vector machine results (Source: Author’s own calculation)

Decision tree results. (Source: Author’s own calculation)

MCC versus F-Recall and F-Precision (Source: Author’s own calculation)

The results of the evaluation criteria for the B Score model (Source: Author’s own calculation)

Dataset	Criteria (%)	Tuned by MCC				Tuned by F-Recall
Dataset	Criteria (%)	LR	SVM	DT	RF	LR	SVM	DT	RF
Train	Accuracy	64.62	83.22	96.29	99.06	64.62	47.05	50.31	96.11
	Recall	65.19	86.98	98.20	99.97	65.19	91.53	90.05	99.10
	Precision	47.59	69.86	91.29	97.27	47.59	37.73	39.19	90.15
	F1	55.02	77.48	94.62	98.60	55.02	53.44	54.61	94.41
	F-Recall	60.70	82.91	96.74	99.42	60.70	71.22	71.49	97.17
	F-Precision	50.31	72.72	92.59	97.80	50.31	42.76	44.18	91.81
	MCC	27.92	65.34	91.94	97.92	27.92	19.60	22.82	91.68
Validation	Accuracy	64.66	70.58	74.09	81.84	64.66	46.62	50.41	79.85
	Recall	64.98	61.58	66.95	67.45	64.98	91.48	87.83	70.78
	Precision	47.62	55.08	59.79	75.26	47.62	37.52	39.02	69.20
	F1	54.96	58.15	63.17	71.14	54.96	53.22	54.04	69.98
	F-Recall	60.56	60.16	65.39	68.88	60.56	71.04	70.26	70.46
	F-Precision	50.31	56.27	61.10	73.56	50.31	42.54	43.90	69.51
	MCC	27.89	35.71	43.45	58.13	27.89	18.94	21.29	54.83

Matrix for different types of alerts (Source: Reihart, et al_, 2010)

-	The event occurred	The event did not occur
There is a warning signal	A	B
There is no warning signal	C	D

Summary of parameters of models (Source: Author’s compilation)

Model	Parameter	Description
LG	None	The baseline model is a linear regression model combined with the sigmoid (logit) activation function, so no tuning is required
SVM	Kernel function	The activation function used to transform data into a different feature space for linear separation includes Linear, Polynomial, Sigmoid, and RBF
	C	The coefficient for balancing the weight between distance and noise
	d	The degree parameter when using the Polynomial kernel, which takes a natural number value
	γ	The gamma parameter for Polynomial, Sigmoid, and RBF kernels, which takes a non-negative value
	r	The intercept for the Polynomial and Sigmoid kernels
DT	Depth	It is necessary to limit the depth of the DT to avoid overfitting and reduce computational cost
DT	Number of leaf nodes	It is necessary to limit the number of leaf nodes of the DT to avoid overfitting and reduce computational cost
RF	Depth	It is necessary to limit the depth of each DT to avoid overfitting and reduce computational cost
	Number of leaf nodes	It is necessary to limit the number of leaf nodes of each DT to avoid overfitting and reduce computational cost
	Number of DTs	The number of DTs in Random Forest Classifier (RF) needs to be considered for computational cost when the number is too high

Confusion matrix (Source: Author’s illustration)

Target variable	Predicted: 1	Predicted: 0	Total
Actual: 1	TP: True positives	FN: False negatives	P
Actual: 0	FP: False positives	TN: True negatives	N
Total	$\hat{P}$ ${\rm{\hat P}}$	$\hat{N}$ ${\rm{\hat N}}$	P + N

Early warning system deployment for B Score customers (Source: Author’s own calculation)

Criteria (%)	Tuned by MCC	Tuned by F-Recall
Criteria (%)	RF (best)	SVM (best)	RF (second best)
Accuracy	81.84 → 78.67	46.62 → 36.86	79.85 → 76.56
Recall	67.45 → 29.01	91.48 → 91.53	70.78 → 38.14
Precision	75.26 → 42.84	37.52 → 22.44	69.20 → 39.37
F-Recall	68.88 → 31.01	71.04 → 56.65	70.46 → 38.38

The results of the evaluation criteria for the C Score model (Source: Author’s own calculation)

Customer	Model	Selection	Parameters
B Score	Best	RF tuned by MCC	n_estimators = 100;max_depth = 20;max_leaf_node = None
	Best	SVM tuned by F-Recall	kernel = ‘sigmoid’;C = 0.1;gamma = 0.01
	Second best	RF tuned by F-Recall	n_estimators = 100;max_depth = 16;max_leaf_node = None
C Score	Best	SVM tuned by MCC	kernel = ‘poly’;degree = 4;C = 0.01gamma = 0.1
C Score	Best	SVM tuned by F-Precision	kernel = ‘poly’;degree = 4;C = 0.1gamma = 0.01

Model tuning parameters in Scikit-learn (Source: Author’s own research)

Model	Parameter	Parameters in Scikit-learn	Range of values for tuning
LG	None	None	None
SVM	Kernel function	kernel: accepts a value from ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’. The default value is ‘rbf’	‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’
	C	C: data type is float; the default value is 1	0.01, 0.1, 1, 10
	d	degree: data type is integer; the default value is 3	2, 3, 4, 5 (this is applicable only when kernel is set to ‘poly’)
	γ	Gamma: accepts a value from ‘scale’, ‘auto’. The default value is ‘scale’. It can also be specified as a non-negative float	0.01, 0.1, 1, 10 (not applicable when kernel is set to ‘linear’)
DT	Depth	max_depth: data type is integer or none. The default value is none, which means the tree is expanded until the maximum depth is reached	The range from 2 to 21 (with a step size of 2) and none
DT	Number of leaf nodes	max_leaf_nodes: data type is integer or None. The default value is none, which means an unlimited number of leaf nodes will be developed, regardless of max_depth	The range from 2 to 21 (with a step size of 2) and none
RF	Depth	Similar to DT	Similar to DT
	Number of leaf nodes	Similar to DT	Similar to DT
	Number of DTs	n_estimators, data type is integer. The default value is 100	10, 50, 100

Early warning system deployment for C Score customers (Source: Author’s own calculation)

Criteria (%)	Tuned by MCC	Tuned by F-Precision
Criteria (%)	SVM (best)	SVM (best)
Accuracy	70.78 → 61.94	71.60 → 65.54
Recall	53.57 → 54.48	45.24 → 50.00
Precision (*)	58.44 → 40.33	62.30 → 43.79
F-Precision (*)	57.40 → 42.54	57.93 → 44.91

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Computer Sciences, Business Computing, Business and Economics, Business Management, Management Accounting, Financial Controlling, Cost Calculation, Investment

Journal RSS Feed

Early Warning System for Debt Group Migration: The Case of One Commercial Bank in Vietnam

Published Online: Sep 10, 2024

Page range: 195 - 216

DOI: https://doi.org/10.2478/fman-2024-0012

Keywords
machine learning models, debt group migration, B score, C score, model parameters tuning

© 2024 Quoc Hung Nguyena et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

The results of the evaluation criteria for the B Score model (Source: Author’s own calculation)

Matrix for different types of alerts (Source: Reihart, et al_, 2010)

Summary of parameters of models (Source: Author’s compilation)

Confusion matrix (Source: Author’s illustration)

Early warning system deployment for B Score customers (Source: Author’s own calculation)

The results of the evaluation criteria for the C Score model (Source: Author’s own calculation)

Model tuning parameters in Scikit-learn (Source: Author’s own research)

Early warning system deployment for C Score customers (Source: Author’s own calculation)

Early Warning System for Debt Group Migration: The Case of One Commercial Bank in Vietnam

Quoc Hung Nguyen

Hoang Viet Trinh

Truong Viet Phuong

Truong Thi Minh Ly

Published Online: Sep 10, 2024

Page range: 195 - 216

DOI: https://doi.org/10.2478/fman-2024-0012

Keywordsmachine learning models, debt group migration, B score, C score, model parameters tuning

© 2024 Quoc Hung Nguyena et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

The results of the evaluation criteria for the B Score model (Source: Author’s own calculation)

Matrix for different types of alerts (Source: Reihart, et al_, 2010)

Summary of parameters of models (Source: Author’s compilation)

Confusion matrix (Source: Author’s illustration)

Early warning system deployment for B Score customers (Source: Author’s own calculation)

The results of the evaluation criteria for the C Score model (Source: Author’s own calculation)

Model tuning parameters in Scikit-learn (Source: Author’s own research)

Early warning system deployment for C Score customers (Source: Author’s own calculation)

Keywords
machine learning models, debt group migration, B score, C score, model parameters tuning