Application of Logical Regression Function Model in Credit Business of Commercial Banks

In recent years, with the reform of China’s financial system, the opening of the financial industry to the outside world, the continuous entry of foreign banks, the emergence of financial innovation and the diversification of banking business, China’s banking industry is facing a huge challenge to participate in international competition. In this situation, China’s commercial banks should learn from the international advanced credit risk management experience, research the credit risk management technology suitable for China’s banks, to narrow the gap with other countries’ banking industry and improve the overall competitiveness of China’s banking industry.

Due to the asymmetric distribution of credit risk and the difficulty in data collection, it is very difficult to measure credit risk, which is also the reason why the research of credit risk management model lags behind the market risk management model. In recent years, the research on credit risk measurement models by western banks and other institutions has improved this situation. The study of these measurement models has a great impact on the bank’s credit risk management, which makes the bank’s credit risk management from qualitative to quantitative leap. However, the current research on credit risk measurement in China is more about the application of various methods proposed by the West, to evaluate the advantages and disadvantages of various methods, thus providing a good reference for the practice of credit risk of commercial banks.

This research paper deals with the commercial banks’ credit risk management based on a comparative study of the traditional and modern credit risk measurement methods and analyses enterprise and individual credit risk using the Logistic Regression model. We try to establish a suitable credit risk measurement method for China’s banks, and put forward some suggestions for management to improve our country’s commercial bank credit assets quality, and thereby reduce credit risk, which has a practical significance.

China’s commercial banks face many forms of financial risks in the process of business operation and management, and credit risk is always the main form of bank risk. It not only has an important impact on the operation safety of commercial banks but also affects the stability of the whole financial system of a country and even the stability and coordinated development of the global economy due to the domino effect. A risk assessment system becomes the key to solving the problem. Xie et al. reviewed the credit risk indicators of supply chain finance with the method of literature induction and constructed the credit risk assessment index system of online supply chain finance of commercial banks [1]. With the development of technology, the application of the Logistic Regression model emerged as is the need of the hour.

The Logical Regression model has been developing in the research of credit risk. Lan et al. built a more scientific credit risk evaluation system through the method of step-based regression based on the actual data of rural banks in China and constructed a calculation model of default probability with the Logistic Regression model [2]. As et al. put forward the methodological hypothesis of the Logistic Regression Model and illustrated the method of using this research method to evaluate family financial decisions [3]. Chen et al. adopted the Logistic Regression method and took the borrower’s default situation as the explanatory variable to establish the borrower’s credit default risk assessment model [4]. Assef et al. used artificial neural networks (ANNS), especially multi-layer perceptrons (MLP) and radial basis functions (RBF), as well as Logistic regression (LR) statistical models to analyse the bank credit status of legal persons (non-default, default and temporary default) to assist analysts in this field in making decisions [5]. Teles et al. compared the efficiency of logistic regression and linear regression in predicting whether the credit business needs recovery [6]. Jun et al. made use of bank business data and machine learning model to realie the credit anti-fraud prediction model based on a logistic regression algorithm [7]. Using the multiple logistic regression method, Zhou et al. discussed the factors affecting online consumer financial credit and countermeasures to improve its classification efficiency, providing theoretical support for improving its risk control system [8]. Xia et al. proposed a credit rating method combining the XGBOOST algorithm with the Logistic Group-Lasso model [9]. Hirk et al. used two different connection functions to evaluate the creditworthiness of a company by assuming the multivariate normal distribution and multivariate logistic distribution of the underlying variables of ordinal number results [10].

By using the credit data provided by commercial banks and combining it with the Logistic Regression model, this paper makes an empirical study on the credit risk of enterprise loans and points out the factors that affect the credit risk of enterprise loans.

Research Methods

According to the information provided by the five-level classified information system of enterprise loan of a commercial bank in the province, this paper randomly selects the data of one enterprise as the forecast sample and uses it to estimate the Logistic model. To facilitate the research, this paper classifies the normal and concerned enterprises as good customers, and the happy, suspicious and loss-related enterprises as bad customers. Of the 150 forecast samples,103 are good customers and 47 are bad customers.

According to the information provided by the enterprise to the bank, I selected the quantitative indicators reflecting the solvency, operation ability, profitability and development ability of the enterprise and the qualitative indicators of the quality and management level of the enterprise’s managers as the independent variables of the model. The details of these indicators are shown in Table 1.

Table 1

Properties of independent variables

	The independent variables	Nature	Definitions or thresholds
Short-term solvency indicators	Current ratio X1	Continuous variables	Current assets ÷ Current liabilities
	Quick ratio X2	Continuous variables	Quick assets ÷ Current liabilities
	Cash flow liability ratio X3	Continuous variables	Net operating cash flow ÷ Current liabilities
Indicators of long-term solvency	Asset-liability ratio X4	Continuous variables	The total amount of liabilities ÷ The total assets of
	Equity ratio X5	Continuous variables	Liabilities ÷ Owner’s equity
	Shareholders’ equity ratio X6	Continuous variables	Owner’s equity ÷ assets
	Interest coverage multiple X7	Continuous variables	(Total profit + interest expense) ÷ Interest charges
Operational capability index	Accounts receivable turnover X8	Continuous variables	Net income from main business ÷ Average balance of accounts receivable
	Deposit turnover X9	Continuous variables	Main business cost ÷ Mean surviving balance
	Shareholder’s equity turnover X10	Continuous variables	Main business revenue ÷ Average balance of shareholders’ equity
	Total asset turnover X11	Continuous variables	Main business revenue ÷ Average balance of assets
Profitability index	Net profit rate on sales X12	Continuous variables	Net profit ÷ Main business revenue
	Return on assets X13	Continuous variables	Net profit ÷ Average balance of assets
	Return on shareholders’ equity X14	Continuous variables	Net profit ÷ Average balance of shareholders’ equity
Development Capability Indicators	Growth rate of net assets X15	Continuous variables	(Owners’ equity at year end-Owners’ equity at the beginning of the year) ÷Owners’ equity at the beginning of the year
	Growth rate of total assets X16	Continuous variables	(Total assets at year-end-Total assets at the beginning of year) ÷ Total assets at the beginning of year
	Main business revenue growth rate X17	Continuous variables	(Year-end main operating income-Main operating revenue at the beginning of the year) ÷Main operating revenue at the beginning of the year
	Growth rate of after-tax profits X18	Continuous variables	(Net interest rate at year end-Net interest at the beginning of the year) ÷Net interest at the beginning of the year
Qualitative indicators	Quality of key management personnel X19	Ordered class variables	“The leaders of the enterprise have rich management experience, strong management ability, remarkable historical performance and good social reputation for personal tourism”=5 ; “Corporate leaders have strong management ability and good management experience”=4; “Business leaders have strong management ability and certain management experience”=3 ; “Business leaders have general management ability and experience”=2; “other”=1
Qualitative indicators	Enterprise management level X20	Ordered class variables	“The property right is clear, the organization structure is perfect, the property system is sound”=3; “general”=2; “other”=1

Results analysis and discussion

3.1

Establishment of model

The purpose of this paper is to establish a Logistic model to judge the credit risk of an enterprise based on the financial data provided by the enterprise and some qualitative indicators to measure the nature of the enterprise, which is represented by the probability P-value. The cut-off point selected is 0.5, that is, when the value is greater than 0.5, the default probability of the enterprise is small and the enterprise is considered as a good customer, and the code of the explained variable is y=1; when the p-value is less than 0 and.5, the default probability of the enterprise is large and the enterprise is considered as a bad customer. In this case, the code of the explained variable is y=0. The explanatory variables are the 20 indicators in the table. The Logistic Regression model thus established is as follows:

(1)

Ln \frac{P}{1 - P} = α + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{19} X_{19} + β_{20} X_{20}

\operatorname{Ln} \frac{P}{1-P}=\alpha+\beta_{1} X_{1}+\beta_{2} X_{2}+\ldots+\beta_{19} X_{19}+\beta_{20} X_{20}

Where P is the probability that the enterprise is a good customer. Square Enter method was adopted for variable selection, and SPSS13.0 software was used for Logistic Regression analysis of the data of 150 predicted samples. The results are shown in Table 2:

Table 2

Block 1: Method=Enter

		B	S.E.	Wald	df	Sig	Exp(B)
Step 1a	χ¹	634.012	444.323	2.036	1	.154	2E+275
	χ²	-976.358	654.229	2.227	1	.136	.000
	χ³	9.901	10.752	.848	1	.357	19959.659
	χ⁴	-6.869	20.664	.110	1	.740	.001
	χ⁵	-.700	.468	2.237	1	.135	.496
	χ⁶	15.918	17.997	.782	1	.376	8188569
	χ⁷	.012	.023	.258	1	.612	1.012
	χ⁸	-.365	3.199	.013	1	.909	.694
	χ⁹	-4.079	12.464	.107	1	.743	.017
	χ¹⁰	-.051	.195	.069	1	.792	.950
	χ¹¹	421.878	335.115	1.585	1	.208	2E+183
	χ¹²	5.475	9.537	.330	1	.566	238.710
	χ¹³	21.546	13.779	2.445	1	.118	2E+009
	χ¹⁴	2.252	4.017	.314	1	.575	9.509
	χ¹⁵	-.022	.156	.020	1	.887	.978
	χ¹⁶	17.818	9.660	3.402	1	.065	5E+007
	χ¹⁷	-2.940	3.013	.952	1	.329	.053
	χ¹⁸	-.069	.064	1.173	1	.279	.933
	χ¹⁹	4.692	2.145	4.783	1	.029	109.041
	χ²⁰	1.156	.945	1.495	1	.221	3.176
	Constant	-14.618	18.057	.655	1	.418	.000

As the method of forcing all independent variables into the model is adopted for variable selection, it can be seen from the above table that the Wald value of each variable is small, and the corresponding probability P-value is almost all greater than the significance level of 0.05, which means that the null hypothesis should not be rejected and the linear relationship between these independent variables and Logit P is not significant. Because the model contains insignificant explanatory variables, it is necessary to filter the variables and build a new model.

3.2

Variable Screening

Here, the Forward(LR) stepwise screening strategy is adopted, that is, variables are entered into the equation according to the score test statistics, and excluded from the equation according to the likelihood ratio chi-square under the principle of maximum likelihood estimation. The standard for adding the independent variable is set at 0.05, and the standard for deleting the independent variable is set at 0.1, that is, the probability value corresponding to the F value when the independent variable is automatically excluded from the model is 0.1. The running results are as follows (Table 3).

Table 3

Block 1: Method=Forward Stepwise

		B	S.E.	Wald	df	Sig	Exp(B)
Step 1a	χ¹⁹	1.331	.238	31.148	1	.000	3.783
	Constant	-2.179	.498	19.163	1	.000	.113
Step 2b	χ⁶	8.078	1.784	20.498	1	.000	3222.223
	χ¹⁹	1.505	.326	21.337	1	.000	4.502
	Constant	-5.303	1.058	25.140	1	.000	.005
Step 3c	χ⁶	8.612	1.933	19.840	1	.000	5496.420
	χ¹³	2.575	1.045	6.069	1	.014	13.132
	χ¹⁹	1.565	.344	20.679	1	.000	4.781
	Constant	-5.623	1.131	24.707	1	.000	.004
Step 4d	χ⁶	8.742	2.100	17.327	1	.000	6261.249
	χ¹³	3.053	1.147	7.092	1	.008	21.181
	χ¹⁹	1.414	.346	16.708	1	.000	4.113
	χ²⁰	1.407	.476	8.726	1	.003	4,084
	Constant	-8.125	1.714	22.461	1	.000	.000
Step 5e	χ³	9.582	3.431	7.799	1	.005	14507.807
	χ⁶	10.112	2.551	15.718	1	.000	24542.686
	χ¹³	2.996	1.296	5.343	1	.021	20.014
	χ¹⁹	1.442	.367	15.439	1	.000	4.228
	χ²⁰	1,807	.565	10.215	1	.001	6.090
	Constant	-10.309	2.299	20.112	1	.000	.000
Step 6f	χ³	9.069	4.046	5.025	1	.025	6684.620
	χ⁶	11.634	3.498	11.058	1	.001	112847.4
	χ¹³	3.484	1.995	3.051	1	.081	32.594
	χ¹⁶	6.081	2.339	6.760	1	.009	437.644
	χ¹⁹	1.600	.439	13.263	1	.000	4.952
	χ²⁰	-1.441	.567	6.466	1	.011	4.226
	Constant	-10.717	2.602	16.962	1	.000	.000
Step 7f	χ³	9.476	3.823	6.145	1	..013	13046.413
	χ⁶	11.407	3.426	11.087	1	.001	89911.947
	χ¹⁶	6.297	2.295	7.530	1	.006	543.072
	χ¹⁹	1.569	.427	13.501	1	.000	4.803
	χ²⁰	1.409	.552	6.519	1	.011	4.092
	Constant	-10.587	2.567	17.006	1	.000	.000
Step 8g	χ³	10.142	4.362	5.405	1	.020	25399.128
	χ⁵	-.276	.177	2.437	1	.119	.759
	χ⁶	10.080	3.100	10.573	1	.001	23855.343
	χ¹⁶	8.276	2.703	9.377	1	.002	3927.887
	χ¹⁹	2.111	.596	12.547	1	.000	8.259
	χ²⁰	-1.404	.590	5.664	1	.017	4.071
	Constant	-10.562	2.787	14.365	1	.000	.000

The table shows each step at which the variable has been filtered. It can be seen that the model has gone through an 8-step process of variable screening, and the independent variables that are finally included in the model are cash flow liability ratio X3, property right ratio X5, shareholder equity ratio X6, the growth rate of total assets X16, quality of main managers X19 and enterprise management level X20. In addition to the property of the six, since the twin mass ratio of Wald value card square value is less than the significance level of 0.05, 3.841, and for the rest of the five variables the Wald statistic was significant, the corresponding probability value is less than the significance level of 0.05, shows that cash flow asset-liability ratio and shareholders’ equity ratio, growth rate of total assets, the main management personnel quality and enterprise management level of all the five of them have a significant linear relationship with Logit P. As for the significant contribution of the ratio of property rights to the default of enterprises, this paper will test it in the end.

The table shows the step-by-step, grouping and chi-square values of the model and the corresponding probability values after each step of variable screening. It can be seen that the chi-square value of the model in the final step is 147.042, and the corresponding probability value is 0, which is less than the significance level of 0.05. It shows that the independent variable of the model can explain the dependent variable well, and the information provided by the independent variable can help to predict the occurrence of the event better.

Table 5 shows the indicators that reflect the goodness of fit of the model. After the screening of variables, the Nagelkerke R2 statistic value corresponding to the model finally obtained was 0.878, which was close to 1, and the Hosmer-Lemeshow unified measurement value was 2.185, which was less than the degree of freedom value of 8, and the significance level was 15.51 of the χ² distribution value corresponding to 0.05, indicating that the statistic was not significant. Therefore, the final model fits the data well and has a high degree of goodness of fit.

Table 5

Model Summary

	-2 Log	Cox&Snell	Nagelkerke
Step	likelihood	R Square	R Square
1	129.593a	.316	444
2	77.879b	.515	724
3	74.269c	.527	740
4	63.951c	.558	785
5	53.695d	.588	826
6	45.133d	.610	858
7	46.403d	.607	853
8	39.481d	.625	878

Hosmer-Lemeshow Test

Step	Chi-square	df	sig
1	7.222	3	.065
2	9.985	8	.266
3	5.324	8	.722
4	1.488	8	.993
5	1.293	8	.996
6	2.442	8	.964
7	2.630	8	.955
8	2.185	8	.975

Table 6

Omnibus tests of Model Coefficients

		Chi-square	df	sig
Step 1	Step	140.119	5	.000
	Block	140.119	5	.000
	Model	140.119	5	.000

χ² (model with equity ratio) -χ² (model without equity ratio) =147.042-140.119=6.923

Table 7

Prediction of the model when the cut-off point is 0.5

	Custom		percentage
Custom	Bad customer	Good customer	Correct
Bad customer	18	6	75.0
Good customer	4	55	93.2
Overall percentage			88.0

Table 8

The prediction of the model when the cut-off point is 0.7

	Custom		percentage
Custom	Bad customer	Good custome	Correct
Bad customer	21	3	87.5
Good custome	6	53	89.8
Overall percentage			89.2

The property right ratio index is tested. When the property right ratio index is not included in the model, the Chi-square value of the model is as follows:

The difference in degrees of freedom is only 6-5=1. It can be seen that the difference between the model chi-square values of the two models is greater than the distribution value of 3.84 when the degree of freedom is 1 and the significance level is 0.05. Therefore, we reject the original assumption that the ratio of property rights has no significant effect on whether an enterprise defaults, that is, the ratio of property rights has a strong effect on whether an enterprise defaults.

Thus, the estimated Logistic model is as follows:

(2)

Ln \frac{P}{1 - P} = - 10.562 + 10.142 X_{1} - 0.0276 X_{2} + 10.08 X_{3} + 8.276 X_{4} + 2.111 X_{5} + 1.404 X_{6}

\operatorname{Ln} \frac{P}{1-P}=-10.562+10.142 X_{1}-0.0276 X_{2}+10.08 X_{3}+8.276 X_{4}+2.111 X_{5}+1.404 X_{6}

Where P is the probability that the enterprise does not default. The independent variables are cash flow liability ratio, property right ratio, shareholder equity ratio, total assets growth rate, main management personnel quality and enterprise management level.

3.3

Model test

The estimated model is tested with 83 randomly selected prediction samples, and the cut-off point is still 0.5. The test results are as follows.

As can be seen from the table, among the 24 actual bad customers, the model correctly identified 18 people and wrongly identified 6 people, with a correct rate of 75%. Among the 59 good customers, the model correctly identified 55 people and wrongly identified 4 people, with a correct rate of 93.2%. The prediction accuracy of the model is 88%. If the cut-off point is set as 0.7, that is, the predicted probability value is greater than 0.7, then the classified predicted value of the explained variable is considered to be 1, and the customer is a good customer. If the customer is less than 0.7, then the customer is considered to be a bad customer with a high probability of default. The results of testing with the estimated model are shown in the table below.

Table 4

Omnibus tests of Model Coefficients

		Chi-square	df	sig
Step 1	Step	56.930	1	.000
	Block	56.930	1	.000
	Model	56.930	1	.000
Step 2	Step	51.714	1	.000
	Block	108.644	2	.000
	Model	108.644	2	.000
Step 3	Step	3.609	1	.057
	Block	112.253	3	.000
	Model	112.253	3	.000
Step 4	Step	10.319	1	.001
	Block	122.572	4	.000
	Model	122.572	4	.000
Step 5	Step	10.256	1	.001
	Block	132.828	5	.000
	Model	132.828	5	.000
Step 6	Step	8.562	1	.003
	Block	141.390	6	.000
	Model	141.390	6	.000
Step 7	Step	-1.271	1	.250
	Block	140.119	5	.000
	Model	140.119	5	.000
Step 8	Step	6.922	1	.009
	Block	147.042	6	.000
	Model	147.042	6	.000

The prediction accuracy of the model when the cut-off point is 0.5

In the table, when the cut-off point is set to 0.7, among the 24 actual bad customers, the model correctly identifies 21 and wrongly identifies 3, with a correct rate of 87.5%. Among the 59 good customers, the model correctly identified 53 people and wrongly identified 6 people, and the correct rate was 89.8%. The prediction accuracy of the model is 89.2%.

The prediction accuracy of the model when the cut-off point is 0.7

In summary, when the cut-off point is 0.5, the overall prediction accuracy of the estimated model is 88%. When the cut-off point is 0.7, the overall prediction accuracy is 89.2%. The prediction result of the model is satisfactory.

Conclusion

In this paper, the Logistic model is used to quantitatively analyse the loan credit risks of natural persons and enterprises, to get the factors that affect the credit risks of customers. For natural person loan credit risk measurement factors are gender, educational level, health status, monthly savings, household registration, the types of loans and loan time limit, and the credit risk factors that affect corporate loans have cash dynamic asset-liability ratio, equity ratio, shareholders’ equity ratio, the growth rate of total assets of four quantitative indicators and main management personnel quality and management level of qualitative indicators. The estimated Logistic model was tested with confirmation samples. When the cut-off point is set to 0.5, the overall correct rate of credit risk measurement for natural persons and enterprises is 84.9% and 88%, respectively. When the cut-off point is set at 0.7, the overall accuracy is 89.2%. In general, the results of credit risk measurement of bank customers by the Logistic model are quite satisfactory. The Logistic Regression model is easy to understand and efficient, so it is worth popularising and putting it into practice in commercial banks in China.

eISSN:: 2444-8656
Idioma:: Inglés

Calendario de la edición:: Volume Open
Temas de la revista:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

RSS Feed de revista

Application of Logical Regression Function Model in Credit Business of Commercial Banks

Publicado en línea: 27 dic 2021

Páginas: 513 - 522

Recibido: 16 jun 2021

Aceptado: 24 sept 2021

DOI: https://doi.org/10.2478/amns.2021.1.00088

Palabras clave
Credit risk, Credit risk of commercial bank, Logistic Regression Model

© 2021 Ying Wei published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Application of Logical Regression Function Model in Credit Business of Commercial Banks

Publicado en línea: 27 dic 2021

Páginas: 513 - 522

Recibido: 16 jun 2021

Aceptado: 24 sept 2021

DOI: https://doi.org/10.2478/amns.2021.1.00088

Palabras claveCredit risk, Credit risk of commercial bank, Logistic Regression Model

© 2021 Ying Wei published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Palabras clave
Credit risk, Credit risk of commercial bank, Logistic Regression Model