This paper takes the credit risk management of commercial banks in China as the mainline, and puts forward a quantitative model that is suitable for the credit risk management of commercial banks in China at present – Logistic regression model, and takes a commercial bank as an example, using the regression model to conduct empirical research on the credit risk of enterprises. The estimated Logistic model was tested with confirmation samples. The results show that when the cut-off point is set to 0.5, the overall correct rate of the model for the credit risk measurement of natural persons and for enterprises reaches 84.9% and 88%, respectively. When the cut-off point is set at 0.7, the overall accuracy is 89.2%. In general, the results of credit risk measurement of bank customers by the Logistic model are quite satisfactory. The Logistic Regression model is easy to understand and efficient, so it is worth popularising and putting into practice in commercial banks in China.

#### Keywords

- Credit risk
- Credit risk of commercial bank
- Logistic Regression Model

#### MSC 2010

- 03D99

In recent years, with the reform of China’s financial system, the opening of the financial industry to the outside world, the continuous entry of foreign banks, the emergence of financial innovation and the diversification of banking business, China’s banking industry is facing a huge challenge to participate in international competition. In this situation, China’s commercial banks should learn from the international advanced credit risk management experience, research the credit risk management technology suitable for China’s banks, to narrow the gap with other countries’ banking industry and improve the overall competitiveness of China’s banking industry.

Due to the asymmetric distribution of credit risk and the difficulty in data collection, it is very difficult to measure credit risk, which is also the reason why the research of credit risk management model lags behind the market risk management model. In recent years, the research on credit risk measurement models by western banks and other institutions has improved this situation. The study of these measurement models has a great impact on the bank’s credit risk management, which makes the bank’s credit risk management from qualitative to quantitative leap. However, the current research on credit risk measurement in China is more about the application of various methods proposed by the West, to evaluate the advantages and disadvantages of various methods, thus providing a good reference for the practice of credit risk of commercial banks.

This research paper deals with the commercial banks’ credit risk management based on a comparative study of the traditional and modern credit risk measurement methods and analyses enterprise and individual credit risk using the Logistic Regression model. We try to establish a suitable credit risk measurement method for China’s banks, and put forward some suggestions for management to improve our country’s commercial bank credit assets quality, and thereby reduce credit risk, which has a practical significance.

China’s commercial banks face many forms of financial risks in the process of business operation and management, and credit risk is always the main form of bank risk. It not only has an important impact on the operation safety of commercial banks but also affects the stability of the whole financial system of a country and even the stability and coordinated development of the global economy due to the domino effect. A risk assessment system becomes the key to solving the problem. Xie et al. reviewed the credit risk indicators of supply chain finance with the method of literature induction and constructed the credit risk assessment index system of online supply chain finance of commercial banks [1]. With the development of technology, the application of the Logistic Regression model emerged as is the need of the hour.

The Logical Regression model has been developing in the research of credit risk. Lan et al. built a more scientific credit risk evaluation system through the method of step-based regression based on the actual data of rural banks in China and constructed a calculation model of default probability with the Logistic Regression model [2]. As et al. put forward the methodological hypothesis of the Logistic Regression Model and illustrated the method of using this research method to evaluate family financial decisions [3]. Chen et al. adopted the Logistic Regression method and took the borrower’s default situation as the explanatory variable to establish the borrower’s credit default risk assessment model [4]. Assef et al. used artificial neural networks (ANNS), especially multi-layer perceptrons (MLP) and radial basis functions (RBF), as well as Logistic regression (LR) statistical models to analyse the bank credit status of legal persons (non-default, default and temporary default) to assist analysts in this field in making decisions [5]. Teles et al. compared the efficiency of logistic regression and linear regression in predicting whether the credit business needs recovery [6]. Jun et al. made use of bank business data and machine learning model to realie the credit anti-fraud prediction model based on a logistic regression algorithm [7]. Using the multiple logistic regression method, Zhou et al. discussed the factors affecting online consumer financial credit and countermeasures to improve its classification efficiency, providing theoretical support for improving its risk control system [8]. Xia et al. proposed a credit rating method combining the XGBOOST algorithm with the Logistic Group-Lasso model [9]. Hirk et al. used two different connection functions to evaluate the creditworthiness of a company by assuming the multivariate normal distribution and multivariate logistic distribution of the underlying variables of ordinal number results [10].

By using the credit data provided by commercial banks and combining it with the Logistic Regression model, this paper makes an empirical study on the credit risk of enterprise loans and points out the factors that affect the credit risk of enterprise loans.

According to the information provided by the five-level classified information system of enterprise loan of a commercial bank in the province, this paper randomly selects the data of one enterprise as the forecast sample and uses it to estimate the Logistic model. To facilitate the research, this paper classifies the normal and concerned enterprises as good customers, and the happy, suspicious and loss-related enterprises as bad customers. Of the 150 forecast samples,103 are good customers and 47 are bad customers.

According to the information provided by the enterprise to the bank, I selected the quantitative indicators reflecting the solvency, operation ability, profitability and development ability of the enterprise and the qualitative indicators of the quality and management level of the enterprise’s managers as the independent variables of the model. The details of these indicators are shown in Table 1.

Properties of independent variables

The independent variables | Nature | Definitions or thresholds | |
---|---|---|---|

Short-term solvency indicators | Current ratio X1 | Continuous variables | Current assets ÷ Current liabilities |

Quick ratio X2 | Continuous variables | Quick assets ÷ Current liabilities | |

Cash flow liability ratio X3 | Continuous variables | Net operating cash flow ÷ Current liabilities | |

Indicators of long-term solvency | Asset-liability ratio X4 | Continuous variables | The total amount of liabilities ÷ The total assets of |

Equity ratio X5 | Continuous variables | Liabilities ÷ Owner’s equity | |

Shareholders’ equity ratio X6 | Continuous variables | Owner’s equity ÷ assets | |

Interest coverage multiple X7 | Continuous variables | (Total profit + interest expense) ÷ Interest charges | |

Operational capability index | Accounts receivable turnover X8 | Continuous variables | Net income from main business ÷ Average balance of accounts receivable |

Deposit turnover X9 | Continuous variables | Main business cost ÷ Mean surviving balance | |

Shareholder’s equity turnover X10 | Continuous variables | Main business revenue ÷ Average balance of shareholders’ equity | |

Total asset turnover X11 | Continuous variables | Main business revenue ÷ Average balance of assets | |

Profitability index | Net profit rate on sales X12 | Continuous variables | Net profit ÷ Main business revenue |

Return on assets X13 | Continuous variables | Net profit ÷ Average balance of assets | |

Return on shareholders’ equity X14 | Continuous variables | Net profit ÷ Average balance of shareholders’ equity | |

Development Capability Indicators | Growth rate of net assets X15 | Continuous variables | (Owners’ equity at year end-Owners’ equity at the beginning of the year) ÷Owners’ equity at the beginning of the year |

Growth rate of total assets X16 | Continuous variables | (Total assets at year-end-Total assets at the beginning of year) ÷ Total assets at the beginning of year | |

Main business revenue growth rate X17 | Continuous variables | (Year-end main operating income-Main operating revenue at the beginning of the year) ÷Main operating revenue at the beginning of the year | |

Growth rate of after-tax profits X18 | Continuous variables | (Net interest rate at year end-Net interest at the beginning of the year) ÷Net interest at the beginning of the year | |

Qualitative indicators | Quality of key management personnel X19 | Ordered class variables | “The leaders of the enterprise have rich management experience, strong management ability, remarkable historical performance and good social reputation for personal tourism”=5 ; “Corporate leaders have strong management ability and good management experience”=4; “Business leaders have strong management ability and certain management experience”=3 ; “Business leaders have general management ability and experience”=2; “other”=1 |

Enterprise management level X20 | Ordered class variables | “The property right is clear, the organization structure is perfect, the property system is sound”=3; “general”=2; “other”=1 |

The purpose of this paper is to establish a Logistic model to judge the credit risk of an enterprise based on the financial data provided by the enterprise and some qualitative indicators to measure the nature of the enterprise, which is represented by the probability P-value. The cut-off point selected is 0.5, that is, when the value is greater than 0.5, the default probability of the enterprise is small and the enterprise is considered as a good customer, and the code of the explained variable is y=1; when the p-value is less than 0 and.5, the default probability of the enterprise is large and the enterprise is considered as a bad customer. In this case, the code of the explained variable is y=0. The explanatory variables are the 20 indicators in the table. The Logistic Regression model thus established is as follows:

Where P is the probability that the enterprise is a good customer. Square Enter method was adopted for variable selection, and SPSS13.0 software was used for Logistic Regression analysis of the data of 150 predicted samples. The results are shown in Table 2:

Block 1: Method=Enter

B | S.E. | Wald | df | Sig | Exp(B) | ||
---|---|---|---|---|---|---|---|

Step 1a | ^{1} |
634.012 | 444.323 | 2.036 | 1 | .154 | 2E+275 |

^{2} |
-976.358 | 654.229 | 2.227 | 1 | .136 | .000 | |

^{3} |
9.901 | 10.752 | .848 | 1 | .357 | 19959.659 | |

^{4} |
-6.869 | 20.664 | .110 | 1 | .740 | .001 | |

^{5} |
-.700 | .468 | 2.237 | 1 | .135 | .496 | |

^{6} |
15.918 | 17.997 | .782 | 1 | .376 | 8188569 | |

^{7} |
.012 | .023 | .258 | 1 | .612 | 1.012 | |

^{8} |
-.365 | 3.199 | .013 | 1 | .909 | .694 | |

^{9} |
-4.079 | 12.464 | .107 | 1 | .743 | .017 | |

^{10} |
-.051 | .195 | .069 | 1 | .792 | .950 | |

^{11} |
421.878 | 335.115 | 1.585 | 1 | .208 | 2E+183 | |

^{12} |
5.475 | 9.537 | .330 | 1 | .566 | 238.710 | |

^{13} |
21.546 | 13.779 | 2.445 | 1 | .118 | 2E+009 | |

^{14} |
2.252 | 4.017 | .314 | 1 | .575 | 9.509 | |

^{15} |
-.022 | .156 | .020 | 1 | .887 | .978 | |

^{16} |
17.818 | 9.660 | 3.402 | 1 | .065 | 5E+007 | |

^{17} |
-2.940 | 3.013 | .952 | 1 | .329 | .053 | |

^{18} |
-.069 | .064 | 1.173 | 1 | .279 | .933 | |

^{19} |
4.692 | 2.145 | 4.783 | 1 | .029 | 109.041 | |

^{20} |
1.156 | .945 | 1.495 | 1 | .221 | 3.176 | |

Constant | -14.618 | 18.057 | .655 | 1 | .418 | .000 |

As the method of forcing all independent variables into the model is adopted for variable selection, it can be seen from the above table that the Wald value of each variable is small, and the corresponding probability P-value is almost all greater than the significance level of 0.05, which means that the null hypothesis should not be rejected and the linear relationship between these independent variables and Logit P is not significant. Because the model contains insignificant explanatory variables, it is necessary to filter the variables and build a new model.

Here, the Forward(LR) stepwise screening strategy is adopted, that is, variables are entered into the equation according to the score test statistics, and excluded from the equation according to the likelihood ratio chi-square under the principle of maximum likelihood estimation. The standard for adding the independent variable is set at 0.05, and the standard for deleting the independent variable is set at 0.1, that is, the probability value corresponding to the F value when the independent variable is automatically excluded from the model is 0.1. The running results are as follows (Table 3).

Block 1: Method=Forward Stepwise

B | S.E. | Wald | df | Sig | Exp(B) | ||
---|---|---|---|---|---|---|---|

Step 1a | ^{19} |
1.331 | .238 | 31.148 | 1 | .000 | 3.783 |

Constant | -2.179 | .498 | 19.163 | 1 | .000 | .113 | |

Step 2b | ^{6} |
8.078 | 1.784 | 20.498 | 1 | .000 | 3222.223 |

^{19} |
1.505 | .326 | 21.337 | 1 | .000 | 4.502 | |

Constant | -5.303 | 1.058 | 25.140 | 1 | .000 | .005 | |

Step 3c | ^{6} |
8.612 | 1.933 | 19.840 | 1 | .000 | 5496.420 |

^{13} |
2.575 | 1.045 | 6.069 | 1 | .014 | 13.132 | |

^{19} |
1.565 | .344 | 20.679 | 1 | .000 | 4.781 | |

Constant | -5.623 | 1.131 | 24.707 | 1 | .000 | .004 | |

Step 4d | ^{6} |
8.742 | 2.100 | 17.327 | 1 | .000 | 6261.249 |

^{13} |
3.053 | 1.147 | 7.092 | 1 | .008 | 21.181 | |

^{19} |
1.414 | .346 | 16.708 | 1 | .000 | 4.113 | |

^{20} |
1.407 | .476 | 8.726 | 1 | .003 | 4,084 | |

Constant | -8.125 | 1.714 | 22.461 | 1 | .000 | .000 | |

Step 5e | ^{3} |
9.582 | 3.431 | 7.799 | 1 | .005 | 14507.807 |

^{6} |
10.112 | 2.551 | 15.718 | 1 | .000 | 24542.686 | |

^{13} |
2.996 | 1.296 | 5.343 | 1 | .021 | 20.014 | |

^{19} |
1.442 | .367 | 15.439 | 1 | .000 | 4.228 | |

^{20} |
1,807 | .565 | 10.215 | 1 | .001 | 6.090 | |

Constant | -10.309 | 2.299 | 20.112 | 1 | .000 | .000 | |

Step 6f | ^{3} |
9.069 | 4.046 | 5.025 | 1 | .025 | 6684.620 |

^{6} |
11.634 | 3.498 | 11.058 | 1 | .001 | 112847.4 | |

^{13} |
3.484 | 1.995 | 3.051 | 1 | .081 | 32.594 | |

^{16} |
6.081 | 2.339 | 6.760 | 1 | .009 | 437.644 | |

^{19} |
1.600 | .439 | 13.263 | 1 | .000 | 4.952 | |

^{20} |
-1.441 | .567 | 6.466 | 1 | .011 | 4.226 | |

Constant | -10.717 | 2.602 | 16.962 | 1 | .000 | .000 | |

Step 7f | ^{3} |
9.476 | 3.823 | 6.145 | 1 | ..013 | 13046.413 |

^{6} |
11.407 | 3.426 | 11.087 | 1 | .001 | 89911.947 | |

^{16} |
6.297 | 2.295 | 7.530 | 1 | .006 | 543.072 | |

^{19} |
1.569 | .427 | 13.501 | 1 | .000 | 4.803 | |

^{20} |
1.409 | .552 | 6.519 | 1 | .011 | 4.092 | |

Constant | -10.587 | 2.567 | 17.006 | 1 | .000 | .000 | |

Step 8g | ^{3} |
10.142 | 4.362 | 5.405 | 1 | .020 | 25399.128 |

^{5} |
-.276 | .177 | 2.437 | 1 | .119 | .759 | |

^{6} |
10.080 | 3.100 | 10.573 | 1 | .001 | 23855.343 | |

^{16} |
8.276 | 2.703 | 9.377 | 1 | .002 | 3927.887 | |

^{19} |
2.111 | .596 | 12.547 | 1 | .000 | 8.259 | |

^{20} |
-1.404 | .590 | 5.664 | 1 | .017 | 4.071 | |

Constant | -10.562 | 2.787 | 14.365 | 1 | .000 | .000 |

The table shows each step at which the variable has been filtered. It can be seen that the model has gone through an 8-step process of variable screening, and the independent variables that are finally included in the model are cash flow liability ratio X3, property right ratio X5, shareholder equity ratio X6, the growth rate of total assets X16, quality of main managers X19 and enterprise management level X20. In addition to the property of the six, since the twin mass ratio of Wald value card square value is less than the significance level of 0.05, 3.841, and for the rest of the five variables the Wald statistic was significant, the corresponding probability value is less than the significance level of 0.05, shows that cash flow asset-liability ratio and shareholders’ equity ratio, growth rate of total assets, the main management personnel quality and enterprise management level of all the five of them have a significant linear relationship with Logit P. As for the significant contribution of the ratio of property rights to the default of enterprises, this paper will test it in the end.

The table shows the step-by-step, grouping and chi-square values of the model and the corresponding probability values after each step of variable screening. It can be seen that the chi-square value of the model in the final step is 147.042, and the corresponding probability value is 0, which is less than the significance level of 0.05. It shows that the independent variable of the model can explain the dependent variable well, and the information provided by the independent variable can help to predict the occurrence of the event better.

Table 5 shows the indicators that reflect the goodness of fit of the model. After the screening of variables, the Nagelkerke R2 statistic value corresponding to the model finally obtained was 0.878, which was close to 1, and the Hosmer-Lemeshow unified measurement value was 2.185, which was less than the degree of freedom value of 8, and the significance level was 15.51 of the ^{2} distribution value corresponding to 0.05, indicating that the statistic was not significant. Therefore, the final model fits the data well and has a high degree of goodness of fit.

Model Summary

-2 Log | Cox&Snell | Nagelkerke | |
---|---|---|---|

Step | likelihood | R Square | R Square |

1 | 129.593a | .316 | 444 |

2 | 77.879b | .515 | 724 |

3 | 74.269c | .527 | 740 |

4 | 63.951c | .558 | 785 |

5 | 53.695d | .588 | 826 |

6 | 45.133d | .610 | 858 |

7 | 46.403d | .607 | 853 |

8 | 39.481d | .625 | 878 |

Omnibus tests of Model Coefficients

Chi-square | df | sig | ||
---|---|---|---|---|

Step 1 | Step | 140.119 | 5 | .000 |

Block | 140.119 | 5 | .000 | |

Model | 140.119 | 5 | .000 |

^{2} (model with equity ratio) -^{2} (model without equity ratio) =147.042-140.119=6.923

Prediction of the model when the cut-off point is 0.5

Custom | percentage | ||
---|---|---|---|

Custom | Bad customer | Good customer | Correct |

Bad customer | 18 | 6 | 75.0 |

Good customer | 4 | 55 | 93.2 |

Overall percentage | 88.0 |

The prediction of the model when the cut-off point is 0.7

Custom | percentage | ||
---|---|---|---|

Custom | Bad customer | Good custome | Correct |

Bad customer | 21 | 3 | 87.5 |

Good custome | 6 | 53 | 89.8 |

Overall percentage | 89.2 |

The property right ratio index is tested. When the property right ratio index is not included in the model, the Chi-square value of the model is as follows:

The difference in degrees of freedom is only 6-5=1. It can be seen that the difference between the model chi-square values of the two models is greater than the distribution value of 3.84 when the degree of freedom is 1 and the significance level is 0.05. Therefore, we reject the original assumption that the ratio of property rights has no significant effect on whether an enterprise defaults, that is, the ratio of property rights has a strong effect on whether an enterprise defaults.

Thus, the estimated Logistic model is as follows:

Where P is the probability that the enterprise does not default. The independent variables are cash flow liability ratio, property right ratio, shareholder equity ratio, total assets growth rate, main management personnel quality and enterprise management level.

The estimated model is tested with 83 randomly selected prediction samples, and the cut-off point is still 0.5. The test results are as follows.

As can be seen from the table, among the 24 actual bad customers, the model correctly identified 18 people and wrongly identified 6 people, with a correct rate of 75%. Among the 59 good customers, the model correctly identified 55 people and wrongly identified 4 people, with a correct rate of 93.2%. The prediction accuracy of the model is 88%. If the cut-off point is set as 0.7, that is, the predicted probability value is greater than 0.7, then the classified predicted value of the explained variable is considered to be 1, and the customer is a good customer. If the customer is less than 0.7, then the customer is considered to be a bad customer with a high probability of default. The results of testing with the estimated model are shown in the table below.

Omnibus tests of Model Coefficients

Chi-square | df | sig | ||
---|---|---|---|---|

Step 1 | Step | 56.930 | 1 | .000 |

Block | 56.930 | 1 | .000 | |

Model | 56.930 | 1 | .000 | |

Step 2 | Step | 51.714 | 1 | .000 |

Block | 108.644 | 2 | .000 | |

Model | 108.644 | 2 | .000 | |

Step 3 | Step | 3.609 | 1 | .057 |

Block | 112.253 | 3 | .000 | |

Model | 112.253 | 3 | .000 | |

Step 4 | Step | 10.319 | 1 | .001 |

Block | 122.572 | 4 | .000 | |

Model | 122.572 | 4 | .000 | |

Step 5 | Step | 10.256 | 1 | .001 |

Block | 132.828 | 5 | .000 | |

Model | 132.828 | 5 | .000 | |

Step 6 | Step | 8.562 | 1 | .003 |

Block | 141.390 | 6 | .000 | |

Model | 141.390 | 6 | .000 | |

Step 7 | Step | -1.271 | 1 | .250 |

Block | 140.119 | 5 | .000 | |

Model | 140.119 | 5 | .000 | |

Step 8 | Step | 6.922 | 1 | .009 |

Block | 147.042 | 6 | .000 | |

Model | 147.042 | 6 | .000 |

In the table, when the cut-off point is set to 0.7, among the 24 actual bad customers, the model correctly identifies 21 and wrongly identifies 3, with a correct rate of 87.5%. Among the 59 good customers, the model correctly identified 53 people and wrongly identified 6 people, and the correct rate was 89.8%. The prediction accuracy of the model is 89.2%.

In summary, when the cut-off point is 0.5, the overall prediction accuracy of the estimated model is 88%. When the cut-off point is 0.7, the overall prediction accuracy is 89.2%. The prediction result of the model is satisfactory.

In this paper, the Logistic model is used to quantitatively analyse the loan credit risks of natural persons and enterprises, to get the factors that affect the credit risks of customers. For natural person loan credit risk measurement factors are gender, educational level, health status, monthly savings, household registration, the types of loans and loan time limit, and the credit risk factors that affect corporate loans have cash dynamic asset-liability ratio, equity ratio, shareholders’ equity ratio, the growth rate of total assets of four quantitative indicators and main management personnel quality and management level of qualitative indicators. The estimated Logistic model was tested with confirmation samples. When the cut-off point is set to 0.5, the overall correct rate of credit risk measurement for natural persons and enterprises is 84.9% and 88%, respectively. When the cut-off point is set at 0.7, the overall accuracy is 89.2%. In general, the results of credit risk measurement of bank customers by the Logistic model are quite satisfactory. The Logistic Regression model is easy to understand and efficient, so it is worth popularising and putting it into practice in commercial banks in China.

