Acceso abierto

Application of Logical Regression Function Model in Credit Business of Commercial Banks


Cite

Introduction

In recent years, with the reform of China’s financial system, the opening of the financial industry to the outside world, the continuous entry of foreign banks, the emergence of financial innovation and the diversification of banking business, China’s banking industry is facing a huge challenge to participate in international competition. In this situation, China’s commercial banks should learn from the international advanced credit risk management experience, research the credit risk management technology suitable for China’s banks, to narrow the gap with other countries’ banking industry and improve the overall competitiveness of China’s banking industry.

Due to the asymmetric distribution of credit risk and the difficulty in data collection, it is very difficult to measure credit risk, which is also the reason why the research of credit risk management model lags behind the market risk management model. In recent years, the research on credit risk measurement models by western banks and other institutions has improved this situation. The study of these measurement models has a great impact on the bank’s credit risk management, which makes the bank’s credit risk management from qualitative to quantitative leap. However, the current research on credit risk measurement in China is more about the application of various methods proposed by the West, to evaluate the advantages and disadvantages of various methods, thus providing a good reference for the practice of credit risk of commercial banks.

This research paper deals with the commercial banks’ credit risk management based on a comparative study of the traditional and modern credit risk measurement methods and analyses enterprise and individual credit risk using the Logistic Regression model. We try to establish a suitable credit risk measurement method for China’s banks, and put forward some suggestions for management to improve our country’s commercial bank credit assets quality, and thereby reduce credit risk, which has a practical significance.

China’s commercial banks face many forms of financial risks in the process of business operation and management, and credit risk is always the main form of bank risk. It not only has an important impact on the operation safety of commercial banks but also affects the stability of the whole financial system of a country and even the stability and coordinated development of the global economy due to the domino effect. A risk assessment system becomes the key to solving the problem. Xie et al. reviewed the credit risk indicators of supply chain finance with the method of literature induction and constructed the credit risk assessment index system of online supply chain finance of commercial banks [1]. With the development of technology, the application of the Logistic Regression model emerged as is the need of the hour.

The Logical Regression model has been developing in the research of credit risk. Lan et al. built a more scientific credit risk evaluation system through the method of step-based regression based on the actual data of rural banks in China and constructed a calculation model of default probability with the Logistic Regression model [2]. As et al. put forward the methodological hypothesis of the Logistic Regression Model and illustrated the method of using this research method to evaluate family financial decisions [3]. Chen et al. adopted the Logistic Regression method and took the borrower’s default situation as the explanatory variable to establish the borrower’s credit default risk assessment model [4]. Assef et al. used artificial neural networks (ANNS), especially multi-layer perceptrons (MLP) and radial basis functions (RBF), as well as Logistic regression (LR) statistical models to analyse the bank credit status of legal persons (non-default, default and temporary default) to assist analysts in this field in making decisions [5]. Teles et al. compared the efficiency of logistic regression and linear regression in predicting whether the credit business needs recovery [6]. Jun et al. made use of bank business data and machine learning model to realie the credit anti-fraud prediction model based on a logistic regression algorithm [7]. Using the multiple logistic regression method, Zhou et al. discussed the factors affecting online consumer financial credit and countermeasures to improve its classification efficiency, providing theoretical support for improving its risk control system [8]. Xia et al. proposed a credit rating method combining the XGBOOST algorithm with the Logistic Group-Lasso model [9]. Hirk et al. used two different connection functions to evaluate the creditworthiness of a company by assuming the multivariate normal distribution and multivariate logistic distribution of the underlying variables of ordinal number results [10].

By using the credit data provided by commercial banks and combining it with the Logistic Regression model, this paper makes an empirical study on the credit risk of enterprise loans and points out the factors that affect the credit risk of enterprise loans.

Research Methods

According to the information provided by the five-level classified information system of enterprise loan of a commercial bank in the province, this paper randomly selects the data of one enterprise as the forecast sample and uses it to estimate the Logistic model. To facilitate the research, this paper classifies the normal and concerned enterprises as good customers, and the happy, suspicious and loss-related enterprises as bad customers. Of the 150 forecast samples,103 are good customers and 47 are bad customers.

According to the information provided by the enterprise to the bank, I selected the quantitative indicators reflecting the solvency, operation ability, profitability and development ability of the enterprise and the qualitative indicators of the quality and management level of the enterprise’s managers as the independent variables of the model. The details of these indicators are shown in Table 1.

Properties of independent variables

The independent variables Nature Definitions or thresholds
Short-term solvency indicators Current ratio X1 Continuous variables Current assets ÷ Current liabilities
Quick ratio X2 Continuous variables Quick assets ÷ Current liabilities
Cash flow liability ratio X3 Continuous variables Net operating cash flow ÷ Current liabilities
Indicators of long-term solvency Asset-liability ratio X4 Continuous variables The total amount of liabilities ÷ The total assets of
Equity ratio X5 Continuous variables Liabilities ÷ Owner’s equity
Shareholders’ equity ratio X6 Continuous variables Owner’s equity ÷ assets
Interest coverage multiple X7 Continuous variables (Total profit + interest expense) ÷ Interest charges
Operational capability index Accounts receivable turnover X8 Continuous variables Net income from main business ÷ Average balance of accounts receivable
Deposit turnover X9 Continuous variables Main business cost ÷ Mean surviving balance
Shareholder’s equity turnover X10 Continuous variables Main business revenue ÷ Average balance of shareholders’ equity
Total asset turnover X11 Continuous variables Main business revenue ÷ Average balance of assets
Profitability index Net profit rate on sales X12 Continuous variables Net profit ÷ Main business revenue
Return on assets X13 Continuous variables Net profit ÷ Average balance of assets
Return on shareholders’ equity X14 Continuous variables Net profit ÷ Average balance of shareholders’ equity
Development Capability Indicators Growth rate of net assets X15 Continuous variables (Owners’ equity at year end-Owners’ equity at the beginning of the year) ÷Owners’ equity at the beginning of the year
Growth rate of total assets X16 Continuous variables (Total assets at year-end-Total assets at the beginning of year) ÷ Total assets at the beginning of year
Main business revenue growth rate X17 Continuous variables (Year-end main operating income-Main operating revenue at the beginning of the year) ÷Main operating revenue at the beginning of the year
Growth rate of after-tax profits X18 Continuous variables (Net interest rate at year end-Net interest at the beginning of the year) ÷Net interest at the beginning of the year
Qualitative indicators Quality of key management personnel X19 Ordered class variables “The leaders of the enterprise have rich management experience, strong management ability, remarkable historical performance and good social reputation for personal tourism”=5 ; “Corporate leaders have strong management ability and good management experience”=4; “Business leaders have strong management ability and certain management experience”=3 ; “Business leaders have general management ability and experience”=2; “other”=1
Enterprise management level X20 Ordered class variables “The property right is clear, the organization structure is perfect, the property system is sound”=3; “general”=2; “other”=1
Results analysis and discussion
Establishment of model

The purpose of this paper is to establish a Logistic model to judge the credit risk of an enterprise based on the financial data provided by the enterprise and some qualitative indicators to measure the nature of the enterprise, which is represented by the probability P-value. The cut-off point selected is 0.5, that is, when the value is greater than 0.5, the default probability of the enterprise is small and the enterprise is considered as a good customer, and the code of the explained variable is y=1; when the p-value is less than 0 and.5, the default probability of the enterprise is large and the enterprise is considered as a bad customer. In this case, the code of the explained variable is y=0. The explanatory variables are the 20 indicators in the table. The Logistic Regression model thus established is as follows:

LnP1P=α+β1X1+β2X2++β19X19+β20X20 \operatorname{Ln} \frac{P}{1-P}=\alpha+\beta_{1} X_{1}+\beta_{2} X_{2}+\ldots+\beta_{19} X_{19}+\beta_{20} X_{20}

Where P is the probability that the enterprise is a good customer. Square Enter method was adopted for variable selection, and SPSS13.0 software was used for Logistic Regression analysis of the data of 150 predicted samples. The results are shown in Table 2:

Block 1: Method=Enter

B S.E. Wald df Sig Exp(B)
Step 1a χ1 634.012 444.323 2.036 1 .154 2E+275
χ2 -976.358 654.229 2.227 1 .136 .000
χ3 9.901 10.752 .848 1 .357 19959.659
χ4 -6.869 20.664 .110 1 .740 .001
χ5 -.700 .468 2.237 1 .135 .496
χ6 15.918 17.997 .782 1 .376 8188569
χ7 .012 .023 .258 1 .612 1.012
χ8 -.365 3.199 .013 1 .909 .694
χ9 -4.079 12.464 .107 1 .743 .017
χ10 -.051 .195 .069 1 .792 .950
χ11 421.878 335.115 1.585 1 .208 2E+183
χ12 5.475 9.537 .330 1 .566 238.710
χ13 21.546 13.779 2.445 1 .118 2E+009
χ14 2.252 4.017 .314 1 .575 9.509
χ15 -.022 .156 .020 1 .887 .978
χ16 17.818 9.660 3.402 1 .065 5E+007
χ17 -2.940 3.013 .952 1 .329 .053
χ18 -.069 .064 1.173 1 .279 .933
χ19 4.692 2.145 4.783 1 .029 109.041
χ20 1.156 .945 1.495 1 .221 3.176
Constant -14.618 18.057 .655 1 .418 .000

As the method of forcing all independent variables into the model is adopted for variable selection, it can be seen from the above table that the Wald value of each variable is small, and the corresponding probability P-value is almost all greater than the significance level of 0.05, which means that the null hypothesis should not be rejected and the linear relationship between these independent variables and Logit P is not significant. Because the model contains insignificant explanatory variables, it is necessary to filter the variables and build a new model.

Variable Screening

Here, the Forward(LR) stepwise screening strategy is adopted, that is, variables are entered into the equation according to the score test statistics, and excluded from the equation according to the likelihood ratio chi-square under the principle of maximum likelihood estimation. The standard for adding the independent variable is set at 0.05, and the standard for deleting the independent variable is set at 0.1, that is, the probability value corresponding to the F value when the independent variable is automatically excluded from the model is 0.1. The running results are as follows (Table 3).

Block 1: Method=Forward Stepwise

B S.E. Wald df Sig Exp(B)
Step 1a χ19 1.331 .238 31.148 1 .000 3.783
Constant -2.179 .498 19.163 1 .000 .113
Step 2b χ6 8.078 1.784 20.498 1 .000 3222.223
χ19 1.505 .326 21.337 1 .000 4.502
Constant -5.303 1.058 25.140 1 .000 .005
Step 3c χ6 8.612 1.933 19.840 1 .000 5496.420
χ13 2.575 1.045 6.069 1 .014 13.132
χ19 1.565 .344 20.679 1 .000 4.781
Constant -5.623 1.131 24.707 1 .000 .004
Step 4d χ6 8.742 2.100 17.327 1 .000 6261.249
χ13 3.053 1.147 7.092 1 .008 21.181
χ19 1.414 .346 16.708 1 .000 4.113
χ20 1.407 .476 8.726 1 .003 4,084
Constant -8.125 1.714 22.461 1 .000 .000
Step 5e χ3 9.582 3.431 7.799 1 .005 14507.807
χ6 10.112 2.551 15.718 1 .000 24542.686
χ13 2.996 1.296 5.343 1 .021 20.014
χ19 1.442 .367 15.439 1 .000 4.228
χ20 1,807 .565 10.215 1 .001 6.090
Constant -10.309 2.299 20.112 1 .000 .000
Step 6f χ3 9.069 4.046 5.025 1 .025 6684.620
χ6 11.634 3.498 11.058 1 .001 112847.4
χ13 3.484 1.995 3.051 1 .081 32.594
χ16 6.081 2.339 6.760 1 .009 437.644
χ19 1.600 .439 13.263 1 .000 4.952
χ20 -1.441 .567 6.466 1 .011 4.226
Constant -10.717 2.602 16.962 1 .000 .000
Step 7f χ3 9.476 3.823 6.145 1 ..013 13046.413
χ6 11.407 3.426 11.087 1 .001 89911.947
χ16 6.297 2.295 7.530 1 .006 543.072
χ19 1.569 .427 13.501 1 .000 4.803
χ20 1.409 .552 6.519 1 .011 4.092
Constant -10.587 2.567 17.006 1 .000 .000
Step 8g χ3 10.142 4.362 5.405 1 .020 25399.128
χ5 -.276 .177 2.437 1 .119 .759
χ6 10.080 3.100 10.573 1 .001 23855.343
χ16 8.276 2.703 9.377 1 .002 3927.887
χ19 2.111 .596 12.547 1 .000 8.259
χ20 -1.404 .590 5.664 1 .017 4.071
Constant -10.562 2.787 14.365 1 .000 .000

The table shows each step at which the variable has been filtered. It can be seen that the model has gone through an 8-step process of variable screening, and the independent variables that are finally included in the model are cash flow liability ratio X3, property right ratio X5, shareholder equity ratio X6, the growth rate of total assets X16, quality of main managers X19 and enterprise management level X20. In addition to the property of the six, since the twin mass ratio of Wald value card square value is less than the significance level of 0.05, 3.841, and for the rest of the five variables the Wald statistic was significant, the corresponding probability value is less than the significance level of 0.05, shows that cash flow asset-liability ratio and shareholders’ equity ratio, growth rate of total assets, the main management personnel quality and enterprise management level of all the five of them have a significant linear relationship with Logit P. As for the significant contribution of the ratio of property rights to the default of enterprises, this paper will test it in the end.

The table shows the step-by-step, grouping and chi-square values of the model and the corresponding probability values after each step of variable screening. It can be seen that the chi-square value of the model in the final step is 147.042, and the corresponding probability value is 0, which is less than the significance level of 0.05. It shows that the independent variable of the model can explain the dependent variable well, and the information provided by the independent variable can help to predict the occurrence of the event better.

Table 5 shows the indicators that reflect the goodness of fit of the model. After the screening of variables, the Nagelkerke R2 statistic value corresponding to the model finally obtained was 0.878, which was close to 1, and the Hosmer-Lemeshow unified measurement value was 2.185, which was less than the degree of freedom value of 8, and the significance level was 15.51 of the χ2 distribution value corresponding to 0.05, indicating that the statistic was not significant. Therefore, the final model fits the data well and has a high degree of goodness of fit.

Model Summary

-2 Log Cox&Snell Nagelkerke
Step likelihood R Square R Square
1 129.593a .316 444
2 77.879b .515 724
3 74.269c .527 740
4 63.951c .558 785
5 53.695d .588 826
6 45.133d .610 858
7 46.403d .607 853
8 39.481d .625 878

Hosmer-Lemeshow Test

Step Chi-square df sig
1 7.222 3 .065
2 9.985 8 .266
3 5.324 8 .722
4 1.488 8 .993
5 1.293 8 .996
6 2.442 8 .964
7 2.630 8 .955
8 2.185 8 .975

Omnibus tests of Model Coefficients

Chi-square df sig
Step 1 Step 140.119 5 .000
Block 140.119 5 .000
Model 140.119 5 .000

χ2 (model with equity ratio) -χ2 (model without equity ratio) =147.042-140.119=6.923

Prediction of the model when the cut-off point is 0.5

Custom percentage
Custom Bad customer Good customer Correct
Bad customer 18 6 75.0
Good customer 4 55 93.2
Overall percentage 88.0

The prediction of the model when the cut-off point is 0.7

Custom percentage
Custom Bad customer Good custome Correct
Bad customer 21 3 87.5
Good custome 6 53 89.8
Overall percentage 89.2

The property right ratio index is tested. When the property right ratio index is not included in the model, the Chi-square value of the model is as follows:

The difference in degrees of freedom is only 6-5=1. It can be seen that the difference between the model chi-square values of the two models is greater than the distribution value of 3.84 when the degree of freedom is 1 and the significance level is 0.05. Therefore, we reject the original assumption that the ratio of property rights has no significant effect on whether an enterprise defaults, that is, the ratio of property rights has a strong effect on whether an enterprise defaults.

Thus, the estimated Logistic model is as follows:

LnP1P=10.562+10.142X10.0276X2+10.08X3+8.276X4+2.111X5+1.404X6 \operatorname{Ln} \frac{P}{1-P}=-10.562+10.142 X_{1}-0.0276 X_{2}+10.08 X_{3}+8.276 X_{4}+2.111 X_{5}+1.404 X_{6}

Where P is the probability that the enterprise does not default. The independent variables are cash flow liability ratio, property right ratio, shareholder equity ratio, total assets growth rate, main management personnel quality and enterprise management level.

Model test

The estimated model is tested with 83 randomly selected prediction samples, and the cut-off point is still 0.5. The test results are as follows.

As can be seen from the table, among the 24 actual bad customers, the model correctly identified 18 people and wrongly identified 6 people, with a correct rate of 75%. Among the 59 good customers, the model correctly identified 55 people and wrongly identified 4 people, with a correct rate of 93.2%. The prediction accuracy of the model is 88%. If the cut-off point is set as 0.7, that is, the predicted probability value is greater than 0.7, then the classified predicted value of the explained variable is considered to be 1, and the customer is a good customer. If the customer is less than 0.7, then the customer is considered to be a bad customer with a high probability of default. The results of testing with the estimated model are shown in the table below.

Omnibus tests of Model Coefficients

Chi-square df sig
Step 1 Step 56.930 1 .000
Block 56.930 1 .000
Model 56.930 1 .000
Step 2 Step 51.714 1 .000
Block 108.644 2 .000
Model 108.644 2 .000
Step 3 Step 3.609 1 .057
Block 112.253 3 .000
Model 112.253 3 .000
Step 4 Step 10.319 1 .001
Block 122.572 4 .000
Model 122.572 4 .000
Step 5 Step 10.256 1 .001
Block 132.828 5 .000
Model 132.828 5 .000
Step 6 Step 8.562 1 .003
Block 141.390 6 .000
Model 141.390 6 .000
Step 7 Step -1.271 1 .250
Block 140.119 5 .000
Model 140.119 5 .000
Step 8 Step 6.922 1 .009
Block 147.042 6 .000
Model 147.042 6 .000

Fig. 1

The prediction accuracy of the model when the cut-off point is 0.5

In the table, when the cut-off point is set to 0.7, among the 24 actual bad customers, the model correctly identifies 21 and wrongly identifies 3, with a correct rate of 87.5%. Among the 59 good customers, the model correctly identified 53 people and wrongly identified 6 people, and the correct rate was 89.8%. The prediction accuracy of the model is 89.2%.

Fig. 2

The prediction accuracy of the model when the cut-off point is 0.7

In summary, when the cut-off point is 0.5, the overall prediction accuracy of the estimated model is 88%. When the cut-off point is 0.7, the overall prediction accuracy is 89.2%. The prediction result of the model is satisfactory.

Conclusion

In this paper, the Logistic model is used to quantitatively analyse the loan credit risks of natural persons and enterprises, to get the factors that affect the credit risks of customers. For natural person loan credit risk measurement factors are gender, educational level, health status, monthly savings, household registration, the types of loans and loan time limit, and the credit risk factors that affect corporate loans have cash dynamic asset-liability ratio, equity ratio, shareholders’ equity ratio, the growth rate of total assets of four quantitative indicators and main management personnel quality and management level of qualitative indicators. The estimated Logistic model was tested with confirmation samples. When the cut-off point is set to 0.5, the overall correct rate of credit risk measurement for natural persons and enterprises is 84.9% and 88%, respectively. When the cut-off point is set at 0.7, the overall accuracy is 89.2%. In general, the results of credit risk measurement of bank customers by the Logistic model are quite satisfactory. The Logistic Regression model is easy to understand and efficient, so it is worth popularising and putting it into practice in commercial banks in China.

eISSN:
2444-8656
Idioma:
Inglés
Calendario de la edición:
Volume Open
Temas de la revista:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics