In the Internet era, the convenience of new Internet wealth management products compared with homogeneous commercial bank wealth management products has made many consumers more willing to transfer funds into them. Among the bank's past customer groups, a small number of people trust the traditional bank wealth management business more. These customers are loyal customers of the bank, but the bank cannot obtain sufficient income from these customers alone. After a bank launches a new wealth management product, predicting whether the customer is willing to buy is the core of the competition among banks for wealth management products. Customer-oriented product marketing has become a general trend [1]. Classifying the target customers of financial products to achieve precise marketing of personal wealth management products will qualitatively improve the competitiveness and brand image of the bank.

In the traditional consulting industry, when the bank has customer selection data, the usual processing method is to use the customer's selection results and the selected related indicators to directly do a simple correlation analysis, and use the correlation results to judge whether our selected indicators and the relevance of the selection results, but the selection results are discrete, and some indicators are not numerical, so the unreasonableness of this method of directly using correlation to make judgements is obvious [2]. Moreover, correlation analysis can only determine whether there is a connection, but cannot make a distinction.

Therefore, to accurately and quickly determine the financial products that customers will choose, we often consider more scientific and reasonable methods, such as classification algorithms in data mining and discrete choice measurement models that have better performance on selected data. However, in previous studies, many scholars have chosen data mining algorithms to study classification problems, but have neglected the application of traditional discrete choice measurement models in related problems. Therefore, this paper intends to use the combined forecasting model method to help banks classify the customers of financial products.

In forecasting practice, different forecasting methods are typically used for the same problem. According to system theory, the object of prediction can be a complex social system or economic system, composed of interrelated and mutually restrictive elements [3]. Generally, a single prediction model can only provide corresponding effective information for prediction from a certain angle, while ignoring the effective information provided by other angles. Therefore, the shortcomings of the single forecasting method are manifested in the insufficient information source, and the single forecasting model is also affected by the model setting form. If you simply discard some methods with larger prediction errors, you will lose some useful information. This approach is a waste of information and should be avoided. A more scientific method is to combine different single forecasting methods to form a combined forecasting method. From the perspective of information utilisation, combined forecasting can comprehensively use the information provided by various individual forecasting methods to obtain a suitable weighted average combined forecasting model [4]. Therefore, the combined prediction method can reduce the incompleteness of information and improve the prediction accuracy more effectively.

American scholar Schmitt once used a combined forecasting method to predict the population of 37 cities in the United States, which enhanced the accuracy of the forecast [5]. Bate and Granger conducted systematic research on the combined forecasting method, and their research results have attracted the attention of forecasting scholars [6]. In the 1970s, research on combination forecasting was paid more attention to by forecasters, and a series of papers on combination forecasting was published. In 1989, the authoritative academic journal –

Among data mining classification algorithms, decision tree algorithms and naive Bayes classification algorithms are the most commonly used. Decision tree algorithm has the characteristics of easy understanding and strong applicability of model data. At the same time, feasible and effective results can be made on large amounts of data in a relatively short period of time. In the naive Bayes algorithm, few parameters are required, and it is not sensitive to missing data. In the discrete choice measurement model, considering that the data in this study are binary choice data, the ordinary traditional binary logit model is chosen.

The research data comes from the UCI (University of California, Irvine) database. The data are all standard test set data. The classification data of bank marketing are selected. The data set comes from a set of bank's fixed deposit marketing activities launched by a financial institution in Portugal. The main method of conducting investigations is telephone interviews. The content of the survey includes 17 variables including age, occupation, marital status, education level and annual account balance. Among them, the last variable is whether the fixed deposit product has been ordered, which is also the target variable of this research.

The bank data are selected, and the bank data are a random sample of 10% of the data set in bankfull, with a total of 4,521 data. The descriptive analysis of each variable is performed in the R language. The variable y is the target variable. Therefore, we prepare the frequency of y and the frequency table separately. The number and frequency of target variables are shown in Table 1:

Number and frequency of target variables

Number | 4000 | 521 |

Frequency | 0.87629 | 0.12371 |

Since there are 4,521 pieces of total data, 3,000 pieces of data are selected as the training set data for model fitting. The specific results of the decision tree C5.0 model obtained are shown in Figure 1.

It can be ascertained from Figure 1 that the three variables of poutcome, duration and month have a great influence on the selection result of y. Therefore, we focus on the content of these three variables. In the entire decision tree model, we found that there are also variables – namely, marital, contact, job, age and day – that play a decision-making role. Therefore, we can start from the variables in the decision tree and judge whether the customer will choose the financial product according to results of the decision tree C5.0 model.

To verify the predicting accuracy of the decision tree C5.0 model, the previously selected test data are used as a new sample for fitting, and the prediction result is compared with the real result to obtain the fitting error, which is used to illustrate the accuracy of the model. Table 2 shows forecast results of decision tree C5.0 model in R.

Forecast results of decision tree C5.0 model

Actual value | Item | No | Yes |

No | 1265 | 80 | |

0.8317 | 0.0526 | ||

Yes | 101 | 75 | |

0.0664 | 0.0493 |

Table 2 shows that the probability of correct prediction is 0.8810 and the probability of wrong prediction is 0.1190. Therefore, it can be considered that the fitting result of the decision tree model is better, and this decision tree model can be used for customer classification of financial products.

The naive Bayes classification algorithm is based on Bayes’ theorem and the assumption of conditional independence of features. Compared with the decision tree, the naive Bayes algorithm considers the prior probability and belongs to an active learning algorithm model [15]. Similar to the decision tree model, the naive Bayes model has a wide range of data processing and has no exact requirements for the type of data. However, the naive Bayes algorithm model is also supervised, and thus it needs to build the model by adopting the training set, and test the accuracy of the model by using the testing set [16].

First, the original data set is randomly divided into a training set and a testing set. To eliminate the influence of the amount of data set on the model results, the data set allocation when fitting the decision tree C5.0 model will be used. That is, 3,000 pieces of data are included in the training set, and 1,521 pieces in the testing set. The training set is named banktrain.

The analysis of the results of the model is carried out in the R language, and the conditional probability of each variable is obtained. Tables 3 and 4 show the conditional probability results of the variables job and marital, respectively.

Conditional probability of variable job with respect to y

no | 0.124747 | 0.236519 | 0.05242427 | 0.039275 | 0.222744 | 0.059312 |

yes | 0.107587 | 0.137071 | 0.03633403 | 0.033877 | 0.262378 | 0.110044 |

job | Self-employed | Services | Student | Technician | Unemployed | Unknown |

no | 0.044537 | 0.103084 | 0.02230744 | 0.17572 | 0.035457 | 0.014793 |

yes | 0.045275 | 0.08213 | 0.04281803 | 0.177953 | 0.035447 | 0.020705 |

Conditional probability of variable marital with respect to y

no | 0.105948 | 0.628804 | 0.2581084 |

yes | 0.134266 | 0.544586 | 0.2964283 |

After considering the conditional probabilities of all variables in the training set, the establishment of naive Bayes model is completed. Next, 1,521 pieces of data in the test set are used to test the naive Bayes classifier and observe the test results. The predicted value is compared with the true value to get the prediction error result of the naive Bayes model (Table 5).

The prediction error of the Naive Bayes model

Actual value | Item | No | Yes |

No | 1312 | 121 | |

0.8626 | 0.0796 | ||

Yes | 36 | 52 | |

0.0237 | 0.0341 |

Table 5 shows that the probability of correct prediction is 0.8968 and the probability of wrong prediction is 0.1032. Therefore, the naive Bayesian model can be considered to have a better fitting result. It can be used to categorise whether the customer will buy this wealth management product.

In the decision tree C5.0 model and the naive Bayes model, all variables are selected for model fitting. Since the data corresponding to the banklogittrain data are random data, and each piece of data corresponds to the specific information of a customer, to maintain the integrity of the data information and the particularity of the individual, in the data set given by the database, the same information has been removed Heavy. We use 0–1 dummy variable coding to eliminate collinearity within the same category. Therefore, all variables are selected to fit the model.

We choose the most traditional binary logit model to model the data and establish a binary logit model. In SAS, the maximum likelihood estimation method is used to estimate parameters. Since only six variables in the original data set are numerical variables, the correlation of these six variables is considered. We calculate the correlation coefficient matrix of these six variables, as shown in Table 6.

Correlation coefficient matrix

age | 1 | |||||

balance | 0.09807 | 1 | ||||

duration | −0.0119 | −0.01248 | 1 | |||

campaign | −0.0057 | −0.00179 | −0.06178 | 1 | ||

pdays | −0.00167 | −0.00114 | 0.0095 | −0.07743 | 1 | |

previous | −0.00415 | 0.0316 | 0.02304 | −0.04988 | 0.46122 | 1 |

Table 6 shows that none of the correlation coefficients exceeds 50%. Therefore, there is no multicollinearity in the data; and thus, the data can be directly modelled.

The training set is coded according to the coding rules, and a new training set, suitable for the logit model and named banklogittrain, is formulated. The banklogittrain data set is input into SAS software to establish a standard binary logit model. Table 7 shows the estimated model parameters.

Parameter estimation table of binary logit model

Intercept | 1 | −2.314 | 0.8532 | 6.473 | 0.009168 |

Age | 1 | −0.0026 | 0.0052 | 0.0169 | 0.903568 |

job_unemployed | 1 | −1.4098 | 0.6654 | 3.8488 | 0.048068 |

job_admin | 1 | −0.5077 | 0.5393 | 0.7791 | 0.376468 |

job_blue | 1 | −1.136 | 0.5337 | 3.7477 | 0.051168 |

job_ent | 1 | −1.1421 | 0.6166 | 2.9252 | 0.085568 |

job_house | 1 | −1.2781 | 0.6543 | 3.2745 | 0.068668 |

job_management | 1 | −0.7238 | 0.5164 | 1.6568 | 0.196568 |

job_retired | 1 | 0.0871 | 0.5514 | 0.0124 | 0.920068 |

job_self | 1 | −0.5975 | 0.5811 | 0.9299 | 0.333768 |

job_services | 1 | −0.7431 | 0.5557 | 1.5302 | 0.214668 |

job_stu | 1 | 0.1575 | 0.6266 | 0.0385 | 0.848368 |

job_tech | 1 | −0.7891 | 0.5221 | 1.9194 | 0.164368 |

marital_married | 1 | −0.5847 | 0.1337 | 9.2815 | 0.000468 |

marital_single | 1 | −0.4646 | 0.1735 | 4.1719 | 0.039368 |

eduction_primary | 1 | 0.1574 | 0.3349 | 0.1089 | 0.742768 |

eduction_secondary | 1 | 0.1036 | 0.297 | 0.0476 | 0.830468 |

education_tertiary | 1 | 0.4284 | 0.3136 | 1.1181 | 0.289068 |

default | 1 | 0.2302 | 0.5353 | 0.1172 | 0.733368 |

balance | 1 | −0.0251 | 0.0063 | 0.4723 | 0.491368 |

housing | 1 | −0.5106 | 0.0866 | 12.2303 | 0.000019 |

loan | 1 | −0.8545 | 0.1784 | 12.8871 | 0.000028 |

contact_cell | 1 | 1.1527 | 0.1621 | 24.172 | 0.000045 |

contact_phone | 1 | 1.2957 | 0.2712 | 14.0878 | 0.000162 |

day | 1 | 0.02871 | 0.0023 | 0.0656 | 0.800268 |

month | 1 | 0.0492 | 0.0004 | 0.7136 | 0.397368 |

duration | 1 | 0.03083 | 0.0026 | 291.6091 | 0.000017 |

campaign | 1 | −0.0218 | 0.0024 | 2.4094 | 0.118968 |

pdays | 1 | −0.0264 | 0.0015 | 0.0091 | 0.935068 |

previous | 1 | −0.0011 | 0.0462 | 0.3194 | 0.571768 |

poutcome_unknown | 1 | −0.9279 | 0.3301 | 5.7796 | 0.014368 |

poutcome_failure | 1 | −0.4032 | 0.2436 | 1.917 | 0.164668 |

poutcome_success | 1 | 1.8464 | 0.3088 | 23.4616 | 0.000026 |

In the variable job, except for the types of retired and student, the other types have negative intentions for purchasing financial products. We assess the actual situation to be that the retired population and the student population are more interested in financial products than other populations due to comparatively lesser pressures faced in life. Among the variables married, only the divorced people are interested in financial products, because the divorced people are more concerned about the appreciation of property. In the education variable, the higher the level of education, the higher the acceptance of financial products. In the variables of default, balance, housing and loan, if a customer's credit card is in arrears, it will promote customers to choose this financial product. In the contact variable, the last contact has a positive effect on the selection. In the poutcome variable, that is, the result of the last selection shows that the person who chose this wealth management product last time will be more inclined to choose this product this time.

Then, the model is:

Test results of binary logit model

Test the global null hypothesis: BETA = 0 | |||
---|---|---|---|

Test | Chi-square | df | Pr > Chi-square |

Likelihood ratio | 662.5263 | 32 | <0.0001 |

Score | 768.684 | 32 | <0.0001 |

Wald | 395.6258 | 32 | <0.0001 |

Then, 1,521 pieces of data in the test set are used to test the binary logit model and observe the test results. The result of the probability of misjudgement can be obtained from the comparison between the predicted value and the true value (Table 9).

The prediction results of the binary logit model

Predictive value | |||
---|---|---|---|

Actual value | Item | no | yes |

No | 0.8376 | 0.0907 | |

Yes | 0.0322 | 0.0394 |

Table 9 shows that the probability of correct prediction is 0.8771 and the probability of wrong prediction is 0.1229. The fitting result of the binary logit model is better, and the model can be used to classify whether the customer will choose the financial product.

Combination prediction is to combine the prediction results of different models, comprehensively use the information of the prediction results of the previously established model and obtain the comprehensive prediction results in an appropriate weighted average manner [17]. There are three classification selection models established for the data set of the same customer's choice of wealth management products, namely the decision tree C5.0 model, the naive Bayes model and the binary logit classification model. After the three models are established, the accuracy and misjudgement probability of the three models are reckoned, as shown in Table 10.

Model results of three classification models

Model | Accuracy | Probability of misjudgement |
---|---|---|

Decision Tree C5.0 Algorithm | 0.8810 | 0.1190 |

Naive Bayes Classification Algorithm | 0.8968 | 0.1032 |

Binary logit model | 0.8771 | 0.1229 |

Among the three models, the naive Bayes model has the highest accuracy, reaching 89.68%. In this section, the main problem we solve is the means to determine the weights of the three classification models in the combined model through a suitable weight determination method. Our expectation is to finally obtain a suitable combination forecasting model.

In the general method of solving the problem of combined forecasting, a comprehensive weight method is usually used to combine the conclusions of each single model. In considering the weight, it is usually necessary to set a certain target index such that the target index is the smallest or the largest to determine the weight [18]. The following are several commonly used weight determination methods.

Generally speaking, we need to minimise the sum of squares of the probability of misjudgement of the combined model [19]. The objective function is
_{j}_{j}

When solving _{j}^{−1}^{r}

Under the least squares criterion, the weight may be negative, which is often inconsistent with the actual situation. Therefore, under the condition that the constraint weight is a positive value, according to the characteristics of the model in this article, the following four positive weight determination methods are adopted.

Arithmetic average method, also called equal weight average method, is a weighting method that equalises the weights of all models. The characteristic of the arithmetic weighted average method is that all models are regarded as equally important and given the same weight. Usually, it is used when the importance of each model is not understood and there is not much difference between the misjudgement probability of each model [22]. This method is simple to calculate, and the weight automatically meets the non-negative condition. Therefore, it is widely used in various fields in real life.

The method of calculating the weight by the reciprocal variance method is to start from the sum of squares of the errors of each model, and it calculates by using the reciprocal of sum of squared residuals of each model [23]. A higher weight is assigned to a model with a small error. Since the sum of squared residuals is always positive, the weight is also always positive. The weight calculation formula is:
_{j}

The weight of the reciprocal mean square method [24] is determined as shown in Eq. (10):
_{j}

The weighting formula of the simple weighting method is shown in Eq. (10). In this formula, _{j}

Since the data are originally classified data, combined with model features, the scalar in the principle of least squares is improved. Since it is impossible to calculate the misjudgement probability for a single piece of data (the result of a single customer's behaviour selection), the misjudgement probability of a single piece of data in the original method is directly regarded as the misjudgement probability of the overall model. That is, _{ij}_{i}^{τ}e_{j}_{0} is the minimum error, _{0} is the weight):
_{1} + 2.33 × _{2} − 5.33 × _{3}.

Among them, _{1}, _{2} and _{3} are the results of decision tree C5.0 algorithm, naive Bayes model and binary logit model, respectively. The probability of misjudgement is

However, in the weight result calculated by the least square method, the weight of the binary logit model is negative, which does not conform to the actual situation. Therefore, the weighting method under the condition that the constraint weight is positive should be considered to weight the model to find the local optimal model [27].

In the previous article, we have introduced that the arithmetic averaging method treats all model results as equally important. The results of the three models have been discussed and weighted. Therefore, the weight of each model is set to 1/3, that is,

Since the model result is a 0–1 variable, the result judgement rule is: when

The results of the three models are weighted and judged according to the above-mentioned judgement rules, and the results obtained are shown in Table 11.

Forecast results of arithmetic average weighted model

Predictive value | |||
---|---|---|---|

Actual value | Item | no | yes |

no | 1315 | 39 | |

0.86456 | 0.0256 | ||

yes | 114 | 53 | |

0.0750 | 0.0348 |

Table 11 shows that the accuracy rate of arithmetic average weighted model is 0.8994, which is 89.94%, and the probability of misjudgement is 10.06%. Compared with the highest accuracy rate of a single model (89.68%), the accuracy is increased by 0.26%. Therefore, so far, the arithmetic average weighted model can be regarded as a local optimal model.

The reciprocal variance method is to assign a higher weight to a model with a lower probability of misjudgement [28]. The weight calculation rule is as in Eq. (9), and the weight of each model is calculated by this rule. Table 12 shows the weights of reciprocal variance of each model.

Weights of reciprocal variance

Decision Tree C5.0 Algorithm | 0.33 |

Naive Bayes Classification Algorithm | 0.36 |

Binary logit model | 0.31 |

Therefore, the combined model is: _{1} + 0.36 × _{2} + 0.31 × _{3}

Since the model result is a 0–1 variable, the result judgement rule is given. When

The three model results are weighted and judged according to the above-mentioned judgement rules. The results obtained after sorting and judgement are shown in Table 13.

The prediction results of the reciprocal variance weighted model

Actual value | Item | no | yes |

no | 1317 | 34 | |

0.8659 | 0.0224 | ||

yes | 122 | 48 | |

0.0802 | 0.0315 |

Table 13 shows that the correct rate of the equal weight model is 89.74%, and the probability of misjudgement is 10.26%. Compared with the highest correct rate of a single model (89.68%), the accuracy is slightly improved by 0.06%.

The calculation formula of the model based on the reciprocal mean method is shown in Eq. (10). In this paper, the calculated weights are derived using the model's misjudgement probability, as shown in Table 14.

Reciprocal mean square weight

Decision Tree C5.0 Algorithm | 0.33 |

Naive Bayes Classification Algorithm | 0.35 |

Binary logit model | 0.3 |

Therefore, the combined model is: _{1} + 0.35 × _{2} + 0.32 × _{3}.

Similarly, since the model result is a 0–1 variable, the result judgement rule is given. When

The prediction results of the reciprocal mean square weighted model

Actual value | Item | no | yes |

no | 1317 | 34 | |

0.8659 | 0.0224 | ||

yes | 122 | 48 | |

0.0802 | 0.0315 |

It can be found that the results are consistent with the reciprocal variance weighted model. The reason may be that their calculation methods are similar, and thus the weight difference is not big, resulting in consistent prediction results.

The weighting formula of the simple weighting method is shown in Eq. (11). We calculated the model weights according to the calculation rule of this formula, and the results are shown in Table 16.

Simple weighted weights

Decision Tree C5.0 Algorithm | 1/3 |

Naive Bayes Classification Algorithm | 1/2 |

Binary logit model | 1/6 |

Therefore, the combined model is:

Since the model result is a 0–1 variable, the result judgement rule is given. When

The calculated weights are used to weight the model, and the accuracy and probability of misjudgement obtained are shown in Table 17.

Prediction results of the simple weighted model

Actual value | Item | no | yes |

no | 1320 | 32 | |

0.8678 | 0.0210 | ||

yes | 126 | 43 | |

0.0828 | 0.0283 |

Table 17 shows that the correct rate of the equal weight model is 89.61%, and the probability of misjudgement is 10.38%. Compared with the highest correct rate of a single model (89.68%), the accuracy is slightly reduced by 0.07%. Therefore, for the simple weighted model, the accuracy is not improved compared to the accuracy of a single model. Therefore, the simple weighting model does not improve the accuracy.

In the above analysis, five commonly used weighting methods are used to weight the model. Without considering the weight sign, the least squares method was chosen. The weight is calculated with the minimum square of the probability of misjudgement as the objective function. Under the condition that the weight is limited to be positive, the commonly used arithmetic average weighting, reciprocal variance weighting method, reciprocal mean square weighting method and simple arithmetic weighting method are used to calculate the weight of the model [29]. Table 18 presents the final model conclusions obtained using the five commonly used weighting methods.

Comparison of results of each weighted model

Least squares method | 0.51% | 9.81% |

Arithmetic mean weighting | 10.06% | 0.26% |

Reciprocal variance weighting | 10.26% | 0.06% |

Reciprocal mean square weighted | 10.26% | 0.06% |

Simple weighting | 10.38% | −0.06% |

Note: The improved accuracy is compared with the accuracy of the best single model (0.8968).

Among them, the weight result calculated by the least square method shows that the weight of the binary logit model is negative, which does not conform to the actual situation of the empirical data. Therefore, the results of the least squares weighted model will not be adopted.

In the case of positive weights, the accuracy of the model based on the simple weighting method is less than that of a single model, and thus the results will not be adopted. Compared with the results of the arithmetic mean weighted model, the inverse variance weighted model and the inverse mean square weighted model, there is a relatively high accuracy. The algorithm weighted model has the highest prediction accuracy. Therefore, the final combination model is:

The result judgement rule is: when

This research studies the classification of a financial product customer of a certain financial institution in Portugal. The decision tree C5.0 algorithm, the naive Bayes classification algorithm and the binary logit model are used to conduct a single model of finance. Empirical research on product customer classification is carried out. In the empirical study of a single model, the naive Bayes model is the best, with an accuracy of 89.68%. Then, the least square method, arithmetic average method, reciprocal variance method, reciprocal mean square method and simple weighting method are selected for weight calculation, and thus we get five new combination models. Through the empirical analysis of the five combination models, it is concluded that in the model that uses the least squares weighting method to determine the weight, the weight appears negative, which does not conform to the actual situation. Therefore, the combined model is not adopted. The model accuracy of the simple weighting method is not as good as that of the single model, and thus the result is not adopted. After comparing the arithmetic mean weighted model with the reciprocal variance weighted model and the reciprocal mean square model, it is found that the misjudgement probability of the arithmetic mean weighted is 10.06%, which is less than the misjudgement probability (10.26%) of the reciprocal variance weighted model and the reciprocal mean square model, which is the arithmetic average. The weighted model is better than the reciprocal variance weighted model and the reciprocal mean square model, and the accuracy is higher, reaching 89.94%. Compared with a single model, the accuracy of the combined model is increased by 0.43%. Based on the results, it is found that the arithmetic average weighted model has the best results and is an ideal combined forecasting model.

The contribution of this paper is that it not only applies the data mining algorithm to the customer's financial product selection problem but also applies the discrete choice measurement model to the problem. The idea, which is to comprehensively consider the results of the classification algorithm and the discrete selection model, and assign appropriate weights to each model to form a new combined model, has a certain degree of innovation. However, in the weight calculation of the combined forecasting model, only five commonly used weight calculation methods are selected. Therefore, the optimal combination model obtained in this paper is only a local optimal model that is superior to the three individual classification methods. The model weights are established on the bank data set. If the model is applied to other data sets, the results may vary greatly.

