In order to accurately describe the risk dependence structure and correlation between financial variables, carry out scientific financial risk assessment, and provide the basis for accurate financial decision-making, first the basic theory of Copula function is established and the mixed Copula model is constructed. Then the hybrid Copula model is nested in a hidden Markov model (HMM), the risk dependences among banking, insurance, securities and trust industries are analysed, and the Copula–Garch model is constructed for empirical analysis of investment portfolio. Finally, the deep learning Markov model is adopted to predict the financial index. The results show that the mixed Copula model based on HMM is more effective than the single Copula and the mixed Copula models. The empirical structure shows that among the four major financial industries in China, the banking and insurance industries have strong interdependence and high probability of risk contagion. The investment failure rate under 95%, 97.5% and 99% confidence intervals calculated by Copula–Garch model are 4.53%, 2.17% and 1.08%, respectively. Moreover, the errors of deep learning Markov model in stock price prediction of Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519) and China Ping An Insurance (sh601318) are 2.56%, 2.98% and 3.56% respectively, which indicates that the four major financial industries in China have strong interdependence and risk contagion, so that the macro or systemic risks may arise, and the deep-learning Markov model can be adopted to predict the stock prices.

With the deepening of global economic integration, the financial markets of various countries and regions have become more closely related, and their correlation has become more obvious [1]. However, the economic globalisation has made the financial crisis more widespread, and financial crises in the local regions or certain countries often trigger global financial crises, such as the subprime crisis in the United States [2, 3]. Therefore, understanding and seizing the correlation between financial markets is of great significance for effectively avoiding the spread of financial crisis. In traditional correlation studies of financial variables, methods such as Pearson correlation coefficient, Spearman correlation and Granger causality test are often used, which are with great limitations [4,5,6]. Therefore, the Copula function has gradually become one of the main methods to explore the correlation between financial variables. Yet the situation is complex and changeable in the actual financial market, and there are close correlations among various enterprises or fields. The single Copula function adopted to explore correlations in financial markets can lead to large errors [7, 8]. Moreover, existing scholars can accurately describe the tail correlation between financial variables by capturing the linear combination of Copula functions with tail characteristics to form a mixed Copula function [9, 10]. Since there are many hidden states behind the yield rate in financial variables, special methods are required to speculate the hidden state. Hidden Markov model (HMM) is first applied in speech recognition and other fields, and then applied in finance and other fields [11, 12]. However, there are few studies on how to measure the portfolio risk by combining the Copula function with HMM.

To fill in the gap, the basic concept of the hybrid Copula model is analysed first, the hybrid Copula model is then nested in the HMM framework and applied to portfolio risk dependency and metric analysis, and the deep Markov model is adopted to predict the stock risk. To sum up, the results of this study aim to provide a theoretical basis for exploring the risk-dependent structure among financial variables in China so as to conduct accurate risk assessment.

The model based on Copula function can be adopted to study the correlation between variables and their characteristics. The binary Copula function is defined as follows.

Assuming that

Suppose that there are two unary distribution functions _{1}) = _{2}) =

The Archimedes Copula function contains multiple Copula functions, which can be expressed by their generative meta-functions [13].

Suppose there is function Ω, which can be defined as follows.
^{d} → [0, 1] can be defined as follows.
^{−1} () satisfies complete monotonicity

The common Archimedes Copula functions include the Gumbel Copula function, the Clayton Copula function and the Frank Copula function [14, 15]. The mathematical expressions of these three functions are as follows, respectively.
_{1} ∈ (0, 1], and the closer it approaches to 0, the closer it approaches to complete correlation; in Eq. (4), ^{−θ2} − 1 is the generator of Clayton Copula function, _{2} ∈ (0, ∞), and the closer it approaches to ∞, the closer it approaches to complete correlation;
_{3} ∈ (0, ∞), and the closer it approaches to ∞, the closer it approaches to complete correlation.

Then the Gumbel Copula, Clayton Copula and Frank Copula functions are combined to build a mixed Copula function, and the weight values of these three functions are _{1}, _{2}, _{3}. The mixed Copula function can be expressed as follows.
_{k}_{1}, _{2}, _{3}, _{1}, _{2}, _{3}} is the parameter set.

Then the density function of the mixed Copula function is as follows.

The state in HMM is hidden, which can be obtained by the observation sequence and transition probability matrix between states. The four parameters in HMM can be described as the number of states, distribution of initial probability, transition probability matrix and density function of observation sequence [16]. It is assumed that there are HM series {Xh} with _{i}_{h}_{h}_{h} |X_{h}

_{h}_{h} |X_{1} = _{1},···, _{h}_{−1} = _{h}_{−1}) = _{h}_{h} |X_{h}_{−1} = _{h}_{−1}),

_{h}_{h} |X_{1} = _{1},···, _{h}_{h},Y_{1} = _{1},···, _{h}_{−1} = _{h}_{−1}) = _{h}_{h} |Y_{h}_{h}

According to the above two properties, the theory of conditional probability is combined with the theory of probability in mathematical statistics to carry out the derivation of

Assuming that the matrix of order

According to Sklar theorem [17], the mathematical expression of the joint density function _{i}_{i}_{1}, _{2}, _{3}, _{1}, _{2}, _{3}} is the parameter set.

If it is the parameter set, _{i}_{ij}_{in}_{in}_{l}

The L-BFGS-B algorithm is adopted to obtain the estimated value after

To construct the portfolio model with Copula function, it is necessary to determine the edge distribution of Copula and determine the edge distribution function. The appropriate Copula function is selected to describe the dependence structure relation of multi-variable edge distribution. The generalised autoregressive conditional heteroskedast (GARCH) model is adopted to generalise the marginal distribution of variables [20]. Copula function is adopted to combine edge distribution functions of different variables, and the resulting joint distribution function is the portfolio function of financial assets. Assuming that the financial time series are {R1t} and {R2t}, the mathematical expression of GARCH (1,1) model of the financial time series is as follows.
_{it}_{it}_{i}

It contains binary normal Copula and T-Copula models, whose mathematical expressions are as follows.
_{t}_{1}_{1,t} + _{2}_{2,t}, where

VaR satisfies the following equation.

In practice, the solution of Eq. (19) is more difficult, so Monte Carlo simulation can be considered for solution simulation; the simulation steps are shown in Figure 2.

Recurrent neural network (RNN) is used for prediction; RNN is also known as a deep learning Markov model, as its properties conform to the specific generalised Markov model [22, 23]. The basic structure of RNN model is shown in Figure 3.

The neurons in the hidden layer in the RNN are affected by the neurons in the upper layer. When

After the parameter gradient of weight matrix in RNN is determined, the stochastic gradient descent algorithm is adopted to train RNN. Then the RNN prediction results are evaluated by calculating the error rate and the average error rate. The calculation equations of error rate

The bank (A), insurance (B), securities (C) and trust (D) in the SWS index (

Through descriptive statistics, the kurtosis values of logarithmic yields of these four industries series index are, respectively, 9.668, 6.268, 6.842 and 6.557, which all exceed the peak value 3 of normal distribution. As they have high peak value, they do not conform to the normal distribution, and the yield rates have the characteristics of peak distribution. According to the analysis of skewness value, the skewness value of the exponential logarithmic yield of these four industries is, respectively, 0.031, 0.082, 0.043 and −0.741, which indicates that except for D, all sequences A, B and C are right-biased. The

The GARCH model is adopted to construct the exponential logarithmic yield series models of bank (A), insurance (B), securities (C) and trust (D). The final constructed model is the edge distribution function. After multiple verifications, it is found that GARCH (1,1) is the most effective method to verify all the yield rate sequence models. Therefore, normal distribution and

Descriptive statistical results of the yield sequence.

A | 0.026 | 1.578 | −1.826 | 1.552 | 0.031 | 9.668 | 3,511.921 | 0.000 |

B | 0.041 | 2.065 | −2.052 | 1.938 | 0.082 | 6.268 | 897.682 | 0.000 |

C | 0.009 | 2.559 | −2.597 | 2.356 | 0.043 | 6.842 | 1,123.972 | 0.000 |

D | 0.007 | 2.113 | −2.273 | 2.054 | −0.741 | 6.557 | 1,250.747 | 0.000 |

Estimation of edge distribution parameters of sequential exponential logarithmic yield sequence based on GARCH model.

A | Normal | 0.002 | 7.226 × e^{−6} |
0.052 | 0.960 | 5.311 | −5.442 |

T | 0.002 | 8.039 × e^{−6} |
0.043 | 0.970 | −5.559 | ||

B | Normal | 0.002 | 1.322 × e^{−6} |
0.062 | 0.950 | 5.825 | −5.297 |

T | 0.002 | 1.264 × e^{−6} |
0.037 | 0.952 | −5.384 | ||

C | Normal | 0.001 | 2.339 × e^{−6} |
0.038 | 0.970 | 5.797 | −5.338 |

T | 0.001 | 2.407 × e^{−6} |
0.051 | 0.963 | −5.458 | ||

D | Normal | 0.002 | 3.389 × e^{−6} |
0.036 | 0.953 | 5.265 | −5.165 |

T | 0.002 | 3.667 × e^{−6} |
0.055 | 0.954 | −5.364 |

AIC, Akaike information criterion; GARCH, generalised autoregressive conditional heteroskedast.

In order to eliminate autocorrelation, autoregressive moving average (ARMA) model [26] is used for data processing in this study. Compared with the GARCH model, the generalised autoregressive score (GAS) model can make full use of the density function, so it is adopted to overcome the heteroscedasticity in the data [27]. Assuming the yield is ^{2} satisfies GAS. The density function of _{t} is the conditional score of the density function; _{t}_{t}

The GAS model is adopted to determine the sequence edge distribution of each industry, and the parameter estimation value of the GAS model is finally obtained, as shown in Table 3. As shown in Table 3 and Figure 5A, 5B and 5C all obey the Laplace distribution, which confirms that the sequences of the three industries have certain skewness, and it is the same as the analysis results of the GARCH model. The distribution of D industry is different from that of other industries, and it belongs to partial

Estimation of edge distribution parameters of sequence exponential logarithmic rate of yield based on GAS model.

A | Laplace distribution | 0.065 | 0.989 | 3.677 × 10^{3} |

B | Laplace distribution | 0.043 | 0.995 | 4.223 × 10^{3} |

C | Laplace distribution | 0.038 | 0.996 | 4.019 × 10^{3} |

D | Partial t distribution | 0.039 | 0.997 | 3.567 × 10^{3} |

GAS, generalised autoregressive score.

Furthermore, the histogram results of the probability integral transform (PIT) values of the edge distribution in the 90% confidence interval of these four industry sequences are calculated. As shown in Figure 6, in the transformation process, the

The uniformly distributed variables obtained from the above PIT are calculated in the mathematical expressions of three single Gumbel Copula, Clayton Copula, Frank Copula and mixed Copula functions, which are Eqs 3–6. The parameters of the Gumbel Copula, Clayton Copula, Frank Copula, and mixed Copula functions are estimated, respectively, and the iteration results of parameters in different functions are shown in Figure 7. The expectation-maximisation (EM) algorithm is adopted for parameter iteration, and the parameters of different functions gradually become stable after 10 times of iteration.

Therefore, the parameter estimates of the three single Copula functions are taken as the initial values of the parameters of the mixed Copula function, and the obtained parameter estimates of the single Copula and mixed Copula functions are shown in Table 4. According to the principle of maximum log-likelihood function and minimum AIC and Bayesian information criterion (BIC) [28], the log-likelihood function of the mixed Copula function is the largest and its AIC and BIC values are the smallest, which indicates the mixed Copula model constructed in this study has better effects compared with the single Copula model.

Parameter estimates for single and mixed Copula functions.

Gumbel Copula | 1.711 | — | 1720.59 | −3432.65 | −3433.29 |

Clayton Copula | 1.121 | — | 1593.28 | −3087.24 | −3177.52 |

Frank Copula | 4.552 | — | 1633.74 | −3006.88 | −3014.87 |

Mixed Copula | 1.538 | 0.439 | 1957.33 | −3926.01 | −3800.65 |

HMM-mixed Copula | — | — | 2059.63 | −4088.39 | −4010.98 |

AIC, Akaike information criterion; BIC, Bayesian information criterion; HMM, Markov model.

As shown in Figure 8, the relevant parameters in the HMM-based mixed Copula model gradually become stable after 50 iterations, and the parameters of the HMM-based mixed Copula model in the final state 1 are _{1} = 1.208, _{2} = 0.923, _{3} = 8.926, _{1} = 0.505, _{2} = 0.482, _{3} = 0.025 and initial distribution _{1} = 2.153, _{2} = 2.101, _{3} = 18.220, _{1} = 0.553, _{2} = 0.339, _{3} = 0.087 and initial distribution

As shown in Table 4, the log-likelihood function of the mixed Copula model based on HMM is 2059.63, AIC = −4088.39, and BIC = −4010.98. State 1 is a low dependent state, while state 2 is a high dependent state. After analysing the probability of Copula model based on HMM in these two states, the probability of state 2 is the highest (97.55%), which indicates that these four industries are in a highly dependent state. As shown in Table 4, the mixed Copula model based on HMM proposed in this study has the largest logarithmic likelihood function and the lowest AIC and BIC values, which indicates that the model effect of nesting mixed Copula functions into HMM is optimal, which is consistent with the research results of Chun et al. (2015) [29].

Subsequently, the dynamic transition diagrams of states 1 and 2 of the sample from 2015 to 2019 are obtained, as shown in Figure 9A. These two dependent states will change at a certain frequency, yet the evaluation of this transformation is not regular and mainly depends on the degree of interdependence between industries. Studies show that the increase in interdependence will lead to the occurrence of risk infection, and the probability of risk infection is very high during this period. As shown in the low dependence probability graph (Figure 9B) and the high dependence probability graph (Figure 9C), the probability graph is basically consistent with the trend of the dynamic transition graph of states. Therefore, when an industry is in state 2 (high dependence state), there is a very strong dependence relationship between industries, which may cause macro or systemic risks from the perspective of dependency and risk contagion [30].

The interdependence of industries under these two states is calculated, and the results are shown in Table 5. Under different states, the tail dependence between A (banking) and B (insurance) is the strongest, which indicates that they have the most sensitivity to the impact of risks, and are more susceptible to be impacted by the own risks of extreme events.

Tail dependence of various industries in different states.

A | 1 | — | 0.431 | 0.112 | |

2 | — | 0.210 | 0.153 | ||

B | 1 | — | 0.311 | 0.241 | |

2 | — | 0.132 | 0.112 | ||

C | 1 | 0.257 | 0.210 | — | 0.413 |

2 | 0.162 | 0.132 | — | 0.215 | |

D | 1 | 0.206 | 0.232 | 0.241 | — |

2 | 0.103 | 0.134 | 0.213 | — |

Assuming that the investor invests according to the weight of [0.5, 0.5] and the investment amount is 1 RMB. Monte Carlo simulation is adopted to measure the VaR value of various fields for 5,000 times. The VaR values under 95%, 97.5% and 99% confidence intervals are, respectively, explored. The results show that when the confidence interval is 95%, the investment failure rate is 4.53%; when the confidence interval is 97.5%, the investment failure rate is 2.17%; when the confidence interval is 99%, the investment failure rate is 1.08%. All these indicate that the model constructed in this study can be used in the measurement of portfolio VaR and can provide good theoretical guidance for investors.

The Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519) and China Ping An Insurance (Sh601318) are taken as the research objects to explore the effect of deep learning Markov model in stock price forecasting. The number of neurons in the hidden layer (

Subsequently, it is applied to the stock price forecast of Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519) [31], and China Ping An Insurance (Sh601318), as shown in Figure 11. The variation trend of each stock price predicted by deep learning Markov model is basically consistent with the actual situation. Moreover, the errors of deep learning Markov model in stock price prediction of Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519)V and China Ping An Insurance (Sh601318) are 2.56%, 2.98% [32] and 3.56%, respectively, which indicates that the deep learning Markov model proposed in this study can be used in the prediction of stock prices and has a high prediction accuracy, which is consistent with the research results of Tingwei et al. (2018) [33].

HMM is combined with the hybrid Copula model to analyse the dependence and risk contagion of the financial sub-industry. It is found that the dependence between financial sub-industries is strong, and has higher risk contagion in the state of high dependence. The deep learning Markov model applied in the stock price prediction is with high accuracy. However, only the deep learning Markov model is adopted to predict the stock prices, and further study should be carried out on the prediction of investment risks. In conclusion, the results of this study provide a theoretical basis for risk assessment among financial sub-industries and accurate financial decisions making.

