Adoption of deep learning Markov model combined with copula function in portfolio risk measurement

With the deepening of global economic integration, the financial markets of various countries and regions have become more closely related, and their correlation has become more obvious [1]. However, the economic globalisation has made the financial crisis more widespread, and financial crises in the local regions or certain countries often trigger global financial crises, such as the subprime crisis in the United States [2, 3]. Therefore, understanding and seizing the correlation between financial markets is of great significance for effectively avoiding the spread of financial crisis. In traditional correlation studies of financial variables, methods such as Pearson correlation coefficient, Spearman correlation and Granger causality test are often used, which are with great limitations [4,5,6]. Therefore, the Copula function has gradually become one of the main methods to explore the correlation between financial variables. Yet the situation is complex and changeable in the actual financial market, and there are close correlations among various enterprises or fields. The single Copula function adopted to explore correlations in financial markets can lead to large errors [7, 8]. Moreover, existing scholars can accurately describe the tail correlation between financial variables by capturing the linear combination of Copula functions with tail characteristics to form a mixed Copula function [9, 10]. Since there are many hidden states behind the yield rate in financial variables, special methods are required to speculate the hidden state. Hidden Markov model (HMM) is first applied in speech recognition and other fields, and then applied in finance and other fields [11, 12]. However, there are few studies on how to measure the portfolio risk by combining the Copula function with HMM.

To fill in the gap, the basic concept of the hybrid Copula model is analysed first, the hybrid Copula model is then nested in the HMM framework and applied to portfolio risk dependency and metric analysis, and the deep Markov model is adopted to predict the stock risk. To sum up, the results of this study aim to provide a theoretical basis for exploring the risk-dependent structure among financial variables in China so as to conduct accurate risk assessment.

Methodologies

2.1

Related concepts of copula functions

The model based on Copula function can be adopted to study the correlation between variables and their characteristics. The binary Copula function is defined as follows.

Definition 1

Assuming that C is a binary Copula containing two variables, and it has the following characteristics: (a), C's domain I is [0,1]; (b) there are arbitrary u and v in the domain I of C and C (u,1) = u,C (1,v) = v (c) the zero-base surface C has the property of 2D increasing.

Suppose that there are two unary distribution functions F (x₁) = u and F (x₂) = v and the two functions are continuous and uniformly distributed [0,1]. Then C (u,v) is a binary distribution function of Copula, which satisfies the condition of 0 ≤ C (u,v) ≤ 1 in the interval (u, v).

The Archimedes Copula function contains multiple Copula functions, which can be expressed by their generative meta-functions [13].

Definition 2

Suppose there is function Ω, which can be defined as follows. (1) $Ω = {φ () : [0, \infty) \to [0, 1] | φ (0) = 1, φ (\infty) = 0, {(- 1)}^{j} φ^{(j)} \geq 0, j = 1, 2 \dots d - 2}$ \Omega = \left\{ {\varphi \left( {} \right):\left[ {0,\infty } \right) \to \left[ {0,1} \right]\left| {\varphi \left( 0 \right) = 1,\varphi \left( \infty \right) = 0,{{\left( { - 1} \right)}^j}{\varphi ^{\left( j \right)}} \ge 0,j = 1,2 \cdots d - 2} \right.} \right\} Then Archimedes Copula function C, [0,1]^d → [0,1] can be defined as follows. (2) $C (u_{1}, u_{2}, \dots, u_{d}) = φ^{- 1} [φ (u_{1}) + \dots + φ (u_{d})]$ C\left( {{u_1},{u_2}, \cdots ,{u_d}} \right) = {\varphi ^{ - 1}}\left[ {\varphi \left( {{u_1}} \right) + \cdots + \varphi \left( {{u_d}} \right)} \right] In Eq. (2), φ () ∈ Ω is a generator in the Archimedes Copula function, whose inverse function φ⁻¹ () satisfies complete monotonicity ${(- 1)}^{k} \frac{\partial^{k}}{\partial u^{k}} φ^{- 1} (u) \geq 0$ {\left( { - 1} \right)^k}{{{\partial ^k}} \over {\partial {u^k}}}{\varphi ^{ - 1}}\left( u \right) \ge 0 .

The common Archimedes Copula functions include the Gumbel Copula function, the Clayton Copula function and the Frank Copula function [14, 15]. The mathematical expressions of these three functions are as follows, respectively. (3) $C (u_{1}, u_{2}, \dots, u_{d}; θ_{1}) = exp {- {[\sum_{k = 1}^{d} {(- ln u_{k})}^{\frac{1}{θ_{1}}}]}^{θ_{1}}}$ C\left( {{u_1},{u_2}, \cdots ,{u_d};{\theta _1}} \right) = \exp \left\{ { - {{\left[ {\sum\limits_{k = 1}^d {{\left( { - \ln {u_k}} \right)}^{{1 \over {{\theta _1}}}}}} \right]}^{{\theta _1}}}} \right\} (4) $C (u_{1}, u_{2}, \dots, u_{d}; θ_{2}) = {(\sum_{k = 1}^{d} u_{_{k}}^{- θ_{2}} - d + 1)}^{- \frac{1}{θ_{2}}}$ C\left( {{u_1},{u_2}, \cdots ,{u_d};{\theta _2}} \right) = {\left( {\sum\limits_{k = 1}^d u_{_k}^{ - {\theta _2}} - d + 1} \right)^{ - {1 \over {{\theta _2}}}}} (5) $C (u_{1}, u_{2}, \dots, u_{d}; θ_{3}) = \frac{1}{θ_{3}} ln (1 + \frac{\prod_{k = 1}^{d} (e^{- θ_{3} u_{k}} - 1)}{{(e^{- θ_{3}} - 1)}^{d - 1}})$ C\left( {{u_1},{u_2}, \cdots ,{u_d};{\theta _3}} \right) = {1 \over {{\theta _3}}}\ln \left( {1 + {{\prod\limits_{k = 1}^d \left( {{e^{ - {\theta _3}{u_k}}} - 1} \right)} \over {{{\left( {{e^{ - {\theta _3}}} - 1} \right)}^{d - 1}}}}} \right) In Eq. (3), ${(- ln u)}^{\frac{1}{θ_{1}}}$ {\left( { - \ln u} \right)^{{1 \over {{\theta _1}}}}} is the generator of Gumbel Copula function, θ₁ ∈ (0,1], and the closer it approaches to 0, the closer it approaches to complete correlation; in Eq. (4), u^−θ₂ − 1 is the generator of Clayton Copula function, θ₂ ∈ (0,∞), and the closer it approaches to ∞, the closer it approaches to complete correlation; $ln \frac{e^{- θ_{3} u} - 1}{e^{- θ_{3}} - 1}$ \ln {{{e^{ - {\theta _3}u}} - 1} \over {{e^{ - {\theta _3}}} - 1}} is the generator of the Frank Copula function, θ₃ ∈ (0,∞), and the closer it approaches to ∞, the closer it approaches to complete correlation.

Then the Gumbel Copula, Clayton Copula and Frank Copula functions are combined to build a mixed Copula function, and the weight values of these three functions are w₁,w₂,w₃. The mixed Copula function can be expressed as follows. (6) $C_{m} (u_{1}, u_{2}, \dots, u_{d}; ψ) = \sum_{k = 1}^{3} w_{k} C_{k} (u_{1}, u_{2}, \dots, u_{d}; θ_{k})$ {C_m}\left( {{u_1},{u_2}, \cdots ,{u_d};\psi } \right) = \sum\limits_{k = 1}^3 {w_k}{C_k}\left( {{u_1},{u_2}, \cdots ,{u_d};{\theta _k}} \right) In the above equation, 0 ≤ w_k ≤ 1, and are all uniform variables in [0,1]; ψ = {θ₁,θ₂,θ₃,w₁,w₂,w₃} is the parameter set.

Then the density function of the mixed Copula function is as follows. (7) $C_{m} (u_{1}, u_{2}, \dots, u_{d}; ψ) = \frac{\partial^{d} C_{m} (u_{1}, u_{2}, \dots, u_{d}; ψ)}{\partial u_{1} \partial u_{2} \dots \partial u_{d}} = \sum_{k = 1}^{3} w_{k} C_{k} (u_{1}, u_{2}, \dots, u_{d}; θ_{k})$ {C_m}\left( {{u_1},{u_2}, \cdots ,{u_d};\psi } \right) = {{{\partial ^d}{C_m}\left( {{u_1},{u_2}, \cdots ,{u_d};\psi } \right)} \over {\partial {u_1}\partial {u_2} \cdots \partial {u_d}}} = \sum\limits_{k = 1}^3 {w_k}{C_k}\left( {{u_1},{u_2}, \cdots ,{u_d};{\theta _k}} \right)

2.2

Construction of dynamic hybrid Copula Model based on HMM

The state in HMM is hidden, which can be obtained by the observation sequence and transition probability matrix between states. The four parameters in HMM can be described as the number of states, distribution of initial probability, transition probability matrix and density function of observation sequence [16]. It is assumed that there are HM series {Xh} with m states, and d dimensional continuous observation series {Yh} has the conditional probability density function P_i (y_h) = P(Y_h = y_h |X_h = i) when the state is i. Therefore, the unconditional probability vector of the initial probability distribution is as follows. (8) $π_{1 * m} = [P (X_{1} = 1), P (X_{1} = 2), \dots, P (X_{1} = m)]$ {\pi _{1*m}} = \left[ {P\left( {{X_1} = 1} \right),P\left( {{X_1} = 2} \right), \cdots ,P\left( {{X_1} = m} \right)} \right] Assuming that the observed sequence is {Yh} and the state sequence is {Xh}, the hidden Markov chain needs to meet the following conditions.

P(X_h = x_h |X₁ = x₁,⋯ ,X_h−1 = x_h−1) = P(X_h = x_h |X_h−1 = x_h−1),

P(Y_h = y_h |X₁ = x₁,⋯ ,X_h = x_h,Y₁ = y₁,⋯ ,Y_h−1 = y_h−1) = P(Y_h = y_h |Y_h = y_h).

According to the above two properties, the theory of conditional probability is combined with the theory of probability in mathematical statistics to carry out the derivation of $P (Y_{1}, \dots, Y_{h}, X_{1}, \dots, X_{h}) : P (Y_{1}, \dots, Y_{h}, X_{1}, \dots, X_{h}) = \prod_{h = 1}^{H} p_{x_{h}} (y_{h}) \prod_{h = 1}^{H - 1} a_{{xx}_{h + 1}} π_{x_{1}} .$ P\left( {{Y_1}, \cdots ,{Y_h},{X_1}, \cdots ,{X_h}} \right):\;P\left( {{Y_1}, \cdots ,{Y_h},{X_1}, \cdots ,{X_h}} \right) = \prod\limits_{h = 1}^H {p_{{x_h}}}\left( {{y_h}} \right)\prod\limits_{h = 1}^{H - 1} {a_{x{x_{h + 1}}}}{\pi _{{x_1}}}. Based on the HMM model, the Copula model is nested in the framework to obtain the dynamic hybrid Copula model. The construction process is as follows.

Assuming that the matrix of order H × D is Y, the matrix formed by the d dimensional observation sequence at time h is {Yh}; the matrix of order H × m order is X, the matrix formed by m state sequences at time h is {Xh}; the matrix of order H × n order is K, the matrix formed by the n-dimensional mixed Copula function branch sequence at the time h is {Rh}.

According to Sklar theorem [17], the mathematical expression of the joint density function f_i of d dimension in the i state at time h is as follows. (9) $f_{i} (y_{h 1}, \dots, y_{hd}) = c_{i} (F_{1} (y_{h 1}; η_{1}) \dots, F_{d} (y_{hd}; η_{d}); ψ) \prod_{l = 1}^{d} f_{l} (y_{hl}; η_{l})$ {f_i}\left( {{y_{h1}}, \cdots ,{y_{hd}}} \right) = {c_i}\left( {{F_1}\left( {{y_{h1}};{\eta _1}} \right) \cdots ,{F_d}\left( {{y_{hd}};{\eta _d}} \right);\psi } \right)\prod\limits_{l = 1}^d {f_l}\left( {{y_{hl}};{\eta _l}} \right) In Eq. (9), c_i () is the mixed Copula density function of i state; η is the parameter vector of edge distribution function and ψ = {θ₁,θ₂,θ₃,w₁,w₂,w₃} is the parameter set.

If it is the parameter set, ζ = {π_i,a_ij,w_in,θ_in,η_l} is the dependent parameter of the nth function of i state, then the mathematical expression of the HMM likelihood function of d dimension is as follows. (10) $L (ζ, Y, X, R) = P (Y, X, R | ζ) = π_{x_{1}} \prod_{h = 1}^{H - 1} a_{x_{h} x_{h + 1}} \prod_{h = 1}^{H} (\prod_{l = 1}^{d} f (y_{hl}; η_{l}) w_{x_{h} K_{h}} c_{x_{h} K_{h}} (u_{h}))$ L\left( {\zeta ,Y,X,R} \right) = P\left( {Y,X,R\left| \zeta \right.} \right) = {\pi _{{x_1}}}\prod\limits_{h = 1}^{H - 1} {a_{{x_h}{x_{h + 1}}}}\prod\limits_{h = 1}^H \left( {\prod\limits_{l = 1}^d f\left( {{y_{hl}};{\eta _l}} \right){w_{{x_h}{K_h}}}{c_{{x_h}{K_h}}}\left( {{u_h}} \right)} \right) Observation sequence and parameter value vector ζ are known, and then the mathematical expressions for the conditional expectation Q of the completely likelihood function that is at K step of iteration is as follows. (11) $\begin{array}{l} Q (ζ, ζ_{(r)}) & = \sum_{{X, R}} P (Y, X, R | Y; ζ_{(r)}) log P (Y, X, R | ζ) = \sum_{i = 1}^{m} log π_{i} γ_{i}^{(k)} (1) + \sum_{i = 1}^{m} \sum_{j = 1}^{m} \sum_{h = 1}^{H - 1} log a_{ij} ε_{ij}^{(k)} (h) \\ + \sum_{i = 1}^{m} \sum_{h = 1}^{H} γ_{i}^{(k)} (h) log (\prod_{l = 1}^{d} f_{l} (y_{hl}; η_{l})) \\ + \sum_{i = 1}^{m} \sum_{h = 1}^{H} \sum_{v = 1}^{3} γ_{i}^{(k)} (t) (log c_{iv} (u_{h}, θ_{iv} | ψ^{(k)}) + log (w_{iv} | ψ^{(k)})) \frac{w_{iv}}{\sum_{v = 1}^{3} w_{iv} c_{iv}} \end{array}$ \matrix{ {Q\left( {\zeta ,{\zeta _{\left( r \right)}}} \right)} \hfill & { = \sum\limits_{\left\{ {X,R} \right\}} P\left( {Y,X,R\left| {Y;{\zeta _{\left( r \right)}}} \right.} \right)\log P\left( {Y,X,R\left| \zeta \right.} \right) = \sum\limits_{i = 1}^m \log {\pi _i}\gamma _i^{\left( k \right)}\left( 1 \right) + \sum\limits_{i = 1}^m \sum\limits_{j = 1}^m \sum\limits_{h = 1}^{H - 1} \log {a_{ij}}\varepsilon _{ij}^{\left( k \right)}\left( h \right)} \hfill \cr {} \hfill & { + \sum\limits_{i = 1}^m \sum\limits_{h = 1}^H \gamma _i^{\left( k \right)}\left( h \right)\log \left( {\prod\limits_{l = 1}^d {f_l}\left( {{y_{hl}};{\eta _l}} \right)} \right)} \hfill \cr {} \hfill & { + \sum\limits_{i = 1}^m \sum\limits_{h = 1}^H \sum\limits_{v = 1}^3 \gamma _i^{\left( k \right)}\left( t \right)\left( {\log {c_{iv}}\left( {{u_h},{\theta _{iv}}\left| {{\psi ^{\left( k \right)}}} \right.} \right) + \log \left( {{w_{iv}}\left| {{\psi ^{\left( k \right)}}} \right.} \right)} \right){{{w_{iv}}} \over {\sum\limits_{v = 1}^3 {w_{iv}}{c_{iv}}}}} \hfill \cr } In the above equation, the conditional expectation Q function consists of five parts. Therefore, it needs to iterate to step K to maximise the Q function, which needs K steps to maximise each part. If the weights of the initial distribution, transition matrix and Copula functions are conditional, the Lagrange multiplication can be adopted to maximise the parameters and then the maximum likelihood function is adopted to estimate the edge distribution. In Eq. (11), the dependent parameters are multivariate nonlinear mathematical expressions with boundary conditions. Taking the optimisation of Autoencoder model in deep learning as an example [18], the L-BFGS can solve similar problems with gradient descent function, SGD and other functions, as shown in Figure 1. But in most cases, the L-BFGS function converges faster and has less memory overhead than other algorithms [19].

Model optimisation based on different functions.

The L-BFGS-B algorithm is adopted to obtain the estimated value after k+1 iteration, and the mathematical expression of the final estimated value is as follows. (12) $π_{i} = γ_{i}^{(k)} (1)$ {\pi _i} = \gamma _i^{\left( k \right)}\left( 1 \right) (13) $a_{ij} = \frac{\sum_{h = 1}^{H - 1} ε_{ij}^{(k)} (h)}{\sum_{j = 1}^{m} \sum_{h = 1}^{H - 1} ε_{ij}^{(k)} (h)}$ {a_{ij}} = {{\sum\limits_{h = 1}^{H - 1} \varepsilon _{ij}^{\left( k \right)}\left( h \right)} \over {\sum\limits_{j = 1}^m \sum\limits_{h = 1}^{H - 1} \varepsilon _{ij}^{\left( k \right)}\left( h \right)}} (14) $w_{iv} = \frac{\sum_{h = 1}^{H} γ_{i}^{(k)} (t) \frac{w_{iv}^{(k)} c_{iv} (u_{h}, θ_{iv}^{(k)})}{\sum_{v = 1}^{3} w_{iv}^{(k)} c_{iv} (u_{h}, θ_{iv}^{(k)})}}{\sum_{h = 1}^{H} γ_{i}^{(k)} (t)}$ {w_{iv}} = {{\sum\limits_{h = 1}^H \gamma _i^{\left( k \right)}\left( t \right){{w_{iv}^{\left( k \right)}{c_{iv}}\left( {{u_h},\theta _{iv}^{\left( k \right)}} \right)} \over {\sum\limits_{v = 1}^3 w_{iv}^{\left( k \right)}{c_{iv}}\left( {{u_h},\theta _{iv}^{\left( k \right)}} \right)}}} \over {\sum\limits_{h = 1}^H \gamma _i^{\left( k \right)}\left( t \right)}}

2.3

Construction of the portfolio model based on Copula function and the risk measurement

To construct the portfolio model with Copula function, it is necessary to determine the edge distribution of Copula and determine the edge distribution function. The appropriate Copula function is selected to describe the dependence structure relation of multi-variable edge distribution. The generalised autoregressive conditional heteroskedast (GARCH) model is adopted to generalise the marginal distribution of variables [20]. Copula function is adopted to combine edge distribution functions of different variables, and the resulting joint distribution function is the portfolio function of financial assets. Assuming that the financial time series are {R1t} and {R2t}, the mathematical expression of GARCH (1,1) model of the financial time series is as follows. (15) ${\begin{array}{l} r_{it} = μ_{i} + ε_{it} \\ ε_{it} = η_{it} σ_{it}, η_{it} \overset{i, t, d}{~} F_{i} () \\ σ_{_{it}}^{2} = ω_{i} + α_{i} ε_{_{it - 1}}^{2} + β_{i} σ_{_{it - 1}}^{2} \\ (η_{1 t}, η_{2 t}) | F_{t - 1} \sim C_{t} (F_{1} (η_{1 t}), F_{2} (η_{2 t}) | F_{t - 1}) \end{array}$ \left\{ {\matrix{ {{r_{it}} = {\mu _i} + {\varepsilon _{it}}} \hfill \cr {{\varepsilon _{it}} = {\eta _{it}}{\sigma _{it}},\quad {\eta _{it}}\mathop \sim\limits^{i,t,d} {F_i}\left( {} \right)} \hfill \cr {\sigma _{_{it}}^2 = {\omega _i} + {\alpha _i}\varepsilon _{_{it - 1}}^2 + {\beta _i}\sigma _{_{it - 1}}^2} \hfill \cr {\left( {{\eta _{1t}},{\eta _{2t}}} \right)\left| {{F_{t - 1}} \sim {C_t}\left( {{F_1}\left( {{\eta _{1t}}} \right),{F_2}\left( {{\eta _{2t}}} \right)\left| {{F_{t - 1}}} \right.} \right)} \right.} \hfill \cr } } \right. where r_it is the yield rate, η_it obeys F_i () distribution, α is the coefficient of ARCH and β is the coefficient of GARCH.

It contains binary normal Copula and T-Copula models, whose mathematical expressions are as follows. (16) $C (u, v; p) = \int_{- \infty}^{φ^{- 1} (u)} \int_{- \infty}^{φ^{- 1} (v)} \frac{1}{2 π \sqrt{1 - p^{2}}} exp [- \frac{s^{2} - 2 pst + t^{2}}{2 (1 - p^{2})}] dsdt$ C\left( {u,v;p} \right) = \int_{ - \infty }^{{\varphi ^{ - 1}}\left( u \right)} \int_{ - \infty }^{{\varphi ^{ - 1}}\left( v \right)} {1 \over {2\pi \sqrt {1 - {p^2}} }}\exp \left[ { - {{{s^2} - 2pst + {t^2}} \over {2\left( {1 - {p^2}} \right)}}} \right]dsdt (17) $C (u, v; p, k) = \int_{- \infty}^{t_{k}^{- 1} (u)} \int_{- \infty}^{t_{k}^{- 1} (v)} \frac{1}{2 π \sqrt{1 - p^{2}}} {[1 + \frac{s^{2} - 2 pst + t^{2}}{k (1 - p^{2})}]}^{\frac{- (k + 2)}{2}} dsdt$ C\left( {u,v;p,k} \right) = \int_{ - \infty }^{t_k^{ - 1}\left( u \right)} \int_{ - \infty }^{t_k^{ - 1}\left( v \right)} {1 \over {2\pi \sqrt {1 - {p^2}} }}{\left[ {1 + {{{s^2} - 2pst + {t^2}} \over {k\left( {1 - {p^2}} \right)}}} \right]^{{{ - \left( {k + 2} \right)} \over 2}}}dsdt Then Monte Carlo simulation is adopted to simulate the value at risk (VaR) of the investment portfolio [21]. Assuming that the initial total investment is fixed and the time range is demarcated as 1 day, the total assets of the X and Y portfolios of the two types of investments at time t is V_t = n₁P_1,t + n₂P_2,t, where n is the proportion of these two types of investments in the total portfolio, and P is the value of these two types of investments at time t. And the logarithmic yield on the portfolio is as follows. (18) $R_{t + 1} = log (\frac{n_{1} P_{1, t}}{n_{1} P_{1, t} + n_{2} P_{2, t}} e^{X} + \frac{n_{1} P_{1, t}}{n_{1} P_{1, t} + n_{2} P_{2, t}} e^{Y}) = log (λ_{1} e^{X} + λ_{2} e^{Y})$ {R_{t + 1}} = \log \left( {{{{n_1}{P_{1,t}}} \over {{n_1}{P_{1,t}} + {n_2}{P_{2,t}}}}{e^X} + {{{n_1}{P_{1,t}}} \over {{n_1}{P_{1,t}} + {n_2}{P_{2,t}}}}{e^Y}} \right) = \log \left( {{\lambda _1}{e^X} + {\lambda _2}{e^Y}} \right) λ is the weight of the two types of investment in the portfolio, the sum of which is 1.

VaR satisfies the following equation. (19) $1 - α = P (R \leq - r) = \int_{- \infty}^{- r - log (- λ_{t})} dx \int_{- \infty}^{log (e^{- r} λ_{2} - λ_{1} λ_{2} e^{y})} c_{t} f_{t} (x | F_{t - 1}) g_{t} (x | F_{t - 1}) dy$ 1 - \alpha = P\left( {R \le - r} \right) = \int_{ - \infty }^{ - r - \log \left( { - {\lambda _t}} \right)} dx\int_{ - \infty }^{\log \left( {{e^{ - r}}{\lambda _2} - {\lambda _1}{\lambda _2}{e^y}} \right)} {c_t}{f_t}\left( {x\left| {{F_{t - 1}}} \right.} \right){g_t}\left( {x\left| {{F_{t - 1}}} \right.} \right)dy α is the confidence level.

In practice, the solution of Eq. (19) is more difficult, so Monte Carlo simulation can be considered for solution simulation; the simulation steps are shown in Figure 2.

Solving process of portfolio VaR. VaR, value at risk.

2.4

Portfolio risk prediction based on deep learning Markov model

Recurrent neural network (RNN) is used for prediction; RNN is also known as a deep learning Markov model, as its properties conform to the specific generalised Markov model [22, 23]. The basic structure of RNN model is shown in Figure 3.

Basic structure of RNN model. RNN, recurrent neural network.

The neurons in the hidden layer in the RNN are affected by the neurons in the upper layer. When t = 0, there is no state of the hidden layer neurons at the previous moment. Different from other models, the parameters in the RNN model are shared with each other, which makes the RNN model carry out the same processing at different moments, which reduces the workload of parameter learning for the model [24]. At time t, the gradient of the error function E in the cyclic layer weight matrix W with respect to RNN is as follows. (20) $\frac{\partial E}{\partial W_{t}} = \frac{\partial E}{\partial r_{t}} \frac{\partial r_{t}}{\partial W_{t}} = [\begin{matrix} δ_{1}^{t} h_{1}^{t - 1} & δ_{1}^{t} h_{2}^{t - 1} & \dots & δ_{1}^{t} h_{n}^{t - 1} \\ δ_{2}^{t} h_{1}^{t - 1} & δ_{2}^{t} h_{2}^{t - 1} & \dots & δ_{2}^{t} h_{n}^{t - 1} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ δ_{n}^{t} h_{1}^{t - 1} & δ_{n}^{t} h_{2}^{t - 1} & \dots & δ_{n}^{t} h_{n}^{t - 1} \end{matrix}]$ {{\partial E} \over {\partial {W_t}}} = {{\partial E} \over {\partial {r_t}}}{{\partial {r_t}} \over {\partial {W_t}}} = \left[ {\matrix{ {\delta _1^th_1^{t - 1}} & {\delta _1^th_2^{t - 1}} & \cdots & {\delta _1^th_n^{t - 1}} \cr {\delta _2^th_1^{t - 1}} & {\delta _2^th_2^{t - 1}} & \cdots & {\delta _2^th_n^{t - 1}} \cr \vdots & \vdots & \vdots & \vdots \cr {\delta _n^th_1^{t - 1}} & {\delta _n^th_2^{t - 1}} & \cdots & {\delta _n^th_n^{t - 1}} \cr } } \right] Then the gradient of the error function E in the weight matrix W in the cyclic layer is the sum $\frac{\partial E}{\partial W} = \sum_{i = 1}^{t} \frac{\partial E}{\partial W_{t}}$ {{\partial E} \over {\partial W}} = \sum\limits_{i = 1}^t {{\partial E} \over {\partial {W_t}}} of the gradients at each moment. Similarly, the gradient of the error function E to the weight matrix U can be obtained as $\frac{\partial E}{\partial U} = \sum_{i = 1}^{t} \frac{\partial E}{\partial U_{t}}$ {{\partial E} \over {\partial U}} = \sum\limits_{i = 1}^t {{\partial E} \over {\partial {U_t}}} .

After the parameter gradient of weight matrix in RNN is determined, the stochastic gradient descent algorithm is adopted to train RNN. Then the RNN prediction results are evaluated by calculating the error rate and the average error rate. The calculation equations of error rate e and average error rate at time t are as follows. (21) $e = \frac{| y^{'} - y |}{y} * 100 %$ e = {{\left| {{y^\prime } - y} \right|} \over y}*100\% (22) $\bar{e} = \frac{1}{N} \sum_{t = 1}^{N} \frac{| y^{'} - y |}{y} * 100 %$ \overline e = {1 \over N}\sum\limits_{t = 1}^N {{\left| {{y^\prime } - y} \right|} \over y}*100\% Y is the prediction results, y is the actual results and N is the data amount.

Results and discussion

3.1

Data preprocessing and edge distribution determination

The bank (A), insurance (B), securities (C) and trust (D) in the SWS index (http://www.swsindex.com/) are taken as research objects. The closing prices of solstice from 2 January 2015 to 31 December 2019 are collected as the research object. As all of the data are time series, data preprocessing is required to ensure the stability of data before building the model by these data [25]. In this study, the logarithmic rate of yield r of various industries is selected for empirical analysis. The descriptive statistics of data of various industries are conducted, and the results are shown in Figure 4. Compared with the standard normal distribution, the distribution of exponential logarithmic yields of various industries has a more obvious kurtosis and skewness.

Descriptive statistical results of the exponential logarithmic yield of each industry. (A) banking industry; (B) insurance industry; (C) securities industry; (D) trust industry).

Through descriptive statistics, the kurtosis values of logarithmic yields of these four industries series index are, respectively, 9.668, 6.268, 6.842 and 6.557, which all exceed the peak value 3 of normal distribution. As they have high peak value, they do not conform to the normal distribution, and the yield rates have the characteristics of peak distribution. According to the analysis of skewness value, the skewness value of the exponential logarithmic yield of these four industries is, respectively, 0.031, 0.082, 0.043 and −0.741, which indicates that except for D, all sequences A, B and C are right-biased. The P values are all <0.05 after Jarque Bera statistical test, indicating that they do not conform to the initial hypothesis of normal distribution.

The GARCH model is adopted to construct the exponential logarithmic yield series models of bank (A), insurance (B), securities (C) and trust (D). The final constructed model is the edge distribution function. After multiple verifications, it is found that GARCH (1,1) is the most effective method to verify all the yield rate sequence models. Therefore, normal distribution and T distribution are used in this study to simulate the residuals of yield series. The GARCH (1,1) models of normal distribution and t-distribution are constructed by MATLAB. The parameters of each model are estimated respectively. The final estimation results of edge distribution are shown in Table 2. Akaike information criterion (AIC) value can reflect the optimal benign of statistical model fitting. The estimation results of edge distribution parameters obtained from Table 1 show that the AIC value of normal distribution is greater than that of T-distribution, which indicates the fitting effect of edge distribution of exponential logarithmic yield rates of A, B, C and D industry sequences based on T distribution is obviously better than that of normal distribution. As A, B, C, and D exponential logarithmic yield sequence have the feature of thick tail, it indicates that the T distribution has an obvious advantage in describing the yield sequence with peak and thick tail distribution feature.

Table 1

Descriptive statistical results of the yield sequence.

	Mean value	Maximum	Minimum	Standard deviation	Skewness	Kurtosis	J_B statistic	P value

A	0.026	1.578	−1.826	1.552	0.031	9.668	3,511.921	0.000
B	0.041	2.065	−2.052	1.938	0.082	6.268	897.682	0.000
C	0.009	2.559	−2.597	2.356	0.043	6.842	1,123.972	0.000
D	0.007	2.113	−2.273	2.054	−0.741	6.557	1,250.747	0.000

Table 2

Estimation of edge distribution parameters of sequential exponential logarithmic yield sequence based on GARCH model.

	Distribution	μ	ω	α	β	θ	AIC

A	Normal	0.002	7.226 × e⁻⁶	0.052	0.960	5.311	−5.442
A	T	0.002	8.039 × e⁻⁶	0.043	0.970	5.311	−5.559
B	Normal	0.002	1.322 × e⁻⁶	0.062	0.950	5.825	−5.297
B	T	0.002	1.264 × e⁻⁶	0.037	0.952	5.825	−5.384
C	Normal	0.001	2.339 × e⁻⁶	0.038	0.970	5.797	−5.338
C	T	0.001	2.407 × e⁻⁶	0.051	0.963	5.797	−5.458
D	Normal	0.002	3.389 × e⁻⁶	0.036	0.953	5.265	−5.165
D	T	0.002	3.667 × e⁻⁶	0.055	0.954	5.265	−5.364

AIC, Akaike information criterion; GARCH, generalised autoregressive conditional heteroskedast.

In order to eliminate autocorrelation, autoregressive moving average (ARMA) model [26] is used for data processing in this study. Compared with the GARCH model, the generalised autoregressive score (GAS) model can make full use of the density function, so it is adopted to overcome the heteroscedasticity in the data [27]. Assuming the yield is y, and y = u + σɛ, where u is the mean value equation, which satisfies ARMA model; σ is the conditional standard deviation, and the fluctuation in f = σ² satisfies GAS. The density function of y is p(y| f, F), where F is the information set. Then the mathematical expression of GAS (1,1) model is as follows. (23) ${\begin{array}{l} f_{t + 1} = w + α s_{t} + β f_{t} \\ s_{t} = S_{t} (f_{t}) \nabla_{t} (y_{t}, f_{t}) \\ S_{t} = E_{t - 1} {[\nabla_{t}, \nabla_{_{t}}^{'}]}^{- γ} \\ \nabla_{t} = \frac{\partial ln p (y_{t} | f_{t}, F_{t})}{\partial f_{t}} \end{array}$ \left\{ {\matrix{ {{f_{t + 1}} = w + \alpha {s_t} + \beta {f_t}} \hfill \cr {{s_t} = {S_t}\left( {{f_t}} \right){\nabla _t}\left( {{y_t},{f_t}} \right)} \hfill \cr {{S_t} = {E_{t - 1}}{{\left[ {{\nabla _t},\nabla _{_t}^\prime } \right]}^{ - \gamma }}} \hfill \cr {{\nabla _t} = {{\partial \ln p\left( {{y_t}\left| {{f_t},{F_t}} \right.} \right)} \over {\partial {f_t}}}} \hfill \cr } } \right. In Eq. (23), ∇_t is the conditional score of the density function; s_t is the scale score of the observation sequence; S_t is the scalar; w,α,β are constant coefficients, the recursive direction of f is controlled by α, β < 1 and the persistence of recursion is controlled by it as γ = 0.

The GAS model is adopted to determine the sequence edge distribution of each industry, and the parameter estimation value of the GAS model is finally obtained, as shown in Table 3. As shown in Table 3 and Figure 5A, 5B and 5C all obey the Laplace distribution, which confirms that the sequences of the three industries have certain skewness, and it is the same as the analysis results of the GARCH model. The distribution of D industry is different from that of other industries, and it belongs to partial t-distribution. As shown in Table 3, the α values of these four industry sequences are all positive, which means that these sequences recourse in the fastest ascending algorithm. However, β > 0.95, which is close to 1, indicating that the fluctuation process of these four industry sequences has a long duration.

Distribution characteristics of different industry sequences.

Table 3

Estimation of edge distribution parameters of sequence exponential logarithmic rate of yield based on GAS model.

	Distribution	α	β	w

A	Laplace distribution	0.065	0.989	3.677 × 10³
B	Laplace distribution	0.043	0.995	4.223 × 10³
C	Laplace distribution	0.038	0.996	4.019 × 10³
D	Partial t distribution	0.039	0.997	3.567 × 10³

GAS, generalised autoregressive score.

Furthermore, the histogram results of the probability integral transform (PIT) values of the edge distribution in the 90% confidence interval of these four industry sequences are calculated. As shown in Figure 6, in the transformation process, the P value of A (banking) is manifested as P < 0.05 at the 3rd and 4th moments, while the P value of C (securities industry) is manifested as P < 0.05 at the final histogram. And the final KS test results (P values) of PIT values of all industries are >0.05, which indicates that the edge distributions fitted in this study also obey uniform distribution after PIT.

PIT statistical results. (A) banking industry; (B) insurance industry; (C) securities industry; (D) trust industry. PIT, probability integral transform.

3.2

Empirical study of portfolio risk measurement based on HMM-mixed Copula function

The uniformly distributed variables obtained from the above PIT are calculated in the mathematical expressions of three single Gumbel Copula, Clayton Copula, Frank Copula and mixed Copula functions, which are Eqs 3–6. The parameters of the Gumbel Copula, Clayton Copula, Frank Copula, and mixed Copula functions are estimated, respectively, and the iteration results of parameters in different functions are shown in Figure 7. The expectation-maximisation (EM) algorithm is adopted for parameter iteration, and the parameters of different functions gradually become stable after 10 times of iteration.

Iteration results for different function parameters.

Therefore, the parameter estimates of the three single Copula functions are taken as the initial values of the parameters of the mixed Copula function, and the obtained parameter estimates of the single Copula and mixed Copula functions are shown in Table 4. According to the principle of maximum log-likelihood function and minimum AIC and Bayesian information criterion (BIC) [28], the log-likelihood function of the mixed Copula function is the largest and its AIC and BIC values are the smallest, which indicates the mixed Copula model constructed in this study has better effects compared with the single Copula model.

Table 4

Parameter estimates for single and mixed Copula functions.

Function	θ	w	Log-likelihood function	AIC	BIC

Gumbel Copula	1.711	—	1720.59	−3432.65	−3433.29
Clayton Copula	1.121	—	1593.28	−3087.24	−3177.52
Frank Copula	4.552	—	1633.74	−3006.88	−3014.87
Mixed Copula	1.538	0.439	1957.33	−3926.01	−3800.65
HMM-mixed Copula	—	—	2059.63	−4088.39	−4010.98

AIC, Akaike information criterion; BIC, Bayesian information criterion; HMM, hidden Markov model.

As shown in Figure 8, the relevant parameters in the HMM-based mixed Copula model gradually become stable after 50 iterations, and the parameters of the HMM-based mixed Copula model in the final state 1 are θ₁ = 1.208, θ₂ = 0.923, θ₃ = 8.926, w₁ = 0.505, w₂ = 0.482, w₃ = 0.025 and initial distribution π = 0.026. The parameters of the HMM-based mixed Copula model in the final state 2 are θ₁ = 2.153, θ₂ = 2.101, θ₃ = 18.220, w₁ = 0.553, w₂ = 0.339, w₃ = 0.087 and initial distribution π = 0.977.

Iteration results of parameters of different function. (A) state 1; (B) state 2.

As shown in Table 4, the log-likelihood function of the mixed Copula model based on HMM is 2059.63, AIC = −4088.39, and BIC = −4010.98. State 1 is a low dependent state, while state 2 is a high dependent state. After analysing the probability of Copula model based on HMM in these two states, the probability of state 2 is the highest (97.55%), which indicates that these four industries are in a highly dependent state. As shown in Table 4, the mixed Copula model based on HMM proposed in this study has the largest logarithmic likelihood function and the lowest AIC and BIC values, which indicates that the model effect of nesting mixed Copula functions into HMM is optimal, which is consistent with the research results of Chun et al. (2015) [29].

Subsequently, the dynamic transition diagrams of states 1 and 2 of the sample from 2015 to 2019 are obtained, as shown in Figure 9A. These two dependent states will change at a certain frequency, yet the evaluation of this transformation is not regular and mainly depends on the degree of interdependence between industries. Studies show that the increase in interdependence will lead to the occurrence of risk infection, and the probability of risk infection is very high during this period. As shown in the low dependence probability graph (Figure 9B) and the high dependence probability graph (Figure 9C), the probability graph is basically consistent with the trend of the dynamic transition graph of states. Therefore, when an industry is in state 2 (high dependence state), there is a very strong dependence relationship between industries, which may cause macro or systemic risks from the perspective of dependency and risk contagion [30].

Transition and state conditional probability of mixed Copula model based on HMM. (A) the dynamic transition graph of the state; (B) the probability graph of low dependence; and (C) the probability graph of high dependence.

The interdependence of industries under these two states is calculated, and the results are shown in Table 5. Under different states, the tail dependence between A (banking) and B (insurance) is the strongest, which indicates that they have the most sensitivity to the impact of risks, and are more susceptible to be impacted by the own risks of extreme events.

Table 5

Tail dependence of various industries in different states.

	State	A	B	C	D

A	1	—	0.443	0.431	0.112
A	2	—	0.223	0.210	0.153
B	1	0.293	—	0.311	0.241
B	2	0.286	—	0.132	0.112
C	1	0.257	0.210	—	0.413
	2	0.162	0.132	—	0.215
D	1	0.206	0.232	0.241	—
	2	0.103	0.134	0.213	—

Assuming that the investor invests according to the weight of [0.5, 0.5] and the investment amount is 1 RMB. Monte Carlo simulation is adopted to measure the VaR value of various fields for 5,000 times. The VaR values under 95%, 97.5% and 99% confidence intervals are, respectively, explored. The results show that when the confidence interval is 95%, the investment failure rate is 4.53%; when the confidence interval is 97.5%, the investment failure rate is 2.17%; when the confidence interval is 99%, the investment failure rate is 1.08%. All these indicate that the model constructed in this study can be used in the measurement of portfolio VaR and can provide good theoretical guidance for investors.

3.3

Stock price prediction based on deep learning Markov model

The Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519) and China Ping An Insurance (Sh601318) are taken as the research objects to explore the effect of deep learning Markov model in stock price forecasting. The number of neurons in the hidden layer (N), the coverage time of input data (t), the width of input data (d) and the training time (t) in RNN on the average prediction error of the model are compared, as shown in Figure 10. As shown in Figure 10A, with the increase in training time, the increase of input data coverage time will reduce the average prediction error of model prediction. When t = 5 and T = 16, the average prediction error is the smallest (2.89%). As shown in Figure 10B, with the increase in the number of neurons in the hidden layer of the model, the increase of input data width will also reduce the average prediction error of model prediction. When d = 5 and N = 60, the average prediction error is the smallest (2.89%). Therefore, t = 5, d = 5, T = 16 and N = 60 are selected for subsequent experiments.

Influence of different parameter values on average prediction error of the model. (A) The influence curve of training duration and input data coverage and (B) the influence curve between the number of neurons in the hidden layer and the width of input data.

Subsequently, it is applied to the stock price forecast of Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519) [31], and China Ping An Insurance (Sh601318), as shown in Figure 11. The variation trend of each stock price predicted by deep learning Markov model is basically consistent with the actual situation. Moreover, the errors of deep learning Markov model in stock price prediction of Shanghai Pudong Development Bank (sh600000), Guizhou Moutai (sh600519)V and China Ping An Insurance (Sh601318) are 2.56%, 2.98% [32] and 3.56%, respectively, which indicates that the deep learning Markov model proposed in this study can be used in the prediction of stock prices and has a high prediction accuracy, which is consistent with the research results of Tingwei et al. (2018) [33].

Stock price prediction results based on deep learning Markov model.

Conclusion

HMM is combined with the hybrid Copula model to analyse the dependence and risk contagion of the financial sub-industry. It is found that the dependence between financial sub-industries is strong, and has higher risk contagion in the state of high dependence. The deep learning Markov model applied in the stock price prediction is with high accuracy. However, only the deep learning Markov model is adopted to predict the stock prices, and further study should be carried out on the prediction of investment risks. In conclusion, the results of this study provide a theoretical basis for risk assessment among financial sub-industries and accurate financial decisions making.

eISSN:: 2444-8656
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Journal RSS Feed

Adoption of deep learning Markov model combined with copula function in portfolio risk measurement

Published Online: Dec 13, 2021

Page range: 901 - 916

Received: Jun 17, 2021

Accepted: Sep 24, 2021

DOI: https://doi.org/10.2478/amns.2021.2.00112

Keywords
mixed Copula model, HMM, financial index, risk contagion, investment failure rate

© 2021 Liangxiong Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Adoption of deep learning Markov model combined with copula function in portfolio risk measurement

Published Online: Dec 13, 2021

Page range: 901 - 916

Received: Jun 17, 2021

Accepted: Sep 24, 2021

DOI: https://doi.org/10.2478/amns.2021.2.00112

Keywordsmixed Copula model, HMM, financial index, risk contagion, investment failure rate

© 2021 Liangxiong Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Keywords
mixed Copula model, HMM, financial index, risk contagion, investment failure rate