A study of the dynamic portfolio adjustment model based on the Markov decision process

Accompanied by the rapid development of China's economy, the domestic fund market is rapidly moving towards prosperity, all types of funds, whether private or public, the number of funds began to explode in growth, to bring investors to choose the scope of investment is also increasingly large [1]. However, what should be seen is that, in such a huge fund market, the performance of various types of funds' performance is not uniform, some excellent funds achieve “bright” performance at the same time as the performance of some other funds but there is a serious decline in the crisis [2]. And, even for those excellent funds, their performance lacks sustainability, and the performance of some funds is even extremely volatile. How investors can seek the strategy that makes their investment with high return and low risk in the investment process is the primary concern of investors [3]. The existence of portfolio investment, so that investors invest in a variety of securities in the appropriate diversification of funds, will be different shares of funds invested in different securities, through a variety of securities risk offset each other, to achieve the purpose of reducing risk while ensuring its return.

Markov decision process is a rapidly developing science for studying sequential decision optimization problems for controlled stochastic dynamical systems [4]. Caccia considered the time volatility and sequential dependence of financial time series and solved the discrete-time mean-variance hedging problem when asset returns obey a multivariate autoregressive Hidden Markov model and showed how to achieve the optimal hedging strategy [5]. Liu et al. applied the Hidden Markov Model to the stock market and made forecasts. And discuss four different improvement methods of GMM-HMM, XGB-HMM, GMM-HMM+LSTM, and XGB-HMM+LSTM respectively with the experimental results and analyze the advantages and disadvantages of different models, the best one of which is used for stock market timing strategy [6]. The dynamic portfolio optimization problem is fundamental in finance and has been studied for more than 50 years. In the dynamic portfolio optimization problem, investors need to adjust the ratio of risky assets (e.g., stocks, etc.) to risk-free assets (e.g., bonds, etc.) at the right time to maximize their assets [7]. In traditional research, the portfolio optimization problem is first described as a stochastic control problem and then solved using dynamic programming principles and the Hamilton-Jacobi-Bellman equation (HJB equation) [8-9]. Methods in this category are often based on continuous-time models that allow for infinitesimally small amounts of transactions over infinitesimally small periods, which is often not feasible in practice [10].

To address the above issues, this paper is devoted to researching an innovative model of dynamic portfolio adjustment based on the Markov decision process, whose core purpose is to provide investors with optimized investment strategies that combine high returns with low risks. By skillfully integrating the Markov mechanism, this paper pioneers the construction of a dynamic asset allocation model, which can accurately determine the optimal weights of assets under different market states with the help of the conditional capital asset pricing model. Further, this paper constructs a state transfer matrix to scientifically describe the transfer probabilities between market states, so that investors can fully grasp the current market state and effectively predict the possible future states, thus realizing a more accurate and forward-looking asset allocation adjustment strategy.

2

Asset allocation based on Markov models

2.1

Markov process (mathematics)

A stochastic process with no posteriority is called a Markov process. By “a posteriority-free” we mean that the state of a stochastic process {X(t),t∈T} at the moment t₀ is known, and the state of the process {X(t),t∈T} after the moment t₀ is independent of the state before the moment t₀ [11]. The time in a Markov process can be continuous or discrete, and a discrete Markov process is called a Markov chain.

Let the state space Ω = {1,2,...,N} of some stochastic process {X(t),t=1,2,...} be a discrete state space, if for any i₁,i₂,...,i_m, j∈Ω and any m nonnegative integers n₁,n₂,.... n_m and natural numbers k > 1, satisfy the conditional probability.

1

P {X (n_{m} + k) = j ∣ X (n_{1}) = i_{1}, X (n_{2}) = i_{2}, \dots, X (n_{m}) = i_{m}} = P {X (n_{m} + k) = j ∣ X (n_{m}) = i_{m}}

(transfer probability) Note that the conditional probability at the right end of (1) is 2 $p_{i j}^{(k)} (n) = P {X (n + k) = j ∣ X (n) = i}$

Obviously: 3 $\begin{matrix} p_{i j}^{(k)} (n) \geq 0 & i, j = 1, 2, \dots N \end{matrix}$ 4 $\begin{matrix} \sum_{j = 1}^{N} p_{i j}^{(k)} (n) = 1 & i, j = 1, 2, \dots N \end{matrix}$

(Transfer matrix) Let the state transfer process of a certain system be a chi-square Markov chain [12-13], the state space of this system is Ω={1,2,...,N}, The state transfer probability is $p_{i j}^{(k)} = p_{i j}^{(k)} (n) \geq 0$ , The k-step state transfer matrix for this system is then: 5 $P^{(k)} = (\begin{matrix} p_{11}^{(k)} & p_{12}^{(k)} & \dots & p_{1 N}^{(k)} \\ p_{21}^{k} & p_{22}^{k} & \dots & p_{2 N}^{(k)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ p_{N 1}^{(k)} & p_{N 2}^{(k)} & \dots & p_{N N}^{(k)} \end{matrix})$

(Properties of transfer matrices) The k-step transfer probability matrix P^(k) has important properties concerning the one-step transfer probability matrix P [14].

6

P^{(k)} = p^{k}

If the limit of the state transfer probability p_ij^(k) of the Markov chain {X(t),t=1,2,...} exists and is only related to j, so: 7 $\begin{matrix} \lim_{k \to \infty} p_{i j}^{(k)} = p_{j} & i, j \in Ω \end{matrix}$

Then this Markov chain is said to be a Markov chain with ergodicity.

From the above analysis, it can be known that in a Markov chain, as long as the state transfer matrix P of the system and the state in which the system is at some initial moment t = 0 are known, the state in which the system is at any moment after that moment can be inferred [15], i.e., there is at any moment: 8 $x_{1} = P x_{0}$ 9 $x_{2} = P x_{1} = P (P x_{0}) = P^{2} x_{0} = P^{(2)} x_{0}$ 10 $x_{3} = P x_{2} = P (P^{(2)} x_{0}) = P^{(3)} x_{0}$ 11 $x_{k} = P x_{k - 1} = P^{(k - 1)} x_{0}$

Therefore, it is necessary to consider the problem of the trend of P^(k) when k is increasing.

(Smooth Distribution) Let x=(x₁, x₂,...,x_N)T be a nonzero column vector and P be the state transfer probability matrix. A vector x is said to be a j=l smooth distribution of a Markov chain if x= Px [16].

For a Markov chain that satisfies ergodicity, let the limit of its state transfer probability p_ij^(k) be 12 $\begin{matrix} \lim_{k \to \infty} p_{i j}^{(k)} = x_{j} & i, j \in Ω \end{matrix}$

Again, according to the constant equation for the state transfer matrix.

13

P^{k} = P \cdot P^{k - 1}

Taking the limit of k → ∞ at both ends of the above equation, we obtain: 14 $(\begin{array}{l} x_{1} \\ x_{2} \\ ⋮ \\ x_{N} \end{array}) = P (\begin{array}{l} x_{1} \\ x_{2} \\ ⋮ \\ x_{N} \end{array})$

That is, a Markov chain satisfying ergodicity whose limiting distribution is the set of equations: 15 $x_{i} = \sum_{j = 1}^{N} p_{i j} x_{j}, (i = 1, 2, \dots N)$

Satisfy the condition x_i ,(i = 1,2, … N), the only solution $\sum_{i = 1}^{N} x_{i} = 1$ [17].

2.2

Dynamic asset allocation based on Markov mechanism transformation

The Markov mechanism transformation model is utilized to identify the states of the stock market, which in turn leads to dynamic asset allocation based on the different states. In this paper, the CAPM model is utilized to determine the weights of various assets in a portfolio [18-19]. Firstly, the basic CAPM model is introduced: For security, the expression of the CAPM model is: 16 $E (r_{j}) = r_{f} + β_{j} [E (r_{m} - r_{f})]$ where r_f is the one-period risk-free rate and β_j is used to measure the systematic risk of the asset.

The traditional unconditional CAPM model does not consider changes in market conditions and treats β_j as a constant. In contrast, the conditional CAPM model takes into account the information of the economic environment and assumes that the β coefficient of an asset has a time-varying characteristic, i.e., β_j will change with the change of the economic environment. In this paper, the state of the stock market in different periods is used as the conditional information variable of the model, and the following model is obtained: 17 $E [(r_{i t} - r_{f t}) ∣ Z_{t}] = \frac{cov [r_{i t}, r_{m t} ∣ Z_{t}]}{var [r_{m t} ∣ Z_{t}]} E [(r_{m t} - r_{f t} ∣ Z_{t})]$ where Z_t represents the state that the stock market is in at moment t, i.e., the smoothed probability that the stock market index return is in state j at moment t.

The stock market is considered to be in a certain state at moment t when the probability that it is in that state at moment t is greater than 0.5. That is: 18 $p^{*} (S_{t} = j ∣ I_{r}; Θ) = {\begin{matrix} 1, & p (S_{t} = j ∣ I_{r}) > 0.5 \\ 0, & else \end{matrix}$

Using Z_t as the conditional information variable of the conditional CAPM model, i.e., the parameters of the CAPM model can be estimated for different stock market states.

In this paper, the mean-variance optimization method is used to calculate the asset allocation weights under three different stock market states.

In the first step, using the computed CAPM model, the residual series of each asset class, which represents the unsystematic risk of each asset class, are computed.

In the second step, using the residual series, the variance-covariance matrices are computed to obtain the variance-covariance matrices for different market states.

In the third step, the quadratic programming method in the programming software is utilized to compute the optimal weights w.

The idea of asset allocation is the minimum variance of the entire portfolio under the condition that the average return of the portfolio obtained, is not less than the average return of the broader market in the same period [20]. The equation is shown below: 19 ${\begin{matrix} \min & \frac{1}{2} w^{'} V_{S_{t}} w \\ s . t . \sum_{i = 1}^{k} w_{i} = 1 & 0 \leq w_{i} \leq 1 \\ w^{'} R \geq μ & w^{'} 1 = 1 \end{matrix}$

In summary, a dynamic asset allocation method based on the transformation of the Markov mechanism can be obtained: 20 ${\begin{matrix} y_{t} = μ_{S_{t}} + \sum_{i = 1}^{k} ϕ_{i} (y_{t - i} - μ_{S_{t - i}}) + ε_{t} \\ E (r_{j}) = r_{f} + β_{j} [E ({\hat{r}}_{m} - r_{f})] \\ \min {\frac{1}{2} w^{'} V_{S_{t}} w}, s . t . \sum_{i = 1}^{k} w_{i} = 1, w^{'} R \geq μ \end{matrix}$

Completely determining different asset allocation weights w.

2.3

Experimental analysis

In this paper, seven indicators commonly used to evaluate the investment performance of funds are selected for assessing the results of the experiment: annualized return, cumulative return, annualized volatility, Sharpe ratio, Karma ratio, maximum retracement, and Sortino ratio. The names and meanings of the indicators are shown in Table 1.

Table 1.

Indicator names and significance of indicators

Name of Indicator	Indicator Meaning
Annualized Return	Portfolio direct return per year
Cumulative Return	Portfolio return at the end of the trading phase
Annualized Volatility	A measure of the uncertainty of the rate of return
Sharpe Ratio	A classic indicator of the combination of return and risk
Karma Ratio	Describes the relationship between return and maximum retracement
Maximum Retracement	Describes the portfolio's largest post-drop scenario
Sortino Ratio	Describes whether downside risk can lead to higher excess returns

1)

Experimental group 1

Data from August 28, 2017, to January 15, 2020, were used for back testing, and performance was compared to portfolio trading strategies based on portfolio adjustments designed by previous studies. The experimental results are shown in Table 2 and Figure 1, which show that the returns of the improved portfolio adjustment strategy designed in this chapter are significantly better than in previous studies. As can be seen in Table 2, the annualized return of the improved portfolio adjustment strategy is 41.03%, which is higher than the annualized return of 21.00% of the control group, and is about twice as high as the experimental annualized return of the control group. The cumulative return of the portfolio adjustment strategy is 120.34%, which is higher than the cumulative return of the control group which is 54.96%, which is about 2.2 times the experimental annualized return of the control group. The Sharpe ratio of the portfolio adjustment strategy is 1.52, which is higher than the control group's Sharpe ratio of 1.09. In terms of maximum retracement, the portfolio adjustment of -18.70% is slightly better than the control group's maximum retracement of -18.98%. Therefore, the results indicate that the portfolio adjustment strategy proposed in this chapter, which introduces a risk diversification mechanism, outperforms the portfolio adjustment portfolio trading strategy designed in previous studies in terms of balancing risk and return.

Table 2.

Experimental results of Markov-based investment strategies-experimental group 1

Name of Indicator	Result
Annualized Return	41.03%
Cumulative Return	120.34%
Annualized Volatility	24.73%
Sharpe Ratio	1.52
Karma Ratio	2.19
Maximum Retracement	-18.70%
Sortino Ratio	2.53

2)

Experimental Group 2

The data from January 2019 to December 2020 are used for back testing and the performance is compared with the CSI 300 index. The results of the experiment are shown in Table 3 and Figure 2.

Table 3.

Experimental results of Markov-based investment strategies - Experimental Group 2

Name of Indicator	Result
Annualized Return	46.81%
Cumulative Return	93.62%
Annualized Volatility	21.76%
Sharpe Ratio	2.20
Karma Ratio	3.27
Maximum Retracement	-18.00%
Sortino Ratio	3.35

It can be seen that the return of the portfolio adjustment strategy based on stock prices is significantly better than the CSI 300 index. As can be seen from Table 3, the annualized return of the portfolio adjustment strategy is 46.81%, which is higher than the 37.75% of the CSI 300 index. The cumulative return of the portfolio adjustment strategy is 93.62%, higher than the CSI 300 index of 75.49%, and the Sharpe ratio of the portfolio adjustment strategy is 2.20, higher than the CSI 300 index of 1.35. In terms of maximum retracement, the portfolio adjustment of -18.00% slightly outperformed the maximum retracement of the CSI 300 Index of -19.16%. The portfolio adjustment strategy outperforms the CSI 300 index in terms of balancing risk and return. Therefore, the results indicate that the Markov-based portfolio adjustment strategy proposed in this chapter is a superior trading strategy to the CSI 300 index portfolio approach.

However, observing in Figure 2, in July 2020, the cumulative return of the CSI 300 index rose sharply, while the cumulative return of the portfolio adjustment strategy increased less than that of the CSI 300 index, which indicates that the strategy failed to seize the opportunity in time to improve the return by buying stocks with higher return increases when the market signaled a rapid rise in the intelligent body. In addition to price information, technical indicators should be added to the model to further improve the return of the strategy and seize the opportunity of the bull market.

According to Hamilton's method of definition, a period is considered to be in a state if the probability of being in that state is greater than 0.5. The smoothed probability plots for the three states are shown in Figure 3:

Based on the transfer probability matrix and the estimation results of the model, the prediction of the next period's return can be made. If moment t is in state 1, then the rate of return at moment t+1 is E₍₁₎ = μ₁P¹¹ + μ₂P¹² + μ₃P¹³, and the rest can be followed by analogy.

3

Portfolio adjustment based on MRS asset allocation modeling

3.1

Trading strategies for asset allocation models

Due to its complexity and vulnerability to external shocks, the financial market is characterized by a variety of states such as high volatility or low volatility, and the risks and returns of various assets in the market under different states are also subject to large changes. However, most of the current quantitative strategies do not consider the impact of the transformation of the market state on the return of assets, especially in the portfolio strategy, most of them do not introduce the market state variable, but it is clear that in different states, according to the theory of the optimal portfolio of assets will inevitably be different [21]. Therefore, it is necessary to effectively identify the market state and calculate the optimal asset portfolio approach for that state based on different market states. The core idea of the strategy is twofold: First, the MRS model is used to identify market state transitions as well as points in time. The second is to increase the value of the portfolio by capturing the opportunities of market state transitions.

In the previous section, this paper investigated the Markov state transition model, the mean-variance model, and the MRS-based asset allocation model, respectively, and found that based on the MRS model presenting the market state transfers in terms of transfer probability matrices, combined with the asset allocation model, this paper was able to obtain the optimal mean-variance, or optimal asset allocation weights, for different states [22].

Based on the above conclusions, this paper proposes to construct the following trading strategy: 1)

Selection of underlying assets, 10 groups of underlying to be selected, as well as a benchmark market and a risk-free rate for a total of 12 data sets, with data intervals in months;

2)

Import the data into the program modeling, and divide the data group into a training group and test group, through the rolling calculation to get the relevant parameters and weights;

3)

Based on the weighting ratios obtained, calculate the annualized returns, Sharpe ratios, and other metrics of the strategy and compare them with the returns of the benchmark and related markets to prove the effectiveness of the strategy.

3.2

Dynamic programming

Dynamic programming is a theory and method for solving optimization problems with multi-stage decision-making processes. It is a simple mathematical method that is now widely used in engineering, mathematics, and social sciences. The main concepts and ideas of dynamic programming are as follows:

A phase is the number of steps a problem requires to make a decision [23]. For example, in a multi-period portfolio choice, this paper wants to decide how many stages are the duration of the investment decision in this paper. For descriptive convenience, here in this paper, k is used to denote the number of stages included in the problem, with k varying from small to large.

The state is the situation faced at the beginning of a phase. It is a very critical parameter in dynamic programming, reflecting both the end of decisions made in previous phases and the starting point for decisions made in this phase. Here, in this paper, z is used to describe the state of each stage.

A decision is a choice made by the decision maker among several different scenarios faced by the decision maker at the beginning of a phase from a given state [24].

In this paper, we use D_k(x_k) to denote the range of values allowed for a decision when the state at stage k is x_k, and u_k(x_k) to denote the decision when the state at stage k is x_k, thus: 21 $u_{k} (x_{k}) \in D_{k} (x_{k})$

The sequence consisting of decisions in each phase of a dynamic programming problem becomes a strategy overall. The strategy for a dynamic programming problem containing T stages can be written as: 22 $u_{1} (x_{1}), u_{2} (x_{2}), \dots, u_{T} (x_{T})$

The sequence of decisions from the beginning of a stage to the end of the process becomes the sub-process or sub-strategy of the problem. The sub-strategies starting at stage k are: 23 $u_{k} (x_{k}), u_{k + 1} (x_{k + 1}), \dots, u_{n} (x_{n})$

From a certain state value of x_k, when the value of the decision variable u_k(x_k) is determined, the value of the state variable x_k+1 in the next stage is also determined [25]. This transfer law from a state value in the previous stage to a state value in the next stage becomes the state transfer law. The next stage state variable x_k+1 is a function of the decision variable u_k(x_k) of the previous stage: 24 $x_{k + 1} = T_{k} (x_{k}, u_{k} (x_{k}))$

The state transfer law also becomes the state transfer equation.

Indicator functions are categorized into stage indicator functions and process indicator functions. A stage-specific indicator function is a measure of the effectiveness of a decision that corresponds to the state of a stage and a stage of decision-making from that state and is denoted by v_k(x_k, u_k).

The indicator function of a process is the value of the benefit obtained by a predetermined criterion when a certain strategy is used, starting from the state x_k (k=1,2,...,n) to the end of the process [24]. This value is related both to the state value of x_k and to the strategy chosen later on x_k, and is denoted in this paper as: 25 $V_{k, T} (x_{k}, u_{k}, x_{k + 1}, u_{k + 1}, \dots, x_{T})$

The indicator function of a process is in turn a function of the indicator functions of the stages it contains. When the value is determined, the value of the indicator function is only related to the selected strategy. The so-called optimal indicator function refers to the value of the indicator function obtained after selecting the optimal strategy for a certain state, which is a certain benefit measure corresponding to a certain optimal sub-strategy. If we write the benefit of the optimal sub-strategy corresponding to starting from the state worker as f_k(x_k), then: 26 $f_{k} (x_{k}) = o p t M_{k, n}$ where opt denotes optimization.

A model solved by dynamic programming methods must have the following components:

the division of the stages; the

the state variable x_k for each stage; the

the decision variable u_k for each stage; the

the set of allowed decisions D_k(x_k).

state transfer equation: x_k+1 = T_k (x_k, u_k (x_k));

recurrence relation equation: f_k (x_k) = opt_{u_k∈D_k(x_k)}V_k,n (x_k,u_k,x_k+1,u_k+1,⋯,x_T);

the value of the boundary condition f_n+1(T_n+1).

In general, the key to dynamic programming methods lies in the correct writing of the basic recurrence relations and appropriate boundary conditions. To do this one must divide the process of the problem into several interrelated stages, appropriately select the state and decision variables, and define the optimal solution function, thus reducing a large problem into a family of sub-problems of the same type, and then solving them one by one. That is, starting from the boundary conditions, gradual recursive optimization, in the solution of each sub-problem, is used in its previous sub-problems of the optimization results, in turn, the last sub-problem obtained by the optimal solution, is the optimal solution of the whole problem.

In addition, in a multi-stage decision-making process, the dynamic programming method is an optimization method that separates the current stage from future stages and considers the current and future results together. Therefore, the selection of decisions in each stage is considered globally and is generally different from the optimal selection result in that stage. In solving the optimal strategy of the whole problem, because the initial state is known, and each stage of the decision is a function of the state of the stage, the optimal strategy through the various stages of the state can be gradually transformed to obtain, to determine the optimal solution.

4

Empirical analysis

4.1

Experimental setup and model construction

In this study, open-source market data is obtained from Yahoo Finance and simulated using daily adjusted closing price and volume data. The data from March 31, 2011, to December 31, 2022, is selected as the proposed data. To minimize the possibility of overfitting the data, the state transition model was trained and tested on data in the March 31, 2011 to December 31, 2015 range. Historical state-level market capitalization data were obtained from the Federal Reserve Bank of St. Louis' “Gross Stock Market Value to GDP Ratio” and “Gross Domestic Product Tables”.

According to the theoretical model, combined with the data situation, this paper will import the data into the program for modeling, the first step is to select the benchmark market and the risk-free interest rate, and according to the market to determine what state the current global market is in, and to calculate the different estimation parameters in different states, through the data modeling of this paper can be obtained by the estimation of the parameters P, Q (state transfer probability), as well as the μ₁^w, μ₂^w, σ₁^w, σ₂^w specific data As shown in the following table: Table 4.

Estimates of market parameters

	P	Q	μ₁^w	μ₂^w	σ₁^w	σ₂^w
Estimated value	0.98	0.98	1.11	0.07	3.51	6.80
Standard deviation	0.06	0.17	0.31	0.66	0.02	0.07

The above table shows the estimation results by the two-state Markov state transition model. From Table 4, it can be seen that the returns and volatility of the market portfolios are different for different states. State 1 corresponds to high return and low volatility, while state 2 corresponds to low return and high volatility. The implied economic implication is that when the market is in a state of high volatility, i.e., when there is a large vibration in the economy, the stock market returns are significantly lower. The volatility of State 2 is almost two times more than State 1. In state 1 the global excess return is expected to be 1.11% per month while its volatility is only 0.31, but in an unstable market such as state 2, the global excess return is expected to be only 0.07% per month and the volatility is 0.66, which is twice as much as in state 1.

4.2

Portfolio Construction and risk-return Estimation

Successful multi-period portfolio optimization relies on accurate risk and return estimates to generate trading decisions. Portfolios are constructed using ETFs from several U.S.-listed regions (all ETFs are denominated in U.S. dollars), as detailed in Figure 5. Figure 6 shows the performance of the selected ETFs from several regions from 2011-2017.

To instantiate the MRS-based portfolio model, it’s a priori weights and return assumptions used in the Bayesian approach are needed. To find the equilibrium weights (a priori weights), historical national market capitalization data is required. The market capitalization data used in this paper is derived from the St. Louis Federal Reserve's “Stock Market Capitalization as a Percentage of Gross Domestic Product” and “Gross Domestic Product” tables. Market capitalization in the table below is obtained by multiplying stock market capitalization by GDP and GDP. Equilibrium weights are defined as portfolio weights weighted by market capitalization.

Before generating risk and return estimates using MRS-based portfolio models, it is necessary to consider whether generating risk estimates, using MRS-based portfolio models, leads to superior results. In this paper, the efficient frontier, which is the set of portfolio compositions that can achieve the highest expected return for a given level of risk, is computed under both methods. No viewpoints are added to the portfolio model during the simulation process, and to make the results more dynamic, asset returns are used to calculate estimated returns and covariances after a process of exponential decay, and the trading period is chosen to be from 2016 to 2020. The hyperparameters γ^risk and γ^trade are chosen in the following combinations: 27 $γ^{r i s k} = 0.001, 0.01, 0.1, 1, 10, 100$ 28 $γ^{trade} = 1, 2, 3, 4, 5$

Total 30 backrests. The results, as shown in Figure 7, show that the Fama-French 5-factor (FF5) model performs better at lower levels of risk but with higher volatility, and the MRS portfolio-based model (without viewpoints) performs better at higher levels of risk, and that both methods yield excess returns that outperform the market returns at the same level of risk Overall since it is also necessary to incorporate potentially better-performing future viewpoints, it is reasonable to generate risk and return estimates using the MRS-based portfolio model.

4.3

Portfolio Performance Analysis

To verify the performance of the MRS portfolio model proposed in this paper on portfolio management, this section analyzes the performance of various experimental models using evaluation metrics, and the results of the back testing experiments of each model for Portfolio 1 and Portfolio 2 are shown in Figures 8 and 9, respectively, with all results retaining two decimal places by rounding up or down.

From Figures 8 and 9, it can be seen that in Portfolio 1, the MRS portfolio model proposed in this paper achieves optimal results in AR, Sharp Ratio, and Sortino Ratio, which are 8.33%, 0.18 and 0.26 higher than the PG model, respectively. In Portfolio 2, the MRS portfolio model proposed in this paper is 11.23%, 0.10 and 0.55 higher than the PG model in AR, Sharp Ratio and Sortino Ratio are higher than the PG model by 11.23%, 0.10, and 0.55 respectively, while the AR and Sortino Ratio metrics are also optimal among all models. The experimental results of the two portfolios show that the MRS portfolio model is stronger in terms of overall profitability and profitability per unit of downside risk taken. However, on the MDD metric, it is the UCRP strategy that achieves the optimal results, indicating that the risk in the UCRP strategy is relatively smaller.

Therefore, analysis of the experimental results shows that both data augmentation and partially observable Markov decision process modeling can enhance the returns of portfolio management, and the use of partially observable Markov decision process modeling brings more significant return enhancement than data augmentation, but data augmentation is more conducive to the reduction of the maximum retracement of portfolio management.

At the same time, it is noted that in the two portfolios of this paper, the above eight MRS-based portfolio models obtain better results than the URCP strategy and the CSI 300 Index in both metrics, AR and Sortino Ratio, further illustrating the effectiveness of the Markov Decision Process methodology in solving portfolio management problems.

To understand whether the weights of portfolio management using the MRS portfolio model are diversified or not, this paper visualizes the portfolio weights of the 20 assets in the Portfolio 1 back testing experiment, as shown in Figures 10 and 11. From the figures, it can be seen that the MRS portfolio model proposed in this paper does allocate funds to different assets, and the portfolio weights are constantly adjusted over time to adapt to the latest market situation. Moreover, it can be seen that there are some equity assets whose portfolio weights are adjusted more frequently, while the risk-free assets and some other equity assets are adjusted less frequently, which may be because this paper considers the transaction costs in the reward function of the MRS portfolio model, and frequent changes in the portfolio weights will bring more transaction costs. Therefore, the model in this paper chooses to keep the current weights unchanged most of the time for certain assets to reduce the transaction costs.

5

Conclusion

This paper proposes a dynamic portfolio adjustment strategy based on a Markov decision process, the core purpose of which is to design an investment program that can balance high return and low risk for investors. By integrating the Markov mechanism, this paper constructs a dynamic asset allocation model that accurately calculates the asset allocation ratio under different market conditions with the help of the conditional capital asset pricing model. A state transfer matrix is also built to characterize the transition probabilities between different market states. Based on the current known market state, the investor can predict the possible future market conditions and make more accurate and rational asset allocation adjustments accordingly. The conclusions are as follows.

1)

The annualized return of MRS's portfolio adjustment strategy was 46.81%, higher than the CSI 300 index's 37.75%. The cumulative return of the MRS portfolio adjustment strategy is 93.62%, which is higher than the CSI 300 index by 75.49%, and the Sharpe ratio of the MRS portfolio adjustment strategy is 2.20, which is higher than the CSI 300 index by 1.35. In terms of the maximum retracement, the maximum retracement of MRS portfolio adjustment is - 18.00%, which is slightly better than the maximum retracement of the CSI 300 index by - 19.16%.

2)

The Fama-French 5-factor (FF5) model performs better at lower levels of risk but is more volatile, the MRS portfolio-based model (no viewpoint) performs better at higher levels of risk, and both methods yield excess returns that outperform the market returns at the same level of risk Overall, with the addition of potentially outperforming viewpoints to be included in the future, it is reasonable to utilize the MRS-based portfolio model to generate risk and return estimates is reasonable.

3)

The MRS portfolio model proposed in this paper achieves optimal results in AR, Sharp Ratio, and Sortino Ratio, which are 8.33%, 0.18 and 0.26 higher than the PG model respectively. In portfolio 2, the MRS portfolio model proposed in this paper is 11.23%, 0.10 and 0.55 higher than the PG model respectively, while the AR and Sortino Ratio metrics are also the best among all models. PG model by 11.23%, 0.10 and 0.55, respectively, while the AR and Sortino Ratio metrics are the best among all models.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

A study of the dynamic portfolio adjustment model based on the Markov decision process

Jiazhuo Wang

Xiaohui Yang

Publicado en línea: 27 feb 2025

Recibido: 20 sept 2024

Aceptado: 21 ene 2025

DOI: https://doi.org/10.2478/amns-2025-0117

Palabras claveMarkov decision process, State transfer matrix, Transfer probability, Asset weights, Investment portfolio;

© 2025 Jiazhuo Wang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Markov decision process, State transfer matrix, Transfer probability, Asset weights, Investment portfolio;