1. bookAHEAD OF PRINT
Journal Details
License
Format
Journal
eISSN
2444-8656
First Published
01 Jan 2016
Publication timeframe
2 times per year
Languages
English
Open Access

Prediction and Analysis of ChiNext Stock Price Based on Linear and Non-linear Composite Model

Published Online: 15 Jul 2022
Volume & Issue: AHEAD OF PRINT
Page range: -
Received: 07 Feb 2022
Accepted: 09 Apr 2022
Journal Details
License
Format
Journal
eISSN
2444-8656
First Published
01 Jan 2016
Publication timeframe
2 times per year
Languages
English
Introduction

After more than two decades of development, the Chinese capital market has become increasingly mature, and stock investment has become an important part of people's daily lives. As the stock price is affected by many factors, its changing law has complex nonlinearity and randomness [1]. Therefore, establishing an effective stock price prediction method has extremely important theoretical significance and application value.

Domestic and foreign scholars have studied and discussed stock price forecasting from different perspectives and put forward various forecasting methods. At the earliest, scholars used autoregressive models, moving average autoregressive models, differential autoregressive moving average models, regression analysis methods, gray forecasting methods, and Markov chains to solve stock price prediction problems. However, due to violent price fluctuations and high noise, it exhibits complex nonlinearity and uncertainty. These linear prediction methods are difficult to reveal their inherent laws. Later, scholars proposed autoregressive conditional heteroscedasticity model (ARCH), generalized autoregressive conditional heteroscedasticity model (GARCH), artificial neural network (ANN), fuzzy neural network (FNN), BP neural network, RBF neural network, and support vector reversion Machine (SVR), and so on [2]. Currently, most of the single methods are used for stock price forecasting. Still, a single forecasting method cannot cover comprehensive and effective information about stock prices with complex and nonlinear stock price data. Therefore, the accuracy of the forecast results is relatively low. Aiming at the difficulty of a single model to fully utilize practical information, some scholars have proposed a stock price prediction method (CAR-SVM) that combines support vector machines (SVM) and autoregressive (CAR) and verified the effectiveness of the method through a case. However, the accuracy of combined prediction often depends heavily on the prediction accuracy of a single model, and the higher correlation between the prediction results of a single model will significantly reduce the combined prediction accuracy. In addition, issues such as the selection of a single model and the determination of combined weights will also affect the prediction accuracy to a large extent.

This article attempts to introduce error correction methods into the stock price forecasting research. This article proposes a new idea of improving the accuracy of stock price prediction by error prediction correction [3]. Firstly, the generalized autoregressive conditional heteroscedasticity forecasting model (GARCH) is used to make a preliminary forecast of stock prices. Then, a regression model is introduced to analyze and fit the unexplained part of the GARCH residual sequence and predict the future residuals. Finally, the error forecast value is used to correct the preliminary forecast value of the stock price to obtain the corrected forecast value [4]. Among them, correlation analysis is used to screen relevant influencing factors when selecting regression model variables. We introduce principal component analysis to extract the main information to reduce the dimensionality of independent variables. The analysis of the Shanghai Composite Index sample data shows that the corrected prediction accuracy has been significantly improved, which verifies the model's effectiveness.

Comprehensive modeling
Model introduction

The GARCH model is also the generalized autoregressive conditional heteroscedasticity model. It was obtained by Bollerslev, who extended some of the constraints of the ARCH model. Since the GARCH model was produced, it has been widely used in the financial field [5]. If the random variable yt can be expressed as the following form: yt=a0+a1xt1+a2xt2++apxtp+utσt2=ϖ0+ϖ1ut12+ϖ2ut22++ϖqutq2 \matrix{ {{y_t} = {a_0} + {a_1}{x_{t - 1}} + {a_2}{x_{t - 2}} + \cdots + {a_p}{x_{t - p}} + {u_t}} \hfill \cr {\sigma _t^2 = {\varpi _0} + {\varpi _1}u_{t - 1}^2 + {\varpi _2}u_{t - 2}^2 + \cdots + {\varpi _q}u_{t - q}^2} \hfill \cr } Where σt2 \sigma _t^2 is the conditional variance, then it is said that ut obeys the q order ARCH process. Adding the lag term of the residual square to formula (1) constitutes a generalized autoregressive conditionally heteroscedastic model (GARCH): σt2=β0+β1ut12+β2ut22++βqutq2+χ1σt12++χpσtp2 \sigma _t^2 = {\beta _0} + {\beta _1}u_{t - 1}^2 + {\beta _2}u_{t - 2}^2 + \cdots + {\beta _q}u_{t - q}^2 + {\chi _1}\sigma _{t - 1}^2 + \cdots + {\chi _p}\sigma _{t - p}^2 β1ut12+β2ut22++βqutq2 {\beta _1}u_{t - 1}^2 + {\beta _2}u_{t - 2}^2 + \cdots + {\beta _q}u_{t - q}^2 is called the ARCH item. χ1σt12++χpσtp2 {\chi _1}\sigma _{t - 1}^2 + \cdots + {\chi _p}\sigma _{t - p}^2 is called the GARCH term. The condition that the coefficient should satisfy is β0 > 0, β1βq ≥ 0, χ1χp ≥ 0.

Modeling principle

First, use the GARCH model to predict the future stock price. Due to the limitations of the GARCH model, the predicted residual sequence must contain unexplained components. Next, establish a regression model to simulate the residual sequence and apply the regression model to future residual sequence prediction [6]. Finally, the preliminary prediction results of GARCH are corrected with this predicted value. We eliminate redundant information between variables and apply correlation between variables and principal component analysis to screen the independent variables of the regression model.

Modeling steps

Modeling in this paper includes three processes: (1) Using historical data GARCH modeling to fit and predict stock price sequence ŷt. (2) Regression modeling to fit and predict the residual sequence ε^t {\hat \varepsilon _t} of GARCH. (3) Comprehensive modeling. Correcting the prediction result of GARC IV with the regression prediction value y^t \hat y_t^\prime The adjusted initial prediction value is: y^t=y^t+ε^t \hat y_t^\prime = {\hat y_t} + {\hat \varepsilon _t}

The specific modeling and prediction process of the method proposed in this paper is shown in Figure 1.

Figure 1

GARCH model comprehensive modeling flow chart

Example analysis
Data source and preprocessing

This article selects the closing price of each trading day of the Shanghai Composite Index from October 09, 2016, to October 20, 2020, as a sample to analyze the latest development and changes of the Shanghai Composite Index. ŷt, t = 1, 2, ⋯, 17 total of 984 valid sample data. First, use the sample data to construct a sweet ruler CIV model and predict the stock price of the following 17 trading days. We get A 7. Then use the prediction error ɛt, t = 1, 2, ⋯, 12 to build an error prediction model. And perform error prediction for the next 5 errors to get ε^t,t=1,2,,5 {\hat \varepsilon _t},\,t = 1,2, \cdots ,5 . Use formula (3) to get the final predicted value. The closing index results from the game between the parties to the transaction that day [7]. Suppose the daily closing price is kt and the logarithmic return is CFt. Since the GARCH model is more suitable for good time series, the closing price sequence kt of each trading day needs to be processed accordingly to make it a relatively stable profitable sequence. The specific conversion formula is as follows: CFt=ln(ktkt1) C{F_t} = \ln \left( {{{{k_t}} \over {{k_{t - 1}}}}} \right)

Model parameter selection

CFt sequence descriptive statistics.

N 984
Mean −0.00091
Std Deviation 0.019474
Maximum 0.1255
Minimum −0.0842
Skewness 0.188468
Kurtosis 6.830535
Jarque-Bera 607.4183
Probability 0

It can be seen from Table 1 that the kurtosis value of the time series of log returns CFt is 6.830535. Significantly greater than the kurtosis value 3 of the normal distribution. The skewness value is 0.188468. The daily statistics of J-B are 607.4183. The P-value is close to zero. It can be seen that the time series does not obey the normal distribution and has the characteristics of “spikes and thick tails.”

It can be seen from Table 2 that the AD test statistic of CFt prediction error series ɛt is −28.79395. It is less than the critical value −3.436782 at the 1% significance level, and of course, it is also less than the other two critical values. Therefore, the time series ɛt can be regarded as a stationary time series [8]. Further, test the autocorrelation and partial autocorrelation of the data series. At the same time, combined with the principle of minimum A and C, it is concluded that ARMA(2,2) has the best fitting effect. Based on the relevant results of this model, observing the square trend chart of the residual of the daily logarithmic return of the Shanghai stock index, it is found that the residual is fluctuating and clustering. Preliminary judgment may have the ARCH effect. ARCH-LM test is performed on the square of residuals. The test statistic LM = TR2 = 54.32103, F is measured as 5.688537. The probability values P are all zero. The results show that the ARC daily effect in the residual is very significant. The residual error of the ARMA(2,2) model has autoregressive conditional heteroscedasticity.

Stationarity test performance table.

variable ADF statistics Significance level (critical value)
Lnɛi −28.79395 (1%)–3.436782
(5%)–2.864269
(10%)–2.568275

Figure 2

The square of the residual plot

The result of parameter estimation of the GARCH model is given. And we write the estimated results of the parameters into the GARCH regression equation [9]. Mean value equation: CFt=0.0005300.520544CFt1+0.212419CFt1+0.539072Cut10.222265Cut2 \matrix{ {C{F_t} = - 0.000530 - 0.520544C{F_{t - 1}} + 0.212419C{F_{t - 1}}} \hfill \cr { + 0.539072C{u_{t - 1}} - 0.222265C{u_{t - 2}}} \hfill \cr }

Variance equation: σt2=2.40E06+0.047477σt12+0.946233σt12 \sigma _t^2 = 2.40E - 06 + 0.047477\sigma _{t - 1}^2 + 0.946233\sigma _{t - 1}^2

The coefficients ARCH(1) (i.e. β1) and GARCC(1) (i.e. χ1) in the equation are all positive. It satisfies the constraints of the parameters and β1 + χ1 = 0.99371 <1 [10]. Estimate the residuals of the 1st order lag, 10th order lag, and 20th order lag in the equation. Neither the F statistic nor the Obs * Rsquared d statistic is significant. This shows that there is no additional ARCH effect in the standard residuals, and it again shows that the estimation of the variance equation is accurate. We use the above CARCH model to make preliminary predictions, and the prediction results ŷt are shown in the following table.

Screening and processing of regression variables

We take the opening price, highest price, lowest price, transaction volume, and transaction value of the Shanghai Stock Exchange Index at the corresponding time and process the data as in (4). Denoted by X1, X2, X3, X4, X5 respectively. Then analyze its correlation with the proper error sequence. It turns out that only X1, X2, and X3, have a certain correlation with the fitting error [11]. Then use principal component analysis to extract the information in influencing factors and reduce the dimensionality of independent variables. The variance contribution of independent variables and principal components is shown in Figure 3. The results show that the first two principal components account for 97% of the variability, while the first principal component occupies nearly 80%. Here, the first and second principal components are selected as the final independent variables of the regression model and denoted by Z1, and Z2 respectively.

Figure 3

Variance contribution graph of independent variables and principal components

Table 4 reflects the principal component loading. The first component is positively correlated with the respective variables [12]. The second principal component is negatively correlated with the independent variable X1 and independent variable X3, and positively correlated with the independent variable X2. Model the GARCH residuals accordingly. It can be seen from Table 3 that the available residual information (that is, the dependent variable ε^t {\hat \varepsilon _t} of the regression equation) is the data of 12 days from October 21, 2020, to November 07, 2020.

GARCH forecast results.

Date Actual value GARCH forecast (ŷt) Error/%
2020/10/21 2317.275 2328.8085 0.50%
2020/10/24 2370.333 2317.3251 −2.24%
2020/10/25 2409.672 2368.1104 −1.72%
2020/10/26 2427.48 2409.1882 −0.75%
2020/10/27 2435.614 2425.5667 −0.41%
2020/10/28 2473.41 2434.8203 −1.56%
2020/10/31 2468.25 2,471.70 0.14%
2020/11/1 2470.019 2467.2555 −0.11%
2020/11/2 2,504.11 2468.4613 −1.42%
2020/11/3 2508.09 2502.9791 −0.20%
2020/11/4 2528.294 2506.6032 −0.86%
2020/11/7 2509.799 2,527.08 0.69%
2020/11/8 2503.836 2508.3703 0.18%
2020/11/9 2524.919 2502.5857 −0.88%
2020/11/10 2,479.54 2523.5189 1.77%
2020/11/11 2,481.08 2478.2691 −0.11%
2020/11/14 2528.714 2479.7309 −1.94%

Principal component loading matrix.

Independent variable 1st principal component 2nd principal component 3rd principal component
X1 0.617 −0.3035 −0.7261
X2 0.5031 0.8616 0.0674
X3 0.6052 −0.4069 0.6842

We use the principal component loading matrix to construct the first component and the second principal component, and then use EVIEWS 7.0 to establish the regression equation based on these 12 sets of data as: ε^t=0.0036320.278330Z1t0.985463Z2t(1.673414)(2.704689)(4.020866) \matrix{ {{{\hat \varepsilon }_t} = - 0.003632 - 0.278330Z{1_t} - 0.985463Z{2_t}} \hfill \cr {\left( { - 1.673414} \right)\left( { - 2.704689} \right)\left( { - 4.020866} \right)} \hfill \cr }

The t statistic of the estimated value of the corresponding coefficient is in parentheses. The t statistic of the principal component regression coefficient in the regression equation is very significant [13]. And the corresponding probability value is Prob. < 0.05 5. The R2 = 0.7197, R¯2=0.6396 {\bar R^2} = 0.6396 of the regression equation shows that the fitting effect of the regression equation is better.

We put the results of the independent variables of the principal component analysis (the first principal component and the second principal component) into the above equation to predict the possible value of the GARCIV residual from November 8, 2020, to November 14, 2020.

Analysis of calibration results

Correct the GARCH prediction according to the predicted residual value to get the final predicted value. It can be seen from Table 5 that the corrected error is significantly reduced [14]. We can judge that the regression model has a certain contribution to improving the accuracy of stock price prediction modeling.

Regression equation residual prediction results.

Date Actual value GARCH residuals/% Regression prediction residual/% After correction/%
2020/11/8 2503.836 0.18 0.34 −0.16
2020/11/9 2524.919 −0.88 −0.23 −0.65
2020/11/10 2479.536 1.77 0.63 114
2020/11/11 2481.084 −0.32 −0.11 −0.21
2020/11/14 2528.714 −1.94 −1.01 −0.93
MAPE/% 0.64 0.35
MSPE/% 0.38 0.2

From the calculation results in the above table, it can be seen that the corrected average absolute percentage error (MAPE) and mean square percentage error (MSPE) are 0.35% and 0.20%. They are respectively lower than 0.64% and 0.38% predicted by GARCH before correction. The results show that regression prediction is very accurate for residual prediction, which significantly reduces the error of the GARCH model and makes up for the lack of consideration of external factors in the GARCH model. Therefore, the comprehensive model can accurately grasp the volatility of stock prices and the influence of external factors on stock prices. The prediction accuracy is high.

Conclusion

This paper introduces the idea of error correction into the study of stock price forecasting to establish a stock price forecasting model that integrates GARCH and regression methods. Firstly, a generalized autoregressive conditional heteroscedasticity prediction model is established. Secondly, an error regression prediction model is constructed using the error sequence as a sample, and the subsequent points of the sequence are predicted. Finally, the error forecast value is used to correct the preliminary forecast value of the stock price to obtain the corrected stock price forecast value. The analysis of the Shanghai Composite Index sample data shows that the corrected prediction accuracy has been significantly improved, which verifies the model's effectiveness.

Figure 1

GARCH model comprehensive modeling flow chart
GARCH model comprehensive modeling flow chart

Figure 2

The square of the residual plot
The square of the residual plot

Figure 3

Variance contribution graph of independent variables and principal components
Variance contribution graph of independent variables and principal components

GARCH forecast results.

Date Actual value GARCH forecast (ŷt) Error/%
2020/10/21 2317.275 2328.8085 0.50%
2020/10/24 2370.333 2317.3251 −2.24%
2020/10/25 2409.672 2368.1104 −1.72%
2020/10/26 2427.48 2409.1882 −0.75%
2020/10/27 2435.614 2425.5667 −0.41%
2020/10/28 2473.41 2434.8203 −1.56%
2020/10/31 2468.25 2,471.70 0.14%
2020/11/1 2470.019 2467.2555 −0.11%
2020/11/2 2,504.11 2468.4613 −1.42%
2020/11/3 2508.09 2502.9791 −0.20%
2020/11/4 2528.294 2506.6032 −0.86%
2020/11/7 2509.799 2,527.08 0.69%
2020/11/8 2503.836 2508.3703 0.18%
2020/11/9 2524.919 2502.5857 −0.88%
2020/11/10 2,479.54 2523.5189 1.77%
2020/11/11 2,481.08 2478.2691 −0.11%
2020/11/14 2528.714 2479.7309 −1.94%

Principal component loading matrix.

Independent variable 1st principal component 2nd principal component 3rd principal component
X1 0.617 −0.3035 −0.7261
X2 0.5031 0.8616 0.0674
X3 0.6052 −0.4069 0.6842

Stationarity test performance table.

variable ADF statistics Significance level (critical value)
Lnɛi −28.79395 (1%)–3.436782
(5%)–2.864269
(10%)–2.568275

CFt sequence descriptive statistics.

N 984
Mean −0.00091
Std Deviation 0.019474
Maximum 0.1255
Minimum −0.0842
Skewness 0.188468
Kurtosis 6.830535
Jarque-Bera 607.4183
Probability 0

Regression equation residual prediction results.

Date Actual value GARCH residuals/% Regression prediction residual/% After correction/%
2020/11/8 2503.836 0.18 0.34 −0.16
2020/11/9 2524.919 −0.88 −0.23 −0.65
2020/11/10 2479.536 1.77 0.63 114
2020/11/11 2481.084 −0.32 −0.11 −0.21
2020/11/14 2528.714 −1.94 −1.01 −0.93
MAPE/% 0.64 0.35
MSPE/% 0.38 0.2

Hong, H., Xu, S., & Lee, C. C. Investor herding in the China stock market: an examination of CHINEXT. Rom J Econ Forecast., 2020; 23(4): 47–61 HongH. XuS. LeeC. C. Investor herding in the China stock market: an examination of CHINEXT Rom J Econ Forecast 2020 23 4 47 61 Search in Google Scholar

Houssein, E. H., Dirar, M., Hussain, K., & Mohamed, W. M. Assess deep learning models for Egyptian exchange prediction using nonlinear artificial neural networks. Neural Computing and Applications., 2021; 33(11): 5965–5987 HousseinE. H. DirarM. HussainK. MohamedW. M. Assess deep learning models for Egyptian exchange prediction using nonlinear artificial neural networks Neural Computing and Applications 2021 33 11 5965 5987 10.1007/s00521-020-05374-9 Search in Google Scholar

Mallikarjuna, M., & Rao, R. P. Evaluation of forecasting methods from selected stock market returns. Financial Innovation., 2019; 5(1): 1–16 MallikarjunaM. RaoR. P. Evaluation of forecasting methods from selected stock market returns Financial Innovation 2019 5 1 1 16 10.1186/s40854-019-0157-x Search in Google Scholar

Gururaj, V., Shriya, V. R., & Ashwini, K. Stock market prediction using linear regression and support vector machines. Int J Appl Eng Res., 2019; 14(8): 1931–1934 GururajV. ShriyaV. R. AshwiniK. Stock market prediction using linear regression and support vector machines Int J Appl Eng Res 2019 14 8 1931 1934 Search in Google Scholar

Hoque, M. E., & Zaidi, M. A. S. Global and country-specific geopolitical risk uncertainty and stock return of fragile emerging economies. Borsa Istanbul Review., 2020;20(3): 197–213 HoqueM. E. ZaidiM. A. S. Global and country-specific geopolitical risk uncertainty and stock return of fragile emerging economies Borsa Istanbul Review 2020 20 3 197 213 10.1016/j.bir.2020.05.001 Search in Google Scholar

Uddin, M. A., Hoque, M. E., & Ali, M. H. International economic policy uncertainty and stock market returns of Bangladesh: evidence from linear and nonlinear model. Quantitative Finance and Economics., 2020;4(2): 236–251 UddinM. A. HoqueM. E. AliM. H. International economic policy uncertainty and stock market returns of Bangladesh: evidence from linear and nonlinear model Quantitative Finance and Economics 2020 4 2 236 251 10.3934/QFE.2020011 Search in Google Scholar

Yuanyuan, C., Rui, W., Bin, Z. & Griffith, W. Temporal association rules discovery algorithm based on improved index tree. Applied Mathematics and Nonlinear Sciences., 2021; 6(1): 115–128 YuanyuanC. RuiW. BinZ. GriffithW. Temporal association rules discovery algorithm based on improved index tree Applied Mathematics and Nonlinear Sciences 2021 6 1 115 128 10.2478/amns.2021.1.00016 Search in Google Scholar

Shirakol, S., Kalyanshetti, M. & Hosamani, S. QSPR Analysis of certain Distance Based Topological Indices. Applied Mathematics and Nonlinear Sciences., 2019;4(2): 371–386 ShirakolS. KalyanshettiM. HosamaniS. QSPR Analysis of certain Distance Based Topological Indices Applied Mathematics and Nonlinear Sciences 2019 4 2 371 386 10.2478/AMNS.2019.2.00032 Search in Google Scholar

Adekoya, O. B. Persistence and efficiency of OECD stock markets: linear and nonlinear fractional integration approaches. Empirical Economics., 2021; 61(3): 1415–1433 AdekoyaO. B. Persistence and efficiency of OECD stock markets: linear and nonlinear fractional integration approaches Empirical Economics 2021 61 3 1415 1433 10.1007/s00181-020-01913-4 Search in Google Scholar

Wu, W., Lee, C. C., Xing, W., & Ho, S. J. The impact of the COVID-19 outbreak on Chinese-listed tourism stocks. Financial Innovation., 2021; 7(1): 1–18 WuW. LeeC. C. XingW. HoS. J. The impact of the COVID-19 outbreak on Chinese-listed tourism stocks Financial Innovation 2021 7 1 1 18 10.1186/s40854-021-00240-6801711735024277 Search in Google Scholar

Senarathne, C. W. The information flow interpretation of margin debt value data: Evidence from New York Stock Exchange. Applied Economics Journal., 2019; 26(1): 45–70 SenarathneC. W. The information flow interpretation of margin debt value data: Evidence from New York Stock Exchange Applied Economics Journal 2019 26 1 45 70 Search in Google Scholar

Senarathne, C. W., & Jianguo, W. The relationship between human behavior and stock returns in efficient stock markets: the mood-effect under a cultural perspective. Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu., 2019; 63(2): 207–220 SenarathneC. W. JianguoW. The relationship between human behavior and stock returns in efficient stock markets: the mood-effect under a cultural perspective Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu 2019 63 2 207 220 10.15611/pn.2019.2.18 Search in Google Scholar

Pang, X., Zhou, Y., Wang, P., Lin, W., & Chang, V. An innovative neural network approach for stock market prediction. The Journal of Supercomputing., 2020; 76(3): 2098–2118 PangX. ZhouY. WangP. LinW. ChangV. An innovative neural network approach for stock market prediction The Journal of Supercomputing 2020 76 3 2098 2118 10.1007/s11227-017-2228-y Search in Google Scholar

Nikou, M., Mansourfar, G., & Bagherzadeh, J. Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms. Intelligent Systems in Accounting, Finance and Management., 2019; 26(4): 164–174 NikouM. MansourfarG. BagherzadehJ. Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms Intelligent Systems in Accounting, Finance and Management 2019 26 4 164 174 10.1002/isaf.1459 Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo