Back propagation mathematical model for stock price prediction

Price prediction in equity markets is of great practical and theoretical interest. On the one hand, relatively accurate prediction brings maximum profit to investors. Many market participants, especially institutional ones, spend a lot of time and money to collect and analyse relevant information before making investment decisions. On the other hand, researchers often use the fact of whether or not the price can be forecast as a tool to check market efficiency. Also, they invent, apply or adjust different models to improve the predictive power. Finding a good method to forecast stock price more accurately will be a topic forever in both the academic field and the financial industry. Equity price prediction is regarded as a challenging task in the financial time series prediction process since the stock market is essentially dynamic, nonlinear, complicated, nonparametric and chaotic in nature [1]. Besides, many macro-economic environments, such as political events, company policies, general economic conditions, commodity price index, interest rates, investor expectations, institutional investors choices and psychological factors of investors, are also the influencing factors [2]. In this paper, we apply five artificial intelligence (AI) models in the predicting research. Among the AI models, the back propagation neural networks (BPNN), radial basis neural networks (RBFNN), general regression neural network (GRNN), support vector machine regression (SVMR) and least squares support vector machine regression (LS-SVMR) are the most widely used and mature methods. The BPNN is successfully used in many fields, such as engineering [3], power forecasting [4], time series forecasting [5], stock index forecasting [6] and stock price variation prediction [7]. BPNN is also useful in the economic field. Lu and Bai [8] proposed a hybrid forecasting model [Wavelet Denoising-based Back Propagation (BP)], which firstly decomposed the original data into multiple layers by wavelet transform, and then established BPNN model using the low-frequency signal of each layer for predicting the Shanghai Composite Index (SCI) closing price. The radial basis function neural network (RBFNN) is a feed forward neural network with a simple structure, which has a single hidden layer. Mller et al. [9] applied RBFNN as a tool for nonlinear pattern recognition to correct the estimation error in the prediction of linear models in predicting two stock series in Shanghai and Shenzhen stock exchanges. In Osuna's study [10], the author demonstrated RBFNN's effectiveness in financial time series forecasting. RBFNN is proposed to overcome the main drawback of BPNN – namely, that of easily falling into local minima in the training process. RBFNN have also been used in various forecasting areas and achieve good forecasting performance, with demonstrated advantages over BPNN in some applications [11]. The GRNN, which is put forward by Specht [12], shows its effectiveness in pattern recognition [13], stock price prediction [14] and groundwater level prediction [15]. Tan et al. [16] showed the forecasting ability of GRNN in the prediction of closing stock price. However, their research is characterised by a lack of comparison with other data mining models, which is also the limitation of other references cited in this paper. Support Vector Machine (SVM), first developed by Vapnik [17], is based on statistical learning theory. Owing to its successful performance in classification tasks [18] and regression tasks, especially in time series prediction and finance-related applications, SVM has drawn significant attention and thus earned intensive study. By using the structural risk minimisation principle to turn the solving process into a convex quadratic programming problem, the SVM obtains better generation performance; and moreover, the solution is unique and globally optimal. The LS-SVMR, based on structural risk minimisation principle, is able to approximate any nonlinear system. As a reformulation of the SVM algorithm, LS-SVMR overcomes the drawbacks of local optima and overfitting in the traditional machine learning algorithm. To the best of our knowledge, there is a dearth in the literature of studies that are focused on comparing the effectiveness of the above-mentioned five algorithms reviewed in this paper. In this study, we present this comparative view by using data to compare the performance of these five neural networks, namely BPNN, RBFNN, GRNN, SVM and LS-SVMR, in predicting stock price. This paper is organised as follows. Section 2 XXX. Section 3 XXX.

Methodology

2.1

BP neural networks

A neutral network generally contains one input layer, one or many hidden layers and one output layer. Supposing that the total number of layers is L, we use l to indicate a single layer. l = 1 corresponds to the input layer, l = 2,. . . , L-1 corresponds to the hidden layers and l = L corresponds to the output layer. For example, Figure 1 shows a neutral network containing only one hidden layer, which means L = 3. Each layer contains one or many neurons; in Figure 1, the input layer contains three neurons, the hidden layer contains four neurons and the output layer contains only one neuron.

Structure of a three-layer neural network

We use w^l_jk to denote the weight for the connection from the k-th neuron in the (l−1)-th layer to the j-th neuron in the l-th layer. For illustration, we list some weights on the arrows in Figure 1. According to our notation, $w_{11}^{2} = 2$ w_{11}^2 = 2 , $w_{43}^{2} = 4$ w_{43}^2 = 4 , $w_{11}^{3} = 2$ w_{11}^3 = 2 , $w_{12}^{3} = 3$ w_{12}^3 = 3 , $w_{13}^{3} = 4$ w_{13}^3 = 4 , $w_{14}^{3} = 5$ w_{14}^3 = 5 . Explicitly, we use b^l_j for the bias of the j-th neuron in the l-th layer. And we use a^l_j for the activation of the j-th neuron in the l-th layer. With these notations, the activation a^l_j of the j-th neuron in the l-th layer is related to the activations in the (l−1)th layer by the equation (1) $a_{j}^{l} = σ (\sum_{k} ω_{jk}^{l} a_{k}^{l - 1} + b_{j}^{l})$ a_j^l = \sigma \left({\sum\limits_k \omega_{jk}^la_k^{l - 1} + b_j^l} \right) where the sigmoid function is defined as $σ (z) = \frac{1}{1 - exp (- z)}$ \sigma (z) = {1 \over {1 - \exp \left({- z} \right)}} .

To rewrite this expression in a matrix form, we define a weight matrix w^l for each layer l. The entries of the weight matrix w^l are just the weights connecting to the l-th layer of neurons, that is, the entry in the j-th row and k-th column is w^l_jk. Similarly, for each layer l we define a bias vector b^l. The components of the bias vector are just the values b^l_j, one component for each neuron in the l-th layer. And finally, we define an activation vector a^l whose components are the activations a^l_j.

With these notations in mind, (2.1) can be rewritten in the elegant and compact vectorised form (2) $a^{l} = σ (w^{l} a^{l - 1} + b^{l})$ {a^l} = \sigma ({w^l}{a^{l - 1}} + {b^l})

Let z^l be the weighted input to the neurons in layer l, that is (3) $z^{l} = w^{l} a^{l - 1} + b^{l}$ {z^l} = {w^l}{a^{l - 1}} + {b^l}

The cost function is defined by the following quadratic form (4) $C = \frac{1}{2 n} \sum_{x} ∥ y (x) - a^{L} (x) ∥^{2} \underline{\underline{def}} \frac{1}{n} \sum_{x} C_{x}$ C = {1 \over {2n}}\sum\limits_x ||y(x) - {a^L}(x)|{|^2}\underline {\underline {\rm def}} {1 \over n}\sum\limits_x {C_x}

We recall that the Hadamard products ⊙ t between two vectors s, t with the same length is defined by (s ⊙ t)_j = s_jt_j. The intermediate error function is computed as $\begin{matrix} δ^{L} = \nabla_{a} C ⊙ σ^{'} (z^{L}) δ^{l} = ({(w^{l + 1})}^{T} δ^{l + 1}) ⊙ σ^{'} (z^{l}), l \geq 2 \\ \frac{\partial C}{\partial b_{j}^{l}} = δ_{j}^{l}, \frac{\partial C}{\partial w_{jk}^{l}} = a_{k}^{l - 1} δ_{j}^{l} . \frac{\partial C}{\partial b^{l}} = δ^{l}, \frac{\partial C}{\partial w^{l}} = a^{l - 1} \times δ^{l} . \end{matrix}$ \matrix{{{\delta^L} = {\nabla_a}C \odot {\sigma^{'}}\left({{z^L}} \right)\;\;{\delta^l} = \left({{{\left({{w^{l + 1}}} \right)}^T}{\delta^{l + 1}}} \right) \odot {\sigma^{'}}\left({{z^l}} \right),\;l \ge 2} \cr {{{\partial C} \over {\partial b_j^l}} = \delta_j^l,\quad {{\partial C} \over {\partial w_{jk}^l}} = a_k^{l - 1}\delta_j^l.\;{{\partial C} \over {\partial {b^l}}} = {\delta^l},\;{{\partial C} \over {\partial {w^l}}} = {a^{l - 1}} \times {\delta^l}.} \cr}

With learning rate, the weights are learned by $w_{jk}^{l} \to w_{jk}^{l} - η \frac{\partial C}{\partial w_{jk}^{l}}, b_{j}^{l} \to b_{j}^{l} - η \frac{\partial C}{\partial b_{j}^{l}} .$ w_{jk}^l \to w_{jk}^l - \eta {{\partial C} \over {\partial w_{jk}^l}},\quad b_j^l \to b_j^l - \eta {{\partial C} \over {\partial b_j^l}}.

Owing to the additivity of cost over sample, we can adopt the idea of stochastic gradient descent to speed up the learning. To make these ideas more precise, stochastic gradient descent works by randomly picking out a small number m of randomly chosen training inputs. We label those random training inputs X₁, X₂,. . . , X_m, and refer to them as a mini-batch. Provided the sample size m is large enough, we expect that the average value of the ∇C_{X_j} will be roughly equal to the average over all ∇C_x, that is (5) $\frac{\sum_{j = 1}^{m} \nabla C_{x}}{m} \approx \frac{\sum_{x} \nabla C_{x}}{n} = \nabla C$ {{\sum\limits_{j = 1}^m \nabla {C_x}} \over m} \approx {{\sum\limits_x \nabla {C_x}} \over n} = \nabla C

Supposing that w_k and b_l denote the weights and biases, respectively, in our neural network, then stochastic gradient descent works by picking out a randomly chosen mini-batch of training inputs, and training with those, we obtain (6) $w_{k} \to w_{k}^{'} = w_{k} - \frac{η}{m} \sum_{j = 1}^{m} \frac{\partial C_{X_{j}}}{\partial w_{k}}$ {w_k} \to w_k^{'} = {w_k} - {\eta \over m}\sum\limits_{j = 1}^m {{\partial {C_{{X_j}}}} \over {\partial {w_k}}} (7) $b_{l} \to b_{l}^{'} = b_{l} - \frac{η}{m} \sum_{j = 1}^{m} \frac{\partial C_{X_{j}}}{\partial b_{l}} .$ {b_l} \to b_l^{'} = {b_l} - {\eta \over m}\sum\limits_{j = 1}^m {{\partial {C_{{X_j}}}} \over {\partial {b_l}}}.

Here the sums are over all the training examples X_j in the current mini-batch.

2.2

Radial basis function (RBF) networks

RBF networks typically have three layers: an input layer, a hidden layer with a nonlinear RBF activation function and a linear output layer. Let us suppose that the hidden layer has I neurons, and the i-th neuron centres at c_i with its preferred value w_i. The input can be modelled as a vector of real numbers x ∈ ℝⁿ, and the output of the network is then a scalar function of the input vector φ: ℝⁿ → ℝ, given by (8) $φ (x) = \frac{\sum_{i = 1}^{I} w_{i} ρ (∥ x - c_{i} ∥^{2})}{\sum_{i = 1}^{I} ρ (∥ x - c_{i} ∥^{2})}$ \varphi (x) = {{\sum\nolimits_{i = 1}^I {w_i}\rho (||x - {c_i}|{|^2})} \over {\sum\nolimits_{i = 1}^I \rho (||x - {c_i}|{|^2})}} where ρ(‖x − c_i‖²) = exp(−β_i‖x − c_i‖²). It is emphasised that the output φ(x) derived based on the given input x is the weighted average of w_i with weight ρ(‖x−c_i‖²). The cost function is defined by the following quadratic form $C = \frac{1}{2 n} \sum_{x} ∥ y (x) - φ (x) ∥^{2} = \frac{1}{n} \sum_{x} C_{x}$ C = {1 \over {2n}}\sum\limits_x ||y(x) - \varphi (x)|{|^2} = {1 \over n}\sum\limits_x {C_x} .

The parameters w_i, c_i, β_i are selected by decreasing C.

2.3

General regression neural networks

GRNN essentially belongs to radial basis neural networks. GRNN was suggested by D.F. Specht in 1991. Recalling the framework of RBF network, we recollect that the number of neurons in the hidden layer is the same as the sample size n of training data. Moreover, the centre of the i-th neuron is just the i-th sample x_i, and the preferred value w_i is set to be the desired output y_i = y(x_i). Then the output of a new input x is (9) $φ (x) = \frac{\sum_{i = 1}^{I} y_{i} ρ (∥ x - x_{i} ∥^{2})}{\sum_{i = 1}^{I} ρ (∥ x - x_{i} ∥^{2})}$ \varphi (x) = {{\sum\limits_{i = 1}^I {y_i}\rho (||x - {x_i}|{|^2})} \over {\sum\limits_{i = 1}^I \rho (||x - {x_i}|{|^2})}} where ρ(‖x − c_i‖²) = exp(−β ‖x − c_i‖²). It may be emphasised that the output derived based on the input x is just the weighted average of y_i with weight ρ(‖x − x_i‖²). We remark that GRNN directly produces a predicted value without training process. One only needs to select a suitable smoothing parameter to implement GRNN.

If the number of neurons in the hidden layer stayed at the sample size n of training data on each prediction, we call it a static GRNN. Instead, as new observations come, we may increase the number of neurons. We call such a pattern as a dynamic GRNN. We choose a dynamic GRNN in the current paper since it has stronger prediction power. A dynamic GRNN has a long memory and updates in a timely manner.

2.4

Support vector regression

A version of SVM for regression (SVR) was proposed in 1996 by Vladimir N. The Vapnik et al. model aims to find a linear function of x to predict y, namely, (10) $f (x) = w \cdot x + b$ f(x) = w \cdot x + b

Let ε be error acceptance. Then one wants to find optimal w and b such that (11) $\sum_{i}^{n} | f (x_{i}) - y_{i} | \leq ε$ \sum\limits_i^n |f({x_i}) - {y_i}| \le \varepsilon

A modified problem is to solve (12) $min \frac{1}{2} ∥ w ∥^{2}, s . t . ∥ Xw + b I_{n} - Y ∥ \leq ε$ \min {1 \over 2}||w|{|^2},s.t.||Xw + b{I_n} - Y|| \le \varepsilon

Here Y = (y₁,⋯ , y_n)^T and X is a matrix with its transpose X^T = (x₁, x₂,⋯ ,x_n)^T, the i-th column of which is just x_i.

The linear predictor (2.5) cannot reveal a possible nonlinear relation between x and y, and thus manifests its limitation. For a general (thus possibly nonlinear) function ϕ: ℝ^m → ℝ^m, with m being the dimension of x, one may adopt the following predictor (13) $f (x) = w^{T} ϕ (x) + b$ f(x) = {w^T}\phi (x) + b

Different functions correspond to different kernel functions in application. The subsequent subsection provides information on more kernel functions.

2.5

Least squares SVM

For some function ϕ: ℝ^m → ℝ^m, let us suppose that we instead use f (x) = w^T ϕ(x) + b to predict y(x). Replacing the inequality constraint by an equality constraint, the least squares SVMR reads (14) $min \frac{1}{2} ∥ w ∥^{2} + \frac{γ}{2} ∥ ξ ∥^{2}, s . t . ϕ (X) w + b I_{n} - Y = ξ$ \min {1 \over 2}||w|{|^2} + {\gamma \over 2}||\xi |{|^2},s.t.\phi (X)w + b{I_n} - Y = \xi

Here ϕ(X) = (ϕ(x₁),⋯ ,ϕ(x_n))^T. Compared to SVM, LS-SVM requires less computation cost.

The solution of LS-SVM regression will be obtained after we construct the Lagrangian function: (15) $L (w, b, ξ, λ) = \frac{1}{2} ∥ w ∥^{2} + \frac{γ}{2} ∥ ξ ∥^{2} + λ^{T} (ϕ (X) w + b I_{n} - Y - ξ)$ L(w,b,\xi,\lambda) = {1 \over 2}||w|{|^2} + {\gamma \over 2}||\xi |{|^2} + {\lambda^T}\left({\phi (X)w + b{I_n} - Y - \xi} \right) (16) ${\begin{array}{l} \frac{\partial L}{\partial w} = 0 \Rightarrow w = ϕ {(X)}^{T} λ = \sum_{i}^{n} λ_{i} ϕ (x_{i}); \\ \frac{\partial L}{\partial b} = 0 \Rightarrow I_{n}^{T} λ = \sum_{i}^{n} λ_{i} = 0; \\ \frac{\partial L}{\partial ξ} = 0 \Rightarrow λ = γ ξ; \\ \frac{\partial L}{\partial λ} = 0 \Rightarrow ϕ (X) ω + b I_{n} - y = ξ . \end{array}$ \left\{{\matrix{{{{\partial L} \over {\partial w}} = 0 \Rightarrow w = \phi {{(X)}^T}\lambda = \sum\limits_i^n {\lambda_i}\phi ({x_i});} \hfill \cr {{{\partial L} \over {\partial b}} = 0 \Rightarrow I_n^T\lambda = \sum\limits_i^n {\lambda_i} = 0;} \hfill \cr {{{\partial L} \over {\partial \xi}} = 0 \Rightarrow \lambda = \gamma \xi ;} \hfill \cr {{{\partial L} \over {\partial \lambda}} = 0 \Rightarrow \phi (X)\omega + b{I_n} - y = \xi.} \hfill \cr}} \right. (17) $(\begin{matrix} 0 & I_{n}^{T} \\ I_{n} & Ω + γ^{- 1} I_{nn} \end{matrix}) (\begin{matrix} b \\ λ \end{matrix}) = (\begin{matrix} 0 \\ Y \end{matrix})$ \left({\matrix{0 & {I_n^T} \cr {{I_n}} & {\Omega + {\gamma^{- 1}}{I_{nn}}} \cr}} \right)\left({\matrix{b \cr \lambda \cr}} \right) = \left({\matrix{0 \cr Y \cr}} \right) where I_nn is the n × n identity matrix and Ω is a n × n matrix defined by Ω_ij = ϕ(x_i)^T ϕ(x_j) = K(x_i, x_j). Once the previous equation is solved, the predicted value f (x) for input x is given by $f (x) = λ^{T} ϕ (X) ϕ (x) + b = \sum_{i}^{n} λ_{i} K (x_{i}, x) + b$ f(x) = {\lambda^T}\phi (X)\phi (x) + b = \sum\limits_i^n {\lambda_i}K\left({{x_i},x} \right) + b .

The kernel function K(x_i, x_j) has many forms:

Linear kernel: $K (x_{i}, x_{j}) = x_{i}^{T} x_{j}$ K({x_i},{\kern 1pt} {x_j}) = x_i^T{x_j} ,

Polynomial kernel of degree d: $K (x_{i}, x_{j}) = {(1 + x_{i}^{T} x_{j} / c)}^{d}$ K({x_i},{\kern 1pt} {x_j}) = {\left({1 + x_i^T{x_j}/c} \right)^d} ,

RBF kernel: K(x_i, x_j) = exp (−‖x_i − x_j‖²/σ²),

MLP kernel: $K (x_{i}, x_{j}) = tanh (k x_{i}^{T} x_{j} + θ)$ K({x_i},{\kern 1pt} {x_j}) = \tanh \left({kx_i^T{x_j} + \theta} \right) ,

where d, c, σ, k, θ are constants. We remark that the linear kernel corresponds to the linear function ϕ(x) = x. The most commonly used kernel is the RBF kernel.

Data and analysis

3.1

Data description

In this work, we study the weekly adjusted closing price of three individual stocks: Bank of China (601988), Vanke A (000002) and Guizhou Maotai (600519). Each price data has a sample size of 427, ranging from 3 January 2006 to 11 March 2018. As usual, we split the whole data set into a training set (80%) and a test set (20%).

We intentionally select the three stocks based on such an observation: they are totally different in price scale. As shown in Table 1, the price of Bank of China is about within 2–5 RMB, Vanke A (000002) is approximately in the range 5–40 RMB and Guizhou Maotai has a wide range of 80–800 RMB. Actually, Guizhou Maotai ranks first in terms of price per share among all stocks listed in the only two stock exchanges of Mainland of China: Shanghai and Shenzhen.

Table 1

Price range

Name	Bank of China	Vanke A	Guizhou Maotai

Lowest price	2.00	5.65	81.13
Highest Price	5.01	40.04	788.42

Let us use {S_i}_1≤i≤427 to denote the time series of price. We use three previous periods to predict the price of the next period. More precisely, we set x_i = (S_i, S_i₊₁, S_i₊₂) and y_i = S_i₊₃ for 1 ≤ i ≤ 424. Then we regard (x_i, y_i) as one sample. That is, for an input x_i, its desired output is y_i. It may be emphasised that we use weekly data, and thus the information contained in the price is assumed to be effective within 1 month.

3.2

Hyper-parameters

We adopt a neural network with three layers, which contains only one hidden layer. The input layer has three neurons, and the output layer has only one neuron which represents the predicted value. To determine the number of neurons in the hidden layer, by rule of thumb, we apply the following formula $m = \sqrt{0.43 ln + 0.12 l^{2} + 2.54 n + 0.77 l + 0.35} + 0.51,$ m = \sqrt {0.43\ln + 0.12{l^2} + 2.54n + 0.77l + 0.35} + 0.51, where l is the number of neurons in the output layer and n is the number of neurons in the input layer. With l = 1, n = 3, we get m = 3 after rounding to integer. The learning rate η = 0.01 is chosen after amounts of tests.

For the implementation of RBF, SVMR and LS-SVMR, we use standard R packages. When applying GRNN, we choose β = 20, 0.5, 0.0005 for Bank of China, Vanke A and Guizhou Maotai, respectively, which are representative of the different price scales of the three stocks.

3.3

Results

Table 2 shows the performance of the five neural network models. From these results, we can see all the five models have some predictive power. Even the worst one, GRNN, has MAPE not exceeding 5%, which is very satisfactory considering we are forecasting stock price rather than volatility.

Table 2

Results of the five methods

Method		BP	RBF	GRNN	SVMR	LS-SVMR

Bank of China	MSE	0.009	0.014	0.02	0.012	0.018
Bank of China	MAPE	0.019	0.025	0.024	0.023	0.028

Vanke A	MSE	2.976	4.686	6.036	3.422	5.472
Vanke A	MAPE	0.049	0.065	0.067	0.059	0.072

Guizhou Maotai	MSE	395.1	740.1	1103.6	407.4	405.5
Guizhou Maotai	MAPE	0.026	0.036	0.048	0.029	0.027

BP, back propagation; GRNN, general regression neural network;

LS-SVMR, least squares support vector machine regression;

RBF, radial basis function; SVMR, support vector machine regression.

Across all the three stocks, in terms of both MSE and MAPE, BP neural network outperforms the other four models. One may refer to Figure 2 in the next subsection for a more intuitive view of the accuracy of prediction for Bank of China using the BP method. SVMR ranks second consistently across the three stocks. However, in terms of both MSE and MAPE, results from SVMR are greater than that of BP by at least 10%. Moreover, on the prediction of Bank of China and Vanke A, BP surpasses SVMR by at least 20% under both criteria.

We cannot tell which one among RBF and LS-SVMR is better. As shown in Table 2, on the prediction of Bank of China and Vanke A, RBF is more accurate than LS-SVMR, while on the prediction of Guizhou Maotai, LS-SVMR has a better performance. Overall, they share a similar accuracy level of prediction. Finally, GRNN behaves the worst consistently across the three stocks.

One reason we could guess for the superior performance of BP over the other methods is that the latter four models all involve the mostly used kernel function: exp (−|x|²). To check whether this guess is correct, we use other kernels to apply the second-best model: SVMR. Table 3 gives the results for the above-mentioned four kernels.

Table 3

Comparison of four kernels in SVMR

Kernel		Linear	Polynomial	Sigmoid	RBF

Bank of China	MSE	0.01	0.01	0.011	0.012
Bank of China	MAPE	0.019	0.02	0.021	0.023

Vanke A	MSE	2.993	3.292	3.515	3.422
Vanke A	MAPE	0.05	0.053	0.055	0.059

Guizhou Maotai	MSE	395.6	403.5	405.7	407.4
Guizhou Maotai	MAPE	0.027	0.028	0.028	0.029

RBF, radial basis function; SVMR, support vector machine regression.

We have two remarks on Table 3. On the one hand, linear kernel is the best in this prediction task, and consistently outperforms the other three kernels. Although RBF kernel is the default kernel in many packages due its flexibility to various data resources, it is not good enough here. Thus, we should try other kernels to make comparison when doing similar predicting projects. On the other hand, BP stills surpasses SVMR with linear kernel, even if the advantage is not obvious now. They share a similar prediction error, possibly resultant to the fact they both involve weighted average, which captures some linear relation in the network.

3.4

Two more discussions: stability of BP and market inefficiency

When implementing BP algorithm, it needs to initialise the weights randomly, which causes instability of the result. To show that the result of BP is stable, we train the neural network for 100 times, and compute its mean and standard deviation. Table 4 helps us eradiate this concern since the standard deviations are extremely small compared to the scale of their corresponding mean values. In other words, the result of every experiment is reliable.

Table 4

100 times experiment

Statistics		Mean	Std.

Bank of China	MSE	0.009	4.8×10⁻⁵
Bank of China	MAPE	0.019	0.0001

Vanke A	MSE	2.976	0.0067
Vanke A	MAPE	0.049	0.0001

Guizhou Maotai	MSE	395.1	1.3728
Guizhou Maotai	MAPE	0.026	7.7×10⁻⁵

Figure 2 plots the observed and predicted prices of Bank of China. It can be seen clearly that the predicted values fit the observed ones well. Also, the turning points are forecasted quite timely. When there is a trend in the actual price, the predicted value follows accordingly and closely.

At a first glance, the network needs at least one period to react or assimilate new information. Actually, it is a false appearance that the predicted values lag one period of the observed values. More precisely, supposing that y_t is the observed price, ŷ_t the predicted price, then it seems y_t ≈ ŷ_t from Figure 2, which means the best predicted value is just the price of the previous period. Alternatively, the stock price process is Markovian. If such a phenomenon is true, then the market is efficient and thus unpredictable.

To prove that the market is actually inefficient, we take difference e_t = y_t − ŷ_t₊₁ and plot the difference series e in Figure 3. It may be emphasised that in Figure 3, the error is not cantered at 0. Actually, it has a bias towards negative values. In other words, on average, y_t < ŷ_t₊₁, which contradicts market efficiency.

Conclusion

In this work, we have successfully demonstrated that the five neural network models are all able to effectively extract meaningful information from past price. With evidence from the forecast accuracy of three unrelated stocks, we find that BP surpasses the other four models consistently and robustly. Also, by implementing the algorithm many times and checking the standard deviation, the stability of BP is observed and confirmed. Based on our trial on different kernels, we advise readers of the current paper not take the default kernel for granted and ‘descrisized’ other kernels. For our own interest, we test the error series and destroy the market efficiency hypothesis. In our future research, we will investigate other more involved neural networks to complete the current tentative work.

eISSN:: 2444-8656
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Journal RSS Feed

Back propagation mathematical model for stock price prediction

Published Online: Dec 30, 2021

Page range: 165 - 174

Received: Jun 17, 2021

Accepted: Sep 24, 2021

DOI: https://doi.org/10.2478/amns.2021.2.00144

Keywords
back propagation, stock price prediction, radial basis function, support vector machine regression, least squares support vector machine regression

© 2021 Yanran Ma et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Back propagation mathematical model for stock price prediction

Published Online: Dec 30, 2021

Page range: 165 - 174

Received: Jun 17, 2021

Accepted: Sep 24, 2021

DOI: https://doi.org/10.2478/amns.2021.2.00144

Keywordsback propagation, stock price prediction, radial basis function, support vector machine regression, least squares support vector machine regression

© 2021 Yanran Ma et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Keywords
back propagation, stock price prediction, radial basis function, support vector machine regression, least squares support vector machine regression