The last years witnessed a huge growth of the machine learning popularity and its quick development. The newly established algorithms were used to solve many difficult problems from various fields of science and to produce solutions facilitating many areas of life. Therefore, the application of such methods to improve the process of strategy adjustment seemed to be a natural choice.
The main aim of this study was to formulate and analyse the machine learning methods, fitted to the strategy parameters’ optimization specificity. The most important problems are the sensitivity of a strategy performance to little parameter changes and numerous local extrema distributed over the solution space in an irregular way. The methods were designed for the purpose of significant shortening of the computation time, without a substantial loss of a strategy quality. The efficiency of methods was compared for three different pairs of assets in case of moving averages crossover system. Considered algorithms – the Extended Hill Climbing, Grid Method and Differential Evolution Method are based on the well-known machine learning methods or intuitive ideas based on observation of previous steps in order to improve the next ones.
The machine learning methods, discussed in this paper were designed to select the strategy parameters in order to maximize strategy performance, measured by the specified optimization criterion. The methods operated on the in-sample data, containing 16 years of daily prices, and their results were verified on 4 years of out-of-sample data. In the first case, a strategy was trading on the
The major hypothesis verified in this paper is that results of the machine learning methods are the same or only slightly worse than the ones near the highest evaluation criterion, obtained by the Exhaustive Search (brute force approach), but the time required for their execution is significantly lower than computation time of checking all the points from the solution space. The additional research question is that the strategies obtained by the machine learning methods are associated with a lower risk of overfitting than the strategies resulted from the Exhaustive Search procedure.
The distributions of optimization criteria and the computation time of 1000 executions of different methods were compared and presented along with the Exhaustive Search results. The adjustment quality was assessed on in-sample data and additional out of sample data in order to test the overfitting tendency. Let us emphasise that the purpose of this paper is not to design the most profitable strategy, but to compare the efficiency of different machine learning methods and the Exhaustive Search (brute force). Tests in the out-of-sample period were performed to assess the overfitting problem. The simulations for different sets of assets was executed in the same framework implemented for the purpose of this research.
The basic machine learning methods have serious disadvantages. For instance, the well-known Hill Climbing returns the local extremum, without guarantee of reaching the global one. That algorithm is inadequate for the global search problem, but it could be used as a main component of more complex and efficient methods of global optimization.
Since the machine learning methods proved their value, by solving plenty of complicated problems, hence, it was reasonable to expect the satisfying results of such methods used for the strategy optimization. The initial intuition was that the machine learning methods would return the results a bit worse than the optimal one, but in disproportionately shorter time, than checking all the possibilities in order to get the best ones (the Exhaustive Search).
Moreover, it was expected that the machine learning methods were less likely to overfit strategy than the Exhaustive Search. The discussed methods were based on an assumption that conditional expected value of the optimization criterion is usually higher for the points surrounded by those with high value of this criterion. Therefore, the low regularity of the solution space could be a real obstacle for the methods’ performance. There was no reason to assume even a moderate level of the space regularity, so the machine learning methods probably could not find the optimal points, if they were not in the high-valued neighbourhood. That property could lead to reducing overfitting risk, because usually, the parameter vector surrounded by those with similar strategy performance have a bigger chance to be profitable in the future, than those from a less stable place.
The structure of this paper is composed as follows. The second chapter contains the literature review. In the third part, machine learning methods used in this paper are explained, as well as the trading assumptions and basic terms. The fourth chapter is devoted to data description, when the fifth contains efficiency tests of considered machine learning methods, with special focus on the optimization criterion and computation time distributions . The summary of results and conclusions are included in the last part.
The machine learning methods have been developed for decades, even before that term was coined in the fifties (Samuel, 1959). Nevertheless, the increased interest in that field was observed in recent years due to the technical possibility to apply the artificial intelligence in the various fields of science and life. The phenomenon of learning from the computational viewpoint was discussed by Valiant (1984). The human’s natural ability to learn and adapt was presented in terms of the information’s selection and automatic adjustment process, resulting in the algorithm’s modifications.
This approach is followed by plenty of the modern machine learning methods and it is close to the general ideas of the classic statistical modelling, where including new dataset leads to changes in the model properties. The traditional statistical and econometric models usually assume that data is produced by the stochastic process from the specified class. The fitting procedure is aimed at finding the process accurate to actual data when the machine learning methods are often based on the iterated improvements without specified model form. The differences between these two approaches called data models and algorithmic models respectively, are widely discussed in Breiman (2001). The field of machine learning contains plenty of various algorithms and methods, used to solve a wide range of problems. Some methods have strong mathematical foundations, for instance, methods based on Markov Chain Monte Carlo (Neal, 1993), when others, such as the Hill Climbing or evolutionary methods, are based on heuristic approach (Juels and Wattenbergy, 1994). The commonly used methods and algorithms with application in scientific problems are discussed by Hastie
The algorithmic strategies are widely used in the financial markets, but most of them are not discussed in papers, due to exclusive character. Nevertheless, some types of the quantitative strategies are widely known, and therefore, discussed in books and papers. The strategy based on the technical analysis indicators, such as the simple moving average crossover method considered in this paper is analysed for specified cases in Gunasekarage and Power (2001). Since machine learning methods have started to gain popularity, as a tool to solve problems in various fields, numerous attempts to use it for trading strategies occurred. Beyond the commercial usage, many academic papers describing strategies, with logic based on a machine learning have been published. For example group of researchers at Sanković
The more recent research was conducted by Ritter (2017), who used Q-learning with the proper rewarding function to handle the risk-averse case and tested strategy in the simulated trading. Dunis and Nathani (2007) presented the quantitative strategies, based on the neural networks such as the Multilayer Perceptron (MLP), Higher Order Neural Networks (HONN) and on the K-Nearest Neighbours method. The authors proved that methods can be effectively used for generating excess returns from trading on gold and silver. The comparison between the performance of machine learning methods and the linear models of ARMA type not only lead to construct a better strategy but additionally showed the presence of nonlinearities in the considered time series.
The application of the machine learning methods in order to predict future prices nowadays becomes more and more popular. Shen, Jiang and Zhang (2012) presented the forecasting model for stock market indexes, based on Support Vector Machines, and tested the trading system based on the produced predictions. Similar approach was followed by Choundhry and Kumkum (2008), where they introduced the hybrid machine learning system, combining support vector machines with genetic algorithm in order to predict the stock prices. The machine learning methods were used for predicting by Patel
Differential evolution, which is one of the methods considered in this paper, was designed by Storn and Price (1997) and discussed in further papers, such as Price
All the statistics used in the optimization criterion were determined by the equity line and could be easily calculated based on it. Therefore, the calculations of net profits and losses (PnLs) were the most complex component of the strategy evaluation procedure. The system was based on the technology called
where:
and
The main problem was to find the best investment strategy in the specified class of strategies, following the two simple moving averages crossover approach. The behaviour of each strategy was fully determined by a vector of parameters from four-dimensional space Ψ, each standing for different moving average window width. More specifically, every strategy from Ψ was parametrized by a vector
where:
Optimization criterion was based on the typical descriptive statistics used by traders – the annualized returns (ARC), the annualized standard deviation (ASD) and maximum drawdown (MDD). The criterion was determined by the following formula:
The construction of the optimization criterion
Conditions on the financial markets were different during the tested time period, from
The available strategies were fully determined by four parameters, standing for moving averages widths. Consequently, the strategy was optimized on the parameters (solution) space Ψ composed of vectors of four numbers from the set {1, 5, 10, ... , 100} (i.e., Ψ = {1, 5, 10, ... , 100}4).
The problem of selecting the best parameters of a trading strategy could be parametrized and reformulated in terms of optimization. The optimization criterion (
Solution space (Ψ) is discrete; thus, the application of algorithms, based on the steps of decreasing size, was strictly limited. Moreover, the function being optimized had no simple analytical formula. In consequence, there was no way to apply popular gradient-based methods. The
Additionally, the performance of automatic strategies is usually sensitive to the parameters’ change; therefore, even subtle difference could severely affect the results. In consequence, one can expect multiple local extrema scattered over the parameter space and big differences in criterion value of the points near each other. High sensitivity of the optimization criterion (objective function) to the parameters was crucial for the machine learning efficiency and led to the selection of more complex methods, adjusted to the problem specificity. Although the optimization criterion was unstable, some level of regularity was necessary for machine learning methods to work. Machine learning algorithms selected the points (candidate solutions), surrounded by other with the high criterion value, which could positively affect results’ stability and reduce overfitting risk.
The machine learning methods presented in this paper are based on well-known concepts. The main effort was to design methods based on these algorithms, but able to run on an atypical problem, hard to be solved by the basic ones. Although the presented methods could result in lower overfitting risk, the paper was focused on the improvement of the strategy selection procedure in terms of time, and hence no features aimed at reducing the overfitting risk would be discussed.
The basic
where
The number of walks required to get satisfying results is stochastic. Thus, declaring the fixed number could result in low stability of results – the difference between optimization criteria obtained in the independent optimizations could be significant. The extended method set the number of walks in a dynamic way, dependent on the efficiency of previous walks. The algorithm starts twice as many new walks, if the previous set of walks improved the optimization criterion, which suggests that there is still a possibility to improve results. That solution guarantees the higher results’ stability, at the expense of the time stability. The time required for execution could be much higher, when the method starts in a different starting point, but on the other hand, ‘bad’ starting points should not affect the final results. In this paper, the initial number of walks is equal to 10.
Metaparameters:
Set iterationsN umber = initialIterationsNumber / k.
Set bestValue = -Inf.
Set bestPoint = NULL.
While TRUE {
Set iterationsNumber = Round(iterationsNumber * k).
Set bestValuePackage = -Inf.
Set bestPointPackage = NULL.
// Current walks set (package) of size equal to iterationsNumber.
For j = 1 to iterationsNumber {
Draw starting point x from uniform distribution over the parameters space.
Set bestValueWalk = optimizedFunction(x).
Set bestPointWalk = x.
Set step = initialStep.
// Single walk.
While TRUE {
Set previousBestValueWalk = bestValueWalk.
// Checking neighbours.
For every parameter space dimension i {
Set neighbourUp = x + currentStep * unitVector_i.
Set currentFunctionValue = optimizedFunction(neighbourUp).
If optimizedFunction(neighbourUp) > currentMaxValue and
neighbourUp is element of parameters space. {
Set bestValueWalk = currentFunctionValue.
Set bestPointWalk = neighbourUp.
Break for loop.
}
Set neighbourDown = x + currentStep * unitVector_i.
Set currentFunctionValue = optimizedFunction(neighbourDown).
If currentFunctionValue > currentMaxValue and neighbourDown is element of parameters space. {
Set bestValueWalk = currentFunctionValue.
Set bestPointWalk = neighbourUp.
Break for loop.
}
}
// Change stepsize if no better point found.
If bestValueWalk == previousBestValueWalk
Set step = RoundDown(step / k).
If step == 0
Stop walk by breaking current while loop.
}
// Update best value and point in package if needed.
If bestValueWalk > bestValuePackage {
Set bestValuePackage = bestValueWalk.
Set bestPointPackage = bestPointWalk.
}
}
// Stop algorithm if no improvement in the previous package.
If bestValuePackage > bestValue {
Set bestValue = bestValuePackage.
Set bestPoint = bestPointPackage
} else {
Break while loop and return bestPoint and bestValue.
}
}
The second machine learning method, called
The search could be improved by setting different meta-parameters, such as the number of starting points or the interspace between parameters in the initial grid. There is a natural trade-off between the method’s accuracy and the computation time due to the fact that the computation time was approximately proportional to the number of evaluations. Setting the meta-parameters allows to balance between method precision and time in an easy and effective way. Another big advantage is the deterministic nature of the method. There is no uncertainty about the method’s results or computation time, which could be observed for random methods, such as the
Throughout the paper, the number of starting points is always equal to
Algorithm is provided for parameter space mapped into {0,1, . . . , N}k for the sake of simplicity. Therefore,
Set bestValues = {-Inf, -Inf, ..., -Inf} as vector of size numberOfGrids.
Set bestPoints = {NA, ..., NA} as list containing dim vectors
of size numberOfSubgrids.
// Check points from the initial grid.
Set gridSize = parameterSpaceWidth / firstInterspace + 1.
For j1 = 0 to gridSize - 1 {
For j2 = 0 to gridSize - 1 {
...
For jdim = 0 to gridSize - 1 {
// Check point from a current grid.
Set currentPoint = firstInterspace *
(j1 * unitVector_1 + j2 * unitVector_2 + ... +
jdim * unitVector_dim).
Set currentValue = optimizedFunction(currentPoint).
If currentValue > bestValues[numberOfSubgrids] and currentPoint is element of parameters space {
// Overwrite the lowest value from bestPoints and sort.
Set bestValues[numberOfSubgrids] = currentValue.
Set bestPoints[numberOfSubgrids] = currentPoint.
Sort descending bestValues.
Permute bestPoints accordingly.
}
}
...
}
}
// Check subgrids centered at the bestPoints.
Set interspace = initialInterspace.
While interspace >= minimumInterspace { For i = 1 to numberOfSubgrids {
Set center = bestPoints[i].
For j1 = -(gridSize - 1) / 2 to (gridSize - 1) / 2 {
For j2 = -(gridSize - 1) / 2 to (gridSize - 1) / 2 {
...
For jdim = -(gridSize - 1) / 2 to (gridSize - 1) / 2 {
// Check point from a current grid.
Set currentPoint = center + interspace *
(j1 * unitVector_1 + j2 * unitVector_2 + ... +
jdim * unitVector_dim).
Set currentValue = optimizedFunction(currentPoint).
If currentValue > bestValues[i] and currentPoint is element of parameters space {
Set bestValues[i] = currentValue.
Set bestPoints[i] = currentPoint.
}
}
...
}
}
}
Set interspace = RoundUp(interspace / k).
}
Return Max(bestValues) and corresponding point from bestPoints.
The R implementation of the procedure
The differential evolution operates on the continuous spaces of real numbers, therefore, it is inadequate for the selection of integer parameters. However, the discrete space can be intrapolated on the continuous one by several methods. The optimization criterion function
The extended function ÕC simply returns the value of OC for the rounded values of parameters with additional assumption that parameters equal to 0 are changed to 1.
The strategies selected by different methods were analysed and compared with the optimal strategy, maximizing the optimization criterion in the in-sample period. The optimal strategy was found in every case by the Exhaustive Search (brute-force) algorithm checking all possible combinations of parameters in order to select one with the highest criterion value. Following this approach always leads to get the highest possible criterion value, but it requires plenty of time. The main purpose of using machine learning methods instead of the Exhaustive Search was to get significantly lower computation time without the loss of quality of results. Therefore, the difference in computation time reflects the value of information learned in previous steps for further search procedure efficiency. Moreover, the Exhaustive Search will be treated as a benchmark due to its simplicity, intuitive character and widespread use.
The main goal of machine learning methods was to find the parameters’ vector
Every considered portfolio was composed of two securities of the same kind. The first pair contained the futures contracts on two important and highly correlated market indexes – American
The machine learning methods searched for strategy optimal in the in-sample period from the beginning of 1998 to the end of 2013. Strategies were validated on the out-of-sample data from the beginning of 2014 to the end of 2017. All strategies operated on daily data, taking position each trading day. The length of in-sample period was big enough to make sure that the different market trends were included for all-time series. On the other hand, the out-of-sample length allowed to properly validate strategies and assess overfitting level.
The most rapid growth of value was observed for
The returns of
The descriptive statistics of the considered assets
In-sample | Out-of-sample | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SPX | DAX | AAPL | MSFT | HGF | CBF | SPX | DAX | AAPL | MSFT | HGF | CBF | |
3.92 | 4.79 | 35.22 | 6.32 | 9.41 | 12.14 | 9.67 | 8.07 | 22.68 | 25.70 | -0.62 | -11.01 | |
20.39 | 24.97 | 46.69 | 33.06 | 28.48 | 34.54 | 11.94 | 18.37 | 22.27 | 21.43 | 19.25 | 33.07 | |
0.19 | 0.19 | 0.75 | 0.19 | 0.33 | 0.35 | 0.81 | 0.44 | 1.02 | 1.2 | -0.03 | -0.33 | |
56.78 | 72.68 | 43.80 | 71.65 | 68.37 | 73.48 | 14.16 | 29.27 | 30.45 | 18.05 | 42.47 | 75.83 |
%ARC - annualized rate of return (%), %ASD - annualized standard deviation (%), %MDD - maximum drawdown of capital (%), IR - information ratio calculated as %ARC / %ASD, OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD), SPX - S&P500 Index, DAX - Deutscher Aktienindex, AAPL - Apple Inc. stock, MSFT - Microsoft Corp. stock, HGF - High Grade Copper Futures, CBF - Crude Oil Brent Futures. The statistics have been calculated for in-sample period from the beginning of 1998 to the end of 2013 and for out-of-sample period from the beginning of 2013 to the end of 2017, on daily data.
Methods described before were tested on the three pairs of assets by running whole optimization process on the data from in-sample period. Extended Hill Climbing and Differential Evolution Method were executed
All statistics and graphs referring to the methods’ performance on that pair of assets were denoted by the acronym
All machine learning methods had the same selected median strategy, different than the one resulted from Exhaustive Search procedure. Nevertheless, all the methods used contrarian approach on
The median strategies parameters and statistics resulted from all the methods for SPXDAX
In-sample | Out-of-sample | |||||||
---|---|---|---|---|---|---|---|---|
ES | EHC | GM | DEM | ES | EHC | GM | DEM | |
60.00 | 100.00 | 100.00 | 100.00 | 60.00 | 100.00 | 100.00 | 100.00 | |
45.00 | 35.00 | 35.00 | 35.00 | 45.00 | 35.00 | 35.00 | 35.00 | |
65.00 | 45.00 | 45.00 | 45.00 | 65.00 | 45.00 | 45.00 | 45.00 | |
75.00 | 85.00 | 85.00 | 85.00 | 75.00 | 85.00 | 85.00 | 85.00 | |
4.27 | 3.92 | 3.92 | 3.92 | -0.03 | -0.62 | -0.62 | -0.62 | |
5.17 | 4.63 | 4.63 | 4.63 | 4.02 | 3.74 | 3.74 | 3.74 | |
0.83 | 0.85 | 0.85 | 0.85 | -0.01 | -0.17 | -0.17 | -0.17 | |
4.53 | 4.30 | 4.30 | 4.30 | 7.20 | 6.34 | 6.34 | 6.34 | |
77.79 | 77.16 | 77.16 | 77.16 | 0.00 | -1.62 | -1.62 | -1.62 |
%ARC - annualized rate of return (%), %ASD - annualized standard deviation (%), %MDD - maximum drawdown of capital (%), IR - information ratio calculated as %ARC / %ASD, OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD), k1, k2, k1.2, k2.2 - strategy parameters, width of the moving averages’ windows. The statistics of the equity lines have been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The equity lines of the strategies selected by all the methods for SPXDAX - in-sample
SPX - S&P500 Index, DAX - Deutscher Aktienindex, ES, EHC, GD, DEM - equity line of the median strategy resulted from respectively Exhaustive Search, Extended Hill Climbing, Grid Method and Differential Evolution Method. Prices of both assets have been normalized in order to have initial value equal to 1000. The equity line has been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The strategy components could hedge each other in order to reduce the portfolio risk and obtain more smooth equity line (stable profits). Both strategies followed two opposite approaches in trading on two similar assets. The strategy contained the contrarian part, operating on
The empirical distributions of the reached criterion and computation time of
The histograms of the reached optimization criterion and the execution time of EHC for SPXDAX – in-sample
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion have been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with assumption of fee equal to 0.25% of the position value.
The histograms of the reached optimization criterion and the execution time of DEM for SPXDAX - in-sample
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The equity lines have been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
of them reached the highest criterion as well. Therefore, the results proved both high efficiency and stability of the methods (Tab. 3). The median of the Extended Hill Climbing procedure computation time is equal to
The summary of the reached optimization criterion and the execution time of methods for SPXDAX – in-sample
ES | EHC | GM | DEM | |||||
---|---|---|---|---|---|---|---|---|
OC | Time [sec] | OC | Time [sec] | OC | Time [sec] | OC | Time [sec] | |
77.79 | 35562.17 | 65.58 | 11.87 | 77.16 | 128.04 | 71.93 | 13.11 | |
77.79 | 35562.17 | 74.39 | 13.93 | 77.16 | 128.04 | 77.16 | 24.84 | |
77.79 | 35562.17 | 77.16 | 30.97 | 77.16 | 128.04 | 77.16 | 31.15 | |
77.79 | 35562.17 | 75.94 | 43.1 | 77.16 | 128.04 | 77.34 | 42.73 | |
77.79 | 35562.17 | 77.16 | 65.32 | 77.16 | 128.04 | 77.79 | 61.5 | |
77.79 | 35562.17 | 77.79 | 569.39 | 77.16 | 128.04 | 77.79 | 141.08 | |
0.00 | 0.00 | 2.61 | 48.77 | 0.00 | 0.00 | 0.36 | 24.06 |
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion have been calculated for the strategy working on daily frequency, investing 20% of the capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The Grid Method resulted in the second best strategy, exactly the same as median strategy from
Most of the
As expected, the out-of-sample strategy performance was worse than the in-sample period. The strategies obtained by the Exhaustive Search and all the considered machine learning methods were ineffective in the out-of-sample period and resulted in return close to zero at the end of a time horizon.
The equity lines of the strategies selected by the all the methods for SPXDAX – out-of-sample
ES, EHC, GD, DEM - equity line of the median strategy resulted from respectively Exhaustive Search, Extended Hill Climbing, Grid Method and Differential Evolution Method. Prices of both assets have been normalized in order to have initial value equal to 1000. The equity line has been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 2014 to the end of 2017 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The strategy was optimized for the stocks of high-tech companies
All the considered methods selected exactly the same strategy. Simple moving averages crossover approach was highly effective due to the enormously high growth of the Apple stock. That strategy had a large return in the in-sample period,
The equity line of the strategy selected by all the methods for AAPLMSFT – in-sample
AAPL - Apple Inc. stock, MSFT - Microsoft Corp. stock, ES, EHC, GD, DEM - equity line of the median strategy resulted from respectively Exhaustive Search, Extended Hill Climbing, Grid Method and Differential Evolution Method. Prices of both assets have been normalized in order to have initial value equal to 1000. The equity line has been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The histograms of the reached optimization criterion and the execution time of EHC for AAPLMSFT – in-sample
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion has been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The median strategy parameters and statistics resulted from all the methods for AAPLMSFT
In-sample | Out-of-sample | |||||||
---|---|---|---|---|---|---|---|---|
ES | EHC | GM | DEM | ES | EHC | GM | DEM | |
50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | 50.00 | |
60.00 | 60.00 | 60.00 | 60.00 | 60.00 | 60.00 | 60.00 | 60.00 | |
75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | |
40.00 | 40.00 | 40.00 | 40.00 | 40.00 | 40.00 | 40.00 | 40.00 | |
17.79 | 17.79 | 17.79 | 17.79 | 1.37 | 1.37 | 1.37 | 1.37 | |
11.13 | 11.13 | 11.13 | 11.13 | 5.68 | 5.68 | 5.68 | 5.68 | |
1.60 | 1.60 | 1.60 | 1.60 | 0.24 | 0.24 | 0.24 | 0.24 | |
7.71 | 7.71 | 7.71 | 7.71 | 9.54 | 9.54 | 9.54 | 9.54 | |
368.83 | 368.83 | 368.83 | 368.83 | 3.49 | 3.49 | 3.49 | 3.49 |
%ARC - annualized rate of return (%), %ASD - annualized standard deviation (%), %MDD - maximum drawdown of capital (%), IR - information ratio calculated as %ARC / %ASD, OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD), k1, k2, k1.2, k2.2 - strategy parameters, width of the moving averages’ windows. The statistics of the equity line have been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The summary of the reached optimization criterion and the execution time of methods for AAPLMSFT – in-sample
ES | EHC | GM | DEM | |||||
---|---|---|---|---|---|---|---|---|
OC | Time [sec] | OC | Time [sec] | OC | Time [sec] | OC | Time [sec] | |
368.83 | 32821.18 | 301.34 | 11.96 | 368.83 | 150.66 | 274.97 | 11.82 | |
368.83 | 32821.18 | 368.83 | 14 | 368.83 | 150.66 | 368.83 | 19.67 | |
368.83 | 32821.18 | 368.83 | 18.18 | 368.83 | 150.66 | 368.83 | 22.19 | |
368.83 | 32821.18 | 367.16 | 27.4 | 368.83 | 150.66 | 368.55 | 22.71 | |
368.83 | 32821.18 | 368.83 | 32.97 | 368.83 | 150.66 | 368.83 | 25.27 | |
368.83 | 32821.18 | 368.83 | 174.06 | 368.83 | 150.66 | 368.83 | 45.8 | |
0.00 | 0.00 | 5.51 | 18.71 | 0.00 | 0.00 | 5.14 | 4.52 |
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion has been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The histograms of the reached optimization criterion and the execution time of DEM for AAPLMSFT – in-sample OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion have been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
All the considered machine learning methods selected the same strategy as the exhaustive search procedure. That strategy was optimal on the in-sample period in terms of optimization criterion, resulted in annualized returns at the level of
The last considered pair of assets was composed of the two commodities’ futures contracts. The problem of finding the optimal strategy was harder, than for the previous ones. The difference between commodities’ behaviour in both periods and a weaker statistical relationship between them were the main reasons for the difficulties. The statistics and graphs from that case were always denoted by
The equity lines of the strategies selected by all the methods for AAPLMSFT – out-of-sample
AAPL - Apple Inc. stock, MSFT - Microsoft Corp. stock, ES, EHC, GD, DEM - equity line of the median strategy resulted from respectively Exhaustive Search, Extended Hill Climbing, Grid Method and Differential Evolution Method. Prices of both assets have been normalized in order to have initial value equal to 1000. The equity line has been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 2014 to the end of 2017 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The median strategies parameters and statistics resulted from all the methods for HGFCBF
In-sample | Out-of-sample | |||||||
---|---|---|---|---|---|---|---|---|
k1 | ES 60.00 | EHC 60.00 | GM 60.00 | DEM 60.00 | ES 60.00 | EHC 60.00 | GM 60.00 | DEM 60.00 |
75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | 75.00 | |
50.00 | 30.00 | 30.00 | 30.00 | 50.00 | 30.00 | 30.00 | 30.00 | |
25.00 | 95.00 | 95.00 | 95.00 | 25.00 | 95.00 | 95.00 | 95.00 | |
8.18 | 9.53 | 9.53 | 9.53 | -1.59 | 6.60 | 6.60 | 7.38 | |
8.16 | 9.83 | 9.83 | 9.83 | 7.09 | 8.17 | 8.17 | 8.07 | |
1.00 | 0.97 | 0.97 | 0.97 | -0.22 | 0.81 | 0.81 | 0.91 | |
7.51 | 9.52 | 9.52 | 9.52 | 15.86 | 12.16 | 12.16 | 12.16 | |
109.11 | 97.04 | 97.04 | 97.04 | -2.24 | 43.81 | 43.81 | 55.49 |
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The empirical statistics have been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The Exhaustive Search selected the strategy with an annualized return equal to
The equity lines of the strategy selected by all the methods for HGFCBF – in-sample
HGF - High Grade Copper Futures, CBF - Crude Oil Brent Futures, ES, EHC, GD, DEM - equity line of the median strategy resulted from respectively Exhaustive Search, Extended Hill Climbing, Grid Method and Differential Evolution Method. Prices of the both assets have been normalized in order to have initial value equal to 1000. The equity line has been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The histograms of the reached optimization criterion and the execution time of EHC for HGFCBF – in-sample
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion has been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The conclusions from computing
The summary of the reached optimization criterion and the execution time of the methods for HGFCBF – in-sample
ES | EHC | GM | DEM | |||||
---|---|---|---|---|---|---|---|---|
OC | Time [sec] | OC | Time [sec] | OC | Time [sec] | OC | Time [sec] | |
109.11 | 42193.57 | 77.29 | 12.29 | 97.04 | 113.75 | 97.04 | 9,76 | |
109.11 | 42193.57 | 93.62 | 14.73 | 97.04 | 113.75 | 97.04 | 19.92 | |
109.11 | 42193.57 | 97.04 | 33.09 | 97.04 | 113.75 | 97.04 | 23.08 | |
109.11 | 42193.57 | 97.82 | 42.69 | 97.04 | 113.75 | 99.01 | 27.34 | |
109.11 | 42193.57 | 109.11 | 36.85 | 97.04 | 113.75 | 97.04 | 29.15 | |
109.11 | 42193.57 | 109.11 | 622.17 | 97.04 | 113.75 | 109.11 | 110.93 | |
0.00 | 0.00 | 8.59 | 51.28 | 0.00 | 0.00 | 4.46 | 12.5 |
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion has been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
FThe
The median of
The histograms of the reached optimization criterion and the execution time of DEM for HGFCBF – in-sample
OC - optimization criterion calculated as 100 * (%ARC * %ARC) / (%ASD * %MDD). The optimization criterion has been calculated from the sample of 1000 independent algorithm executions. The strategies have been working on the daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
All the considered machine learning methods finally selected the same strategy, which was slightly worse than the optimal one in the in-sample period but significantly better in the out-of-sample period. The annualized returns were about
The equity lines of the strategies selected by the different methods for HGFCBF – out-of-sample
SPX - S&P500 Index, DAX - Deutscher Aktienindex, ES, EHC, GD, DEM - equity line of the median strategy resulted from respectively Exhaustive Search, Extended Hill Climbing, Grid Method and Differential Evolution Method. Prices of the both assets have been normalized in order to have initial value equal to 1000. The equity line has been calculated for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 2014 to the end of 2017 has been simulated, with the assumption of fee equal to 0.25% of the position value.
Throughout the paper, three machine learning optimization methods, adjusted to the problem specificity, were discussed. The performance of each method was tested by solving three problems of selection of trading strategy parameters on the period from the beginning of 1998 to the end of 2017. The machine learning algorithms solved the problem in significantly shorter time than the Exhaustive Search procedure with no significant difference in the results’ quality.
As noted before, the machine learning methods gave results similar to the optimal ones obtained by the Exhaustive Search procedure. The critical difference was in the computation time. Checking all the possible parameters required plenty of time. It lasted a few hours, whereas the machine learning methods produced the comparable results in a fraction of a minute. The advantage in time efficiency would be critical for complex problems, for instance, considering a larger parameter space. The relative time difference was significant, for instance,
Mean and median optimization criterion reached by the different methods, referred to the ES method in percent – in-sample
ES | Grid | EHC median | DEM median | EHC mean | DEM mean | |
---|---|---|---|---|---|---|
100 | 99.19 | 99.19 | 99.19 | 97.62 | 99.42 | |
100 | 100.00 | 100.00 | 100.00 | 99.55 | 99.92 | |
100 | 88.94 | 88.94 | 88.94 | 89.65 | 90.74 |
SPXDAX - case of trading on S&P500 Index and Deutscher Aktienindex, AAPLMSFT - case of trading on Apple Inc. and Microsoft Corp. stocks, HGFCBF - case of trading on High-Grade Copper and Crude Oil futures. ES - the Exhaustive Search, EHC - the Extended Hill Climbing, DEM - the Differential Evolution. The simulations has been performed for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
Mean and median computation time of the methods, referred to the ES method in percent
ES | Grid | EHC median | DEM median | EHC mean | DEM mean | |
---|---|---|---|---|---|---|
100 | 0.35 | 0.08 | 0.09 | 0.12 | 0.12 | |
100 | 0.46 | 0.06 | 0.07 | 0.08 | 0.07 | |
100 | 0.27 | 0.08 | 0.05 | 0.10 | 0.06 |
SPXDAX - case of trading on S&P500 Index and Deutscher Aktienindex, AAPLMSFT - case of trading on Apple Inc. and Microsoft Corp. stocks, HGFCBF - case of trading on High-Grade Copper and Crude Oil futures. ES - the Exhaustive Search, EHC - the Extended Hill Climbing, DEM - the Differential Evolution. The simulations has been performed for the strategy working on daily frequency, investing 20% of capital in position on each asset with rebalancing every 5 trading days. Trading from the beginning of 1998 to the end of 2013 has been simulated, with the assumption of fee equal to 0.25% of the position value.
The first box plot (Fig. 13) presents the optimization criterion across the samples. There are almost no significant differences between the results of the tested methods. The second box plot (Fig. 14) presents the computation time across the samples. The
The boxplot of the optimization criterion of strategies selected by the machine learning methods, as a percentage of the global maxima found by the Exhaustive Search
The samples were denoted by the algorithm acronym and the number of trading case, so
The boxplot of machine learnings methods’ computation time empirical distribution
The samples were denoted by the algorithm acronym and the number of trading case, so
Three machine learning methods (EHC, GM and DEM) were implemented and tested on simple moving averages’ crossover strategy optimization problem. Machine learning methods were a heuristic searches, based on simple algorithms, commonly used for similar problems. The methods were adjusted to the considered problem specificity, such as discreteness of parameters or low regularity of the solution space.
Machine learning methods were compared based on the value of optimization criterion, including annualized rate of return from strategy and two risk measures – the annualized standard deviation and the maximum drawdown. All the statistics were calculated for the simulated trading on the period from the beginning of 1998 to the end of 2013. The optimization criterion calculated for the strategies and the computation time, required to proceed the whole search process, were compared with the Exhaustive Search. The considered strategies were traded on the specified pairs of assets and were tested separately on
The strategies were compared, in terms of the optimization criterion, based on the annualized returns and including the risk metrics, such as the annualized standard deviation of returns and the maximum draw-down of the equity line. Applying such an approach in the optimization process led to the selection of more stable strategies. Using maximum drawdown component eliminated the strategies generating all profits in one short period of time. That approach significantly reduced the risk of overfitting, caused by the adjustment strategy to a few past extreme market situations.
The first method, called the
The second implemented ML method was purely deterministic algorithm, called the
The last method, called
The performance of strategies in the in-sample period was better than in the out-of-sample. Despite the main goal was to introduce and compare optimization methods, it is worth to point out the difference between in-sample and out-of-sample strategies’ accuracy. The strategies optimized by different methods in the in-sample periods bear losses in the out-of-sample period for two out of three cases (
Slightly different situation was in the case of out-of-sample results for commodity futures trading
To sum up, the presented results seems to be consistent with the main hypothesis. The machine learning methods required much less time than the Exhaustive Search and produced similar results in the considered cases. In consequence, the main hypothesis was not rejected. The machine learning methods reached only slightly worse in-sample optimization criterion but in a significantly lower execution time. The additional research question, that the machine learning methods leads to lower overfitting risk, could not be answered based on the results presented in this paper. In two scenarios, the machine learning methods selected very similar strategies to the optimal one. Nevertheless, the methods selected worse strategies in the in-sample period in the last case; the final strategy generated profit in the out-of-sample period, while the one obtained by the ES resulted in the loss of the invested capital. The property of the overfitting reduction was observed only in one case, so it cannot lead to certain conclusions.