The method of support vectors was developed by Vladimir Vapnik in 1995 and was first applied to the task of classification of texts by Joachims in 1998. In its original form, the algorithm solved the problem of distinguishing objects of two classes. The method has gained immense popularity due to its high efficiency. Many researchers used it in their work in the classification of texts. The approach proposed by Vapnik to determine to which of the two predefined classes the sample should belong adheres to the principle of structural minimization of risk.
The results of the classification of texts using the support vector method are among the best, in comparison with other machine learning approaches. However, the learning speed of this algorithm is one of the lowest. The method of SVM requires a large amount of memory and exercises a significant computational load for the computer to perform the training. Summing up, the simplicity combined with state of the art performance on many learning problems (classification, regression, and novelty detection) has contributed to the popularity of SVM.
SVM is also quite a popular algorithm for building trading systems. It was used mostly to predict stock or index price movement whether it will go up or down. However, hardly any papers reveal how SVM performs on the cryptocurrency market where the asset price volatility is much higher than on traditional markets.
The goal of this paper is to apply the SVM algorithm to build an investment strategy for the cryptocurrency market and investigate its profitability. The research hypothesis is that the strategy based on the SVM algorithm is able to outperform the benchmark strategies in terms of return-risk relation. Similar to Ślepaczuk
The main idea and methodology concepts were adopted from the research paper ‘Nonlinear support vector machines can systematically identify stocks with high and low future returns’ by Huerta
SVM is implemented to build a trading strategy in the following way. The training set is basically a tail set, which is defined to be a group of coins whose volatility-adjusted price change is in the highest or the lowest quintile, for example, the highest and the lowest 25 coins. Each coin is represented by a set of technical features. A classifier is trained on historical tail sets and tested on the current data. The classifier is chosen to be a nonlinear support vector machine. The SVM is trained once per reallocation period. If the portfolio is reallocated once per week, the SVM is trained once per week accordingly. The portfolio is formed by ranking coins using the classifier output. The highest ranked coins are used for long positions and the lowest ranked potentially can be used for short sales. The data cover the period from 01/01/2015 to 08/01/2018.
The structure of the paper can be summarized as follows. After the literature review in the theoretical part, a short introduction to support vector machines is provided. There are three concepts such that the maximal margin classifier, the support vector classifier, and the support vector machine. The second section focuses on data, methodology and strategy implementation. The third part provides empirical results in comparison with benchmark investment strategies. Sensitivity analysis is the final part of the thesis.
The theoretical background for SVM is based mainly on James
There are many methods in machine learning (e.g., neural networks) that might work as good as SVM, but the simplicity of the mathematical functions and the theory that guides the training of the model as a convex optimization problem (Boyd and Vandenberghe, 2004) make SVMs a good option. An important trait of convex optimization problems is a guarantee that there is only an optimal model to fit the data.
In the previous literature, the application of SVMs to financial data has been mostly dedicated to the prediction of the future direction of the stock price index. For example, the study of Kim (2003) examines the feasibility of applying SVM in financial forecasting by comparing it with back-propagation neural networks and case-based reasoning. The experimental results proved that SVM provides a promising alternative to stock market prediction.
Another example is the paper of Van Gestel
Additional examples of SVM regression for futures index prediction are found in Tay and Cao (2001, 2002) and Cao and Tay (2003), where also was proven that SVMs provide a promising alternative to the neural network for financial time series forecasting. As demonstrated in the experiment, the SVMs forecast significantly outperformed the BP network in the CME-SP, CBOT-US, CBOT-BO and MATIF-CAC40 futures and slightly better in the EUREX-BUND.
In the work of Huang
Kim (2003) also used SVM as a classification method to predict the direction of the market’s movement. The paper emphasized the importance of the meta parameter assumptions and how the prediction performances of SVMs are sensitive to the value of these parameters.
Huang
The most related work that contributed to the idea of this paper is Huerta
Another paper that contributed to the methodological and strategy implementation part is the work of Kość
Regarding the application of SVMs on the cryptocurrency market, there is no such extensive research that has been conducted if compared to the research for the stock market. For example, Chen
Many of the published papers using SVMs for financial data mention the influence of meta-parameter selection on the performance of the model. It is a key issue to make sure that the selection of meta parameters is free from forward-looking bias. In order to avoid this widespread problem, the meta parameters must be chosen based only on historical information.
There was always an important point for discussion concerning the choice of the type of SVM: linear versus nonlinear. Linear SVMs are fast to train and run, but these types of SVMs tend to underperform on complex datasets with many training examples. As was proven in Huerta
Support Vector Machines (SVM) is one of the most popular algorithms for classification. It is based on the assumption that in a multidimensional space, there exists a hyperplane that may separate the data into classes.
SVM can be generalized out of a simple classifier, which is called the maximal margin classifier. It is only feasible to apply the linear classifier to the data sets, which are linearly separable. Unfortunately, most of the data sets have classes that cannot be separated by a linear boundary. The maximal margin classifier was extended to the support vector classifier, which in turn can be applied to a wider range of data sets. Support vector machines are a further extension of the support vector classifier, which already can be applied for fitting data with non-linear class boundaries.
If we deal with non-linear class boundaries, the problem can be solved by extending the dimension via using quadratic, cubic or higher-order polynomial functions of the features. For example, rather than applying a support vector classifier with
The optimization problem takes the following form: Maximize
So, SVM is an enlargement of the support vector classifier that realizes through extending the feature space via using kernels that are considered to be a specific efficient computational approach. A kernel is a kind of a function that is able to quantify the similarity of a pair of observations. For example, one may consider the following expression:
that will just return the support vector classifier. The equation (2.3) is called a linear kernel due to the fact that the support vector classifier is linear in the predictors; the linear kernel basically quantifies the similarity of two observations applying the Pearson correlation. However, it is possible to choose another form for (2.3). For example, one may substitute every output of
The expression stated above is a polynomial kernel with degree
where
K(x, xi) is the kernel function, that takes two vectors as inputs and produces a single scalar value, which is positive.
The polynomial kernel that was outlined in (2.4) serves as just one instance of a possible non-linear kernel. There exist ample alternatives. One very wide-spread alternative is the radial kernel, that has the following form:
In (2.6), γ (gamma) is a tuning meta parameter that is a positive constant. Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’.
Fig. 2.1 demonstrates two examples for non-linear data where SVM is run with a polynomial kernel. So, looking at Fig. 2.1, in both cases, either kernel is able to capture the decision boundary; however, SVM with a radial kernel demonstrates far better fitting.
Advantages of SVM model are such that: it is very efficient when groups are fully or almost fully separable, can work very well when there are more independent variables than observations, can be adjusted to work well with unbalanced datasets, has only just several parameters to tune and is partially immune to outliers. Disadvantages are as follows: SVM is very slow for a large number of observations and it is not the most efficient when separation is low. To conclude, the type of SVMs used in the paper is the non-linear radial kernel.
The data were downloaded from
As of 05/08/2018, the number of all listed cryptos totals 1732, while as of 01/12/2017 there were ~ 1500 cryptos. In such a way, despite the current downturn in the cryptocurrency market, the number of cryptocurrencies is constantly growing.
The total market capitalization of the cryptocurrency market reached its peak in January 2018, totalling almost 800 billion USD, which is equal ~ 2.5% of S&P 500 Total Market Cap. Estimates based on the data provided by: www.coinmarketcap and World Bank: data.worldbank.org/indicator/CM.MKT.TRAD.CD. USA stock market cap for 2017 equals 32.121 trillion USD.
As can be noticed in Fig. 3.1, till March 2017, the market was dominated by Bitcoin (90% of the total market cap of cryptocurrency market). In 2017–2018, Bitcoin lost its position and allowed other assets to share the market. As of 05/08/2018, the share of Bitcoin comprised of 47.44% of the total market, Ethereum – 16.21%, Ripple – 6.66%, and Others – 19.69%.
The 14-day moving average of the volume was calculated for each asset, and those which do not meet the filter threshold of 100 USD are excluded from further usage. Such application of filtering ensures that the investment portfolio meets the minimum liquidity requirement. An additional filter is applied regarding the price history of an asset. Only those cryptocurrencies that have 91 days and more of history for the close price are qualified to constitute the data set.
After that, there was created a ranked set of 100 cryptocurrencies by the largest market cap. This set of 100 cryptocurrencies with the largest market cap will be referred to further in the thesis as the Top100.
The starting date for the simulation period was taken as 01/10/2014 so to ensure the availability of at least 3-month historical data to create a training set and start running the strategy on 01/01/2015. The ranking with 100 largest cryptocurrencies (Top100) was calculated for each day of the simulation. From the set of 1438 unique assets, only 475 have been qualified to enter the Top100 ranking for at least one day, as well as having taken into consideration the condition of the 14-day moving average of the daily volume being higher than 100 USD.
The following measures were used to provide descriptive statistics for the data and further these measures were applied in the evaluation of portfolios’ efficiency:
– ARC (the annualized rate of change):
where P
– ASD (the annualized standard deviation of daily returns):
where
and
– MDD (the maximum drawdown coefficient):
– IR1, IR2 (the information ratio coefficients) quantify the risk-weighted gain:
Tab. 3.1 presents descriptive statistics for the 10 largest and 10 smallest cryptocurrencies as of 01/01/2018. As can be seen from Tab. 3.1, the values of %ARC, %ASD, IR, and IR2 have huge differences across the cryptocurrencies. Quite a lot of new assets such as loom-network, cybermiles, nuls bibox-token, and odem appeared just a couple of months ago, but already took the position in Top100. For example, odem behaves as an investment ‘star’ demonstrating abnormal return and the lowest drawdown. This crypto was built on the Ethereum blockchain and stands for On-Demand Education Marketplace.
Descriptive statistics for 10 largest and 10 smallest cryptocurrencies by MarketCap in TOP100 as of date 01-08-2018
The largest 10 cryptocurrencies in TOP100 as of 01-08-2018 | ||||||||
---|---|---|---|---|---|---|---|---|
Name | %ARC | %ASD | %MDD | IR1 | IR2 | Date of start | Volume, mUSD | MarketCap, USD |
bitcoin | 118 | 75.8 | 69.7 | 1.6 | 2.6 | 01-10-2014 | 1888 | 43839225862 |
ethereum | 437.7 | 145.2 | 84.3 | 3 | 15.7 | 20-08-2015 | 323 | 17124399552 |
ripple | 226.7 | 164.6 | 87.1 | 1.4 | 3.6 | 01-10-2014 | 499 | 13468236361 |
bitcoin-cash | 83.5 | 198.5 | 84.4 | 0.4 | 0.4 | 05-08-2017 | 699 | 6672807179 |
eos | 516 | 253 | 87.9 | 2 | 12 | 14-07-2017 | 78 | 5220519698 |
stellar | 228.6 | 178.8 | 82.6 | 1.3 | 3.5 | 01-10-2014 | 301 | 4634665748 |
litecoin | 111.1 | 119.2 | 79.1 | 0.9 | 1.3 | 01-10-2014 | 80 | 3709789644 |
cardano | 698.6 | 263.8 | 89.3 | 2.6 | 20.7 | 14-10-2017 | 32 | 2624893338 |
iota | 48.4 | 188.4 | 82.9 | 0.3 | 0.1 | 26-06-2017 | 3059 | 2460207729 |
tether | -5.4 | 45.7 | 49.9 | -0.1 | 0 | 15-03-2015 | 140 | 2233258238 |
loom-network | 649.7 | 217.4 | 80 | 3 | 24.3 | 21-04-2018 | 2.8 | 98040413 |
gas | 365.8 | 265.4 | 89.6 | 1.4 | 5.6 | 09-08-2017 | 2.6 | 91875052 |
tenx | -97.6 | 247 | 99.2 | -0.4 | -0.4 | 10-07-2017 | 8.1 | 91154578 |
nxt | 34.2 | 158.1 | 95.6 | 0.2 | 0.1 | 01-10-2014 | 2.8 | 90165499 |
cybermiles | -44.8 | 202.2 | 87.4 | -0.2 | -0.1 | 04-05-2018 | 7.1 | 88375828 |
nuls | 5083.5 | 382.2 | 79.1 | 13.3 | 854.8 | 22-03-2018 | 4.1 | 88067102 |
byteball | 160 | 236.8 | 91.2 | 0.7 | 1.2 | 09-01-2017 | 0.56 | 86950232 |
bibox-token | 53.2 | 265.7 | 87.9 | 0.2 | 0.1 | 08-06-2018 | 67.5 | 83456610 |
odem | 89471 | 229.6 | 40.7 | 389.7 | 856638 | 01-08-2018 | 0.135 | 82906522 |
electroneum | -92.5 | 243 | 95.3 | -0.4 | -0.4 | 15-11-2017 | 0.552 | 81946739 |
Legend: %ARC- annualized rate of return as in (3.1), %ASD - annualized standard deviation in percent as in (3.2), %MDD – maximum drawdown of capital in percent as in (3.3), IR1, IR2 - information ratios as in (3.4) \and (3.5) accoringly, 'Date of Start' - date of first appearance in Top100.
The values for the maximum drawdown are also relatively large. If odem is not taken into consideration, other cryptocurrencies in the set have %MDD in the range from 50% to 99%. For comparison purposes, the S&P500 index has noted only ~ 14% drawdown in the same simulation horizon (see Tab. 4.1 in the next section).
One of the key issues in this investigation is how to form the tail sets that constitute the positive and negative classes of the training data. As a metric, there were chosen returns divided by a volatility estimate. This option creates an ordered list of coins with volatility-adjusted returns. The volatility is estimated here by an exponential moving average:
The volatility-adjusted return is the ratio of the daily return and the estimation of the volatility according to formula 3.6:
where Rt is a daily return,
It was shown in Huffman
In order to run calculations for each reallocation period, the volatility-adjusted returns for the 3-month period (91 days) for the assets in Top100 for each day were calculated. Long data were filtered by the condition to ensure the presence of 91 days of history. Then for each day in the range of these 91 days, there were calculated discrete daily returns. Using these daily returns, the estimate of the volatility (an exponential moving average) was calculated according to the algorithm presented in (3.6).
As can be seen, the first sigma is 0. Lambda was set to 0.94 by default as a value more frequently used for exponential moving average computations. Then the volatility adjusted returns are calculated according to (3.7). As a result, for each day for each coin in Top100, the volatility-adjusted returns are calculated for the 3-month period (91 days).
For illustration purposes, the distribution of volatility-adjusted returns with 3-month history from 01/10/2014 to 12/31/2014 is shown in Fig. 3.2. As an example, the positive tail sets are the B most positive volatility-adjusted returns, and the negative tail sets are the B most negative. B can be 5, 10 or 15 coins in the data set. The values for the training size will be provided later in this chapter in terms of sensitivity analysis. The vertical lines represent the cut-off indicating how many assets will be used for SVM training. The + and − regions are the ones used for that. In such a way, to train SVM for 01/01/2015, there will be used 3-month data from 01/10/2014 to 12/31/2014 with volatility-adjusted returns, for 01/02/2015 − data from 02/10/2014 to 01/01/2015.
Further, in the paper, TS will be denoted as the length for the training set, or in other words, the number of volatility-adjusted returns used in designing the training data sets.
Each coin on day
Technical indicators used for the creation of feature set to train SVMs
Feature | Full name | Parameters |
---|---|---|
MOM, n days | Momentum for close prices, n days | n = 10 days |
ΔV, n days | Volume change n days | n = 10 days |
RSI | Relative Strength Index | n = 10 days |
FI | Force Index | N/A |
Williams %R | Williams Percent Range | n = 10 days |
PSAR | Parabolic stop and reversal system | Acceleration factor by default set to 2% increasing by 2% with a maximum of 20% |
Note: the table contains the list of six technical features used for running SVM and parameters set for calculations purposes.
Momentum has been one of the well-recognized phenomenon described in the academic literature; see Jegadeesh et al. (2012) and Rouwenhorst (2002). They stated that stocks with high (low) returns over periods of three to 12 months keep having high (low) returns over subsequent three to 12 month periods. So, momentum for close prices of cryptocurrencies was included in the feature set.
Another feature, the volume change, is an indicator catching underreactions and overreactions in stock price movements. If a price movement happens with large volume, the price change is more significant than if it occurs with low volume; see Chordia et al. (2002). To capture this effect, a percentage change of the daily trading volume was included in the feature data set.
RSI is a momentum indicator that captures the magnitude of the latest price changes in order to estimate whether the market is overbought or oversold. Thus, it is predominantly used to identify overbought or oversold market conditions. RSI is calculated as follows:
where RS is an average gain of up-trending periods during a certain time period divided by an average loss of down-trending periods during the defined time period. In such a way, RSI provides a relative estimation of the strength of an asset’s recent price performance. RSI outcomes range from 0 to 100. The default time period for relating up-trending periods to down-trending periods is 2 weeks. The traditional interpretation of RSI is that RSI outcomes of 70 or above show that an asset is getting overbought or overvalued. RSI being 30 or below interpreted as pointing out oversold or undervalued conditions. Sudden significant price changes may provide false buy or sell signals. Therefore, it is better to use it with amendments to its application or together with other reliable technical indicators.
FI captures the market power behind the change in the asset price. FI’s value may be either positive or negative, that depends on the upward or downward change in the asset price. The three inputs required for the formula are: close price, open price, and trading volume. Analysts often use FI along with the moving average to make predictions for an asset’s future performance. The formula for FI is as follows:
Williams%R is a type of momentum indicator that ranges between 0 and –100 and measures overbought and oversold market conditions. The Williams%R is commonly applied to define entry and exit points for trading. It can be treated as a technical analysis oscillator. It compares an asset’s close price to the high-low range over a specific period, by default over 14 days. The formula to calculate this indicator is as follows:
The Williams%R became popular as an indicator because of its ability to signal for market swings at least one or two points in the future. Predictions of market reversals are very valuable for market participants; so an asset is overbought when the indicator is above –20, and is oversold when the indicator is below –80. Overbought and oversold periods can persist, should the price keep on rising or falling.
The parabolic SAR is a technical indicator applied to define the price direction of an asset, as well as to measure how the price direction is changing. It is generally used to put trailing price stops; thus, it may be treated as a stop-loss system.
PSAR is calculated independently for each trend in the price. When the price is in an uptrend, PSAR goes below the price and converges upwards towards it. By the same logic, on a downtrend, PSAR goes above the price and converges downwards. At each step within a trend, PSAR is calculated one period in advance. PSAR value for tomorrow is calculated using data available today. PSAR values are calculated as follows:
where EP is the highest high for a long-term trend and the lowest low for a short-term trend, which is updated each time a new EP is achieved; AF is the default of 2% increasing by 2% each time a new EP is achieved, with a maximum of 20%.
In the work of Pistole (2010), moving average rules, RSI method and PSAR technique were compared with the buy&hold strategy for the S&P500 index. Interestingly, the PSAR indicator is far more successful in positioning long or short on the S&P500 index. Results revealed that by applying PSAR as the buy and sell signal, the strategy significantly outperforms the market, as well as buy&hold and other strategies discussed in that paper.
The above mentioned six technical indicators were considered and included in the feature set mostly based on the academic literature. All the indicators were calculated in R. RSI, SAR, and Williams%R were implemented using the corresponding built-in functions from the TTR package.
The data sets that should be prepared in advance are long data containing all the information available and the Top100 market cap ranking. These data sets will be used for running the SVM strategy. A step-by-step loop was written in R to implement the strategy.
Volatility-adjusted returns were calculated for the assets in Top100 for the 3 previous months for each day, more specifically, over the period from date
The length of the training set is introduced as the parameter TS. To construct the training set, TS is used to denote the number of volatility-adjusted returns. For example, TS on the level of 25 means that 25 coins with the highest values and 25 coins with the lowest values of volatility-adjusted returns are taken to form the training data. The assets with the highest values form the class + and the assets with the lowest values form the class −.Then, for these 50 coins, the six technical features are calculated (MOM, V, IF, PSAR, Will%R, RSI) for each day for the period of the last 91 days. The period over which the training set is constructed to perform one SVM test on the reallocation day (date[
When the training set is ready, SVM is applied in order to tune the meta parameters C (cost) and γ (gamma). The best C and γ are used later to test SVM forthe assets on the reallocation day. The choice of meta-parameters is explained in section 3.7.
The first reallocation day is assumed to be 01/01/2015. The reallocation day is the day on which the portfolio composition is changed. The change depends on the SVM output. The assets with the greatest values of the SVM output are included in the portfolio on the reallocation day. SVM output is the value provided by the function as in (2.5).
On the reallocation day, we liquidate the assets that are not recommended by SVM output and they leave the portfolio. The assets that remain in the portfolio have their weights reallocated. The time period which is between two sequential reallocation days is called the reallocation period (RE).
Then testing set is prepared. SVM is tested on the data on the date[
To summarize, the training protocol is such that SVM is trained over tail sets for the period of time [
In order to estimate the portfolio performance, the following methodology was used. The gross rate of return
where
The weights are being calculated according to the following formula
It is assumed that
The portfolio composition changes on the reallocation day.
In order to understand the logic of the formula for calculation of the turnover ratio of the portfolio, three cases were taken into consideration. The first one is that assets leave the ranking. The second one is that assets enter the portfolio, and the third one, assets keep staying in the portfolio, just with new weights. To account for this change in the portfolio composition, the turnover ratio of the portfolio was calculated as follows:
The above value can be of any in the range from zero (the composition of the portfolio is not changed if compared to the previous reallocation day) to 200% (the composition of the portfolio is entirely changed; all assets left the portfolio and new ones entered).
The values of transaction costs on the cryptocurrency markets can be between 0.2% and 2.0% of the transaction value depending on the asset’s type and the liquidity. Transaction costs for the base and the benchmark portfolios were assumed to be 1%. To provide the fair calculation for the total portfolio reallocation cost, the product of the portfolio turnover ratio
The implementation of a portfolio loop in R was done with the package Performance Analytics. That vignette gives some simple examples of computing portfolio returns using asset prices as well as weights framework.
To construct the benchmark portfolios, the Top100 market cap ranking was used. To estimate the efficiency of the main SVM strategy, similarly as in Kość
The benchmark equally weighted portfolio (further denoted as EqW) is constructed as an investment with equal weights in all cryptocurrencies, which are qualified for Top100 on the reallocation day. As for a base case, the reallocation period is set to one week. Transaction costs constitute 1% of the portfolio value. These assumptions are the same as for the SVM strategy.
The benchmark market-cap weighted portfolio (further denoted as McW) is built as an investment with market-cap weights in cryptocurrencies, which are qualified for Top100 on the reallocation day. It means that on reallocation day, the investment in an asset takes the weight as the ratio of the market capitalization of an asset to the whole market capitalization for all the assets in Top100. The reallocation period and the transaction costs are the same as for the equally-weighted portfolio.
The buy&hold strategy is also considered as a benchmark portfolio. Just to get the full insight on the performance of all the portfolios, two buy&hold strategies were run, one for bitcoin (BTC B&H), and the second for S&P500 index (S&P B&H). It is worth noting that buy&hold strategy on S&P500 index is a widely used benchmark to compare the portfolios. Both buy&hold strategies have the same simulation period as the former benchmark strategies.
The C (cost) and γ (gamma) values are the meta-parameters of the SVM model. One of the problems that frequently arises is overfitting, which is quite common in machine learning (Cawley
High values for C causes the cost of misclassification to be large; therefore, SVMs are forced to classify the input data more severely and the problem of overfitting may arise. Small values for C mean lower variance and higher bias. Small values for C makes the cost of misclassification low; thus, providing more ‘space’ for the model to make a mistake by misclassifying the case. The objective is to find the balance between ‘not too severe’ and ‘not too relaxed’.
When the value for gamma is small, the constraint for the model is too high and it cannot catch the complexity or curvature of the input data. In other words, gamma explains how strong the influence of a single training observation is.
As in a standard classification problem, the dataset is divided into training and testing sets, which are mutually exclusive sets. Further, in order to perform tuning of meta-parameters, the training set is separated again into training set and validation set. The visualization of the data set splits is presented in Fig. 3.3. Therefore, in order to perform the tuning, only second training and validation sets are used.
The partition of the training set again into training and validation sets is performed with the help of certain sampling method. Potentially, there can be a number of partitions into validation and training sets over which tuning is performed.
In the package e1071
As the algorithm of SVM is a standard classification task, which has only two parameters, the grid search method is considered quite effective. To conduct grid search over the parameters, function tune.svm() from the package e1071 was used. A sequence of parameters for cost and gamma was created as the vector (0.5, 1, 2, 4). Each pair of the parameters from that sequence is tested and those values of cost and gamma providing the lowest prediction error for the model on the validation subsets are chosen.
In such a way, every time SVM is tested on the reallocation date[
This section summarizes step-by-step actions to implement the SVM strategy.
Here are the one-time actions conducted before running the loop to generate strategy results:
Web-scraping: the data were scrapped from the website
Filtering: 14-day moving average of the volume was calculated for each asset, and those that did not meet the filter threshold of 100 USD were excluded from further usage. Additionally, only those cryptocurrencies that have 91 days and more of history for the open price were qualified to constitute the long data set.
Top100 ranking: a ranked set of 100 cryptocurrencies by the largest market cap was created for each day for the whole long data set.
A step-by-step loop written in R to implement the strategy is run as many times as the number of reallocation days in the period from 01/01/2015 to 01/08/2018. Step-by-step actions in the loop are as follows:
Preparation of the training set: class for SVM. On date[
Preparation of the training set: technical features. We select the assets with the highest and the lowest volatility-adjusted returns from the spectrum defined by %TS assumption and then calculate for them six technical features (MOM, V, IF, PSAR, Will%R, RSI) for each day for the period of the last 91 days.
Meta-parameters tuning: SVM is applied to the training set in order to tune the meta parameters C (cost) and γ (gamma). The best C and γ are used later to test SVM to predict the class for the assets in the testing set on the reallocation day.
Preparation of the testing set: On date[
SVM is run using prepared training and testing sets. The output of the SVM for 100 assets from Top100 on the date[
‘Buy’ candidates are then kept in the portfolio for the reallocation period (for example, it is 1 week for the base case).
Calculation of the net portfolio value for one reallocation period taking into consideration transaction costs.
Once the loop is finished, performance statistics are calculated for the net value of the portfolio. In such a way, in every reallocation period, a fresh SVM model is trained and tested with its own optimal meta parameters.
The parameters that are deemed to be fixed in the model are as follows:
the number of periods for volatility-adjusted returns calculated as daily returns
lambda λ (set to 0.94) used to calculate exponential moving average for returns
length of historical data taken to calculate technical features described in Tab. 3.2
meta parameters C and γ
length of training data set to 91 days
long positions only assumptions.
Four parameters were chosen to participate in the sensitivity analysis. These parameters are:
the number of cryptocurrencies kept in the portfolio (N)
reallocation period (RE)
the percentage value of the transaction costs (%TC)
training data size (%TS).
The performance statistics for SVM and benchmark strategies are presented in Tab. 4.1.
Descriptive statistics of the SVM strategy compared with the benchmark strategies
N | RE | %TC | V | %ARC | %ASD | %MDD | IR1 | IR2 | %MT | |
---|---|---|---|---|---|---|---|---|---|---|
S&P B&H | - | - | - | - | 13.6 | 15.5 | 14.2 | 0.8 | - | |
BTC B&H | - | - | - | - | 147.4 | 76.8 | 69.7 | 4.1 | - | |
EqW | 100 | 1 w | 1 | 100 | 425.8 | 96.2 | 81.7 | 23.1 | 10.8 | |
McW | 100 | 1 w | 1 | 100 | 141.9 | 74.9 | 73.1 | 3.7 | 6.3 | |
SVM | 25 | 1 w | 1 | 100 | 173.6 | 103.1 | 83.1 | 3.5 | 143.7 |
Legend: McW – market cap weighted strategy, EqW – equally weighted strategy, N– number of currencies to be invested/used to construct portfolio, RE – the width of the reallocation period between the portfolio reallocation days, %TC – the total transaction costs taken as the percentage of the total transaction value of the portfolio, V– the threshold value (USD) of the 14-day moving average of daily volume, %ARC − annualized rate of return, %ASD − annualized standard deviation in percent, %MDD − maximum drawdown of capital in percent, IR1, IR2 − information ratios, %MT – the mean portfolio turnover ratio in percent.
Buy&hold strategy on bitcoin (BTC B&H) demonstrates more than 10 times larger %ARC than buy&hold on S&P500 index (S&P B&H); however, both the risk and maximum drawdown in terms of %ASD and %MDD are approximately 5 times larger. The resulting values of IR1 and IR2 are 2 and 5 times larger for BTC B&H, respectively.
%ARC of BTC B&H is higher than the market cap weighted strategy (McW), although the difference is not relatively high. It can be explained intuitively as the predominant part of the McW portfolio consists of bitcoin.
The values of information ratios IR1/IR2 are also comparable, which means that the amount of return per unit of risk is the same as bitcoin dominates the McW portfolio.
EqW outperforms all the benchmark strategies and also the SVM strategy. It gives the highest values of ARC and IR1/IR2 demonstrating abnormal returns. Therefore, sorting the performance of portfolios according to IR1, the strategy with the highest value is EqW. EqW outperforms SVM more than two times as well as the other benchmark strategies. The sequence of other strategies according to IR1 is as follows: BTC B&H, McW and SVM, which is on the fourth place outperforming only S&P B&H.
The main hypothesis that the investment strategy based on SVMs algorithm outperforms benchmark strategies can be rejected based on these results. SVM portfolio with the long positions only gained the fourth result according to IR1 after EqW, McW and BTC B&H. Additionally, it is the riskiest one according to the value of %ASD and %MDD, meaning that SVMs algorithm selects the cryptocurrencies that are relatively volatile. Additionally, the mean portfolio turnover for SVM strategy is 14 times larger than for EqW strategy, which is the reason for high transaction costs, and consequently, lower portfolio net value. Overall, one may invest equal weights into Top100 cryptocurrencies, incur no additional costs of implementing more sophisticated strategy and yet get abnormal returns on the cryptocurrency market in comparison to simple B&H or more sophisticated strategies.
Plots of the equity lines and drawdowns for the SVM strategy in comparison to the benchmark strategies can be found in Fig. 4.1 and Fig. 4.2, respectively.
As can be seen in Fig. 4.2, SVM and EqW strategies reach the ‘deepest’ drawdown if compared to other strategies. Conversely, S&P B&H demonstrates the most stable returns.
The research questions of this study were formulated around the sensitivity analysis, namely how sensitive are the results of portfolio performance to the model parameters. The sensitivity analysis of SVM strategy is performed for the following four parameters:
Number of cryptocurrencies kept in the portfolio N = 5, 10, 15, 20, 25, VAR. VAR means that any number (between 0 and 100) of cryptocurrencies selected by SVM output for buying are included in the portfolio. This number varies from one reallocation period to another.
Reallocation period RE: 3d (3 days), 1w (7 days), 1m (30 days).
Percentage value of the transaction costs TC: 0.5%, 1.0%, 2.0%.
Training data size TS: ~ 25%, ~ 50%, ~100%.
The parameters that were set as fixed are the following:
Length of historical data taken to calculate technical features: 10d (10 days).
Lambda λ used to calculate exponential moving average for returns: 0.94.
Meta-parameters C and γ are being chosen for eachreallocation period via the tuning algorithm. These parameters are sequenced as follows: (0.5, 1, 2, 4). There can be a different set of optimal parameters for each reallocation day and it is not overseen. The choice of meta-parameters is described in section 2.6.
Length of training data: 3 months (91 days).
Long positions only assumptions.
The fixed parameters are kept constant according to the assumptions of the author. Only four parameters are chosen to participate in the sensitivity analysis: reallocation period RE, the percentage value of the transaction costs TC, number of cryptocurrencies kept in the portfolio N and training data size TS. Descriptive statistics for SVM strategy and the performance of the portfolios are presented in Tab. 4.2. At the end of this table, we additionally attach the best-selected set of parameters for the SVM strategy, which demonstrated the performance of the strategy with the highest IR1.
Descriptive statistics for SVM strategy (sensitivity analysis). Descriptive statistics for the benchmark strategies have been placed above for convenient comparison.
Benchmark Strategies | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Name | %ARC | %ASD | %MDD | IR1 | IR2 | %MT | ||||
S&P B&H | 13.6 | 15.5 | 14.2 | 0.9 | 0.8 | |||||
BTC B&H | 147.4 | 76.8 | 69.7 | 1.9 | 4.1 | 6.3 | ||||
EqW | 425.8 | 96.2 | 81.7 | 4.4 | 23.1 | 10.8 | ||||
McW | 141.9 | 74.9 | 73.1 | 1.9 | 3.7 | 6.3 | ||||
25 | long only | 50 | 1 | 19.4 | 108.7 | 90.6 | 0.2 | 0.0 | 115.3 | |
25 | long only | 50 | 1 | 224.2 | 101.5 | 86.0 | 2.2 | 5.8 | 148.8 | |
long only | 50 | 1w | 1 | -21.8 | 142.2 | 95.1 | -0.2 | 0.0 | 189.3 | |
long only | 50 | 1w | 1 | 89.3 | 131.7 | 85.0 | 0.7 | 0.7 | 176.8 | |
long only | 50 | 1w | 1 | 207.2 | 115.7 | 82.0 | 1.8 | 4.5 | 166.2 | |
long only | 50 | 1w | 1 | 215.9 | 110.0 | 82.3 | 2.0 | 5.1 | 154.3 | |
long only | 50 | 1w | 1 | 326.4 | 92.6 | 57.6 | 3.5 | 20.0 | 105.6 | |
25 | long only | 1w | 1 | 177.9 | 103.3 | 85.1 | 1.7 | 3.6 | 144.3 | |
25 | long only | 1w | 1 | 210.6 | 103.6 | 85.5 | 2.0 | 5.0 | 160.5 | |
25 | long only | 50 | 1w | 368.8 | 110.2 | 76.5 | 3.3 | 16.1 | 155.4 | |
25 | long only | 50 | 1w | 29.6 | 110.9 | 88,1 | 0.3 | 0,1 | 154.9 | |
N | Position | %TS | RE | %TC | %ARC | %ASD | %MDD | IR1 | IR2 | %MT |
VAR | long only | 50 | 1m | 1 | 392.43 | 88.97 | 53.45 | 4.41 | 32.38 | 105.9 |
Legend: McW – market cap weighted strategy, EqW – equally weighted strategy, N– number of currencies to be invested/used to construct portfolio, %TS – training data size, RE – the width of the reallocation period between the portfolio reallocation days, %TC – the total transaction costs taken as the percentage of the total transaction value of the portfolio, %ARC − annualized rate of return, %ASD − annualized standard deviation in percent, %MDD − maximum drawdown of capital in percent, IR1, IR2 − information ratios, %MT – the mean portfolio turnover ratio in percent.
The base case for the SVM strategy is presented in Tab. 4.2 together with benchmark strategies. As a reminder, the parameters for the base case are as follows: N = 25, RE = 1w, TC = 1%, TS ~ 50%.
If reallocation period is changed from 1 week to 3 days, the performance becomes much worse, IR1 and %ARC drops several times. Such poor performance can be explained by high transaction costs, even though %MT is lower for RE 3d. This can be explained by the fact that when the reallocation period is 3 days, the change of assets in the portfolio is more dynamic if compared to the 7-day reallocation. The transaction costs for 3-day reallocation are higher because we reallocate the portfolio 1.87 (= 2.33*115.3/143.7) times more often. Similar situation, but in the opposite direction occurs when we change the reallocation period from 1w to 1m. the results significantly improve in terms of %ARC, IR, and IR2. So, the length of the reallocation period significantly impacts the portfolio performance.
Analysing the sensitivity to parameter N, we can see that the worst performance is noticed when we keep only 5 coins in the portfolio during a reallocation period. The lower the number of coins in the portfolio, the higher is the portfolio turnover. Consequently, with N changing from VAR to 5, and accordingly higher %MT, the statistics demonstrate decreasing figures for %ARC, IR1 and IR1 and increasing values of %ASD and %MD. The best results are observed for the varying number of cryptocurrencies in the portfolio (meaning any number advised by SVM output for buying are kept in the portfolio). Additionally, statistics are very sensitive to the parameter N.
If we change the training size (%TS) through 25%, 50% and 100% (for example, for 50%, it is 25 coins from class + and 25 coins from class −), it does not exercise significant impact on the performance of our strategies, but we still observe that the best results are observed for the smallest training size.
Performance of the portfolios heavily depends on the magnitude of transaction costs but it is rather straightforward. For %TC equalled 0.5, the annual return is substantially higher than for %TC equalled 1.
Therefore, the common feature for the sensitivity analysis is that the shorter the reallocation period and the lower the number of cryptocurrencies, the lower is the performance for the strategy measured by IR1 and IR2. The performance of the portfolios heavily depends on the magnitude of transaction costs and relatively to a lesser extent depends on the change of training size. Addressing the research questions stated in the beginning, the strategy results are significantly sensitive to the three out of four chosen parameters. So, the model does not provide robust results.
Fig. 4.3 and Fig. 4.4 presents respectively the equity lines for the SVM strategies with the varying parameters such as reallocation period RE and number of assets N kept in the portfolio.
Equity lines with the varying parameters such as transaction costs %TC and length of training set TS are presented in Fig. 4.5 and Fig. 4.6, respectively.
The main aim of this paper was to apply the SVM algorithm to build an investment strategy for the cryptocurrency market and investigate its profitability. The research hypothesis was that the strategy based on the SVM algorithm is able to outperform the benchmark strategies in terms of return-risk relation. The results of this investigation were reported for the period between 2015-01-01 and 2018-08-01. The main hypothesis that the investment strategy based on the SVMs algorithm outperforms benchmark strategies is rejected based on the IR1 values.
The main methodology concepts were based on the research paper ‘Nonlinear support vector machines can systematically identify stocks with high and low future returns’ by Huerta
SVM was implemented to build a trading strategy in the following way. The training set is a tail set that is defined to be a group of coins whose volatility-adjusted price change is in the highest and lowest quintile. Each asset is presented by a set of six technical features. SVM is trained on historical tail sets and tested on current data. The classifier is chosen to be a nonlinear support vector machine. The SVM is trained and tested once per reallocation period. The portfolio is formed by ranking coins using the SVM output. The highest ranked coins are used for long positions.
Our results show that EqW portfolio outperforms all the benchmark strategies and also the SVM strategy. It gives the highest values of IR1 and IR2 demonstrating abnormal returns. The performance of the SVM strategy was ranked the fourth being better only from S&P B&H strategy. Therefore, the main hypothesis stated at the beginning of this paper was rejected based on the IR1 values.
The SVM strategy has not demonstrated abnormal returns. Moreover, the results are not stable and the algorithm itself does not provide robust outcomes. The performance of the portfolio is extremely sensitive to the parameters. In this study, only the influence of the four parameters has been checked. The performance is highly sensitive to the number of assets kept in the portfolio. If we include only these assets recommended by the SVM function output in the portfolio (N = VAR), the results get closer to the best EqW strategy. The magnitude of transaction costs and the length of the reallocation period heavily impact the performance statistics as well. Only the size of the training set does not have any significant impact on the outcome.
It is important that quite a large number of parameters that are deemed to be fixed in our analysis might influence the final results of the portfolio performance. Especially, the choice of the meta parameters C and γplay a very important role. Actually, the strategy produces notably different figures due to the fact that the method of choice of meta-parameters is greed search with fixed sampling. The computer power does not allow estimating the optimal parameters for the whole training set and it is the reason why the cost and gamma can be different if we run the analysis for a broader set of possible values. As an application of SVM implies setting of quite a large number of parameters, this makes the model very prone to the problem of overfitting. Therefore, length of historical data taken to calculate technical features, lambda λ used to calculate exponential moving average for returns, length of training data which were fixed parameters in the model can influence the final results of our analysis.
‘Buy’ candidates for the portfolio are defined by the SVM output based on the rule that assets are included in the portfolio, if their returns are predicted to grow. It implies that the decision is guided mainly by momentum rule. We invest in those assets whose returns are predicted to grow. As was shown in the paper by Kość