Robustness of Support Vector Machines in Algorithmic Trading on Cryptocurrency Market

The method of support vectors was developed by Vladimir Vapnik in 1995 and was first applied to the task of classification of texts by Joachims in 1998. In its original form, the algorithm solved the problem of distinguishing objects of two classes. The method has gained immense popularity due to its high efficiency. Many researchers used it in their work in the classification of texts. The approach proposed by Vapnik to determine to which of the two predefined classes the sample should belong adheres to the principle of structural minimization of risk.

The results of the classification of texts using the support vector method are among the best, in comparison with other machine learning approaches. However, the learning speed of this algorithm is one of the lowest. The method of SVM requires a large amount of memory and exercises a significant computational load for the computer to perform the training. Summing up, the simplicity combined with state of the art performance on many learning problems (classification, regression, and novelty detection) has contributed to the popularity of SVM.

SVM is also quite a popular algorithm for building trading systems. It was used mostly to predict stock or index price movement whether it will go up or down. However, hardly any papers reveal how SVM performs on the cryptocurrency market where the asset price volatility is much higher than on traditional markets.

The goal of this paper is to apply the SVM algorithm to build an investment strategy for the cryptocurrency market and investigate its profitability. The research hypothesis is that the strategy based on the SVM algorithm is able to outperform the benchmark strategies in terms of return-risk relation. Similar to Ślepaczuk et al. (2018), we will use the following metrics to estimate the portfolio profitability: %ARC (the annualized rate of change), %ASD (the annualized standard deviation of daily returns), MDD (the maximum drawdown coefficient), IR1, IR2 (the information ratio coefficients). The assessment of the research hypothesis will be based on the value of the IR1, which quantifies the return-risk ratio. The research questions addressed in this study are formulated around the sensitivity analysis results, namely what the sensitivity of the portfolio performance to the main parameters set is in the SVM model. Four parameters were selected in the sensitivity analysis. These parameters are the number of cryptocurrencies kept in the portfolio, the reallocation period, the percentage value of the transaction costs, and training data size TS. The parameters that were set as fixed are the following: length of historical data taken to calculate technical features, lambda λ used to calculate exponential moving average for returns, meta parameters C and γ, length of training data and long positions only assumption.

The main idea and methodology concepts were adopted from the research paper ‘Nonlinear support vector machines can systematically identify stocks with high and low future returns’ by Huerta et al. (2013) and ‘Momentum and contrarian effects on the cryptocurrency market' by Kość et al. (2018).

SVM is implemented to build a trading strategy in the following way. The training set is basically a tail set, which is defined to be a group of coins whose volatility-adjusted price change is in the highest or the lowest quintile, for example, the highest and the lowest 25 coins. Each coin is represented by a set of technical features. A classifier is trained on historical tail sets and tested on the current data. The classifier is chosen to be a nonlinear support vector machine. The SVM is trained once per reallocation period. If the portfolio is reallocated once per week, the SVM is trained once per week accordingly. The portfolio is formed by ranking coins using the classifier output. The highest ranked coins are used for long positions and the lowest ranked potentially can be used for short sales. The data cover the period from 01/01/2015 to 08/01/2018.

The structure of the paper can be summarized as follows. After the literature review in the theoretical part, a short introduction to support vector machines is provided. There are three concepts such that the maximal margin classifier, the support vector classifier, and the support vector machine. The second section focuses on data, methodology and strategy implementation. The third part provides empirical results in comparison with benchmark investment strategies. Sensitivity analysis is the final part of the thesis.

1.1

Literature review

The theoretical background for SVM is based mainly on James et al. (2017) and Hastie et al. (2017). The nonlinear SVM classifier was chosen to run the strategy in this paper because it worked well in multiple previous applications. Additionally, it was proven to be convenient to use and fast to train (Vapnik [1999] and Muller et al. [2001]).

There are many methods in machine learning (e.g., neural networks) that might work as good as SVM, but the simplicity of the mathematical functions and the theory that guides the training of the model as a convex optimization problem (Boyd and Vandenberghe, 2004) make SVMs a good option. An important trait of convex optimization problems is a guarantee that there is only an optimal model to fit the data.

In the previous literature, the application of SVMs to financial data has been mostly dedicated to the prediction of the future direction of the stock price index. For example, the study of Kim (2003) examines the feasibility of applying SVM in financial forecasting by comparing it with back-propagation neural networks and case-based reasoning. The experimental results proved that SVM provides a promising alternative to stock market prediction.

Another example is the paper of Van Gestel et al. (2001), where SVM was used for one step ahead prediction of the 90-day T-bill rate in secondary markets and predict the daily closing price return of the German DAX30 index. The SVMs were used for regression instead of classification, and the feature vector was based on lagged returns of the index, bond rates, S&P500, FTSE, and CAC40. That paper also showed that a rolling approach to select the most optimal meta parameters demonstrated better performance. The rolling approach is based on the selection of the meta parameters via using all historical information so far available.

Additional examples of SVM regression for futures index prediction are found in Tay and Cao (2001, 2002) and Cao and Tay (2003), where also was proven that SVMs provide a promising alternative to the neural network for financial time series forecasting. As demonstrated in the experiment, the SVMs forecast significantly outperformed the BP network in the CME-SP, CBOT-US, CBOT-BO and MATIF-CAC40 futures and slightly better in the EUREX-BUND.

In the work of Huang et al. (2005), they investigated the predictability of financial movement direction with SVM by forecasting the weekly movement direction of NIKKEI 225 index. The empirical results showed that SVM outperforms the other classification methods such as Linear Discriminant Analysis, Quadratic Discriminant Analysis, and Elman Backpropagation Neural Networks.

Kim (2003) also used SVM as a classification method to predict the direction of the market’s movement. The paper emphasized the importance of the meta parameter assumptions and how the prediction performances of SVMs are sensitive to the value of these parameters.

Huang et al. (2004) developed an interesting approach, which uses fundamental data to predict credit rating. SVMs are applied as a successful classifier of the ratings for the companies. The research revealed the fact that different markets and sectors possess distinct factors for classification.

The most related work that contributed to the idea of this paper is Huerta et al. (2013), which investigates the profitability of a trading strategy based on training SVMs to identify stocks with high or low predicted returns. This is the only paper found in the open internet space which applies SVMs for running an investment strategy for the stock market. Even though our paper is dedicated to testing SVMs in an investment strategy for cryptocurrencies, many recommendations and assumptions from Huerta et al. (2013) had been adopted.

Another paper that contributed to the methodological and strategy implementation part is the work of Kość et al. (2018), which investigates the momentum and contrarian effects on cryptocurrency markets. The performance of investment portfolios was benchmarked against (1) equally weighted and (2) market-cap weighted investments as well as against the B&H strategies based on (3) S&P500 index and (4) BTCUSD price. According to the results, the cryptocurrency market clearly demonstrates the existence of a strong contrarian effect.

Regarding the application of SVMs on the cryptocurrency market, there is no such extensive research that has been conducted if compared to the research for the stock market. For example, Chen et al. (2018) checked the accuracy of predictor models for price changes in ethereum, and the performance of SVMs has not turned to be the best.

Many of the published papers using SVMs for financial data mention the influence of meta-parameter selection on the performance of the model. It is a key issue to make sure that the selection of meta parameters is free from forward-looking bias. In order to avoid this widespread problem, the meta parameters must be chosen based only on historical information.

There was always an important point for discussion concerning the choice of the type of SVM: linear versus nonlinear. Linear SVMs are fast to train and run, but these types of SVMs tend to underperform on complex datasets with many training examples. As was proven in Huerta et al. (2013), linear SVMs resulted in inferior returns if compared to those yielded by nonlinear SVMs.

2

Theoretical background regarding SVM

Support Vector Machines (SVM) is one of the most popular algorithms for classification. It is based on the assumption that in a multidimensional space, there exists a hyperplane that may separate the data into classes.

SVM can be generalized out of a simple classifier, which is called the maximal margin classifier. It is only feasible to apply the linear classifier to the data sets, which are linearly separable. Unfortunately, most of the data sets have classes that cannot be separated by a linear boundary. The maximal margin classifier was extended to the support vector classifier, which in turn can be applied to a wider range of data sets. Support vector machines are a further extension of the support vector classifier, which already can be applied for fitting data with non-linear class boundaries.

If we deal with non-linear class boundaries, the problem can be solved by extending the dimension via using quadratic, cubic or higher-order polynomial functions of the features. For example, rather than applying a support vector classifier with p features space, there can be used a support vector classifier fit via using 2p features space:

(2.1)

X_{1}, X_{1}^{2}, X_{2}, X_{2}^{2}, ..., X_{p}, X_{p}^{2} .

$${{X}_{1}},X_{1}^{2},{{X}_{2}},X_{2}^{2},...,{{X}_{p}},X_{p}^{2}.$$

The optimization problem takes the following form: Maximize M

β0, β11, β12. . . , βp1, βp2, ε₁, …, ε_n, M subject to

(2.2)

\begin{matrix} y_{i} (β_{0} + \sum_{j = 1}^{p} β_{j 1} x_{i j} + \sum_{j = 1}^{p} β_{j 2} x_{i j}^{2}) \geq M (1 - ε_{i}) \\ ε_{i} \geq 0, \sum_{i = 1}^{n} ε_{i} \leq C, \sum_{j = 1}^{p} \sum_{k = 1}^{2} β_{j k}^{2} = 1 \end{matrix}

$$\begin{array}{*{35}{l}}{{y}_{i}}\left( {{\beta }_{0}}+\sum\nolimits_{j=1}^{p}{{{\beta }_{j1}}{{x}_{ij}}}+\sum\nolimits_{j=1}^{p}{{{\beta }_{j2}}x_{ij}^{2}} \right)\ge M\left( 1-{{\varepsilon }_{i}} \right) \\{{\varepsilon }_{i}}\ge 0,\sum\nolimits_{i=1}^{n}{{{\varepsilon }_{i}}\le C,\sum\nolimits_{j=1}^{p}{\sum\nolimits_{k=1}^{2}{\beta _{jk}^{2}=1}}} \\\end{array}$$

So, SVM is an enlargement of the support vector classifier that realizes through extending the feature space via using kernels that are considered to be a specific efficient computational approach. A kernel is a kind of a function that is able to quantify the similarity of a pair of observations. For example, one may consider the following expression:

(2.3)

K (x_{i,} x_{i^{'}}) = \sum_{j = 1}^{p} x_{i j} x_{i^{'} j},

$$K\left( {{x}_{i,}}{{x}_{{{i}'}}} \right)=\sum\nolimits_{j=1}^{p}{{x}_{ij}}{{x}_{{i}'\,j}},$$

that will just return the support vector classifier. The equation (2.3) is called a linear kernel due to the fact that the support vector classifier is linear in the predictors; the linear kernel basically quantifies the similarity of two observations applying the Pearson correlation. However, it is possible to choose another form for (2.3). For example, one may substitute every output of $\sum_{j = 1}^{p} x_{i j} x_{i^{'} j}$ $\sum _{j=1}^{p}{{x}_{ij}}{{x}_{{i}'j}}$ with the instance obtained via the following expression:

(2.4)

K (x_{i}, x_{i^{'} j}) = {(1 + \sum_{j = 1}^{p} x_{i j} x_{i^{'} j})}^{d} .

$$K\left( {{x}_{i}},{{x}_{{i}'j}} \right)={{\left( 1+\sum\nolimits_{j=1}^{p}{{x}_{ij}}{{x}_{{i}'j}} \right)}^{d}}.$$

The expression stated above is a polynomial kernel with degree d, where d is a positive integer. Applying such a kernel instead of the linear kernel as shown in (2.4), the algorithm demonstrates a much more flexible decision boundary. Rather than in the initial feature space, it basically results in running a support vector classifier with a higher-dimensional space using polynomials of degree d. The resulting classifier is called a support vector machine when a non-linear kernel is applied such as shown in (2.4). So, the non-linear function takes the form of the following equation:

(2.5)

f (x) = β_{0} \sum_{i = 1}^{N} α_{i} y_{i} K (x, x_{i}),

$$f\left( x \right)={{\beta }_{0}}\sum\nolimits_{i=1}^{N}{{{\alpha }_{i}}{{y}_{i}}K\left( x,{{x}_{i}} \right)},{\ }$$

where

–

x_i is all the vectors of the training set.

–

N is the number of training examples used to fit the SVM parameters.

–

a_i is a scalar, that is, a real number that takes values between 0 and C. C may be deemed as a budget defined by the number allowing the margin to be violated by the n observations. If C equals zero, it means that there is no budget for violations to the margin.

–

y_i identifies whether the feature vector x_i of the object i belongs to the tail set with class + or class –.

–

β₀ is obtained by training the SVM and is a scalar that shifts the output of the SVM by a constant.

–

K(x, x_i) is the kernel function, that takes two vectors as inputs and produces a single scalar value, which is positive.

The polynomial kernel that was outlined in (2.4) serves as just one instance of a possible non-linear kernel. There exist ample alternatives. One very wide-spread alternative is the radial kernel, that has the following form:

(2.6)

K (x_{i}, x_{i^{'}}) = \exp (= - γ \sum_{j = 1}^{p} {(x_{i j} - x_{i^{'} j})}^{2}) .

$$K\left( {{x}_{i}},{{x}_{{{i}'}}} \right)=\exp \left( =-\gamma \sum\nolimits_{j=1}^{p}{{\left( {{x}_{ij}}-{{x}_{{i}'j}} \right)}^{2}} \right).$$

In (2.6), γ (gamma) is a tuning meta parameter that is a positive constant. Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’.

Fig. 2.1 demonstrates two examples for non-linear data where SVM is run with a polynomial kernel. So, looking at Fig. 2.1, in both cases, either kernel is able to capture the decision boundary; however, SVM with a radial kernel demonstrates far better fitting.

Advantages of SVM model are such that: it is very efficient when groups are fully or almost fully separable, can work very well when there are more independent variables than observations, can be adjusted to work well with unbalanced datasets, has only just several parameters to tune and is partially immune to outliers. Disadvantages are as follows: SVM is very slow for a large number of observations and it is not the most efficient when separation is low. To conclude, the type of SVMs used in the paper is the non-linear radial kernel.

3

Data, methodology, and strategy implementation

3.1

Data and filtering

The data were downloaded from coinmarketcap.com.As of 05/08/2018, there were 1732 coins listed on the website. Data include OHLC prices, Volume, and MarketCap starting from 27/12/2013 and ending by 05/08/2018, as decided by the authors.

As of 05/08/2018, the number of all listed cryptos totals 1732, while as of 01/12/2017 there were ~ 1500 cryptos. In such a way, despite the current downturn in the cryptocurrency market, the number of cryptocurrencies is constantly growing.

The total market capitalization of the cryptocurrency market reached its peak in January 2018, totalling almost 800 billion USD, which is equal ~ 2.5% of S&P 500 Total Market Cap.⁽¹⁾ Nowadays, the total market cap of the cryptocurrency market is undergoing 4 times decline if compared to the peak at the end of the year. The recent decreasing trend may reflect investors’ sentiment, which has been impacted by negative news concerning new market regulations and ICOs frauds. Additionally, there are some claims that the highs of 2017 may have been manipulated and artificially inflated.

As can be noticed in Fig. 3.1, till March 2017, the market was dominated by Bitcoin (90% of the total market cap of cryptocurrency market). In 2017–2018, Bitcoin lost its position and allowed other assets to share the market. As of 05/08/2018, the share of Bitcoin comprised of 47.44% of the total market, Ethereum – 16.21%, Ripple – 6.66%, and Others – 19.69%.

The 14-day moving average of the volume was calculated for each asset, and those which do not meet the filter threshold of 100 USD are excluded from further usage. Such application of filtering ensures that the investment portfolio meets the minimum liquidity requirement. An additional filter is applied regarding the price history of an asset. Only those cryptocurrencies that have 91 days and more of history for the close price are qualified to constitute the data set.

After that, there was created a ranked set of 100 cryptocurrencies by the largest market cap. This set of 100 cryptocurrencies with the largest market cap will be referred to further in the thesis as the Top100.

The starting date for the simulation period was taken as 01/10/2014 so to ensure the availability of at least 3-month historical data to create a training set and start running the strategy on 01/01/2015. The ranking with 100 largest cryptocurrencies (Top100) was calculated for each day of the simulation. From the set of 1438 unique assets, only 475 have been qualified to enter the Top100 ranking for at least one day, as well as having taken into consideration the condition of the 14-day moving average of the daily volume being higher than 100 USD.

3.2

Performance statistics

The following measures were used to provide descriptive statistics for the data and further these measures were applied in the evaluation of portfolios’ efficiency:

– ARC (the annualized rate of change):

(3.1)

ARC = {(1 + \frac{P_{T}}{P_{0}})}^{\frac{365}{T}} - 1

$$\text{ARC}={{\left( 1+\frac{{{P}_{T}}}{{{P}_{0}}} \right)}^{\frac{365}{T}}}-1$$

where P_T stands for the portfolio value after the T-th period.

– ASD (the annualized standard deviation of daily returns):

(3.2)

ASD = \sqrt{\frac{365}{T} \sum_{t = 1}^{T} {(r_{1} - \bar{r})}^{2}}

$$\text{ASD}=\sqrt{\frac{365}{T}\sum\nolimits_{t=1}^{T}{{{\left( {{r}_{1}}-\bar{r} \right)}^{2}}}}\,$$

where $r_{t} = \frac{P_{t}}{P_{t - 1}} - 1$ ${{r}_{t}}=\frac{{{P}_{t}}}{{{P}_{t-1}}}-1$

and $\bar{r}$ $\bar{r}$ is an average return calculated as the simple mathematical average of a series of returns generated over one year.

– MDD (the maximum drawdown coefficient):

(3.3)

MDD (T) = \max_{τ \in [0, T]} (\max_{τ \in [0, T]} P_{t} - P_{τ})

$$\text{MDD}\left( T \right)={{\max }_{\tau \in [0,T]}}\left( {{\max }_{\tau \in [0,T]}}{{P}_{t}}-{{P}_{\tau }} \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,$$

– IR1, IR2 (the information ratio coefficients) quantify the risk-weighted gain:

(3.4)

IR 1 = \frac{A R C}{A S D}

$$\text{IR}1=\frac{ARC}{ASD}$$

(3.5)

IR2 = \frac{s i g n (A R C) A R C^{2}}{ASD * MDD}

$$\text{IR2}=\frac{sign\left( ARC \right)AR{{C}^{2}}}{\text{ASD}*\text{MDD}}$$

Tab. 3.1 presents descriptive statistics for the 10 largest and 10 smallest cryptocurrencies as of 01/01/2018. As can be seen from Tab. 3.1, the values of %ARC, %ASD, IR, and IR2 have huge differences across the cryptocurrencies. Quite a lot of new assets such as loom-network, cybermiles, nuls bibox-token, and odem appeared just a couple of months ago, but already took the position in Top100. For example, odem behaves as an investment ‘star’ demonstrating abnormal return and the lowest drawdown. This crypto was built on the Ethereum blockchain and stands for On-Demand Education Marketplace.

Tab. 3.1

Descriptive statistics for 10 largest and 10 smallest cryptocurrencies by MarketCap in TOP100 as of date 01-08-2018

		The largest 10 cryptocurrencies in TOP100 as of 01-08-2018

Name	%ARC	%ASD	%MDD	IR1	IR2	Date of start	Volume, mUSD	MarketCap, USD
bitcoin	118	75.8	69.7	1.6	2.6	01-10-2014	1888	43839225862
ethereum	437.7	145.2	84.3	3	15.7	20-08-2015	323	17124399552
ripple	226.7	164.6	87.1	1.4	3.6	01-10-2014	499	13468236361
bitcoin-cash	83.5	198.5	84.4	0.4	0.4	05-08-2017	699	6672807179
eos	516	253	87.9	2	12	14-07-2017	78	5220519698
stellar	228.6	178.8	82.6	1.3	3.5	01-10-2014	301	4634665748
litecoin	111.1	119.2	79.1	0.9	1.3	01-10-2014	80	3709789644
cardano	698.6	263.8	89.3	2.6	20.7	14-10-2017	32	2624893338
iota	48.4	188.4	82.9	0.3	0.1	26-06-2017	3059	2460207729
tether	-5.4	45.7	49.9	-0.1	0	15-03-2015	140	2233258238

		The smallest 10 cryptocurrencies in TOP100 as of 01-08-2018

Name	%ARC	%ASD	%MDD	IR1	IR2	Date of start	Volume, mUSD	MarketCap, USD

loom-network	649.7	217.4	80	3	24.3	21-04-2018	2.8	98040413
gas	365.8	265.4	89.6	1.4	5.6	09-08-2017	2.6	91875052
tenx	-97.6	247	99.2	-0.4	-0.4	10-07-2017	8.1	91154578
nxt	34.2	158.1	95.6	0.2	0.1	01-10-2014	2.8	90165499
cybermiles	-44.8	202.2	87.4	-0.2	-0.1	04-05-2018	7.1	88375828
nuls	5083.5	382.2	79.1	13.3	854.8	22-03-2018	4.1	88067102
byteball	160	236.8	91.2	0.7	1.2	09-01-2017	0.56	86950232
bibox-token	53.2	265.7	87.9	0.2	0.1	08-06-2018	67.5	83456610
odem	89471	229.6	40.7	389.7	856638	01-08-2018	0.135	82906522
electroneum	-92.5	243	95.3	-0.4	-0.4	15-11-2017	0.552	81946739

Legend: %ARC- annualized rate of return as in (3.1), %ASD - annualized standard deviation in percent as in (3.2), %MDD – maximum drawdown of capital in percent as in (3.3), IR1, IR2 - information ratios as in (3.4) \and (3.5) accoringly, 'Date of Start' - date of first appearance in Top100.

The values for the maximum drawdown are also relatively large. If odem is not taken into consideration, other cryptocurrencies in the set have %MDD in the range from 50% to 99%. For comparison purposes, the S&P500 index has noted only ~ 14% drawdown in the same simulation horizon (see Tab. 4.1 in the next section).

3.3

Construction of the training data

One of the key issues in this investigation is how to form the tail sets that constitute the positive and negative classes of the training data. As a metric, there were chosen returns divided by a volatility estimate. This option creates an ordered list of coins with volatility-adjusted returns. The volatility is estimated here by an exponential moving average:

(3.6)

\begin{matrix} {\hat{σ}}_{0}^{2} = 0 \\ {\hat{σ}}_{1}^{2} = λ {\hat{σ}}_{0}^{2} + (1 - λ) r_{0}^{2} \\ {\hat{σ}}_{1}^{2} = λ {\hat{σ}}_{0}^{2} + (1 - λ) r_{0}^{2} \\ ⋮ \\ {\hat{σ}}_{t + 1}^{2} = λ {\hat{σ}}_{t}^{2} + (1 - λ) r_{t}^{2} \end{matrix}

$$\begin{array}{*{35}{l}}\widehat{\sigma }_{0}^{2}=0 \\\widehat{\sigma }_{1}^{2}=\lambda \widehat{\sigma }_{0}^{2}+\left( 1-\lambda \right)r_{0}^{2} \\\widehat{\sigma }_{1}^{2}=\lambda \widehat{\sigma }_{0}^{2}+\left( 1-\lambda \right)r_{0}^{2} \\\vdots \\\widehat{\sigma }_{t+1}^{2}=\lambda \widehat{\sigma }_{t}^{2}+\left( 1-\lambda \right)r_{t}^{2} \\\end{array}$$

The volatility-adjusted return is the ratio of the daily return and the estimation of the volatility according to formula 3.6:

(3.7)

a d j R_{t} = \frac{R_{t}}{{\hat{σ}}_{t}}

$$adj\,\,\,{{R}_{t}}=\frac{{{R}_{t}}}{{{\widehat{\sigma }}_{t}}}$$

where R_t is a daily return, $\hat{σ}$ $\widehat{\sigma }$_t is the exponential moving average estimate of the volatility calculated as shown in (3.6) with λ set to 0.94.

It was shown in Huffman et al. (2011) that computational load is higher when the volatility is calculated using standard deviations. Moreover, the results revealed no significant difference between these two methods for volatility estimation. Therefore, the way to calculate the volatility-adjusted return as presented in (3.7) is the preferred option.

In order to run calculations for each reallocation period, the volatility-adjusted returns for the 3-month period (91 days) for the assets in Top100 for each day were calculated. Long data were filtered by the condition to ensure the presence of 91 days of history. Then for each day in the range of these 91 days, there were calculated discrete daily returns. Using these daily returns, the estimate of the volatility (an exponential moving average) was calculated according to the algorithm presented in (3.6).

As can be seen, the first sigma is 0. Lambda was set to 0.94 by default as a value more frequently used for exponential moving average computations. Then the volatility adjusted returns are calculated according to (3.7). As a result, for each day for each coin in Top100, the volatility-adjusted returns are calculated for the 3-month period (91 days).

For illustration purposes, the distribution of volatility-adjusted returns with 3-month history from 01/10/2014 to 12/31/2014 is shown in Fig. 3.2. As an example, the positive tail sets are the B most positive volatility-adjusted returns, and the negative tail sets are the B most negative. B can be 5, 10 or 15 coins in the data set. The values for the training size will be provided later in this chapter in terms of sensitivity analysis. The vertical lines represent the cut-off indicating how many assets will be used for SVM training. The + and − regions are the ones used for that. In such a way, to train SVM for 01/01/2015, there will be used 3-month data from 01/10/2014 to 12/31/2014 with volatility-adjusted returns, for 01/02/2015 − data from 02/10/2014 to 01/01/2015.

Further, in the paper, TS will be denoted as the length for the training set, or in other words, the number of volatility-adjusted returns used in designing the training data sets.

3.4

Selection of technical features

Each coin on day t should be characterized by a vector of technical features xi(t). Features are chosen based on their popularity in the academic literature. The list of features used is presented in Tab. 3.2.

Tab. 3.2

Technical indicators used for the creation of feature set to train SVMs

Feature	Full name	Parameters
MOM, n days	Momentum for close prices, n days	n = 10 days
ΔV, n days	Volume change n days	n = 10 days
RSI	Relative Strength Index	n = 10 days
FI	Force Index	N/A
Williams %R	Williams Percent Range	n = 10 days
PSAR	Parabolic stop and reversal system	Acceleration factor by default set to 2% increasing by 2% with a maximum of 20%

Note: the table contains the list of six technical features used for running SVM and parameters set for calculations purposes.

Momentum has been one of the well-recognized phenomenon described in the academic literature; see Jegadeesh et al. (2012) and Rouwenhorst (2002). They stated that stocks with high (low) returns over periods of three to 12 months keep having high (low) returns over subsequent three to 12 month periods. So, momentum for close prices of cryptocurrencies was included in the feature set.

Another feature, the volume change, is an indicator catching underreactions and overreactions in stock price movements. If a price movement happens with large volume, the price change is more significant than if it occurs with low volume; see Chordia et al. (2002). To capture this effect, a percentage change of the daily trading volume was included in the feature data set.

RSI is a momentum indicator that captures the magnitude of the latest price changes in order to estimate whether the market is overbought or oversold. Thus, it is predominantly used to identify overbought or oversold market conditions. RSI is calculated as follows:

(3.8)

RSI = 100 - 100 / (1 - RS)

$$\text{RSI}=100-100/\left( 1-\text{RS} \right)$$

where RS is an average gain of up-trending periods during a certain time period divided by an average loss of down-trending periods during the defined time period. In such a way, RSI provides a relative estimation of the strength of an asset’s recent price performance. RSI outcomes range from 0 to 100. The default time period for relating up-trending periods to down-trending periods is 2 weeks. The traditional interpretation of RSI is that RSI outcomes of 70 or above show that an asset is getting overbought or overvalued. RSI being 30 or below interpreted as pointing out oversold or undervalued conditions. Sudden significant price changes may provide false buy or sell signals. Therefore, it is better to use it with amendments to its application or together with other reliable technical indicators.

FI captures the market power behind the change in the asset price. FI’s value may be either positive or negative, that depends on the upward or downward change in the asset price. The three inputs required for the formula are: close price, open price, and trading volume. Analysts often use FI along with the moving average to make predictions for an asset’s future performance. The formula for FI is as follows:

(3.9)

FI = (Close price - Open price) * Volume

$$\text{FI}=\left( \text{Close}\,\text{price}-\text{Open}\,\text{price} \right)^*\text{Volume}$$

Williams%R is a type of momentum indicator that ranges between 0 and –100 and measures overbought and oversold market conditions. The Williams%R is commonly applied to define entry and exit points for trading. It can be treated as a technical analysis oscillator. It compares an asset’s close price to the high-low range over a specific period, by default over 14 days. The formula to calculate this indicator is as follows:

(3.10)

\begin{matrix} % R = (highest high - close price) / \\ (highest high - lowest low) * (- 100) \end{matrix}

$$\begin{array}{*{35}{l}}{\text %}\text{R}=\left( \text{highest}\,\text{high}-\text{close}\,\text{price}\, \right)/ \\\left( \text{highest}\,\text{high}-\text{lowest}\,\text{low} \right)*\left( -100 \right) \\\end{array}$$

The Williams%R became popular as an indicator because of its ability to signal for market swings at least one or two points in the future. Predictions of market reversals are very valuable for market participants; so an asset is overbought when the indicator is above –20, and is oversold when the indicator is below –80. Overbought and oversold periods can persist, should the price keep on rising or falling.

The parabolic SAR is a technical indicator applied to define the price direction of an asset, as well as to measure how the price direction is changing. It is generally used to put trailing price stops; thus, it may be treated as a stop-loss system.

PSAR is calculated independently for each trend in the price. When the price is in an uptrend, PSAR goes below the price and converges upwards towards it. By the same logic, on a downtrend, PSAR goes above the price and converges downwards. At each step within a trend, PSAR is calculated one period in advance. PSAR value for tomorrow is calculated using data available today. PSAR values are calculated as follows:

(3.11)

PSAR = {PSAR}_{n + 1} = {PSAR}_{n} + (AF* (EP - {PSAR}_{n}))

$$\text{PSAR}=\text{PSA}{{\text{R}}_{n+1}}=\text{PSA}{{\text{R}}_{n}}+\left( \text{AF*}\left( \text{EP}-\text{PSA}{{\text{R}}_{n}} \right) \right)$$

where EP is the highest high for a long-term trend and the lowest low for a short-term trend, which is updated each time a new EP is achieved; AF is the default of 2% increasing by 2% each time a new EP is achieved, with a maximum of 20%.

In the work of Pistole (2010), moving average rules, RSI method and PSAR technique were compared with the buy&hold strategy for the S&P500 index. Interestingly, the PSAR indicator is far more successful in positioning long or short on the S&P500 index. Results revealed that by applying PSAR as the buy and sell signal, the strategy significantly outperforms the market, as well as buy&hold and other strategies discussed in that paper.

The above mentioned six technical indicators were considered and included in the feature set mostly based on the academic literature. All the indicators were calculated in R. RSI, SAR, and Williams%R were implemented using the corresponding built-in functions from the TTR package.

3.5

Portfolio construction and training protocol

The data sets that should be prepared in advance are long data containing all the information available and the Top100 market cap ranking. These data sets will be used for running the SVM strategy. A step-by-step loop was written in R to implement the strategy.

Volatility-adjusted returns were calculated for the assets in Top100 for the 3 previous months for each day, more specifically, over the period from date_t]-92_–to date_[t]–1. Then, these returns were ranked in descending order. For each day, there should be 100 coins with volatility-adjusted returns. These volatility-adjusted returns are serving as a class for SVMs.

The length of the training set is introduced as the parameter TS. To construct the training set, TS is used to denote the number of volatility-adjusted returns. For example, TS on the level of 25 means that 25 coins with the highest values and 25 coins with the lowest values of volatility-adjusted returns are taken to form the training data. The assets with the highest values form the class + and the assets with the lowest values form the class −.Then, for these 50 coins, the six technical features are calculated (MOM, V, IF, PSAR, Will%R, RSI) for each day for the period of the last 91 days. The period over which the training set is constructed to perform one SVM test on the reallocation day (date_[t]) is from the date_[t]–92 to date_[t]–1.

When the training set is ready, SVM is applied in order to tune the meta parameters C (cost) and γ (gamma). The best C and γ are used later to test SVM forthe assets on the reallocation day. The choice of meta-parameters is explained in section 3.7.

The first reallocation day is assumed to be 01/01/2015. The reallocation day is the day on which the portfolio composition is changed. The change depends on the SVM output. The assets with the greatest values of the SVM output are included in the portfolio on the reallocation day. SVM output is the value provided by the function as in (2.5).

On the reallocation day, we liquidate the assets that are not recommended by SVM output and they leave the portfolio. The assets that remain in the portfolio have their weights reallocated. The time period which is between two sequential reallocation days is called the reallocation period (RE).

Then testing set is prepared. SVM is tested on the data on the date_[t]. Even though the system is trained for a particular train data size TS, all the assets which are on the list of Top100 on the reallocation day are considered for testing. The technical features are calculated for them and SVM is tested on the whole data set. SVM output provides the function value for each asset in the range varying around –1 and 1 (due to assigning 1 for positive class and –1 for negative class). The output is ranked and those assets with the highest values are qualified to become ‘buy’ candidates and to be included in the portfolio. In such a way, every time as a portfolio is reconsidered on the reallocation day, a fresh SVM model is trained and tested in order to capture the changes on the market. It is worth noting that the function svm() scales the data by default.

To summarize, the training protocol is such that SVM is trained over tail sets for the period of time [t − 92, t −1] and tested for the period [t]. As the data from coinmarketcap.com are available on a daily basis and without breaks for weekends, the above-mentioned protocol is applied over the whole simulation period from 01/01/2015 to 08/01/2018.

In order to estimate the portfolio performance, the following methodology was used. The gross rate of return $R_{0, T}^{(P)}$ $R_{0,T}^{\left( P \right)}$ for a given portfolio P in the period t∈[0, …, T] is calculated as:

(3.12)

R_{0, T}^{(P)} = \prod_{t = 0}^{T} (1 + \sum_{i = 1}^{N} w_{i, t} r_{i, t} - Δ W_{t}^{R} * T C) - 1

$$R_{0,T}^{\left( P \right)}=\prod\nolimits_{t=0}^{T}{\left( 1+\sum\nolimits_{i=1}^{N}{{{w}_{i,t}}{{r}_{i,t}}-\Delta W_{t}^{R}*}TC \right)-1}$$

where N is the total number of cryptos; T is total time horizon for the investment; r_i,t is the accruing daily rate of return of the i-th asset on day t; w_i,t is the weight of the i-th asset in the whole portfolio Π on day t; $Δ W_{t}^{R}$ $\Delta W_{t}^{R}$ is the total portfolio turnover rate in percent on day t; TC is the total transaction costs in percent.

The weights are being calculated according to the following formula

(3.13)

w_{i, t} = (1 + r_{i, t}) w_{i, t - 1}

$${{w}_{i,t}}=\left( 1+{{r}_{i,t}} \right){{w}_{i,t-1}}$$

It is assumed that w_i,t sums up to unity for each reallocation day. On each reallocation day t = t_R, the weights are reallocated in the following way:

(3.14)

w_{i, t_{R}} = (\begin{matrix} \frac{1}{N} & \begin{matrix} \to weights ​ for SVM equally \\ weighted portfolio \end{matrix} \\ \frac{M C_{i, t}}{\sum_{i}^{N} M C_{i, t}} & \begin{matrix} \to weights for market-cap \\ weighted portfolio \end{matrix} \end{matrix})

$$w_{i,t_R}=\left\{\begin{array}{lc}\frac1N&\begin{array}{l}\rightarrow\text{weights}\text{}\text{ }\text{ }\text{for}\text{ }\text{ }\text{SVM}\text{ }\text{ }\text{equally}\\\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{weighted}\text{ }\text{ }\text{portfolio}\text{ }\end{array}\\\frac{MC_{i,t}}{\displaystyle\sum\nolimits_i^NMC_{i,t}}&\begin{array}{l}\rightarrow\text{weights}\text{ }\text{ }\text{for}\text{ }\text{ }\text{market-cap}\\\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{ }\text{weighted}\text{ }\text{ }\text{portfolio}\end{array}\end{array}\text{ }\text{ }\right.$$

The portfolio composition changes on the reallocation day.

In order to understand the logic of the formula for calculation of the turnover ratio of the portfolio, three cases were taken into consideration. The first one is that assets leave the ranking. The second one is that assets enter the portfolio, and the third one, assets keep staying in the portfolio, just with new weights. To account for this change in the portfolio composition, the turnover ratio of the portfolio was calculated as follows:

(3.15)

Δ W_{t_{R}}^{R} = \sum_{i = 1}^{N} (w_{i, t} - w_{i, t_{R}}) |

$$\Delta \,W_{{{t}_{R}}}^{R}=\sum\limits_{i=1}^{N}{\left| {{w}_{i,t}}-{{w}_{i,{{t}_{R}}}} \right.|}$$

The above value can be of any in the range from zero (the composition of the portfolio is not changed if compared to the previous reallocation day) to 200% (the composition of the portfolio is entirely changed; all assets left the portfolio and new ones entered).

The values of transaction costs on the cryptocurrency markets can be between 0.2% and 2.0% of the transaction value depending on the asset’s type and the liquidity. Transaction costs for the base and the benchmark portfolios were assumed to be 1%. To provide the fair calculation for the total portfolio reallocation cost, the product of the portfolio turnover ratio $Δ W_{t}^{R}$ $\Delta W_{t}^{R}$ and the total percent transaction cost TC is taken.

The implementation of a portfolio loop in R was done with the package Performance Analytics. That vignette gives some simple examples of computing portfolio returns using asset prices as well as weights framework.

3.6

Benchmark portfolios

To construct the benchmark portfolios, the Top100 market cap ranking was used. To estimate the efficiency of the main SVM strategy, similarly as in Kość et al. (2018), the benchmark portfolios were chosen as follows:

–

The benchmark equally weighted portfolio (further denoted as EqW) is constructed as an investment with equal weights in all cryptocurrencies, which are qualified for Top100 on the reallocation day. As for a base case, the reallocation period is set to one week. Transaction costs constitute 1% of the portfolio value. These assumptions are the same as for the SVM strategy.

–

The benchmark market-cap weighted portfolio (further denoted as McW) is built as an investment with market-cap weights in cryptocurrencies, which are qualified for Top100 on the reallocation day. It means that on reallocation day, the investment in an asset takes the weight as the ratio of the market capitalization of an asset to the whole market capitalization for all the assets in Top100. The reallocation period and the transaction costs are the same as for the equally-weighted portfolio.

–

The buy&hold strategy is also considered as a benchmark portfolio. Just to get the full insight on the performance of all the portfolios, two buy&hold strategies were run, one for bitcoin (BTC B&H), and the second for S&P500 index (S&P B&H). It is worth noting that buy&hold strategy on S&P500 index is a widely used benchmark to compare the portfolios. Both buy&hold strategies have the same simulation period as the former benchmark strategies.

3.7

Selection of meta-parameters

The C (cost) and γ (gamma) values are the meta-parameters of the SVM model. One of the problems that frequently arises is overfitting, which is quite common in machine learning (Cawley et al. 2010). That is why, the question of choice of meta-parameters is a major one. The meaning of meta-parameters was described in the theoretical part in section 2. As a short reminder, the general meaning of the C (cost) and γ (gamma) is as follows.

High values for C causes the cost of misclassification to be large; therefore, SVMs are forced to classify the input data more severely and the problem of overfitting may arise. Small values for C mean lower variance and higher bias. Small values for C makes the cost of misclassification low; thus, providing more ‘space’ for the model to make a mistake by misclassifying the case. The objective is to find the balance between ‘not too severe’ and ‘not too relaxed’.

When the value for gamma is small, the constraint for the model is too high and it cannot catch the complexity or curvature of the input data. In other words, gamma explains how strong the influence of a single training observation is.

As in a standard classification problem, the dataset is divided into training and testing sets, which are mutually exclusive sets. Further, in order to perform tuning of meta-parameters, the training set is separated again into training set and validation set. The visualization of the data set splits is presented in Fig. 3.3. Therefore, in order to perform the tuning, only second training and validation sets are used.

The partition of the training set again into training and validation sets is performed with the help of certain sampling method. Potentially, there can be a number of partitions into validation and training sets over which tuning is performed.

In the package e1071 tune.control(), option available within built-in function tune.svm() is presented. Tune. control offers three sampling methods on how the train data set may be split. These methods are ‘cross’, ‘fix’ and ‘bootstrap’. Due to the fact that performing cross-validation or bootstrap over the full data set has very high computational load and takes an unreasonably long time for the user to get the results, sampling method ‘fix’ was used to tune the parameters. So, the sampling method set to ‘fix’ means that a single split into train/validation set is used and the train set contains a fixed part of the supplied data. By default, the proportion of the train set is 2/3.

As the algorithm of SVM is a standard classification task, which has only two parameters, the grid search method is considered quite effective. To conduct grid search over the parameters, function tune.svm() from the package e1071 was used. A sequence of parameters for cost and gamma was created as the vector (0.5, 1, 2, 4). Each pair of the parameters from that sequence is tested and those values of cost and gamma providing the lowest prediction error for the model on the validation subsets are chosen.

In such a way, every time SVM is tested on the reallocation date_[t], first the training set is used to tune the parameters and chose the optimal ones. Then, these best cost and gamma are used for testing SVM. In such a way, there can be a different set of optimal parameters for each reallocation period. So, there is its own set of optimal parameters for each reallocation period.

3.8

Summary for strategy implementation: step-by-step actions

This section summarizes step-by-step actions to implement the SVM strategy.

Here are the one-time actions conducted before running the loop to generate strategy results:

Web-scraping: the data were scrapped from the website coinmarketcap.com starting from 27/12/2013 and ending by 05/08/2018. They were not provided in any ‘friendly’ format, so we had to scrap them from HTML source of the website.

Filtering: 14-day moving average of the volume was calculated for each asset, and those that did not meet the filter threshold of 100 USD were excluded from further usage. Additionally, only those cryptocurrencies that have 91 days and more of history for the open price were qualified to constitute the long data set.

Top100 ranking: a ranked set of 100 cryptocurrencies by the largest market cap was created for each day for the whole long data set.

A step-by-step loop written in R to implement the strategy is run as many times as the number of reallocation days in the period from 01/01/2015 to 01/08/2018. Step-by-step actions in the loop are as follows:

Preparation of the training set: class for SVM. On date_[t] volatility-adjusted returns are calculated for the assets in Top100 for the 3 previous months for each day, more specifically over the period from date_[t]–92 to date_[t]–1. The returns are further ranked in descending order. For each day, there should be 100 coins with volatility-adjusted returns calculated for them. Volatility-adjusted returns are serving as a class for SVMs. The assets with the positive values of volatility-adjusted returns are assigned the class + and the assets with the negative values are assigned the class −.

Preparation of the training set: technical features. We select the assets with the highest and the lowest volatility-adjusted returns from the spectrum defined by %TS assumption and then calculate for them six technical features (MOM, V, IF, PSAR, Will%R, RSI) for each day for the period of the last 91 days.

Meta-parameters tuning: SVM is applied to the training set in order to tune the meta parameters C (cost) and γ (gamma). The best C and γ are used later to test SVM to predict the class for the assets in the testing set on the reallocation day.

Preparation of the testing set: On date_[t], all the assets that are on the list of Top100 on the reallocation day are considered for testing. Six technical features are calculated (MOM, V, IF, PSAR, Will%R, RSI) for 100 cryptocurrencies from Top100 on the date_[t].

SVM is run using prepared training and testing sets. The output of the SVM for 100 assets from Top100 on the date_[t] is ranked, and those assets with the highest values are qualified to become ‘buy’ candidates.

‘Buy’ candidates are then kept in the portfolio for the reallocation period (for example, it is 1 week for the base case).

Calculation of the net portfolio value for one reallocation period taking into consideration transaction costs.

Once the loop is finished, performance statistics are calculated for the net value of the portfolio. In such a way, in every reallocation period, a fresh SVM model is trained and tested with its own optimal meta parameters.

The parameters that are deemed to be fixed in the model are as follows:

–

the number of periods for volatility-adjusted returns calculated as daily returns

–

lambda λ (set to 0.94) used to calculate exponential moving average for returns

–

length of historical data taken to calculate technical features described in Tab. 3.2

–

meta parameters C and γ

–

length of training data set to 91 days

–

long positions only assumptions.

Four parameters were chosen to participate in the sensitivity analysis. These parameters are:

–

the number of cryptocurrencies kept in the portfolio (N)

–

reallocation period (RE)

–

the percentage value of the transaction costs (%TC)

–

training data size (%TS).

4

Empirical results

4.1

Performance of the svm strategy in comparison to the benchmark strategies

The performance statistics for SVM and benchmark strategies are presented in Tab. 4.1.

Tab. 4.1

Descriptive statistics of the SVM strategy compared with the benchmark strategies

	N	RE	%TC	V	%ARC	%ASD	%MDD	IR1	IR2	%MT
S&P B&H	-	-	-	-	13.6	15.5	14.2	0.9	0.8	-
BTC B&H	-	-	-	-	147.4	76.8	69.7	1.9	4.1	-
EqW	100	1 w	1	100	425.8	96.2	81.7	4.4	23.1	10.8
McW	100	1 w	1	100	141.9	74.9	73.1	1.9	3.7	6.3
SVM	25	1 w	1	100	173.6	103.1	83.1	1.7	3.5	143.7

Legend: McW – market cap weighted strategy, EqW – equally weighted strategy, N– number of currencies to be invested/used to construct portfolio, RE – the width of the reallocation period between the portfolio reallocation days, %TC – the total transaction costs taken as the percentage of the total transaction value of the portfolio, V– the threshold value (USD) of the 14-day moving average of daily volume, %ARC − annualized rate of return, %ASD − annualized standard deviation in percent, %MDD − maximum drawdown of capital in percent, IR1, IR2 − information ratios, %MT – the mean portfolio turnover ratio in percent.

Buy&hold strategy on bitcoin (BTC B&H) demonstrates more than 10 times larger %ARC than buy&hold on S&P500 index (S&P B&H); however, both the risk and maximum drawdown in terms of %ASD and %MDD are approximately 5 times larger. The resulting values of IR1 and IR2 are 2 and 5 times larger for BTC B&H, respectively.

%ARC of BTC B&H is higher than the market cap weighted strategy (McW), although the difference is not relatively high. It can be explained intuitively as the predominant part of the McW portfolio consists of bitcoin.

The values of information ratios IR1/IR2 are also comparable, which means that the amount of return per unit of risk is the same as bitcoin dominates the McW portfolio.

EqW outperforms all the benchmark strategies and also the SVM strategy. It gives the highest values of ARC and IR1/IR2 demonstrating abnormal returns. Therefore, sorting the performance of portfolios according to IR1, the strategy with the highest value is EqW. EqW outperforms SVM more than two times as well as the other benchmark strategies. The sequence of other strategies according to IR1 is as follows: BTC B&H, McW and SVM, which is on the fourth place outperforming only S&P B&H.

The main hypothesis that the investment strategy based on SVMs algorithm outperforms benchmark strategies can be rejected based on these results. SVM portfolio with the long positions only gained the fourth result according to IR1 after EqW, McW and BTC B&H. Additionally, it is the riskiest one according to the value of %ASD and %MDD, meaning that SVMs algorithm selects the cryptocurrencies that are relatively volatile. Additionally, the mean portfolio turnover for SVM strategy is 14 times larger than for EqW strategy, which is the reason for high transaction costs, and consequently, lower portfolio net value. Overall, one may invest equal weights into Top100 cryptocurrencies, incur no additional costs of implementing more sophisticated strategy and yet get abnormal returns on the cryptocurrency market in comparison to simple B&H or more sophisticated strategies.

Plots of the equity lines and drawdowns for the SVM strategy in comparison to the benchmark strategies can be found in Fig. 4.1 and Fig. 4.2, respectively.

As can be seen in Fig. 4.2, SVM and EqW strategies reach the ‘deepest’ drawdown if compared to other strategies. Conversely, S&P B&H demonstrates the most stable returns.

4.2

Sensitivity Analysis

The research questions of this study were formulated around the sensitivity analysis, namely how sensitive are the results of portfolio performance to the model parameters. The sensitivity analysis of SVM strategy is performed for the following four parameters:

–

Number of cryptocurrencies kept in the portfolio N = 5, 10, 15, 20, 25, VAR. VAR means that any number (between 0 and 100) of cryptocurrencies selected by SVM output for buying are included in the portfolio. This number varies from one reallocation period to another.

–

Reallocation period RE: 3d (3 days), 1w (7 days), 1m (30 days).

–

Percentage value of the transaction costs TC: 0.5%, 1.0%, 2.0%.

–

Training data size TS: ~ 25%, ~ 50%, ~100%.

The parameters that were set as fixed are the following:

–

Length of historical data taken to calculate technical features: 10d (10 days).

–

Lambda λ used to calculate exponential moving average for returns: 0.94.

–

Meta-parameters C and γ are being chosen for eachreallocation period via the tuning algorithm. These parameters are sequenced as follows: (0.5, 1, 2, 4). There can be a different set of optimal parameters for each reallocation day and it is not overseen. The choice of meta-parameters is described in section 2.6.

–

Length of training data: 3 months (91 days).

–

Long positions only assumptions.

The fixed parameters are kept constant according to the assumptions of the author. Only four parameters are chosen to participate in the sensitivity analysis: reallocation period RE, the percentage value of the transaction costs TC, number of cryptocurrencies kept in the portfolio N and training data size TS. Descriptive statistics for SVM strategy and the performance of the portfolios are presented in Tab. 4.2. At the end of this table, we additionally attach the best-selected set of parameters for the SVM strategy, which demonstrated the performance of the strategy with the highest IR1.

Tab. 4.2

Descriptive statistics for SVM strategy (sensitivity analysis). Descriptive statistics for the benchmark strategies have been placed above for convenient comparison.

					Benchmark Strategies

		Name			%ARC	%ASD	%MDD	IR1	IR2	%MT
		S&P B&H			13.6	15.5	14.2	0.9	0.8
		BTC B&H			147.4	76.8	69.7	1.9	4.1	6.3
		EqW			425.8	96.2	81.7	4.4	23.1	10.8
		McW			141.9	74.9	73.1	1.9	3.7	6.3
		SVM			173.6	103.1	83.1	1.7	3.5	143.7

		Parameters					SVM Strategy

N	Position	%TS	RE	%TC	%ARC	%ASD	%MDD	IR1	IR2	%MT

25	long only	50	3d	1	19.4	108.7	90.6	0.2	0.0	115.3
25	long only	50	1w	1	173.6	103.1	83.1	1.7	3.5	143.7
25	long only	50	1m	1	224.2	101.5	86.0	2.2	5.8	148.8

5	long only	50	1w	1	-21.8	142.2	95.1	-0.2	0.0	189.3
10	long only	50	1w	1	89.3	131.7	85.0	0.7	0.7	176.8
15	long only	50	1w	1	207.2	115.7	82.0	1.8	4.5	166.2
20	long only	50	1w	1	215.9	110.0	82.3	2.0	5.1	154.3
25	long only	50	1w	1	173.6	103.1	83.1	1.7	3.5	143.7
VAR	long only	50	1w	1	326.4	92.6	57.6	3.5	20.0	105.6

25	long only	100	1w	1	177.9	103.3	85.1	1.7	3.6	144.3
25	long only	50	1w	1	173.6	103.1	83.1	1.7	3.5	143.7
25	long only	25	1w	1	210.6	103.6	85.5	2.0	5.0	160.5

25	long only	50	1w	0,5	368.8	110.2	76.5	3.3	16.1	155.4
25	long only	50	1w	1	173.6	103.1	83.1	1.7	3,5	143.7
25	long only	50	1w	2	29.6	110.9	88,1	0.3	0,1	154.9



			Best performance of SVM strategy with a selected set of parameters

N	Position	%TS	RE	%TC	%ARC	%ASD	%MDD	IR1	IR2	%MT
VAR	long only	50	1m	1	392.43	88.97	53.45	4.41	32.38	105.9

Legend: McW – market cap weighted strategy, EqW – equally weighted strategy, N– number of currencies to be invested/used to construct portfolio, %TS – training data size, RE – the width of the reallocation period between the portfolio reallocation days, %TC – the total transaction costs taken as the percentage of the total transaction value of the portfolio, %ARC − annualized rate of return, %ASD − annualized standard deviation in percent, %MDD − maximum drawdown of capital in percent, IR1, IR2 − information ratios, %MT – the mean portfolio turnover ratio in percent.

The base case for the SVM strategy is presented in Tab. 4.2 together with benchmark strategies. As a reminder, the parameters for the base case are as follows: N = 25, RE = 1w, TC = 1%, TS ~ 50%.

If reallocation period is changed from 1 week to 3 days, the performance becomes much worse, IR1 and %ARC drops several times. Such poor performance can be explained by high transaction costs, even though %MT is lower for RE 3d. This can be explained by the fact that when the reallocation period is 3 days, the change of assets in the portfolio is more dynamic if compared to the 7-day reallocation. The transaction costs for 3-day reallocation are higher because we reallocate the portfolio 1.87 (= 2.33*115.3/143.7) times more often. Similar situation, but in the opposite direction occurs when we change the reallocation period from 1w to 1m. the results significantly improve in terms of %ARC, IR, and IR2. So, the length of the reallocation period significantly impacts the portfolio performance.

Analysing the sensitivity to parameter N, we can see that the worst performance is noticed when we keep only 5 coins in the portfolio during a reallocation period. The lower the number of coins in the portfolio, the higher is the portfolio turnover. Consequently, with N changing from VAR to 5, and accordingly higher %MT, the statistics demonstrate decreasing figures for %ARC, IR1 and IR1 and increasing values of %ASD and %MD. The best results are observed for the varying number of cryptocurrencies in the portfolio (meaning any number advised by SVM output for buying are kept in the portfolio). Additionally, statistics are very sensitive to the parameter N.

If we change the training size (%TS) through 25%, 50% and 100% (for example, for 50%, it is 25 coins from class + and 25 coins from class −), it does not exercise significant impact on the performance of our strategies, but we still observe that the best results are observed for the smallest training size.

Performance of the portfolios heavily depends on the magnitude of transaction costs but it is rather straightforward. For %TC equalled 0.5, the annual return is substantially higher than for %TC equalled 1.

Therefore, the common feature for the sensitivity analysis is that the shorter the reallocation period and the lower the number of cryptocurrencies, the lower is the performance for the strategy measured by IR1 and IR2. The performance of the portfolios heavily depends on the magnitude of transaction costs and relatively to a lesser extent depends on the change of training size. Addressing the research questions stated in the beginning, the strategy results are significantly sensitive to the three out of four chosen parameters. So, the model does not provide robust results.

Fig. 4.3 and Fig. 4.4 presents respectively the equity lines for the SVM strategies with the varying parameters such as reallocation period RE and number of assets N kept in the portfolio.

Equity lines with the varying parameters such as transaction costs %TC and length of training set TS are presented in Fig. 4.5 and Fig. 4.6, respectively.

5

Conclusions

The main aim of this paper was to apply the SVM algorithm to build an investment strategy for the cryptocurrency market and investigate its profitability. The research hypothesis was that the strategy based on the SVM algorithm is able to outperform the benchmark strategies in terms of return-risk relation. The results of this investigation were reported for the period between 2015-01-01 and 2018-08-01. The main hypothesis that the investment strategy based on the SVMs algorithm outperforms benchmark strategies is rejected based on the IR1 values.

The main methodology concepts were based on the research paper ‘Nonlinear support vector machines can systematically identify stocks with high and low future returns’ by Huerta et al. (2013) and ‘Momentum and contrarian effects on the cryptocurrency market’ by Kość et al. (2018).

SVM was implemented to build a trading strategy in the following way. The training set is a tail set that is defined to be a group of coins whose volatility-adjusted price change is in the highest and lowest quintile. Each asset is presented by a set of six technical features. SVM is trained on historical tail sets and tested on current data. The classifier is chosen to be a nonlinear support vector machine. The SVM is trained and tested once per reallocation period. The portfolio is formed by ranking coins using the SVM output. The highest ranked coins are used for long positions.

Our results show that EqW portfolio outperforms all the benchmark strategies and also the SVM strategy. It gives the highest values of IR1 and IR2 demonstrating abnormal returns. The performance of the SVM strategy was ranked the fourth being better only from S&P B&H strategy. Therefore, the main hypothesis stated at the beginning of this paper was rejected based on the IR1 values.

The SVM strategy has not demonstrated abnormal returns. Moreover, the results are not stable and the algorithm itself does not provide robust outcomes. The performance of the portfolio is extremely sensitive to the parameters. In this study, only the influence of the four parameters has been checked. The performance is highly sensitive to the number of assets kept in the portfolio. If we include only these assets recommended by the SVM function output in the portfolio (N = VAR), the results get closer to the best EqW strategy. The magnitude of transaction costs and the length of the reallocation period heavily impact the performance statistics as well. Only the size of the training set does not have any significant impact on the outcome.

It is important that quite a large number of parameters that are deemed to be fixed in our analysis might influence the final results of the portfolio performance. Especially, the choice of the meta parameters C and γplay a very important role. Actually, the strategy produces notably different figures due to the fact that the method of choice of meta-parameters is greed search with fixed sampling. The computer power does not allow estimating the optimal parameters for the whole training set and it is the reason why the cost and gamma can be different if we run the analysis for a broader set of possible values. As an application of SVM implies setting of quite a large number of parameters, this makes the model very prone to the problem of overfitting. Therefore, length of historical data taken to calculate technical features, lambda λ used to calculate exponential moving average for returns, length of training data which were fixed parameters in the model can influence the final results of our analysis.

‘Buy’ candidates for the portfolio are defined by the SVM output based on the rule that assets are included in the portfolio, if their returns are predicted to grow. It implies that the decision is guided mainly by momentum rule. We invest in those assets whose returns are predicted to grow. As was shown in the paper by Kość et al. (2018) that there is a lack of momentum effect on the cryptocurrency market. In the opposite, the results proved the existence of strong contrarian effects. Therefore, it is worth checking the performance of the contrarian portfolio, that is, if we select as long positions the assets whose returns were assumed to decrease in the past, and therefore, were predicted by SVM to fall in the future. As one more potential continuation of this study, the sensitivity analysis can be carried out for these parameters, which were deemed fixed in the model. Moreover, it will be interesting to run the market neutral strategy including both long and short positions, so that there will be hedging for long positions.

Estimates based on the data provided by: www.coinmarketcap and World Bank: data.worldbank.org/indicator/CM.MKT.TRAD.CD. USA stock market cap for 2017 equals 32.121 trillion USD.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Economia e business, Economia politica, Teoria economica, sistemi e strutture, Microeconomia, Macroeconomia, Politica economica

Feed RSS della rivista

Robustness of Support Vector Machines in Algorithmic Trading on Cryptocurrency Market

Robert Ślepaczuk

Maryna Zenkova

Pubblicato online: 07 ago 2019

Pagine: 186 - 205

DOI: https://doi.org/10.1515/ceej-2018-0022

Parole chiaveMachine learning, support vector machines, investment algorithm, algorithmic trading, strategy, optimization, cross-validation, overfitting, cryptocurrency market, technical analysis, meta parameters

© 2018 R. Ślepaczuk, M. Zenkova, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Parole chiave
Machine learning, support vector machines, investment algorithm, algorithmic trading, strategy, optimization, cross-validation, overfitting, cryptocurrency market, technical analysis, meta parameters