À propos de cet article

Citez

Introduction

Quantitative trading accounts for 70% of the total trading quota in a mature stock market. In China, the proportion of quantitative trading demonstrates rapid growth with the normalisation of the market. Quantitative trading platform is an important system of realising stock quantitative backtesting and deals. Due to the relative independence of investment deals and information technology, the researchers on stock trading strategy cannot independently develop and maintain a complete quantitative trading system. Therefore, quantitative trading platform is indispensable to validate and implement the proposed strategy.

Quantitative trading platform is divided into two types: private platform and public platform. Private platform is developed by fund firms or private-equity firms only for internal use, which has a relatively specialised function. Public platform is for public use, where users upload their own policies, perform backtesting and dealing. Public platform mainly gains profits by providing data and computation for users.

Massive quantitative trading platforms use two advantages to attract users: data quality and computing power. Especially computing power is critical for user experience and execution efficiency. Strategy backtesting is an important function of trading platform and it requires access to large amounts of historical time-series data, which is time-consuming. Therefore, the execution efficiency of strategy backtesting becomes an essential factor of user experience.

Cache is a critical device for improving data access efficiency. Latest time-series data are usually used when backtesting. It could largely reduce the time spent if the usually used data is placed into the cache. However, there are two issues in caching quantitative trading data. (1) Which data would be chosen and cached? Time-series data have their own characteristics: a user often accesses a piece of consecutive data, and the latest data would be frequently accessed and all the accessed data have including or overlapping relationships. Moreover, users on quantitative trading platform are divided into different types: VIP paying user, common user and experienced user. Different types of users have various expected backtesting performance and user experience functions with times. Therefore, the objective cached data depend on access characteristics of backtested data and user experience demands. (2) How are the cached data placed into multi-level caches? In stock quantitative backtesting, data are placed into storage devices such as memory and discs in the form of object intermediate result, which could improve access efficiency. There are many caching devices such as memory, network memory, solid-state disc and disc. They have different read-write access rates and capacities. The faster the access rate is, the smaller the capacity is. The devices fall into advanced cache and low-level cache and form a multi-level caching system. The proposed cache optimisation strategy is fit for all kinds of storage devices, and it no longer distinguishes between internal memory and external memory. The multi-level cache placement method needs researching for quantitative time-series data.

To address the above problems, we study multi-level cache management on quantitative trading platform in the paper. Cache-based data merging approach and cached data placement algorithm sensitive to user experience loss are proposed to improve user experience and access rates. The paper is organised as follows. Section 2 introduces the related works. The caching framework of quantitative trading platform is presented in Section 3. Section 4 gives cached data generation method based on data merging. In Section 5, a multi-level cache placement approach optimising user experience is designed. Sections 6 and 7 show the performance tests and conclusions, respectively.

Related works

With the increasing ratio of quantitative trading in stock trading, the design and optimisation of quantitative trading platform are becoming a concern. However, most works focus on trading strategy and trading data mining [1,2,3], while paying less attention to backtesting performance and user experience. Data access is an important part of quantitative platform backtesting, which can refer to optimising methods of user access experience. At present, the optimisation approaches of user access experience are experience value quantification [4], data placement [5,6,7], task scheduling [8,9,10] and cache management [11,12,13]. For cache management, the second exponential smoothing method is used to predict the probability distribution of user data accessed again, and then caching priority of each kind of user is determined [14]. Performance indicators of each kind of user are dynamically monitored and their caching spaces are adjusted according to real-time states [15]. A cache extension approach in a cloud computing environment is given to provide the isolated cache management model for each kind of user, which could maintain a good access experience [16].

However, the data cache of quantitative trading has its own characteristics and demands, such as including an overlapping relationship of data access and multi-level cache, which the above works cannot be available for. In this paper, our work would focus on cache management of quantitative data.

User experience function and caching framework

User experience of quantitative platform is our optimisation objective of cache management. In the section, user access experience is quantified and a framework of cache management is introduced.

User experience function

User experience and return time of backtesting results show a negative correlation. For the convenience of denotation and computation, we assume that user experience is bad when experience value is large. Experience value is positively related to time. Since there are different types of users, experience function for each kind of user is distinct. For example, we can assume that experience value of users who pay increases fast with time and that of free users grows slowly or even stays unchanged. When making agreements with users, user experience functions are different. The main functions are shown in Figure 1. Assume that f (t) is the user experience function, where t is the return time of backtesting results. In Figure 1(a), it is a linear function and formalised as f (t) = a · t, where a is the slope. In Figure 1 (b), it is a piecewise function and formalised as f(t)={c10<tt1c2t1<tt2c3t2<t f(t) = \left\{ {\matrix{ {{c_1}} & {0 < t \le {t_1}} \cr {{c_2}} & {{t_1} < t \le {t_2}} \cr {{c_3}} & {{t_2} < t} \cr } } \right.\; , where c1, c2, and c3 are constants, and t1 and t2 are segmentation times. In Figure 1(c), it is a combination of linear and piecewise functions and formalised as f(t)={at0<tt1at1t1<tt2c3t2<t f(t) = \left\{ {\matrix{ {{\rm{at}}} & {0 < t \le {t_1}} \cr {a{t_1}} & {{t_1} < t \le {t_2}} \cr {{c_3}} & {{t_2} < t} \cr } } \right.\; . In Figure 1(d, e) they are combinations and separately formalised as f(t)={c10<tt1c2t1<tt2c2+a(tt2)t2<t f(t) = \left\{ {\matrix{ {{c_1}} & {0 < t \le {t_1}} \cr {{c_2}} & {{t_1} < t \le {t_2}} \cr {{c_2} + a(t - {t_2})} & {{t_2} < t} \cr } } \right.\; , f(t)={c10<tt1(c3c1)(t2t1)(tt1)t1<tt2c3t2<t f(t) = \left\{ {\matrix{ {{c_1}} & {0 < t \le {t_1}} \cr {{{({c_3} - {c_1})} \over {({t_2} - {t_1})}}(t - {t_1})} & {{t_1} < t \le {t_2}} \cr {{c_3}} & {{t_2} < t} \cr } } \right.\; , where (c3c1)(t2t1) {{({c_3} - {c_1})} \over {({t_2} - {t_1})}} is the slope of Figure 1(e). In Figure 1(f), it is a non-linear function and formalised as f (t) = atc|a > 1, where parameter a can adjust the growth rate of experience value and parameter c controls the growth rate of the initial value. The experience value increases fast as time is large.

Fig. 1

User experience function.

Several common user experience functions are given above and the platform can use various functions as well. Different parameters are set to determine distinct experience loss rate according to user categories. Experience loss rate denotes the decline rate of user experience over time, which quantifies the effects of backtesting return time on experience value. The function provides a basis for multi-level cache management.

Different users have diverse demands for query time. Moreover, different weights are used in user experience functions in terms of user importance. The above way could unify the optimised objective. In addition, user experience is related to the profit of the service provider. The global objective of optimising user experience is the same as maximising the profits of the service provider.

The framework of multi-level cache management

The framework of multi-level cache management on quantitative trading platform is demonstrated in Figure 2, which periodically manages cached data. The platform predicts the access volume of different data during the next period and adjusts the cached data in multi-level caching devices according to access volume and user experience function. When a user accesses data based on quantitative trading platform, the cached data are first obtained. If cached data cannot be fetched, it searches data from the database.

Fig. 2

The framework of multi-level cache management.

The cached data are stored in permanent storage media in the form of software function, i.e. they are stored into memory or disc by means of object or object serialisation. The caching characteristics of the software function are always returning the same results for the same input parameters, just like the decoration function ‘lru_cache’ in Python.

In theory, storage is finished by assigning storage space for cached data. For the convenience of adding, deleting and merging operations, the fixed amount of storage is set for each kind of caching device. Since the value and space of different caching devices are distinct, the fixed sizes for different devices are variously set. Generally, the fixed size of a device is proportional to the total storage space and inversely proportional to the times of additions and deletions on the device.

The generation of cached data

Before placing data into multi-level cache, the cached data and their value need to be determined. Access volume is an important factor of data value. In Section 4.1, we introduce the prediction method of access volume in future period. When users access time-series data of the same stock, different accesses would return part of overlapping data. It is an efficient way of saving caching space by merging the overlapping data into cache. Section 4.2 presents the merging approach of time-series data.

Access volume prediction in future period

In Figure 2, the module of access volume record records user access volumes of different data in each period. Based on historical data, the access volume of the next period can be predicted using statistics or machine learning algorithm.

When performing backtesting, time-series data are accessed generally by the function: getData(tb,te,s), where tb is the start time, te is the end time and s is the target of stock data. After time-series data are fetched, filter operation is further executed. It is observed that the triple < tb,te,s > could only determine a set of cached data. If nb,e,s denotes access volume of the data determined by a triple and hb,e,s is the amount of space taken up by the data, a piece of data on the platform is formalised as < tb,te,s,nb,e,s,hb,e,s >.

Access volumes of different cached data next period can be predicted based on historic data. The common ways are historical statistics and machine learning. Historical statistics get the average value or weighted average value as a prediction value. Assume that qt is the access volume to be predicted. We know qt = ∑i∈[1,n] αiqt−i,i∈[1,n] αi = 1 stands, where qti is the historical access volume of the ith period before current period. It is obtained through statistical data of the first n periods. Data in each period have a weight of αi. Machine learning can use RNN, LSTM and GRU algorithms, and the prediction approaches are not our concerns.

Time-series data merging

Performing data merging for the overlapping part can save cached space. We assume that there are two pieces of data: e1 =< t1,t3,s,n1,3,s,h1,3,s > and e2 =< t2,t4,s,n2,4,s,h2,4,s >, where t1 < t2 < t3 < t4 stands. The cached data determined by < t1,t3,s > and < t2,t4,s > have an overlapping part. As shown in Figure 3, the shadow area denotes the overlapping part. It is easily seen that data merging can save more space than that saved by separately caching. Without loss of generality, we use e3 to denote the item after merging e1 and e2, and formalised as e3 =< t1,t4,s,n1,4,s,h1,4,s >, n1,4,s = n1,3,s + n2,4,s,h1,4,s = h1,3,s + h2,4,shoverlap, where hoverlap is the size of overlapped data.

Fig. 3

Time-series data merging.

The method of determining data overlapping is as follows. (1) All the required data to be accessed are segmented and stored logically. (2) Assume that si is the identification of a piece of stock data. Logical hash index for si is constructed. (3) Logical hash index links to physical address of different levels of caching devices. (4) Based on hash indexes and their links, it is easy to determine whether there are overlapped data on the same level.

Since the determination of overlapped data is based on the hash index, its time cost is small. When the cached data are overlapped, the data on the same level of caching device are deleted directly.

Merging operation would change the value of cached data. Value is positively correlated with access value and negatively correlated with data size. Ignoring the other factors, the value of cached data can be computed as pb,e,s=nb,e,shb,e,s {p_{b,e,s}} = {{{n_{b,e,s}}} \over {{h_{b,e,s}}}} . After merging, the value of pb,e,s may increase or decline. Let us consider the following example. There are two data segments n1,3,s = 6, h1,3,s = 6, p1,3,s = 1 and n2,4,s = 12, h2,4,s = 6, p2,4,s = 2. When data merging is done and the space is h1,4,s = 10, we get p1,4,s = (n1,3,s + n2,4,s)/h1,4,s = 1.8, where its value is lower than p2,4,s. If the space is h1,4,s = 6, we get p1,4,s = (n1,3,s + n2,4,s)/h1,4,s = 3, where its value is higher than p1,3,s and p2,4,s.

pb,e,s can determine whether the cached data are placed into cache and which kind of cache they are placed into. For the case of p1,3,s > p1,4,s, i.e. the value p1,4,s after merging cached data < t1,t4,s > is smaller than the value p1,3,s of cached data < t1,t3,s >, merging operation may lead to failing to be placed into cache some data that could have been cached. Therefore, our proposed merging algorithm conforms to the rule of not reducing value. The value after merging should not be smaller than that of any data before merging.

Based on the above considerations, we introduce our stock merging algorithm of cached data, which is called the value-first cached data merging algorithm. It first tries to merge data with high merging values, then places the merged data into cache and continually do merging until all the data cannot be merged again. Its concrete steps are shown in Algorithm 1, which just gives the merging method for a stock. Since the data are not overlapped between stocks, merging operations for other stocks also use Algorithm 1.

Cached Data Merging Operation (CDMO)

Input: C, P. C is the set of cached data for a stock s and formalized as C = {ci | i ∈ [0, n)}, where i is integer. Assuming that pi is the value of ci, P is formalized as P = {pi | i ∈ [0, n)}, where i is integer.
Output: new value of C and P after merging.

1. OP
2. WHILE O ≠ Ø DO
3.   pk = max{pi | piO}, OO–{pk}
4.   QO
5.   WHILE Q ≠ Ø DO
6.     qj = max {qi | qiQ}, QQ–{qj}
7.     IF ck and cj overlap
8.       merging ck and cj and the new data is cl
9.       IF pipk
10         CC–{ck, cj}+{cl}
11         PP–{pk, pj}+{pl}
12.         RETURN CDMO (C, P) // Iteratively call Algorithm 1
13.   END WHILE
14. END WHILE
15. RETURN C, P

When a user accesses a set of time-series data and data are cached using the merging method, the data need to be restored. Our method is to scratch the required data in memory. Taking python as an example, cached data are first converted into DataFrame and then the selection is done. Since interception operation of data is easy and done in memory, the time lost accessing data from merged data and original data can be ignored.

Multi-level cache placement algorithm

Just as demonstrated in Figure 2, after the set of cached data C are prepared, they should be placed into multi-level cache to improve user access experience. Next, we discuss cache placement algorithm.

We assume that ei denotes a kind of caching device, wi denotes its capacity and E = {ei, …} denotes a set of the devices. The device with a high accessing rate is advanced cache and that with a low rate is low-level cache.

We next propose a multi-level cache placement algorithm oriented to user experience. It aims to optimise the user experience and evaluate the value of cached data based on user experience function, data size and accessing frequency. The data with high caching value are preferred to be placed into the advanced cache. The value of cached data cj is computed as follows. xj=f(tj,ei+1)f(tj,ei)hj {x_j} = {{\sum f\left( {{t_{j,{e_{i + 1}}}}} \right) - \sum f\left( {{t_{j,{e_i}}}} \right)} \over {{h_j}}} where xj is the estimated value of cj, tj,ei is query time if putting cj into ei, f (tj,ei) is to compute user experience, ∑ f (tj,ei) is the sum of all experience value if just putting cj into ei, hj is the size of cj and ei+1 denotes the caching device one level lower than ei. Note that different users have distinct user experience functions.

The proposed algorithm is shown in Algorithm 2. It lets each level of caching device maximally improve the user experience. The data with a large improvement degree for user experience are placed into advanced cache, while those with small improvement are placed into low-level cache or not cached.

Multi-level Cache Placement Algorithm (MCP)

Input: C, E, W, where W is the set of device capacities, i.e., W = {wi | eiE}

1. E ← sort(E) //sort devices in decreasing order of read-white rate.
2. FOREACH ei in E
3.   FOREACH cj in C
4.     Compute value xj of cj
5.     XX ∪ {xj} //put xj into set of data value
6.   END FOREACH
7.   get the largest xj in X, place cj into ei, until wi runs out; update C, i.e., remove the cached cj in C; X ← Ø
8. END FOREACH
Performance evaluation

In this section, we validate the performance of the user experience of the proposed cache merging and placement algorithms. Our evaluation environment is set according to different combinations of devices and device capacities. Its comparison indicators are average user experience value, average access rate and improvement degree of user experience.

The effect of cache capacity on user experience

In the experiments, we test how cached data merging operation (CDMO)-multi-level cache placement (MPC) performs in user experience for different cache capacities. CDMO-MPC means that Algorithm 1 is first used to merge cached data and then the data are placed using Algorithm 2. Two kinds of users are introduced: advanced and common. Functions in Figure 1(a) and (e) are, respectively, their user experience functions. The indicator is the average value of user experience. Our proposed MPC algorithm is compared with two classical algorithms: Least Recently Used (LRU) and Least Frequently Used (LFU). Before making comparisons, the two algorithms are improved as follows. (1) First place data with high priority into the advanced cache. (2) The replaced data from the advanced cache are placed into the next lower level of cache. (3) The replaced data from the upper level of cache are placed into the next lower level, and the rest can be done in the same manner.

We use three caching media: memory, solid-state disc and disc and set their ratio of capacities as 1:10:100. In Figure 4, the horizontal ordinate represents the capacity of memory and that of the other caching devices that can be computed. We can see that MPC performs largely better than LRU and LFU in user experience, especially for small caching capacity.

Fig. 4

The effect of cache capacity on user experience.

The effect of caching device on user experience

In this section, the effect of different cache combinations on user experience is validated. We separately set three combinations of caching devices: just memory, memory-SSD and memory-SSD-disc. As shown in Figure 5, MPC is >50% better than the other algorithms in the average value of user experience, no matter for which cache combinations. The main reason is that our proposed MPC algorithm aims to optimise the user experience. The user experience of MPC improves 61% than LFU in a just-memory environment. In the memory-SSD environment, the user experience of MPC improves 75%. In the memory-SSD-disc environment, it improves 81%. As shown in experiments, the MPC approach has a better optimising performance for the case of multiple caching devices.

Fig. 5

The effect of caching device on user experience. LRU, least recently used; LFU, least frequently used.

The performance validation of merging algorithm

In this section, we test the CDMO algorithm. Four different combinations of algorithms are introduced. CDMO-MPC is to first use CDMO to merge data and then use MPC to place data. NoMerge-MPC is directly using MPC to place data without merging. CDMO-LRU is to first use CDMO and then execute LRU. NoMerge-LRU is to directly use LRU to place data without merging operation. The horizontal ordinate is memory capacity and the ratio of memory, SSD and disc is 1 10 100. From Figure 6, it is seen that both CDMO-MPC and CDMOLRU perform better than their corresponding algorithms without merging operation, especially when caching capacity is small. In addition, NoMerge-MPC has a better user experience than CDMO-LRU, since MPC plays a fairly good role in optimising user experience.

Fig. 6

The performance of the merging algorithm. LRU, least recently used; LFU, least frequently used.

The performance comparison of different users

Here, we demonstrate the effects of user experience and access rate of MPC for different users. Advanced users and common users are set, and for user experience functions, refer to Section 6.1. Our comparison object is LRU. Figure 7 shows that common users have a better experience than advanced users for MPC, while LRU has little difference for different users. That is because MPC is driven by user experience. For accessing rate, MPC also shows a larger difference than LRU for two kinds of users, and MPC has a slower accessing rate for the common user than LRU. That is due to much more resources taken up by the advanced users.

Fig. 7

The performance comparison of different users.

Conclusions and future work

In the paper, we propose a cached data merging method and multi-level cache placement algorithm, which aim to optimise the user experience. Our demonstrated experiments reveal that they show good performance in improving user experience on quantitative trading platform, and multi-level cache placement algorithm mainly plays a critical role. It could optimise experience value for any kind of users, especially for advanced users. Since there are massive users on quantitative trading platform and it provides backtesting and trading services based on a distributed environment, we will focus on cache management of quantitative trading platform in a distributed computing environment in future.

eISSN:
2444-8656
Langue:
Anglais
Périodicité:
Volume Open
Sujets de la revue:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics