| Literature DB >> 33266741 |
Andrés L Suárez-Cetrulo1,2, Alejandro Cervantes1, David Quintana1.
Abstract
In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.Entities:
Keywords: adaptive classifiers; concept drift; ensemble methods; recurrent concepts; stock price direction prediction
Year: 2019 PMID: 33266741 PMCID: PMC7514129 DOI: 10.3390/e21010025
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1RCARF structure.
Selected technical indicators. Formulas as reported in Kara et al. [37] applied to second-level. Exponential and simple moving averages for 5 and 20 s added as extra features.
| Name of Indicators | Formulas |
|---|---|
| Simple n-second moving average (5, 10, 20) | |
| Weighted n-second moving average (5, 10, 20) | |
| Momentum | |
| Stochastic K% | |
| Stochastic D% | |
| RSI (Relative Strength Index) | |
| MACD (Moving average convergence divergence) | |
| Larry William’s R% | |
| A/D (Accumulation/Distribution) Oscillator | |
| CCI (Commodity Channel Index) |
is the closing price; the low price; the high price at time t; EMA exponential moving average, : ; smoothing factor: ; k is time period of k second exponential moving average; and mean lowest low and highest high in the last t seconds, respectively; ; ; means the upward price change; means the downward price change at time t. n is the period used to compute the technical indicator in seconds.
Sensitivity parameters for the ADWIN change detector in ARF and RCARF.
| Configuration | ||
|---|---|---|
| ARF | 0.0001 | 0.00001 |
| ARF | 0.01 | 0.001 |
| RCARF, ARF | 0.3 | 0.15 |
| RCD | 0.15 | |
Global comparison. Accumulated error (%) for all algorithms on the whole dataset, sorted from best to worst result. Main descriptive statistics over 20 runs. Differences are significant at .
| Mean | Median | Var. | Max. | Min. | |
|---|---|---|---|---|---|
| RCARF | 34.7533 | 34.7538 | 0.0002 | 34.7791 | 34.7285 |
| ARF | 34.8008 | 34.8007 | 0.0002 | 34.8362 | 34.7769 |
| ARF | 34.8309 | 34.8335 | 0.0003 | 34.8591 | 34.7902 |
| RCD | 35.0469 | 35.0469 | 0.0000 | 35.0469 | 35.0469 |
| ARF | 35.1104 | 35.1114 | 0.0002 | 35.1392 | 35.0881 |
| DWM | 35.2364 | 35.2364 | 0.0000 | 35.2364 | 35.2364 |
| AHOEFT | 35.4661 | 35.4661 | 0.0000 | 35.4661 | 35.4661 |
Internal statistics for RCARF on the whole dataset over 20 runs. # Drifts, number of total drifts during the execution (both recurring and background); Drifts per tree, number of total drifts during the execution (both recurring and background) divided by the ensemble size; # F. Warnings, number of active warnings at the end of the execution; # Trees, number of decision trees in the concept history at the end of the execution.
| Mean | Median | Var. | Max. | Min. | |
|---|---|---|---|---|---|
| # Drifts | 3411.1500 | 3411.5000 | 3120.2395 | 3518 | 3279 |
| Drifts per tree | 85.2788 | 85.2875 | 1.9501 | 88 | 82 |
| # F. Warnings | 13.6000 | 14.0000 | 9.2000 | 19 | 9 |
| # | 118.2500 | 119.0000 | 46.0921 | 130 | 106 |
Figure 2Sample run of RCARF on a single test for the trading first day. Error measured on windows of 500 examples. Red dotted vertical lines mark drifts to background trees, and blued dashed vertical lines mark drifts to recurring trees.
Figure 3Algorithm comparison. Average error measured on windows of 1000 examples for a example period of time. For RCARF, ARF, ARF and ARF, we show the average result of 20 runs.