| Literature DB >> 33927260 |
Delilah Donick1, Sandro Claudio Lera2,3,4.
Abstract
Conventionally, random forests are built from "greedy" decision trees which each consider only one split at a time during their construction. The sub-optimality of greedy implementation has been well-known, yet mainstream adoption of more sophisticated tree building algorithms has been lacking. We examine under what circumstances an implementation of less greedy decision trees actually yields outperformance. To this end, a "stepwise lookahead" variation of the random forest algorithm is presented for its ability to better uncover binary feature interdependencies. In contrast to the greedy approach, the decision trees included in this random forest algorithm, each simultaneously consider three split nodes in tiers of depth two. It is demonstrated on synthetic data and financial price time series that the lookahead version significantly outperforms the greedy one when (a) certain non-linear relationships between feature-pairs are present and (b) if the signal-to-noise ratio is particularly low. A long-short trading strategy for copper futures is then backtested by training both greedy and stepwise lookahead random forests to predict the signs of daily price returns. The resulting superior performance of the lookahead algorithm is at least partially explained by the presence of "XOR-like" relationships between long-term and short-term technical indicators. More generally, across all examined datasets, when no such relationships between features are present, performance across random forests is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this lookahead variation is a useful extension to the toolkit of data scientists, in particular for financial machine learning, where conditions (a) and (b) are typically met.Entities:
Year: 2021 PMID: 33927260 PMCID: PMC8085031 DOI: 10.1038/s41598-021-88571-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) This DT of depth 2 has split nodes , , , split features , , , split feature values , , and , and resulting leaf nodes , , , . The green “” (red “−”) within the nodes represent training samples that belong to the “” (“−”) class. All datapoints are contained in the top (root) node. If a datapoint has a value for feature that is greater than or equal to , it travels down the right branch into . If not, it travels down the left branch into . This process is then repeated so that the samples are further separated into , , , and . With each split, the samples are separated so that the resulting nodes are each more homogenous than the previous one. This can be observed qualitatively by looking at the ratio of “”s to “−”s, or quantitatively by the Gini-scores (1) for the nodes at each level of the DT. (b) We show 100 datapoints that follow an XOR-like pattern. Each datapoint is plotted by its values for Feature and Feature . The green “”s and the red “−”s denote the associated class labels. All datapoints which have and values that are both either greater or less than 0.5 belong to the “−” class. All others are part of the “” class. The darker, dashed lines at and together represent the optimal splits and a perfect in-sample classification of the data. The lighter, dotted lines at and represent suboptimal splits which, if selected, would result in a higher cumulative Gini-score (4).
Figure 2The top plot shows the prediction accuracy for different classifiers as a function of the signal-to-noise level defined in (5). Towards the left side (signal dominates) and the right side (pure noise), the lookahead random forest is not distinguishably better than greedy variants (GDT or XGB). However, there is a transition regime (blue shaded background) of lower signal to noise ratio where the LRF outperforms since the GDTs that underly the other algorithms fail to regularly identify the relevant features. This is observed in the bottom plot, where, on the y-axis. the relative importance of the individual features is shown. The LRF correctly identifies the XOR-like features much more consistently than the other methods, thereby accounting for its outperformance. The individual GDT underperforms all of the ensemble methods.
Accuracy scores for the prediction of the signs of daily commodity returns.
| LRF | GRF | GDT | XGboost | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Asset | % Accuracy | p-value | % Accuracy | p-value | % Accuracy | p-value | % Accuracy | p-value | % Majority |
| Coco | 0.000 | 51.9 | 0.343 | 50.9 | 0.826 | 49.4 | 1.000 | 51.6 | |
| Coffee | 49.0 | 0.997 | 52.2 | 0.078 | 49.7 | 0.973 | 50.7 | 0.700 | 51.1 |
| Copper | 0.039 | 0.041 | 0.000 | 0.000 | 51.8 | ||||
| Corn | 0.005 | 52.3 | 0.082 | 47.8 | 1.000 | 0.043 | 51.3 | ||
| Cotton | 50.8 | 0.471 | 48.6 | 0.998 | 48.3 | 0.996 | 48.1 | 1.000 | 50.7 |
| Crude oil | 50.1 | 0.995 | 51.3 | 0.819 | 45.4 | 1.000 | 52.6 | 0.182 | 51.9 |
| Gold | 50.1 | 1.000 | 51.8 | 0.982 | 51.3 | 0.995 | 51.2 | 0.998 | 53.3 |
| Heating oil | 48.6 | 0.999 | 51.9 | 0.079 | 46.8 | 1.000 | 50.7 | 0.588 | 50.9 |
| Natural gas | 49.3 | 0.974 | 50.8 | 0.494 | 48.9 | 0.995 | 51.1 | 0.221 | 50.7 |
| Oats | 50.1 | 1.000 | 50.0 | 1.000 | 49.9 | 1.000 | 51.1 | 1.000 | 54.3 |
| Palladium | 47.5 | 1.000 | 48.2 | 1.000 | 48.2 | 1.000 | 43.6 | 1.000 | 53.4 |
| Platinum | 45.3 | 1.000 | 50.0 | 1.000 | 50.3 | 1.000 | 48.5 | 1.000 | 53.5 |
| Rice | 0.000 | 0.000 | 50.0 | 0.875 | 51.9 | 0.097 | 50.8 | ||
| Silver | 0.007 | 0.000 | 48.0 | 1.000 | 52.6 | 0.925 | 53.7 | ||
| Soybean | 50.4 | 1.000 | 49.9 | 1.000 | 46.3 | 1.000 | 46.0 | 1.000 | 53.1 |
| Sugar | 50.7 | 0.280 | 49.8 | 0.720 | 48.0 | 0.999 | 46.9 | 1.000 | 50.3 |
Numbers in bold are accuracies that are statistically significant, in the sense that their p-value is below 5%. The p-value is calculated from a binomial test that compares the strategy accuracy against the naive strategy of always guessing the majority class (right most column).
Figure 3The top plot includes the cumulative returns from 2012 through 2020 for the different trading strategies. The black line represents the benchmark “buy and hold” strategy for copper futures. The solid green (purple) line represents the returns of the optimized LRF (GRF) trading strategy detailed in SI Appendix, based on the 8 technical indicators detailed in SI Appendix. The legend shows the average cumulative annual growth rate (CAGR), annualized Sharpe ratio, success rate (SR), maximum drawdown (MDD), and the fraction of time that a long (L) or short (S) positions is held. The LRF clearly outperforms the GRF across various metrics. The shaded background colors depict one of the rolling windows for in-sample (IS) training, cross-validation (CV) and out-of-sample (OS) prediction. These windows are rolled in steps of 75 (trading) days. The bottom right heatmap visualizes the “probabilistic” classification (fraction of DTs that predict at positive return) of a LRF trained on just two features (cf. axis labels). This structure is akin to the XOR-pattern of Fig. 1. The dashed green (purple) line represents the returns of the optimizes LRF (GRF) trading strategy detailed in SI Appendix, based on only the two XOR-like technical indicators. The outperformance of the LRF is yet more pronounced in this case.