| Literature DB >> 30794608 |
Abstract
Stock trend prediction is a challenging task due to the market's noise, and machine learning techniques have recently been successful in coping with this challenge. In this research, we create a novel framework for stock prediction, Dynamic Advisor-Based Ensemble (dynABE). dynABE explores domain-specific areas based on the companies of interest, diversifies the feature set by creating different "advisors" that each handles a different area, follows an effective model ensemble procedure for each advisor, and combines the advisors together in a second-level ensemble through an online update strategy we developed. dynABE is able to adapt to price pattern changes of the market during the active trading period robustly, without needing to retrain the entire model. We test dynABE on three cobalt-related companies, and it achieves the best-case misclassification error of 31.12% and an annualized absolute return of 359.55% with zero maximum drawdown. dynABE also consistently outperforms the baseline models of support vector machine, neural network, and random forest in all case studies.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30794608 PMCID: PMC6386270 DOI: 10.1371/journal.pone.0212487
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of related works.
Works are grouped by the machine learning models they utilize.
| Author (s) | General Approach | Dataset | Machine Learning Model | |
|---|---|---|---|---|
| Features | Target | |||
| Fama | General Indicators | Dividend yield | NYSE portfolio stock returns | Linear regression |
| Pesaran | General Indicators | Dividend yield, interest rates, inflation rates, and industrial production index | S&P 500 and Dow Jones Industrial portfolio stock returns | Linear regression |
| M.-C. Lee (2009) [ | General Indicators | Future contracts, spot indices, and previous day’s NASDAQ index | NASDAQ index | Support vector machine |
| Schumaker and Chen (2009) [ | Sentiment Analysis | Financial news articles | S&P 500 companies | Support vector machine |
| Hagenau | Sentiment Analysis | Corporate announcement news | Selected companies | Support vector machine |
| Saad | General Indicators | Historical prices | Selected companies | Time delay, recurrent, and probabilistic neural networks |
| Tsang | General Indicators | Historical prices | HSBC stock trend | 3-layer neural network |
| Tsai | General Indicators | Financial and macroeconomic indices | TSE listed companies | 3-layer neural network |
| Ding | Sentiment Analysis | News events | S&P 500 index and individual companies | Convolutional Neural Network |
| Nelson | General Indicators | Hitorical prices and technical indicators | Companies listed in IBovespa index | Recurrent (LSTM) Neural Netwok |
| Das | Sentiment Analysis | Twitter and other streaming data | Google, Microsoft, and Apple | Recurrent (LSTM) Neural Network |
| Chen and Wei (2018) [ | Intercorrelation of Corporations | Corporate information | 2988 companies listed in the “tushare” API | Convolutional Neural Network |
| Patel | General Indicators | Technical indicators | Two stock prices and two stock indices | Comparison between ANN, SVM, random forest, and naïve-Bayes |
| Ballings | General Indicators | Financial indices, corporate information, and economic indicators | Stock trend of 5767 listed European companies | Comparison between logistic regression, neural networks, k-nearest neighbor, SVM, random forest, and AdaBoost |
Fig 1Overview of dynABE.
Fig 2dynABE’s ensemble learning framework for one advisor.
Fig 3The training process of stacking.
We only show training the first-fold as an example, which is highlighted in red.
Fig 4Stabilizing prediction results through bootstrap aggregation.
We show processing sample 1 as an example. The same process would be repeated 10 times in practice.
Fig 5Available agents to be passed to online update.
Misclassification errors of each advisor for all three companies during the validation period.
The best performance of each advisor is bolded.
| Company | Classifier | Advisor 1 Error | Advisor 2 Error | Advisor 3 Error |
|---|---|---|---|---|
| Jinchuan | Linear Regression | (average) 36.33% | (average) 43.04% | (average) 39.74% |
| Logistic Regression | (average) 39.00% | (average) 40.29% | (average) 39.45% | |
| SVM | (average) 35.72% | (average) 40.47% | (average) 38.79% | |
| XGBoost | (average) 34.86% | (average) 36.35% | ||
| Rotation forest | (average) 40.00% | (average) 41.84% | (average) 39.13% | |
| Logistic Stacking | 35.43% | 36.75% | ||
| XGBoost Stacking | 33.86% | 35.70% | ||
| Rotation Forest Stacking | 35.43% | 38.06% | 38.85% | |
| Averaged Stacking | 35.70% | 34.12% | 36.75% | |
| Sumitomo | Linear Regression | (average) 34.03% | (average) 44.22% | (average) 40.06% |
| Logistic Regression | (average) 36.06% | (average) 43.72% | (average) 38.28% | |
| SVM | (average) 35.75% | (average) 43.75% | (average) 40.17% | |
| XGBoost | (average) 36.81% | (average) 35.22% | ||
| Rotation forest | (average) 35.89% | (average) 43.89% | (average) 39.78% | |
| Logistic Stacking | 31.67% | 44.17% | 35.00% | |
| XGBoost Stacking | 32.22% | 45.28% | 36.11% | |
| Rotation Forest Stacking | 32.50% | 43.33% | 34.72% | |
| Averaged Stacking | 43.89% | |||
| Zijin | Linear Regression | (average) 43.14% | (average) 43.72% | (average) 42.31% |
| Logistic Regression | (average) 41.86% | (average) 43.25% | (average) 42.36% | |
| SVM | (average) 43.06% | (average) 43.28% | (average) 41.31% | |
| XGBoost | (average) 41.81% | (average) 44.61% | (average) 42.19% | |
| Rotation forest | (average) 44.58% | (average) 43.00% | ||
| Logistic Stacking | 42.78% | 42.50% | ||
| XGBoost Stacking | 42.50% | 42.50% | 40.83% | |
| Rotation Forest Stacking | 42.50% | 43.89% | ||
| Averaged Stacking | 41.94% | 42.78% | 41.11% |
Online update experiments with Jinchuan.
Here we show common hyperparameter combinations and their effects on online update’s misclassification error. Then we grid search on the validation set and present the searched optimal combinations.
| Company: Jinchuan | ||
|---|---|---|
| Update Frequency | Diversity Bias | Error |
| 3 | 0 | 33.86% |
| 3 | 1 | 33.86% |
| 3 | 10 | 33.60% |
| 5 | 0 | 33.51% |
| 5 | 1 | 33.24% |
| 5 | 10 | 32.71% |
| 10 | 0 | 32.88% |
| 10 | 1 | 32.35% |
| 10 | 10 | 32.35% |
| (grid search) 5 | (grid search) 5 | 31.12% |
Online update experiments with Zijin.
| Company: Zijin | ||
|---|---|---|
| Update Frequency | Diversity Bias | Error |
| 3 | 0 | 42.58% |
| 3 | 1 | 42.58% |
| 3 | 10 | 40.34% |
| 5 | 0 | 42.54% |
| 5 | 1 | 41.41% |
| 5 | 10 | 40.28% |
| 10 | 0 | 42.29% |
| 10 | 1 | 41.14% |
| 10 | 10 | 40.86% |
| (grid search) 40 | (grid search) 31 | 37.19% |
Fig 6Online update hyperparameter grid search for Jinchuan.
Bigger and brigher bubbles represent higher accuracies.
Fig 8Online update hyperparameter grid search for Zijin.
Bubbles are denser for Zijin because its optimal hyperparameter combinations are outside the normal range, so its tuning range is also greater.
Comparison of stacking and online update errors.
The best performance of each company is bolded.
| Company | Advisor 1 Stacking Error | Advisor 2 Stacking Error | Advisor 3 Stacking Error | Online Update Error |
|---|---|---|---|---|
| Jinchuan | 35.43% (Logistic Stk.) | 33.86% (Logistic Stk.) | 36.75% (Logistic Stk.) | |
| 34.38% (XGBoost Stk.) | 33.86% (XGBoost Stk.) | 35.70% (XGBoost Stk.) | ||
| 35.43% (Rot. Forest Stk.) | 38.06% (Rot. Forest Stk.) | 38.85% (Rot. Forest Stk.) | ||
| 35.70% (Averaged Stk.) | 34.12% (Averaged Stk.) | 36.75% (Averaged Stk.) | ||
| Sumitomo | 31.67% (Logistic Stk.) | 44.17% (Logistic Stk.) | 35.00% (Logistic Stk.) | |
| 32.22% (XGBoost Stk.) | 45.28% (XGBoost Stk.) | 36.11% (XGBoost Stk.) | ||
| 32.50% (Rot. Forest Stk.) | 43.33% (Rot. Forest Stk.) | 34.72% (Rot. Forest Stk.) | ||
| 31.94% (Averaged Stk.) | 43.89% (Averaged Stk.) | 34.17% (Averaged Stk.) | ||
| Zijin | 42.78% (Logistic Stk.) | 41.67% (Logistic Stk.) | 42.50% (Logistic Stk.) | |
| 42.50% (XGBoost Stk.) | 42.50% (XGBoost Stk.) | 40.83% (XGBoost Stk.) | ||
| 42.50% (Rot. Forest Stk.) | 43.89% (Rot. Forest Stk.) | 40.00% (Rot. Forest Stk.) | ||
| 41.94% (Averaged Stk.) | 42.78% (Averaged Stk.) | 41.11% (Averaged Stk.) |
Comparison between baseline models and dynABE on the validation set.
Here we use misclassification errors as the evaluation metric. The best baseline performances are italicized, and the best overall performances are bolded.
| Company | Support Vector Machine | 3-layer Neural Network | Random Forest | dynABE |
|---|---|---|---|---|
| Jinchuan | 37.53% | 35.43% | ||
| Sumitomo | 41.39% | 44.44% | ||
| Zijin | 43.61% | 40.28% |
Fig 9Advisor weight history of Jinchuan.
Different advisors are represented by different colors, corresponding to the legend. The x-axis is the epochs. Each epoch means one weight update. The y-axis is the weights. A higher weight means that a certain advisor plays a more important role during online update.
Fig 11Advisor weight history of Zijin.
Fig 10Advisor weight history of Sumitomo.
Fig 12Accuracy history of Jinchuan.
x-axis is the dates, and y-axis is the cumulative accuracy up to a certain day.
Fig 14Accuracy history of Zijin.
Fig 15Absolute returns of the trading strategy and the stock price for Jinchuan.
Exact returns of the last day of the trading period are labled at the end of each trend line. The weight initialization period in the beginning is plotted as a flat black line.
Fig 17Absolute returns of the trading strategy and the stock price for Zijin.
Evaluations on trading strategies.
The returns have been annualized using 250 days as the number of trading days in a year.
| Company | Annualized Absolute Return (%) | Annualized Excess Return to Stock (%) | Annualized Excess Return to Index (%) | Sharpe Ratio | Maximum Drawdown |
|---|---|---|---|---|---|
| Jinchuan | 254.704 | 140.255 | 253.840 | 2.08089 | 0.930976 |
| Sumitomo | 359.549 | 343.168 | 358.643 | 2.15309 | 0 |
| Zijin | 77.2329 | 76.0122 | 76.2205 | 2.16598 | 0 |
Online update experiments with Sumitomo.
| Company: Sumitomo | ||
|---|---|---|
| Update Frequency | Diversity Bias | Error |
| 3 | 0 | 31.93% |
| 3 | 1 | 31.93% |
| 3 | 10 | 33.61% |
| 5 | 0 | 32.39% |
| 5 | 1 | 31.83% |
| 5 | 10 | 34.08% |
| 10 | 0 | 31.71% |
| 10 | 1 | 32.29% |
| 10 | 10 | 34.00% |
| (grid search) 12 | (grid search) 0 | 31.61% |