| Literature DB >> 35791405 |
Kenniy Olorunnimbe1, Herna Viktor1.
Abstract
The widespread usage of machine learning in different mainstream contexts has made deep learning the technique of choice in various domains, including finance. This systematic survey explores various scenarios employing deep learning in financial markets, especially the stock market. A key requirement for our methodology is its focus on research papers involving backtesting. That is, we consider whether the experimentation mode is sufficient for market practitioners to consider the work in a real-world use case. Works meeting this requirement are distributed across seven distinct specializations. Most studies focus on trade strategy, price prediction, and portfolio management, with a limited number considering market simulation, stock selection, hedging strategy, and risk management. We also recognize that domain-specific metrics such as "returns" and "volatility" appear most important for accurately representing model performance across specializations. Our study demonstrates that, although there have been some improvements in reproducibility, substantial work remains to be done regarding model explainability. Accordingly, we suggest several future directions, such as improving trust by creating reproducible, explainable, and accountable models and emphasizing prediction of longer-term horizons-potentially via the utilization of supplementary data-which continues to represent a significant unresolved challenge.Entities:
Keywords: Backtesting; Deep learning; Financial market; Machine learning; Neural network; Practice and application; Quantitative analysis; Stock market
Year: 2022 PMID: 35791405 PMCID: PMC9245389 DOI: 10.1007/s10462-022-10226-0
Source DB: PubMed Journal: Artif Intell Rev ISSN: 0269-2821 Impact factor: 9.588
A sample trade message for Apple Inc. (AAPL)
| Ticker symbol | AAPL |
|---|---|
| Name | Apple Inc. |
| Last trade price | 289.80 |
| Last trade timestamp | 1577480401 |
| Last trade volume | 35447203 |
| Exchange | NASDAQ |
Representative attributes by data types
| Market data attributes | Fundamental data attributes | Alternative data attributes |
|---|---|---|
| open price, high price, low price, close price, volume | revenue, earnings per share, market capitalization, dividend, average volume, shares outstanding, next earning date | google trends, news, texts, tweets, satellite imagery |
Intraday time bar for ticker IBM
| Date | Time | Open | High | Low | Close | Volume |
|---|---|---|---|---|---|---|
| 20160128 | 10:00 | 122.17 | 122.27 | 122.09 | 122.09 | 4,934 |
| 20160128 | 11:00 | 121.42 | 121.60 | 121.38 | 121.52 | 12,254 |
Characteristics of data in survey
| Source | Type | Frequency | Free | Library |
|---|---|---|---|---|
| Market, fundamental | Interday | Y | Investpy | |
| Market, fundamental | Interday, intraday | N | Na | |
| Market, fundamental | Interday, intraday | N | Na | |
| Market, fundamental | Interday, intraday | Y | yfinance | |
| Market | Interday* | Y | Kaggle-api | |
| Market, fundamental | Interday, intraday | N | Tws-api | |
| Taiwan market | Interday, intraday | Y | Na | |
| pypi.org/project/tushare | China market, fundamental | Interday, intraday | Y | Tushare |
| Market, fundamental | Interday, intraday | N | Na | |
| Market, fundamental | Interday, intraday | N | Na | |
| Market, fundamental | Interday, intraday | N | Na | |
| Market, fundamental | Interday, intraday | N | Na | |
| China market, fundamental | Interday, intraday | N | Na | |
| Nordic market | Intraday* | Y | Na | |
| UK market | Interday | N | Na | |
| Narket, fundamental | Interday, intraday | N | Na | |
| Taiwan market, fundamental | Interday, intraday | N | Na | |
| China market, fundamental | Interday, intraday | N | Jqdatasdk |
*Subject to availability
aWRDS—compustat daily updates
Characteristics of public data sources
| Source | Market data attributes | Fundamental data attributes | Frequency |
|---|---|---|---|
| investinga | Open price, high price, low price, close price, volume | Revenue, earnings per share, market capitalization, dividend, average volume, ratio, beta, shares outstanding, next earning date | Daily, weekly, monthly |
| y-financeb | Open price, high price, low price, close price, volume | Major holders, institutional holders, mutual fund holders, dividends, splits, actions, calendar, earnings, quarterly earnings, financials, quarterly financials, balance sheet, quarterly balance sheet, cashflow, quarterly cashflow, sustainability, shares outstanding | 1 min, 2 min, 5 min, 15 min, 30 min, 60 min, 90 mins, 1 h, 1 day, 5 days, 1 week, 1 month, 3 months |
| taifexc | Open bid, high bid, low bid, last bid, volume, best bid, best ask, historical high, historical low | Not available | Daily |
| kaggled | Open price, high price, low price, close price, volume | Not available | Daily |
| tusharee | Open price, high price, low price, close price, volume | Account receivable turn day, account receivable turnover, business income, current asset days, current asset turnover, earnings per share, earnings per share (year over year), fixed assets, gross profit rate, inventory days, inventory turnover, liquid assets, net profit ratio, net profits, outstanding, profits (year over year), report date, reserved, reserved per share, return on equity, time to market, total assets | Daily |
| etsinf | Open price, high price, low price, close price, volume | Not available | Daily |
ahttps://github.com/alvarobartt/investpy
bhttps://github.com/ranaroussi/yfinance
chttps://www.taifex.com.tw/enl/eng3/totalTableDate
dhttps://www.kaggle.com/datasets, https://github.com/kaggle/kaggle-api
ehttps://github.com/waditu/tushare
fhttps://etsin.fairdata.fi/
Fig. 1Survey structure
Fig. 3Candlestick & bar charts
Fig. 4Model of a typical neuron (Castro 2006)
Fig. 5Supervision-based learning technique
Fig. 6Learning technique based on data availability
Fig. 7Taxonomy of deep learning architecture used in stock market applications
Fig. 8n-layer feed-forward neural network (Castro 2006)
Fig. 9RNN (Goodfellow et al. 2016)
Fig. 10LSTM & GRU (Goodfellow et al. 2016)
Fig. 11Architecture of a convolutional neural network (Goodfellow et al. 2016)
Fig. 12A simple Autoencoder (Goodfellow et al. 2016)
Fig. 13Reinforcement Learning (François-Lavet et al. 2018)
Fig. 2Intraday tick time series showing trade price and volume within the trading hours, across 2 days (Investing.com 2013)
Fig. 14Time-series for the same value of
Fig. 15Backtesting strategies
Machine learning evaluation metrics
| Evaluation | Description | Formula |
|---|---|---|
| Accuracy | The percentage of the correctly predicted classes. | |
| Error rate | The percentage of incorrectly predicted classes. Also computed as | |
| Recall | Ratio of true positive classes; also known as measure of exactness or sensitivity. | |
| Precision | Ratio of positive predictions; also known as measure of completeness. | |
| F-score | Harmonic mean of recall and precision. | |
| Weighted F-score | Weighted measure of recall and precision. | |
| Mean absolute error (MAE) | Average of the absolute difference between the predicted values and the actual values. | |
| Mean absolute percentage error (MAPE) | Average of the percentage errors. | |
| Mean square error (MSE) | Average of the squared difference between the predicted values and the actual values. |
Financial evaluation metrics
| Evaluation | Description | Formula |
|---|---|---|
| Returns | Total amount gained or lost within a specific investment period, typically measured as a percentage of the original investment known as Rate of Returns (RoR) (Kenton | |
| Compound annual growth rate (CAGR) | The ROR for investment over a number of years, with returns re-invested yearly (Murphy | |
| Volatility | Degree of variation in asset or total portfolio value (Investopedia | |
| Sharpe ratio | Measures performance in comparison with a risk-free asset, with adjustments for volatility or total risk (Hargrave | |
| Sortino ratio | A modification of the Sharpe Ratio that differentiates harmful volatility from overall volatility (Kenton | |
| Maximum drawdown (MDD) | Measures the decline of a return from a peak before a new peak that is at least equal to the old peak is achieved (Hayes | |
| Calmar ratio | Risk-adjusted returns (Will Kenton | |
| Value-at-risk (VaR) threshold | Estimate (as threshold) of maximum loss for an investment over time (Harper |
Quantifying papers by publication and year of publication
| Publisher | Count | Year | Count |
|---|---|---|---|
| IEEE | 9 | 2018 | 6 |
| arXiv | 8 | 2019 | 10 |
| SSRN | 5 | 2020 | 19 |
| Elsevier | 3 | ||
| ACM | 2 | ||
| MDPI | 2 | ||
| Springer | 2 | ||
| IOP Publishing | 1 | ||
| Wiley | 1 | ||
| IJCAI | 1 | ||
| Institutional Investor Journals | 1 |
Quantifying the architectures and _elds considered by publications surveyed
Summary of publications
| A: architecture, B: market(s), C: dataset source, D: reproducibility | |||||
|---|---|---|---|---|---|
| References | A | B | C | D | |
| Trade strategy | Wang et al. ( | DRL, LSTM | China, US | wind, wrds | No |
| Li et al. ( | DRL | US | kaggle | No | |
| Théate and Ernst ( | DRL, FFNN | Asia, US, Europe | unspecified | Yes | |
| Zhang et al. ( | DRL | US | pinnacle | No | |
| Chakole and Kurhekar ( | DRL, FFNN | US, India | yahoo | No | |
| Wu et al. ( | DRL, LSTM | China | tushare | No | |
| Hu et al. ( | Autoencoder, CNN | UK | unspecified | No | |
| Lei et al. ( | CNN | China | tushare | No | |
| Chen et al. ( | CNN | Taiwan | apex | No | |
| Wu et al. ( | LSTM | Taiwan | tfe | No | |
| Koshiyama et al. ( | Autoencoder, LSTM | Global | bloomberg | Yes | |
| Sun et al. ( | LSTM | US | ibkr | No | |
| Silva et al. ( | LSTM | Unspecified | unspecified | No | |
| Wang et al. ( | LSTM | China | joinquant | No | |
| Chalvatzis and Hristu-Varsakelis ( | LSTM | US | unspecified | No | |
| Price prediction | Wang et al. ( | Conv-LSTM, RNN | China, US | ibkr | No |
| Zhang et al. ( | CNN, LSTM | UK, Nordic | lse, etsin | No | |
| Zhao et al. ( | Autoencoder, CNN, LSTM | US | unspecified | No | |
| Zhang et al. ( | Autoencoder, CNN, LSTM | China | unspecified | No | |
| Fang et al. ( | LSTM | China | private | No | |
| Baek and Kim ( | LSTM | US | yahoo | Yes | |
| Wang et al. ( | CNN | US | unspecified | No | |
| Zhang et al. ( | Autoencoder | China | unspecified | No | |
| Portfolio management | Liang et al. ( | DRL | China | investing, wind | Yes |
| Park et al. ( | DRL | Korea, US | investing, yahoo | No | |
| Guo et al. ( | DRL, CNN | China | unspecified | Yes | |
| Wang and Wang ( | FFNN | US | bloomberg | No | |
| Market simulation | Maeda et al. ( | DRL, LSTM, CNN | Simulated | none | No |
| Buehler et al. ( | Autoencoder | US | unspecified | Yes | |
| Raman and Leidner ( | DRL | US | trkd | No | |
| Stock selection | Zhang et al. ( | FFNN | China | unspecified | No |
| Amel-Zadeh et al. ( | RNN, FFNN | US | wrds | No | |
| Yang et al. ( | CNN, LSTM | China | unspecified | No | |
| Risk management | Arimond et al. ( | CNN, FFNN, LSTM, RNN | EU, UK, US | refinitive | No |
| Hedging strategy | Ruf and Wang ( | FFNN | EU, US | optionm, datashop | Yes |
investing investing.com, wrds wrds-www.wharton.upenn.edu, bloomberg bloomberg.com, yahoo finance.yahoo.com, kaggle kaggle.com, ibkr interactivebrokers.com, tfe taifex.com.tw, tushare pypi.org/project/tushare, optionm optionmetrics.com, refinitive refinitiv.com, datashop datashop.deutsche-boerse.com, trkd trkd.thomsonreuters.com, wind wind.com.cn, etsin etsin.fairdata.fi, lse londonstockexchange.com, pinnacle pinnacledata2.com, apex apex.com.tw, joinquant joinquant.com
Quantifying evaluation measures used in different specializations
| TS | PP | MS | SS | PM | RM | HS | |
|---|---|---|---|---|---|---|---|
| Returns | 13 | 8 | 2 | 2 | 4 | 1 | 0 |
| MDD | 8 | 2 | 1 | 2 | 0 | 0 | 0 |
| Sharpe ratio | 7 | 3 | 1 | 1 | 3 | 0 | 0 |
| Sortino ratio | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| Calmar ratio | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| Accuracy | 3 | 1 | 0 | 2 | 0 | 0 | 0 |
| Volatility | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
| Recall | 2 | 1 | 1 | 1 | 0 | 0 | 0 |
| Precision | 2 | 2 | 1 | 1 | 0 | 0 | 0 |
| F-score | 2 | 1 | 1 | 1 | 0 | 0 | 0 |
| VaR threshold | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| MAE | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| MAPE | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| MSE | 1 | 3 | 0 | 0 | 0 | 0 | 1 |
TS trade strategy, PP price prediction, MS market simulation, SS stock selection, PM portfolio management, RM risk management, HS hedging strategy
Highlights of and problems with publications reviewed
| Ref. | Highlights/pros | Problems/cons |
|---|---|---|
| Wang et al. ( | Clear implementation of DRL and LSTM with adequate historical data and extensive evaluation metrics | Examples of interpretability should be provided for more than one timeframe |
| Li et al. ( | DRL hybrid with Adaboost ensemble that provides good performance | Discussion included ML evaluations that were not presented |
| Théate and Ernst ( | Extensive evaluation criteria with adequate consideration for trading cost | Details of backtesting not provided |
| Zhang et al. ( | Includes tests across a vast amount of financial instruments and evaluation measures | Unclear on how or why cross-validation was combined with the backtesting approach that was employed to control overfitting. |
| Chakole and Kurhekar ( | Focus on market trends using extensive financial and ML evaluation metrics and incorporating transaction costs | Interpretability insights needed to provide context for the good performance |
| Wu et al. ( | Extensive evaluation metrics and well-defined backtesting strategy | Lacking conversation regarding interpretability |
| Hu et al. ( | Uses chart representations of financial data as DL input, producing a good performance | Numerical representation of the same data is missing, precluding a balanced comparison. Furthermore, there is no discussion of model interpretability |
| Lei et al. ( | ResNet used to improve the effectiveness of moving average indicators in terms of financial and ML evaluation metrics | Minimal explanation for the model’s performance |
| Chen et al. ( | Uses a significant amount of high-frequency trading data as input image in a pair trading setup using CNN | Minimal evaluation of returns and no evaluation results presented. Furthermore, there are no comparisons with the raw numerical data |
| Wu et al. ( | Uses a high-frequency trading technique to predict profitability on daily options trading using LSTM. Provides extensive details regarding the backtesting approach | No baseline comparison provided |
| Koshiyama et al. ( | Uses LSTM encoder-decoder to transfer trends across 58 different global markets with impressive results across multiple financial and ML evaluation metrics | No interpretation of the model’s operation or details of the featured transfer |
| Sun et al. ( | Predicts futures market movement using LSTM across multiple criteria, including simulated live trading. Additionally, multiple well backtested models are generated with different parameters and time windows | Lack of clarity regarding why the presented model’s accuracy is worse than chance. A baseline comparison with financial metrics could provide that clarity |
| Silva et al. ( | Clear presentation of the strategy, and evaluation across multiple financial and ML criteria | No insights or explanations regarding the output of the LSTM model employed |
| Wang et al. ( | Combines LSTM with market indicators in a novel manner with promising results | The presented evaluation is unclear, and no insights are offered regarding the model’s performance |
| Chalvatzis and Hristu-Varsakelis ( | Nine DL ML models are combined with LSTM in an ensemble; a well-formalized trading strategy, training, and testing conducted using a practical rolling windows approach and a complete set of evaluation criteria | Although the general discussion regarding evaluation is extensive, it does not provide insights into the model’s performance in relation to the input features |
| Wang et al. ( | Convolutional LSTM enables price prediction with improved performance while controlling for overfitting | Lack of discussion regarding explainability |
| Zhang et al. ( | Combines LSTM and CNN to capture spatial structure in LOB and features sufficient backtesting | Given the approach to backtesting, there is no indication of whether multiple models have been created or the same model is updated |
| Zhao et al. ( | Uses market charts as input for a CAE that serves as LSTM input | Approach to backtesting unclear due to the unusual data split across training, validation, and test sets. Furthermore, lacks sufficient baseline comparisons and offers no discussion regarding model explainability |
| Zhang et al. ( | Combines LSTM with Autoencoder and CNN for improved predictive results across financial and ML metrics | Insufficient discussion regarding model explainability |
| Fang et al. ( | Regression model is combined with LSTM for better predictive performance | Concludes that the results are not stable for backtested data |
| Baek and Kim ( | Uses LSTM for data augmentation, specifically targeting controlling overfitting. Provides extensive results across ML, financial, and statistical criteria and discusses model performance | No justification for why the work only considered price data, rendering the provided model’s explanation less complete |
| Wang et al. ( | Uses one-dimensional CNN for price prediction, demonstrating better generalization than SVM and FFNN | Makes an argument against a buy-and-hold baseline; however evaluation results based on the argument would be sufficient evidence. More discussion regarding model explainability needed |
| Zhang et al. ( | Provides good evaluation results for the use of an Autoencoder for feature reduction in an ensemble learning setup | Lack of clarity regarding the backtesting strategy and the data splits. Furthermore, no discussion provided regarding model explainability |
| Liang et al. ( | Early attempt at using DRL in the financial market featuring sufficient backtesting and evaluation of results | No discussion regarding model explainability. Furthermore, the paper concludes that the results are unfavorable |
| Park et al. ( | Uses Q Learning to derive trading strategies in a simulated feature space to gain experience beyond the available data. Impressive performance in comparison to the baseline | No discussion concerning insights into the model’s decisions |
| Guo et al. ( | Ensemble of portfolio management using the existing state-of-the-art strategy with DRL to provide a vast improvement on returns. | Lack of discussion regarding model explainability, a necessity for insights into the vastly improved performance |
| Wang and Wang ( | Uses ResNet to address overfitting problems when presented with noisy financial data. Sufficient backtesting results provided across statistical and financial metrics | No insights into model performance provided to help understand the data features contributing to the performance |
| Maeda et al. ( | Uses DRL and LSTM to simulate market data, enabling the creation of theoretical market conditions with impressive results for returns compared to the baseline | Lack of discussion regarding model explainability |
| Buehler et al. ( | Provides a good overview of the theories involved in generative financial data modeling | Although there is an argument that no value is derived from using more data, it is worth investigating including a comparison with more kinds of real data based on multiple market scenarios |
| Raman and Leidner ( | Simulates up to a year of market data using only 6 weeks of real market data; simulated data is used with DRL for test trading decisions with sufficient baseline comparisons | No financial evaluation metrics for the simulated trades. Furthermore, performance implications of a longer time frame for input and simulated data would be useful |
| Zhang et al. ( | Combines LSTM with boosting ensembles to identify key market features and reduce overfitting | No comparisons with traditional feature reduction methods such as PCA and no discussion of model explainability |
| Amel-Zadeh et al. ( | Using only fundamental data, compares RNN and FFNN models with non-DL algorithms; the non-DL methods outperform the DL models | No information on the completeness of the input data, namely, lagged dates, and no insights into the model’s performance |
| Yang et al. ( | Compares CNN with LSTM, with features derived from profit indicators | No comparisons with other baseline strategies and no discussion of model explainability |
| Arimond et al. ( | Specifically targeted at using CNN and LSTM to estimate VaR with a focus on future research potentials | Fails to formally present the baseline evaluation results or provide an explanation or suggestions regarding potential model performance |
| Ruf and Wang ( | Uses FFNN to predict a derived metric that it uses in a hedging strategy with promising results | No consideration of the state of model or discussion of insights from the model output |
| Predicted | ||||
|---|---|---|---|---|
| Positive | Negative | Total | ||
| Actual | Positive | |||
| Negative | ||||
| Total | ||||
| AI | Artificial intelligence |
| ANN | Artificial neural networks |
| API | Application Programming Interface |
| CAE | Convoluted autoencoder |
| CAGR | Compound annual growth rate |
| CBOE | Chicago Board Options Exchange |
| CNN | Convolutional neural network |
| DBN | Deep belief networks |
| DL | Deep learning |
| DQN | Deep Q-Network |
| DRL | Deep reinforcement learning |
| EAIML | Ethical AI & machine learning |
| FFNN | Feed-forward neural networks |
| FIX | Financial information exchange |
| GRU | Gated recurrent unit |
| GTP | GPRS tunnelling protocol |
| HMM | Hidden markov model |
| IID | Independent and identically distributed |
| KLD | Kullback-Leibler divergence |
| LOB | Limit order book |
| LSTM | Long short-term memory |
| MACD | Moving average convergence/divergence |
| MAE | Mean absolute error |
| MAPE | Mean absolute percentage error |
| MDD | Maximum drawdown |
| MDP | Markov decision process |
| ML | Machine learning |
| MSE | Mean square error |
| NLP | Natural language processing |
| OHLC | Open-high-low-close |
| PCA | Principal Component Analysis |
| PReLU | Parametric ReLU |
| RBM | Restricted Boltzmann machine |
| RL | Reinforcement learning |
| RNN | Recurrent neural network |
| ReLU | Rectified linear unit |
| ResNet | Residual network |
| RoR | Rate of returns |
| SVM | Support vector machines |
| TQDN | Trading deep Q-network |
| VIX | Volatility index |
| VWAP | Volume-weighted average price |
| VaR | Value-at-risk |
| WRDS | Wharton research data services |
| XAI | Explainable AI |