| Literature DB >> 32716945 |
Abstract
Despite active research on trading systems based on reinforcement learning, the development and performance of research methods require improvements. This study proposes a new action-specialized expert ensemble method consisting of action-specialized expert models designed specifically for each reinforcement learning action: buy, hold, and sell. Models are constructed by examining and defining different reward values that correlate with each action under specific conditions, and investment behavior is reflected with each expert model. To verify the performance of this technique, profits of the proposed system are compared to those of single trading and common ensemble systems. To verify robustness and account for the extension of discrete action space, we compared and analyzed changes in profits of the three actions to our model's results. Furthermore, we checked for sensitivity with three different reward functions: profit, Sharpe ratio, and Sortino ratio. All experiments were conducted with S&P500, Hang Seng Index, and Eurostoxx50 data. The model was 39.1% and 21.6% more efficient than single and common ensemble models, respectively. Considering the extended discrete action space, the 3-action space was extended to 11- and 21-action spaces, and the cumulative returns increased by 427.2% and 856.7%, respectively. Results on reward functions indicated that our models are well trained; results of the Sharpe and Sortino ratios were better than the implementation of profit only, as in the single-model cases. The Sortino ratio was slightly better than the Sharpe ratio.Entities:
Mesh:
Year: 2020 PMID: 32716945 PMCID: PMC7384672 DOI: 10.1371/journal.pone.0236178
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of trading system studies using reinforcement learning.
| Authors (year) | State | Action | Reward | Data description (Training:Test) | Method |
|---|---|---|---|---|---|
| Saud Almahdi, Steve Y. Yang (2017) | 104 (weekly 2 years) | 3 (-1, 0, 1) | SR STR CR | 5 Funds Jan. 2011–Dec. 2015 (6:4) | RRL |
| Yang Wang et al. (2017) | 200 (daily delta price) | 3 (-1, 0, 1) | Long-term return (100 days) | 2 Index Jan. 2001–Dec. 2015 (4:11) Online-learning | DQN |
| Gyeeun Jeong, Ha Young Kim (2019) | 200 (daily close price) | 3 (-1, 0, 1) + 1 (# of shares) | Long-term return (200 days) | 4 Index Jan. 1987–Dec. 2017 Jan. 2008–Dec. 2017 Apr. 1991–Dec. 2017 Jul. 1997–Dec. 2017 (approx. 4:1:5) Online-learning | DQN + Extra networks |
| John Moody, Matthew Saffell (2001) | 84 (monthly price) | 3 (-1, 0, 1) | Profit DSR DDR | S&P500, T-Bill Jan. 1950–Dec. 1994 (4:5) | RRL |
| Huang, Chien-Yi (2018) | 198 (Time 3, Market 12x16, Position 3) | 3 (-1, 0, 1) | Log return | 12 Currency Jan. 2012–Dec. 2017 (tick) Online-learning | DRQN |
| Yue Deng et al. (2017) | 150 (minute price, 50x3) | 3 (-1, 0, 1) | SR TP | IF future, silver, sugar Jan. 2014–Sep. 2015 (1:5) Jan. 2014–Jan. 2015 (2:5) Online-learning | FDRNN + DRL |
| Parag C. Pendharkar et al. (2018) | 4 (yearly asset statement) | 5 (0:10, 2.5:7.5, 5:5, 7.5:2.5, 10:0) | Profit DSR DDR | S&P500, T-note, AGG 1976–2016 (26:15) | SARSA Q-learning |
| Authors of the present study | 200 (daily close price) | 3 (-1, 0, 1) 11 (±5~±1, 0) 21 (±10~±1, 0) | Long-term return (100days) Sharpe ratio Sortino ratio | 3 Index Jan. 1987–Dec. 2017 (21:10) Online-learning | DQN + Expert Ensemble |
SR: Sharpe ratio, STR: Sterling ratio, CR: Calmar ratio, DSR: differential Sharpe ratio, DDR: downside deviation ratio, TP: total profit, RRL: recurrent reinforcement learning, DQN: deep Q-network, DRQN: deep recurrent Q-network, FDRNN: fuzzy deep recurrent neural network, DRL: deep reinforcement learning, SARSA: on-policy reinforcement learning algorithm
Fig 1Interaction of agent and environment in reinforcement learning.
Predetermined positive constants of the expert model by profit interval.
| Expert model: Buy | Expert model: Hold | Expert model: Sell | |||
|---|---|---|---|---|---|
| Range of profit | Predetermined positive constant ( | Range of profit | Predetermined positive constant ( | Range of profit | Predetermined positive constant ( |
| -∞– 0.3% | 1 | -∞–-0.3% | 1 | -∞–-5% | 10 |
| 0.3–1% | 3 | -0.3–0.3% | 7 | -5 –-3% | 7 |
| 1–2% | 5 | 0.3 –∞% | 1 | -3 –-2% | 6 |
| 2–3% | 6 | -2 –-1% | 5 | ||
| 3–5% | 7 | -1 –-0.3% | 3 | ||
| 5 –∞% | 10 | -0.3 –∞% | 1 | ||
Fig 2Process of common ensemble model and our proposed model.
Description of data sets’ periods.
| Index | Training | Test | ||
|---|---|---|---|---|
| Period | # of data | Period | # of data | |
| S&P500 | Jan 2, 1987–Dec 29, 2006 | 5040 | Jan 3, 2007–Dec 29, 2017 | 2767 |
| Hang Seng Index (HSI) | Jan 2, 1987–Dec 29, 2006 | 4935 | Jan 2, 2007–Dec 29, 2017 | 2710 |
| Eurostoxx50 | Jan 1, 1987–Dec 29, 2006 | 5151 | Jan 3, 2007–Dec 29, 2017 | 2745 |
Fig 3Movements of the three indices in the training and test periods (Buy and Hold strategy).
Fig 4Profit distribution during the training period and data balance for buy, hold, and sell.
Descriptive statistics of each index data sets.
| S&P500 | HSI | STOXX50 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Train | Test | Total | Train | Test | Total | Train | Test | Total | |
| 5039 | 2767 | 7806 | 4934 | 2710 | 7644 | 5150 | 2745 | 7895 | |
| 0.0004 | 0.0003 | 0.0004 | 0.0006 | 0.0003 | 0.0005 | 0.0004 | 0.0001 | 0.0003 | |
| 0.0107 | 0.0126 | 0.0114 | 0.0168 | 0.0159 | 0.0165 | 0.0122 | 0.0150 | 0.0132 | |
| -0.2047 | -0.0904 | -0.2047 | -0.3333 | -0.1270 | -0.3333 | -0.0793 | -0.0862 | -0.0862 | |
| -0.0046 | -0.0040 | -0.0044 | -0.0065 | -0.0068 | -0.0066 | -0.0051 | -0.0069 | -0.0058 | |
| 0.0005 | 0.0006 | 0.0006 | 0.0007 | 0.0005 | 0.0006 | 0.0007 | 0.0001 | 0.0005 | |
| 0.0056 | 0.0055 | 0.0056 | 0.0083 | 0.0079 | 0.0082 | 0.0063 | 0.0072 | 0.0066 | |
| 0.0910 | 0.1158 | 0.1158 | 0.1882 | 0.1435 | 0.1882 | 0.0733 | 0.1100 | 0.1100 | |
| -1.4796 | -0.1033 | -0.8337 | -1.9444 | 0.2875 | -1.2299 | -0.1703 | 0.1163 | -0.0354 | |
| 31.7070 | 11.1981 | 21.7192 | 45.6856 | 9.4156 | 34.5147 | 5.3473 | 5.8628 | 5.9805 | |
Fig 5Q-Q plots for each index.
Fig 6Model training and the ensemble process on training as well as the test phase.
Fig 7The entire experiment processes.
Comparison of top five models’ average profits of single models with different reward functions.
| Top 5 average, S&P500 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| # of action | Ratio | Window size | Cross | ||||||
| 0 | 15 | 20 | 25 | 30 | 35 | 50 | Average | ||
| 3 | Profit | 3.516 ±0.131 | - | - | - | - | - | - | - |
| Sharpe | - | 3.133 ±0.050 | 3.192 ±0.098 | 3.159 ±0.035 | 3.178 ±0.146 | 3.203 ±0.094 | 3.242 ±0.047 | 3.184 ±0034 | |
| Sortino | - | 3.133 ±0.103 | 3.212 ±0.127 | 3.156 ±0.144 | 3.153 ±0.063 | 3.258 ±0.110 | 3.247 ±0.068 | 3.193 ±0.048 | |
| 11 | Profit | 8.958 ±0.584 | - | - | - | - | - | - | - |
| Sharpe | - | 10.369 ±0.252 | 10.654 ±0.787 | 10.402 ±0.337 | 11.907 ±0.420 | 10.328 ±0.674 | 10.274 ±0.241 | 10.656 ±0.572 | |
| Sortino | - | 10.272 ±0.588 | 10.455 ±0.402 | 10.647 ±0.634 | 10.351 ±0.057 | 10.849 ±0.560 | 10.516 ±0.337 | 10.515 ±0.191 | |
| 21 | Profit | 20.71 ±0.424 | - | - | - | - | - | - | - |
| Sharpe | - | 20.13 ±0.341 | 19.992 ±0.362 | 20.006 ±0.974 | 20.001 ±0.621 | 19.476 ±0.252 | 20.583 ±0.385 | 20.031 ±0.323 | |
| Sortino | - | 20.056 ±0.472 | 20.119 ±0.386 | 20.098 ±0.937 | 20.223 ±0.614 | 19.898 ±0.231 | 22.584 ±0.758 | 20.496 ±0.939 | |
| 3 | Profit | 3.22 ±0.032 | - | - | - | - | - | - | - |
| Sharpe | - | 3.576 ±0.032 | 3.486 ±0.096 | 3.426 ±0.057 | 3.443 ±0.116 | 3.537 ±0.064 | 3.488 ±0.092 | 3.493 ±0.051 | |
| Sortino | - | 3.281 ±0.129 | 3.574 ±0.116 | 3.503 ±0.062 | 3.525 ±0.157 | 3.667 ±0.069 | 3.554 ±0.088 | 3.517 ±0.118 | |
| 11 | Profit | 10.952 ±0.167 | - | - | - | - | - | - | - |
| Sharpe | - | 11.968 ±0.508 | 12.117 ±0.437 | 12.175 ±0.305 | 11.828 ±0.459 | 11.411 ±0.427 | 11.598 ±0.617 | 11.85 ±0.273 | |
| Sortino | - | 11.446 ±0.330 | 11.75 ±0.194 | 12.156 ±0.363 | 11.319 ±0.251 | 11.801 ±0.271 | 11.846 ±0.544 | 11.72 ±0.274 | |
| 21 | Profit | 21.436 ±1.165 | - | - | - | - | - | - | - |
| Sharpe | - | 23.514 ±0.270 | 22.393 ±0.460 | 22.269 ±0.224 | 22.476 ±1.008 | 22.924 ±0.511 | 23.129 ±0.631 | 22.784 ±0.445 | |
| Sortino | - | 23.313 ±0.654 | 22.31 ±0.362 | 23.267 ±0.300 | 23.29 ±0.363 | 22.532 ±0.867 | 23.456 ±0.322 | 23.028 ±0.438 | |
| 3 | Profit | 3.379 ±0.093 | - | - | - | - | - | - | - |
| Sharpe | - | 3.456 ±0.096 | 3.415 ±0.118 | 3.425 ±0.174 | 3.553 ±0.202 | 3.319 ±0.028 | 3.284 ±0.025 | 3.409 ±0.088 | |
| Sortino | - | 3.556 ±0.205 | 3.265 ±0.112 | 3.389 ±0.017 | 3.314 ±0.053 | 3.327 ±0.105 | 3.415 ±0.125 | 3.378 ±0.094 | |
| 11 | Profit | 9.068 ±0.127 | - | - | - | - | - | - | - |
| Sharpe | - | 10.883 ±0.408 | 11.942 ±0.685 | 10.947 ±0.179 | 10.906 ±0.088 | 11.233 ±0.851 | 11.645 ±0.687 | 11.259 ±0.404 | |
| Sortino | - | 11.323 ±0.370 | 10.591 ±0.110 | 10.704 ±0.364 | 12.412 ±0.167 | 11.219 ±0.554 | 11.497 ±0.559 | 11.291 ±0.597 | |
| 21 | Profit | 22.99 ±0.740 | - | - | - | - | - | - | - |
| Sharpe | - | 20.649 ±0.405 | 23.793 ±1.162 | 21.978 ±0.301 | 22.241 ±1.184 | 20.651 ±0.272 | 21.049 ±0.536 | 21.727 ±1.109 | |
| Sortino | - | 21.258 ±1.055 | 22.624 ±0.694 | 21.778 ±0.424 | 23.255 ±0.843 | 22.325 ±0.850 | 21.098 ±0.455 | 22.056 ±0.760 | |
Comparison of top 2 action-specialized expert models and single models.
| Top 2 returns of expert and single model, S&P500 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| # of action | # of model | Single Profit | Profit Expert | Single Sortino | Sortino Expert | ||||
| Buy | Hold | Sell | Buy | Hold | Sell | ||||
| 3 | 1 | 3.774 | 3.657 | 3.709 | 3.089 | 3.242 | 3.563 | 3.521 | 3.305 |
| 2 | 3.492 | 3.473 | 3.252 | 2.913 | 3.217 | 3.481 | 3.396 | 3.269 | |
| 11 | 1 | 9.899 | 11.376 | 11.056 | 9.297 | 10.43 | 11.395 | 9.907 | 9.612 |
| 2 | 9.066 | 11.049 | 10.624 | 9.189 | 10.393 | 11.154 | 9.848 | 9.569 | |
| 21 | 1 | 21.482 | 22.266 | 19.377 | 22.296 | 21.237 | 19.314 | 19.491 | 22.047 |
| 2 | 20.85 | 21.488 | 18.59 | 21.502 | 20.43 | 18.939 | 19.056 | 20.322 | |
| 3 | 1 | 3.269 | 3.625 | 3.433 | 3.71 | 3.555 | 3.767 | 3.552 | 3.498 |
| 2 | 3.232 | 3.436 | 3.281 | 3.483 | 3.504 | 3.426 | 3.378 | 3.204 | |
| 11 | 1 | 11.196 | 10.421 | 10.775 | 10.771 | 11.686 | 10.17 | 10.901 | 11.972 |
| 2 | 11.108 | 10.102 | 10.401 | 10.624 | 11.494 | 10.1 | 10.804 | 11.521 | |
| 21 | 1 | 22.947 | 19.18 | 21.034 | 20.011 | 23.892 | 24.41 | 23.438 | 22.718 |
| 2 | 22.764 | 18.943 | 20.921 | 19.813 | 23.427 | 23.84 | 23.099 | 22.066 | |
| 3 | 1 | 3.479 | 3.467 | 3.549 | 3.209 | 3.413 | 3.573 | 3.296 | 3.302 |
| 2 | 3.473 | 3.363 | 3.411 | 3.19 | 3.323 | 3.384 | 3.255 | 3.174 | |
| 11 | 1 | 9.291 | 9.837 | 9.921 | 9.945 | 12.628 | 9.883 | 10.857 | 10.807 |
| 2 | 9.09 | 9.575 | 9.796 | 9.783 | 12.593 | 9.462 | 10.561 | 10.801 | |
| 21 | 1 | 24.395 | 24.544 | 21.074 | 20.831 | 22.729 | 23.951 | 21.558 | 22.225 |
| 2 | 23.038 | 23.48 | 20.919 | 20.332 | 22.229 | 22.287 | 20.646 | 21.477 | |
Cumulative profits of top five models on S&P500.
| # of action | # of model | Reward function and Ensemble | |||||
|---|---|---|---|---|---|---|---|
| Profit | PE | EPE | Sortino | SE | ESE | ||
| 1 | 3.774 | 4.093 | 4.394 | 3.242 | 4.142 | 4.499 | |
| 2 | 3.492 | 4.016 | 4.36 | 3.217 | 3.781 | 4.488 | |
| 3 | 3.446 | 4.003 | 4.221 | 3.104 | 3.757 | 4.471 | |
| 4 | 3.438 | 3.85 | 4.153 | 3.102 | 3.669 | 4.443 | |
| 5 | 3.43 | 3.792 | 4.106 | 3.1 | 3.655 | 4.423 | |
| Avg | 3.516 ±0.131 | 3.951 ±0.112 | 4.247 ±0.113 | 3.153 ±0.063 | 3.801 ±0.177 | 4.465 ±0.028 | |
| 1 | 9.899 | 10.145 | 17.059 | 10.43 | 13.02 | 16.591 | |
| 2 | 9.066 | 9.942 | 16.602 | 10.393 | 12.589 | 15.829 | |
| 3 | 9.06 | 9.912 | 16.178 | 10.346 | 12.197 | 15.821 | |
| 4 | 8.64 | 9.878 | 16.083 | 10.32 | 12.038 | 15.795 | |
| 5 | 8.123 | 9.736 | 15.97 | 10.267 | 11.997 | 15.68 | |
| Avg | 8.958 ±0.584 | 9.923 ±0.132 | 16.378 ±0.402 | 10.351 ±0.057 | 12.368 ±0.387 | 15.943 ±0.328 | |
| 1 | 21.482 | 24.564 | 28.346 | 21.237 | 23.457 | 26.926 | |
| 2 | 20.85 | 24.431 | 26.926 | 20.43 | 23.267 | 26.573 | |
| 3 | 20.46 | 24.331 | 26.837 | 20.267 | 21.88 | 26.292 | |
| 4 | 20.401 | 23.835 | 26.576 | 19.659 | 21.843 | 26.258 | |
| 5 | 20.359 | 23.626 | 26.499 | 19.522 | 21.458 | 26.14 | |
| Avg | 20.71 ±0.424 | 24.158 ±0.363 | 27.037 ±0.673 | 20.223 ±0.613 | 22.381 ±0.817 | 26.438 ±0.282 | |
PE: profit ensemble (common ensemble), EPE: expert profit ensemble, SE: Sortino ensemble (common ensemble), ESE: expert Sortino ensemble
Cumulative profits of top five models on Eurostoxx50.
| # of action | # of model | Reward function and Ensemble | |||||
|---|---|---|---|---|---|---|---|
| Profit | PE | EPE | Sortino | SE | ESE | ||
| 1 | 3.479 | 3.768 | 4.348 | 3.413 | 3.776 | 4.124 | |
| 2 | 3.473 | 3.747 | 4.278 | 3.323 | 3.657 | 4.112 | |
| 3 | 3.398 | 3.717 | 4.265 | 3.283 | 3.577 | 4.103 | |
| 4 | 3.298 | 3.635 | 4.256 | 3.281 | 3.571 | 4.099 | |
| 5 | 3.248 | 3.582 | 4.194 | 3.271 | 3.566 | 4.006 | |
| Avg | 3.379 ±0.093 | 3.69 ±0.070 | 4.268 ±0.049 | 3.314 ±0.052 | 3.629 ±0.081 | 4.089 ±0.042 | |
| 1 | 9.291 | 10.025 | 16.419 | 12.628 | 15.001 | 16.686 | |
| 2 | 9.09 | 9.866 | 15.746 | 12.593 | 14.262 | 16.669 | |
| 3 | 9.064 | 9.681 | 15.628 | 12.319 | 14.203 | 15.736 | |
| 4 | 8.97 | 9.613 | 15.566 | 12.306 | 14.126 | 15.673 | |
| 5 | 8.925 | 9.581 | 15.207 | 12.213 | 13.871 | 15.399 | |
| Avg | 9.068 ±0.127 | 9.753 ±0.168 | 15.713 ±0.396 | 12.412 ±0.167 | 14.293 ±0.378 | 16.033 ±0.539 | |
| 1 | 24.395 | 27.238 | 34.863 | 22.729 | 27.526 | 30.05 | |
| 2 | 23.038 | 24.646 | 32.15 | 22.229 | 26.785 | 29.418 | |
| 3 | 22.629 | 24.57 | 31.965 | 22.209 | 26.781 | 29.129 | |
| 4 | 22.575 | 24.391 | 31.959 | 21.747 | 26.766 | 29.051 | |
| 5 | 22.311 | 24.304 | 31.69 | 21.669 | 26.743 | 27.763 | |
| Avg | 22.99 ±0.740 | 25.03 ±1.110 | 32.525 ±1.178 | 22.117 ±0.383 | 26.92 ±0.303 | 29.082 ±0.747 | |
Fig 8Performance of DQN, common ensemble, and our proposed model with two reward functions on S&P500.
Fig 10Performance of DQN, common ensemble, and our proposed model with two reward functions on Eurostoxx50.
Average profit and increasing rate with common and expert ensemble.
| Average profit and increasing rate, S&P500 | |||||||
|---|---|---|---|---|---|---|---|
| # of action | Reward function and Ensemble | ||||||
| Profit | PE | EPE | Sortino | SE | ESE | ||
| 3 | Avg | 3.516 ±0.131 | 3.951 ±0.112 | 4.247 ±0.113 | 3.153 ±0.063 | 3.801 ±0.177 | 4.465 ±0.028 |
| % | - | 12.4 | 20.8 | - | 20.5 | 41.6 | |
| 11 | Avg | 8.958 ±0.584 | 9.923 ±0.132 | 16.378 ±0.402 | 10.351 ±0.057 | 12.368 ±0.387 | 15.943 ±0.328 |
| % | - | 10.8 | 82.8 | - | 19.5 | 54.0 | |
| 21 | Avg | 20.71 ±0.424 | 24.158 ±0.363 | 27.037 ±0.673 | 20.223 ±0.613 | 22.381 ±0.817 | 26.438 ±0.282 |
| % | - | 16.6 | 30.5 | - | 10.7 | 30.7 | |
| 3 | Avg | 3.22 ±0.032 | 3.72 ±0.065 | 4.59 ±0.089 | 3.432 ±0.082 | 3.875 ±0.079 | 4.323 ±0.102 |
| % | - | 15.5 | 42.6 | - | 12.9 | 25.9 | |
| 11 | Avg | 10.952 ±0.167 | 13.902 ±0.110 | 16.566 ±0.808 | 11.319 ±0.251 | 14.212 ±0.222 | 17.172 ±0.357 |
| % | - | 26.9 | 51.3 | - | 25.6 | 51.7 | |
| 21 | Avg | 21.436 ±1.164 | 23.388 ±0.952 | 24.989 ±1.501 | 23.29 ±0.362 | 25.675 ±0.442 | 30.399 ±0.767 |
| % | - | 9.1 | 16.6 | - | 10.2 | 30.5 | |
| 3 | Avg | 3.379 ±0.093 | 3.69 ±0.070 | 4.268 ±0.049 | 3.314 ±0.052 | 3.629 ±0.081 | 4.089 ±0.042 |
| % | - | 9.2 | 26.3 | - | 9.5 | 23.4 | |
| 11 | Avg | 9.068 ±0.127 | 9.753 ±0.168 | 15.713 ±0.396 | 12.412 ±0.167 | 14.293 ±0.378 | 16.033 ±0.539 |
| % | - | 7.6 | 73.3 | - | 15.2 | 29.2 | |
| 21 | Avg | 22.99 ±0.740 | 25.03 ±1.110 | 32.525 ±1.178 | 22.117 ±0.383 | 26.92 ±0.303 | 29.082 ±0.747 |
| % | - | 8.9 | 41.5 | - | 21.7 | 31.5 | |
Increasing rate of profit by extended action on each index.
| Increasing rate of profit by extended action on S&P500 | |||||||
|---|---|---|---|---|---|---|---|
| # of action | Reward function and Ensemble | ||||||
| Profit | PE | EPE | Sortino | SE | ESE | ||
| 3 | Avg | 3.516 ±0.131 | 3.951 ±0.112 | 4.247 ±0.113 | 3.153 ±0.063 | 3.801 ±0.177 | 4.465 ±0.028 |
| 11 | Avg | 8.958 ±0.584 | 9.923 ±0.132 | 16.378 ±0.402 | 10.351 ±0.057 | 12.368 ±0.387 | 15.943 ±0.328 |
| % | 316.3 | 302.4 | 473.6 | 434.3 | 405.9 | 431.3 | |
| 21 | Avg | 20.71 ±0.424 | 24.158 ±0.363 | 27.037 ±0.673 | 20.223 ±0.613 | 22.381 ±0.817 | 26.438 ±0.282 |
| % | 783.4 | 784.8 | 801.9 | 892.8 | 763.4 | 734.2 | |
| 3 | Avg | 3.22 ±0.032 | 3.72 ±0.065 | 4.59 ±0.089 | 3.432 ±0.082 | 3.875 ±0.079 | 4.323 ±0.102 |
| 11 | Avg | 10.952 ±0.167 | 13.902 ±0.110 | 16.566 ±0.808 | 11.319 ±0.251 | 14.212 ±0.222 | 17.172 ±0.357 |
| % | 448.3 | 474.4 | 433.6 | 424.2 | 459.5 | 486.7 | |
| 21 | Avg | 21.436 ±1.164 | 23.388 ±0.952 | 24.989 ±1.501 | 23.29 ±0.362 | 25.675 ±0.442 | 30.399 ±0.767 |
| % | 920.5 | 823.2 | 668.1 | 916.3 | 858.2 | 884.8 | |
| 3 | Avg | 3.379 ±0.093 | 3.69 ±0.070 | 4.268 ±0.049 | 3.314 ±0.052 | 3.629 ±0.081 | 4.089 ±0.042 |
| 11 | Avg | 9.068 ±0.127 | 9.753 ±0.168 | 15.713 ±0.396 | 12.412 ±0.167 | 14.293 ±0.378 | 16.033 ±0.539 |
| % | 339.1 | 325.4 | 450.2 | 493.1 | 505.5 | 486.7 | |
| 21 | Avg | 22.99 ±0.740 | 25.03 ±1.110 | 32.525 ±1.178 | 22.117 ±0.383 | 26.92 ±0.303 | 29.082 ±0.747 |
| % | 924.2 | 893.4 | 964.7 | 912.5 | 985.8 | 909.2 | |
Fig 11Average performance of extended action and multi shares on three stock indices.
Fig 12The detail actions of the best expert ensemble models of each action on S&P500.
Fig 14The detail actions of the best expert ensemble models of each action on Eurostoxx50.
The time and space complexity comparisons for our proposed algorithm and previous methods.
| Method | Time complexity | Space complexity |
|---|---|---|
| DQN | ||
| DRQN | ||
| Common Ensemble | ||
| Proposed method |
Fig 15Results of trade-off between different length of training period and performance with S&P500.
Fig 16Trade-off between training time costs and the performance.
Fig 17Comparison of our proposed model’s performance with other algorithms’ performance on each index in 3-action space.
2-sample T-test of our proposed model on DQN, DRQN, common ensemble.
| # of action | Index | S&P500 | Hang Seng Index | Eurostoxx50 | |||
|---|---|---|---|---|---|---|---|
| 2-samples | t | p-value | t | p-value | t | p-value | |
| 3 | ESE / SE [ | 11.42 | 7.8E–30 | 8.46 | 3.6E–17 | 8.68 | 5.3E–18 |
| ESE / DQN [ | 40.20 | 2.9E–301 | 18.44 | 2.6E–73 | 21.01 | 7.2E–94 | |
| ESE / DRQN [ | 39.54 | 9.7E–296 | 20.57 | 3.2E–90 | 22.05 | 5.9E–103 | |
| 11 | ESE / SE [ | 27.48 | 2.1E–154 | 17.25 | 9.4E–65 | 14.42 | 3.3E–46 |
| ESE / DQN [ | 41.00 | 1.2E–302 | 38.94 | 1.8E–282 | 25.53 | 7.1E–135 | |
| 21 | ESE / SE [ | 10.92 | 2.0E–27 | 11.70 | 3.3E–31 | 5.65 | 1.7E–08 |
| ESE / DQN [ | 30.51 | 2.7E–185 | 16.44 | 4.9E–59 | 12.42 | 7.8E–35 | |
ESE: expert Sortino ensemble, SE: Sortino ensemble (common ensemble)
Cumulative profits of top five models on HSI.
| # of action | # of model | Reward function and Ensemble | |||||
|---|---|---|---|---|---|---|---|
| Profit | PE | EPE | Sortino | SE | ESE | ||
| 1 | 3.269 | 3.833 | 4.68 | 3.555 | 4.02 | 4.516 | |
| 2 | 3.232 | 3.74 | 4.662 | 3.504 | 3.888 | 4.319 | |
| 3 | 3.216 | 3.699 | 4.644 | 3.383 | 3.855 | 4.302 | |
| 4 | 3.215 | 3.684 | 4.501 | 3.376 | 3.813 | 4.246 | |
| 5 | 3.17 | 3.643 | 4.465 | 3.345 | 3.8 | 4.231 | |
| Avg | 3.22 ±0.032 | 3.72 ±0.065 | 4.59 ±0.089 | 3.432 ±0.082 | 3.875 ±0.079 | 4.323 ±0.102 | |
| 1 | 11.196 | 14.05 | 18.102 | 11.686 | 14.462 | 17.875 | |
| 2 | 11.108 | 13.985 | 16.62 | 11.494 | 14.45 | 17.106 | |
| 3 | 10.847 | 13.92 | 16.22 | 11.32 | 14.158 | 16.997 | |
| 4 | 10.833 | 13.783 | 16.028 | 11.08 | 14.117 | 16.963 | |
| 5 | 10.778 | 13.771 | 15.862 | 11.014 | 13.872 | 16.921 | |
| Avg | 10.952 ±0.167 | 13.902 ±0.110 | 16.566 ±0.808 | 11.319 ±0.251 | 14.212 ±0.222 | 17.172 ±0.357 | |
| 1 | 22.947 | 24.277 | 26.835 | 23.892 | 26.355 | 31.558 | |
| 2 | 22.764 | 24.265 | 26.342 | 23.427 | 25.929 | 31 | |
| 3 | 20.655 | 23.934 | 25.141 | 23.226 | 25.582 | 30.158 | |
| 4 | 20.431 | 22.249 | 23.752 | 23.099 | 25.465 | 29.759 | |
| 5 | 20.382 | 22.215 | 22.873 | 22.805 | 25.046 | 29.521 | |
| Avg | 21.436 ±1.164 | 23.388 ±0.952 | 24.989 ±1.501 | 23.29 ±0.362 | 25.675 ±0.442 | 30.399 ±0.767 | |