| Literature DB >> 33106709 |
Qun Zhang1,2, Lijun Yang3, Feng Zhou4.
Abstract
The recent availability of enormous amounts of both data and computing power has created new opportunities for predictive modeling. This paper compiles an analytical framework based on multiple sources of data including daily trading data, online news, derivative technical indicators, and time-frequency features decomposed from closing prices. We also provide a real-life demonstration of how to combine and capitalize on all available information to predict the stock price of BGI Genomics. Moreover, we apply a long short-term memory (LSTM) network equipped with an attention mechanism to identify long-term temporal dependencies and adaptively highlight key features. We further examine the learning capabilities of the network for specific tasks, including forecasting the next day's price direction and closing price and developing trading strategies, comparing its statistical accuracy and trading performance with those of methods based on logistic regression, support vector machine, gradient boosting decision trees, and the original LSTM model. The experimental results for BGI Genomics demonstrate that the attention enhanced LSTM model remarkably improves prediction performance through multi-source heterogeneous information fusion, highlighting the significance of online news and time-frequency features, as well as exemplifying and validating our proposed framework.Entities:
Keywords: Attention mechanism; Heterogeneous information fusion; Long short-term memory network; Machine learning; Stock price prediction
Year: 2020 PMID: 33106709 PMCID: PMC7577284 DOI: 10.1016/j.ins.2020.10.023
Source DB: PubMed Journal: Inf Sci (N Y) ISSN: 0020-0255 Impact factor: 6.795
Fig. 1The results decomposed from the closing price of BGI Genomics by the ITD method from July 14, 2017 to July 21, 2020. Panels in the first column: The top panel is the original signal (i.e., closing price time series), the remaining panels are the PRCs () and the residual (). Panels in the second column: The top panel is the original signal (i.e., closing price time series), and the remaining five panels are the corresponding IAs of the PRCs (). Panels in the last column: The top panel is the original signal (i.e., closing price time series), and the remaining five panels are the corresponding IFs of the PRCs ().
Fig. 2Graphical illustration of the structure of LSTM.
Fig. 3Graphical illustration of the architecture of attention enhanced LSTM.
Fig. 4Graphical illustration of the architecture of the proposed framework.
Summary statistics of features derived from trading data, news data, and time–frequency data.
| Source | Feature | Number | Mean | Std. | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|---|---|
| Trading data | 674 | 100.0351 | 49.3612 | 47.0000 | 61.2025 | 74.5500 | 144.0000 | 250.0000 | |
| 674 | 100.0259 | 49.3478 | 47.0000 | 61.2025 | 74.5500 | 144.0000 | 250.0000 | ||
| 674 | 102.5368 | 51.1283 | 49.4400 | 62.5150 | 76.2950 | 148.0825 | 261.9900 | ||
| 674 | 97.9735 | 48.0490 | 46.5200 | 60.4925 | 73.0150 | 139.0100 | 238.0000 | ||
| 674 | 100.1665 | 49.5963 | 48.1100 | 61.3075 | 74.5900 | 144.2725 | 257.0200 | ||
| 674 | 100.1845 | 49.6217 | 48.1100 | 61.3075 | 74.5900 | 144.2725 | 257.0200 | ||
| 674 | 0.0480 | 3.3926 | −10.0039 | −1.7664 | 0.1190 | 1.6739 | 10.0065 | ||
| 674 | |||||||||
| 674 | 100.2719 | 49.5554 | 48.3543 | 61.3935 | 74.5721 | 143.6681 | 250.2272 | ||
| 674 | |||||||||
| 674 | |||||||||
| News data | 674 | 0.6466 | 0.1838 | 0.0598 | 0.5000 | 0.6320 | 0.8056 | 0.9569 | |
| 674 | 0.2208 | 0.1420 | 0.0000 | 0.1345 | 0.2638 | 0.3349 | 0.4745 | ||
| 674 | 0.5002 | 0.1802 | 0.0626 | 0.3828 | 0.5000 | 0.6167 | 0.9430 | ||
| 674 | 0.2089 | 0.1287 | 0.0000 | 0.1239 | 0.2519 | 0.3071 | 0.4129 | ||
| 674 | 0.9106 | 2.1894 | 0 | 0 | 0 | 0.7688 | 18.1000 | ||
| 674 | 2.3299 | 3.8932 | 0 | 0.2591 | 0.9933 | 2.7375 | 32.2280 | ||
| 674 | 3.9086 | 5.1597 | 0 | 0.5178 | 1.9992 | 4.8761 | 36.3820 | ||
| 674 | 8.6015 | 11.2630 | 0.0004 | 0.8611 | 3.9426 | 10.2983 | 48.1040 | ||
| 674 | 14.8793 | 20.6166 | 0.0002 | 1.2688 | 4.8367 | 17.2045 | 68.9720 | ||
| 674 | 1.0837 | 1.7107 | 0 | 0 | 0 | 1.5708 | 4.7124 | ||
| 674 | 2.7827 | 1.7889 | 0 | 1.5708 | 1.5708 | 4.7124 | 4.7124 | ||
| Time–frequency | 674 | 3.0321 | 1.5856 | 0 | 1.5708 | 1.5708 | 4.7124 | 4.7124 | |
| data | 674 | 3.0484 | 1.5692 | 1.5708 | 1.5708 | 1.5708 | 4.7124 | 4.7124 | |
| 674 | 3.0717 | 1.5704 | 1.5708 | 1.5708 | 1.5708 | 4.7124 | 4.7124 | ||
| 674 | 0.2182 | 2.3614 | −18.1000 | 0 | 0 | 0 | 16.9500 | ||
| 674 | 0.4006 | 4.5202 | −12.5750 | −1.0150 | 0 | 0.9438 | 32.2280 | ||
| 674 | 0.8625 | 6.4169 | −33.1050 | −1.4625 | 0.0531 | 2.4146 | 36.3820 | ||
| 674 | −0.2290 | 14.1739 | −48.1040 | −4.3403 | 0.1215 | 3.4171 | 41.5710 | ||
| 674 | −9.4688 | 23.6003 | −68.9720 | −15.6863 | 0.0389 | 3.4262 | 29.2630 | ||
| 674 | 108.3830 | 52.0642 | 57.2340 | 63.5660 | 73.2605 | 164.9075 | 212.4000 |
Summary statistics of Alpha 101 technical indicators.
| Feature | Count | Mean | Std | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|---|
| alpha101_001 | 674 | 0.4903 | 0.2801 | 0.0761 | 0.2332 | 0.5726 | 0.8317 | 0.8317 |
| alpha101_002 | 674 | −0.1032 | 0.4681 | −0.9301 | −0.5027 | −0.1298 | 0.2407 | 0.9520 |
| alpha101_003 | 674 | −0.2031 | 0.4235 | −0.9159 | −0.5350 | −0.2568 | 0.0866 | 0.9113 |
| alpha101_004 | 674 | −5.0000 | 2.9679 | −9.0000 | −8.0000 | −5.0000 | −2.0000 | −1.0000 |
| alpha101_005 | 674 | −0.2383 | 0.2331 | −0.9986 | −0.3308 | −0.1449 | −0.0725 | −0.0014 |
| alpha101_006 | 674 | −0.1928 | 0.4217 | −0.9581 | −0.5280 | −0.2456 | 0.1047 | 0.8588 |
| alpha101_008 | 674 | −0.5055 | 0.2822 | −1.0000 | −0.7475 | −0.5077 | −0.2665 | −0.0014 |
| alpha101_009 | 674 | 0.0984 | 4.2992 | −22.0600 | −1.2250 | −0.0650 | 1.5175 | 25.7000 |
| alpha101_010 | 674 | 0.1703 | 4.2970 | −22.0600 | −1.2250 | 0 | 1.5175 | 25.7000 |
| alpha101_011 | 674 | 0.4842 | 0.3282 | 0.0021 | 0.2112 | 0.4405 | 0.6847 | 1.5865 |
| alpha101_012 | 674 | −0.6060 | 4.2574 | −25.7000 | −1.9875 | −0.3600 | 0.7800 | 22.7800 |
| alpha101_013 | 674 | −0.5019 | 0.2918 | −1.0000 | −0.7569 | −0.5111 | −0.2431 | −0.0014 |
| alpha101_014 | 674 | −0.0984 | 0.2518 | −0.8778 | −0.2466 | −0.0655 | 0.0270 | 0.8040 |
| alpha101_015 | 674 | −1.5024 | 0.5887 | −2.9433 | −1.9125 | −1.5041 | −1.0975 | −0.1438 |
| alpha101_016 | 674 | −0.5045 | 0.2892 | −1.0000 | −0.7555 | −0.5069 | −0.2611 | −0.0014 |
| alpha101_018 | 674 | −0.4929 | 0.2834 | −1.0000 | −0.7315 | −0.4965 | −0.2462 | −0.0014 |
| alpha101_020 | 674 | −0.1681 | 0.2042 | −0.9835 | −0.2261 | −0.0929 | −0.0256 | 0.0000 |
| alpha101_021 | 674 | 0.1246 | 0.9929 | −1.0000 | −1.0000 | 1.0000 | 1.0000 | 1.0000 |
| alpha101_022 | 674 | −0.0071 | 0.3986 | −1.5397 | −0.1496 | 0.0009 | 0.1414 | 1.3134 |
| alpha101_023 | 674 | −1.0250 | 4.7441 | −39.4100 | −0.7625 | 0 | 0 | 36.9500 |
| alpha101_024 | 674 | −5.2481 | 10.2483 | −38.9500 | −10.4775 | −4.3400 | 0 | 66.2100 |
| alpha101_025 | 674 | 0.5018 | 0.2833 | 0.0028 | 0.2610 | 0.5156 | 0.7447 | 1.0000 |
| alpha101_027 | 674 | 1.0000 | 0 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| alpha101_029 | 674 | 3.3382 | 1.4471 | 1.0056 | 2.1426 | 3.2976 | 4.4937 | 5.9750 |
| alpha101_030 | 674 | 0.1286 | 0.0794 | 0.0104 | 0.0713 | 0.1217 | 0.1768 | 0.5018 |
| alpha101_033 | 674 | 0.5066 | 0.2873 | 0.0041 | 0.2555 | 0.5021 | 0.7569 | 1.0000 |
| alpha101_034 | 674 | 0.5103 | 0.2873 | 0.0014 | 0.2628 | 0.5186 | 0.7579 | 1.0000 |
| alpha101_038 | 674 | −0.2380 | 0.2174 | −0.9023 | −0.3456 | −0.1654 | −0.0689 | −0.0009 |
| alpha101_040 | 674 | −0.1874 | 0.2621 | −0.8739 | −0.3468 | −0.1395 | −0.0280 | 0.5930 |
| alpha101_041 | 674 | −0.0497 | 0.6592 | −3.4969 | −0.2506 | −0.0219 | 0.1704 | 4.0343 |
| alpha101_042 | 674 | 2.1348 | 3.8876 | 0.0014 | 0.5722 | 1.1578 | 1.9646 | 43.0000 |
| alpha101_044 | 674 | −0.4280 | 0.4949 | −0.9984 | −0.8268 | −0.6227 | −0.1033 | 0.9412 |
| alpha101_046 | 674 | 0.1301 | 1.3917 | −6.8900 | −1.0000 | 1.0000 | 1.0000 | 19.6000 |
| alpha101_047 | 674 | 0.6566 | 1.1763 | −0.9893 | −0.0473 | 0.3844 | 1.0322 | 11.4341 |
| alpha101_049 | 674 | 0.0788 | 3.1977 | −22.0600 | −0.4800 | 1.0000 | 1.0000 | 22.7800 |
| alpha101_050 | 674 | −0.7495 | 0.2111 | −0.9986 | −0.9376 | −0.7982 | −0.6030 | −0.0236 |
| alpha101_051 | 674 | 0.0943 | 3.1974 | −22.0600 | −0.4425 | 1.0000 | 1.0000 | 22.7800 |
| alpha101_053 | 674 | 275.9593 | 21514.6395 | −238698.0000 | −2.4702 | 0.0049 | 2.5857 | 238700.3418 |
| alpha101_054 | 674 | −0.4262 | 0.2606 | −1.0000 | −0.6431 | −0.4401 | −0.1988 | 0 |
| alpha101_055 | 674 | −0.2572 | 0.4937 | −0.9841 | −0.6738 | −0.3605 | 0.1138 | 0.9852 |
| alpha101_056 | 674 | −0.2405 | 0.2196 | −0.9285 | −0.3718 | −0.1661 | −0.0683 | −0.0005 |
| alpha101_057 | 674 | 0.4911 | 6.7141 | −62.9900 | −1.0648 | 0.3137 | 1.8087 | 40.5214 |
| alpha101_060 | 674 | −0.0014 | 0.0017 | −0.0052 | −0.0026 | −0.0016 | −0.0001 | 0.0024 |
| alpha101_062 | 674 | −0.4852 | 0.5002 | −1.0000 | −1.0000 | 0 | 0 | 0 |
| alpha101_064 | 674 | −0.4407 | 0.4968 | −1.0000 | −1.0000 | 0 | 0 | 0 |
| alpha101_065 | 674 | −0.4392 | 0.4967 | −1.0000 | −1.0000 | 0 | 0 | 0 |
| alpha101_068 | 674 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| alpha101_072 | 674 | 4.4331 | 25.2251 | 0.0057 | 0.4547 | 0.9736 | 1.8481 | 493.0000 |
| alpha101_073 | 674 | −9.1320 | 5.0477 | −17.0000 | −13.0000 | −9.0000 | −5.0000 | −1.0000 |
| alpha101_074 | 674 | −0.4970 | 0.5004 | −1.0000 | −1.0000 | 0 | 0 | 0 |
| alpha101_077 | 674 | 0.3226 | 0.2273 | 0.0014 | 0.1341 | 0.2848 | 0.4769 | 0.9862 |
| alpha101_081 | 674 | −0.5208 | 0.4999 | −1.0000 | −1.0000 | −1.0000 | 0 | 0 |
| alpha101_083 | 672 | 0.8673 | 11.0008 | −70.2675 | −1.6709 | 0.2267 | 3.5335 | 71.8665 |
| alpha101_086 | 674 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| alpha101_088 | 674 | 0.5076 | 0.2868 | 0.0041 | 0.2583 | 0.5179 | 0.7555 | 1.0000 |
| alpha101_092 | 674 | 4.1365 | 2.5270 | 1.0000 | 1.0000 | 4.0000 | 7.0000 | 7.0000 |
| alpha101_096 | 674 | −7.3242 | 1.5951 | −13.0000 | −8.0000 | −7.0000 | −7.0000 | −1.0000 |
| alpha101_098 | 674 | −0.0070 | 0.4435 | −0.9738 | −0.3419 | 0.0186 | 0.3378 | 0.9159 |
| alpha101_099 | 674 | −0.5000 | 0.5004 | −1.0000 | −1.0000 | −0.5000 | 0 | 0 |
| alpha101_101 | 674 | −0.0154 | 0.5445 | −0.9999 | −0.4810 | 0 | 0.4635 | 1.0000 |
Summary statistics of Alpha 191 technical indicators.
| Feature | Count | Mean | Std | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|---|
| alpha191_001 | 674 | −0.1461 | 0.5081 | −0.9890 | −0.5766 | −0.1890 | 0.2514 | 0.9464 |
| alpha191_003 | 674 | 0.6696 | 12.2021 | −68.5500 | −4.6275 | 0.0050 | 4.7550 | 60.5400 |
| alpha191_004 | 674 | −0.1543 | 0.9888 | −1.0000 | −1.0000 | −1.0000 | 1.0000 | 1.0000 |
| alpha191_006 | 674 | −0.4907 | 0.2500 | −0.7510 | −0.7510 | −0.2510 | −0.2510 | −0.2510 |
| alpha191_007 | 674 | 0.4842 | 0.3282 | 0.0021 | 0.2112 | 0.4405 | 0.6847 | 1.5865 |
| alpha191_008 | 674 | 0.5200 | 0.2798 | 0.0014 | 0.2833 | 0.5250 | 0.7583 | 1.0000 |
| alpha191_009 | 674 | 100.2593 | 49.3075 | 51.5595 | 61.2718 | 74.6724 | 146.0503 | 234.5470 |
| alpha191_012 | 674 | −0.2462 | 0.2343 | −0.9875 | −0.3390 | −0.1685 | −0.0681 | −0.0001 |
| alpha191_013 | 674 | −0.0497 | 0.6592 | −3.4969 | −0.2506 | −0.0219 | 0.1704 | 4.0343 |
| alpha191_014 | 674 | 0.0001 | 10.0993 | −62.7200 | −3.9275 | −0.1250 | 3.2725 | 46.5200 |
| alpha191_015 | 674 | −0.0008 | 0.0161 | −0.1000 | −0.0060 | 0 | 0.0055 | 0.1000 |
| alpha191_016 | 674 | −0.7495 | 0.2111 | −0.9986 | −0.9376 | −0.7982 | −0.6030 | −0.0236 |
| alpha191_018 | 674 | 1.0031 | 0.0797 | 0.7560 | 0.9574 | 0.9981 | 1.0416 | 1.3036 |
| alpha191_019 | 674 | −0.0001 | 0.0735 | −0.2440 | −0.0426 | −0.0019 | 0.0400 | 0.2329 |
| alpha191_020 | 674 | 0.3897 | 8.7536 | −25.8970 | −4.9171 | −0.5164 | 4.4594 | 37.8756 |
| alpha191_021 | 674 | −0.0690 | 2.4703 | −11.1087 | −1.0000 | −0.2720 | 1.1663 | 9.1747 |
| alpha191_022 | 674 | −98.9511 | 48.1224 | −211.2363 | −144 | −74.6083 | −60.4823 | −54.9735 |
| alpha191_023 | 674 | 51.8650 | 7.5712 | 30.1150 | 46.8357 | 52.2175 | 55.7542 | 79.5787 |
| alpha191_024 | 674 | 0.0544 | 6.2956 | −22.5832 | −2.7434 | −0.3185 | 2.2654 | 28.7783 |
| alpha191_027 | 674 | 3.9642 | 62.3864 | −153.0914 | −33.9152 | −6.9047 | 42.5038 | 192.7562 |
| alpha191_028 | 674 | 46.0056 | 35.7541 | −17.6066 | 12.8207 | 41.8047 | 77.5327 | 119.6743 |
| alpha191_029 | 674 | |||||||
| alpha191_031 | 674 | 0.1035 | 6.7481 | −18.8284 | −3.9601 | −0.5696 | 3.6819 | 23.4269 |
| alpha191_032 | 674 | −2.0000 | 0.5887 | −2.9433 | −2.0000 | −2.0000 | −1.0000 | −0.1438 |
| alpha191_034 | 674 | 1.0034 | 0.0670 | 0.8102 | 0.9645 | 1.0057 | 1.0412 | 1.2320 |
| alpha191_036 | 674 | 0.4968 | 0.2890 | 0.0014 | 0.2465 | 0.4951 | 0.7462 | 1.0000 |
| alpha191_037 | 674 | −216.8244 | 5068.2919 | −23652.8933 | −1701 | 35.9658 | 1789.3365 | 27750.8772 |
| alpha191_038 | 674 | −1.0000 | 4.7441 | −39.4100 | −0.7625 | 0 | 0 | 36.9500 |
| alpha191_041 | 674 | −0.4808 | 0.1678 | −1.0000 | −0.4162 | −0.4162 | −0.4162 | −0.4162 |
| alpha191_042 | 674 | −0.1874 | 0.2621 | −0.8739 | −0.3468 | −0.1395 | −0.0280 | 0.5930 |
| alpha191_043 | 674 | |||||||
| alpha191_047 | 674 | 21.1994 | 15.0507 | −18.2237 | 9.7817 | 22.4605 | 33.2687 | 52.3685 |
| alpha191_048 | 674 | −0.1243 | 0.0834 | −0.5074 | −0.1697 | −0.1145 | −0.0588 | −0.0061 |
| alpha191_049 | 674 | 0.4986 | 0.2187 | 0.0536 | 0.3254 | 0.5138 | 0.6783 | 0.9289 |
| alpha191_050 | 674 | 0.0028 | 0.4374 | −0.8578 | −0.3566 | −0.0277 | 0.3492 | 0.8929 |
| alpha191_051 | 674 | 0.5014 | 0.2187 | 0.0711 | 0.3217 | 0.4862 | 0.6746 | 0.9464 |
| alpha191_053 | 674 | 52.3739 | 13.4198 | 16.6667 | 41.6667 | 50.0000 | 58.3333 | 91.6667 |
| alpha191_054 | 674 | −0.4849 | 0.2803 | −1.0000 | −0.7224 | −0.4816 | −0.2436 | −0.0057 |
| alpha191_055 | 674 | −68.8866 | 455.1920 | −1718.0416 | −263.6958 | −69.9731 | 118.9330 | 2016.6866 |
| alpha191_057 | 674 | 46.1301 | 24.1279 | 5.5400 | 24.7380 | 42.9472 | 68.7471 | 92.6738 |
| alpha191_058 | 674 | 52.4332 | 10.4298 | 25.0000 | 45.0000 | 55.0000 | 60.0000 | 85.0000 |
| alpha191_059 | 674 | 2.7471 | 22.2861 | −59.3400 | −9.5325 | 1.2700 | 10.1375 | 111.3100 |
| alpha191_062 | 674 | −0.4280 | 0.4949 | −0.9984 | −0.8268 | −0.6227 | −0.1033 | 0.9412 |
| alpha191_063 | 674 | 49.2838 | 19.2573 | 6.2514 | 34.9882 | 47.4137 | 63.1272 | 92.2590 |
| alpha191_065 | 674 | 1.0017 | 0.0437 | 0.8517 | 0.9797 | 1.0011 | 1.0231 | 1.2011 |
| alpha191_066 | 674 | 0.0219 | 4.3617 | −16.7405 | −2.2564 | −0.1134 | 2.0674 | 17.4130 |
| alpha191_067 | 674 | 49.9596 | 11.3874 | 25.3152 | 41.5746 | 48.5643 | 58.0425 | 78.7689 |
| alpha191_068 | 674 | 0 | ||||||
| alpha191_069 | 674 | −0.3201 | 0.2244 | −0.8354 | −0.4851 | −0.3351 | −0.1517 | 0.2583 |
| alpha191_070 | 674 | |||||||
| alpha191_072 | 674 | 53.2241 | 11.3734 | 26.7890 | 44.6575 | 52.2874 | 62.0824 | 79.2427 |
| alpha191_076 | 674 | 0.4841 | 0.1535 | 0.2715 | 0.4182 | 0.4610 | 0.5140 | 1.6130 |
| alpha191_079 | 674 | 49.6429 | 14.5054 | 14.6352 | 39.0670 | 48.7588 | 59.1861 | 85.2442 |
| alpha191_080 | 674 | 19.5966 | 99.4696 | −83.3537 | −30.2964 | −3.2805 | 31.7205 | 1200.5022 |
| alpha191_081 | 674 | |||||||
| alpha191_082 | 674 | 53.0468 | 10.3792 | 24.2015 | 45.4165 | 52.3005 | 60.6852 | 76.0180 |
| alpha191_083 | 674 | −0.5045 | 0.2892 | −1.0000 | −0.7555 | −0.5069 | −0.2611 | −0.0014 |
| alpha191_084 | 674 | |||||||
| alpha191_086 | 674 | 0.1301 | 1.3917 | −6.8900 | −1.0000 | 1.0000 | 1.0000 | 19.6000 |
| alpha191_088 | 674 | 1.6152 | 17.4095 | −34.5442 | −9.4947 | −0.6739 | 11.9001 | 58.6935 |
| alpha191_089 | 674 | −0.0944 | 2.8494 | −10.7743 | −2.0000 | 0.0514 | 1.4320 | 10.8588 |
| alpha191_090 | 674 | −0.4981 | 0.2895 | −0.9986 | −0.7458 | −0.4986 | −0.2486 | −0.0014 |
| alpha191_093 | 674 | 18.9129 | 13.8422 | 2.7100 | 7.8750 | 15.2600 | 26.4000 | 67.8400 |
| alpha191_095 | 674 | |||||||
| alpha191_096 | 674 | 46.3085 | 27.2698 | 5.1796 | 23.2278 | 39.2274 | 67.0904 | 112.4911 |
| alpha191_097 | 674 | |||||||
| alpha191_098 | 674 | −5.2481 | 10.2483 | −38.9500 | −10.0000 | −4.3400 | 0 | 66.2100 |
| alpha191_099 | 674 | −0.5019 | 0.2918 | −1.0000 | −0.7569 | −0.5111 | −0.2431 | −0.0014 |
| alpha191_100 | 674 | |||||||
| alpha191_101 | 674 | −0.4926 | 0.5003 | −1.0000 | −1.0000 | 0 | 0 | 0 |
| alpha191_102 | 674 | 49.1942 | 10.5188 | 31.7694 | 41.5690 | 46.9827 | 54.4488 | 91.3050 |
| alpha191_103 | 674 | 44.2285 | 35.2438 | 0 | 10.0000 | 40.0000 | 80.0000 | 95.0000 |
| alpha191_104 | 674 | −0.0071 | 0.3986 | −1.5397 | −0.1496 | 0.0009 | 0.1414 | 1.3134 |
| alpha191_105 | 674 | −0.2031 | 0.4235 | −0.9159 | −0.5350 | −0.2568 | 0.0866 | 0.9113 |
| alpha191_106 | 674 | 0.2569 | 19.8327 | −62.4900 | −9.0125 | −0.4450 | 9.0675 | 95.0600 |
| alpha191_107 | 674 | −0.1681 | 0.2042 | −0.9835 | −0.2261 | −0.0929 | −0.0256 | 0.0000 |
| alpha191_109 | 674 | 0.9971 | 0.1312 | 0.6941 | 0.9052 | 0.9790 | 1.0682 | 1.6409 |
| alpha191_110 | 674 | 123.0288 | 63.2200 | 34.9001 | 79.7922 | 104.5444 | 148.5177 | 378.9331 |
| alpha191_111 | 674 | |||||||
| alpha191_112 | 674 | −2.0000 | 38.9669 | −80.7871 | −30.7496 | −5.8300 | 31.4155 | 81.8079 |
| alpha191_114 | 672 | 0.8673 | 11.0008 | −70.2675 | −2.0000 | 0.2267 | 3.5335 | 71.8665 |
| alpha191_116 | 674 | −0.1299 | 1.2192 | −4.2179 | −0.9192 | −0.2821 | 0.5463 | 3.6647 |
| alpha191_118 | 674 | 126.1728 | 49.3493 | 40.8141 | 88.3101 | 119.7050 | 155.0957 | 325.9664 |
| alpha191_120 | 674 | 2.1348 | 3.8876 | 0.0014 | 0.5722 | 1.1578 | 1.9646 | 43.0000 |
| alpha191_122 | 674 | 0.0000 | 0.0016 | −0.0036 | −0.0009 | 0.0000 | 0.0010 | 0.0053 |
| alpha191_123 | 674 | −0.5000 | 0.5004 | −1.0000 | −1.0000 | −0.5000 | 0 | 0 |
| alpha191_126 | 674 | 100.2256 | 49.5704 | 48.2833 | 61.3275 | 74.5467 | 143.1742 | 250.3000 |
| alpha191_128 | 674 | −20.3023 | 113.4759 | −748.4376 | −58.6368 | 15.9909 | 48.1163 | 94.4449 |
| alpha191_129 | 674 | −16.0000 | 13.7424 | −81.3600 | −20.0000 | −11.0000 | −6.2450 | −1.0000 |
| alpha191_132 | 674 | |||||||
| alpha191_133 | 674 | −4.4955 | 62.7920 | −95.0000 | −65.0000 | −20.0000 | 60.0000 | 95.0000 |
| alpha191_134 | 674 | |||||||
| alpha191_135 | 674 | 1.0579 | 0.2446 | 0.7656 | 0.9429 | 1.0030 | 1.1051 | 2.6757 |
| alpha191_136 | 674 | −0.0984 | 0.2518 | −0.8778 | −0.2466 | −0.0655 | 0.0270 | 0.8040 |
| alpha191_137 | 674 | 382.1074 | 1046.7995 | 1.2376 | 23.5833 | 78.6708 | 316.1064 | 12999.0600 |
| alpha191_139 | 674 | −0.1928 | 0.4217 | −0.9581 | −0.5280 | −0.2456 | 0.1047 | 0.8588 |
| alpha191_144 | 674 | 0 | 0 | 0.0000 | 0 | 0 | 0 | 0 |
| alpha191_148 | 674 | −0.4377 | 0.4965 | −1.0000 | −1.0000 | 0 | 0 | 0 |
| alpha191_150 | 674 | |||||||
| alpha191_151 | 674 | 1.2090 | 16.0095 | −27.5168 | −6.4558 | −0.7298 | 6.7053 | 64.2788 |
| alpha191_155 | 674 | |||||||
| alpha191_156 | 674 | −0.7181 | 0.1761 | −1.0000 | −0.8610 | −0.7318 | −0.6022 | −0.1650 |
| alpha191_157 | 674 | 3.5016 | 1.4254 | 1.0049 | 2.2632 | 3.4910 | 4.7184 | 5.9930 |
| alpha191_158 | 674 | 2.4363 | 7.9814 | −21.6317 | −1.0000 | 0.9246 | 4.1404 | 52.4837 |
| alpha191_160 | 674 | 3.0890 | 2.1346 | 0.6624 | 1.5520 | 2.2150 | 4.2154 | 9.5648 |
| alpha191_161 | 674 | 4.8465 | 3.5734 | 0.9925 | 2.1642 | 3.5758 | 6.7469 | 19.3108 |
| alpha191_162 | 674 | −1.0000 | 0 | −1.0000 | −1.0000 | −1.0000 | −1.0000 | −1.0000 |
| alpha191_163 | 674 | 0.5018 | 0.2833 | 0.0028 | 0.2610 | 0.5156 | 0.7447 | 1.0000 |
| alpha191_164 | 674 | 0.0030 | 0.0119 | 0 | 0.0000 | 0.0000 | 0.0008 | 0.1509 |
| alpha191_167 | 674 | 15.7092 | 15.0387 | 1.3500 | 5.9025 | 9.9400 | 19.2700 | 97.5100 |
| alpha191_168 | 674 | −1.0000 | 0.5404 | −7.2381 | −1.0000 | −0.8881 | −0.6906 | −0.3760 |
| alpha191_170 | 674 | −0.2523 | 0.3540 | −0.9930 | −0.4873 | −0.2550 | −0.0361 | 2.2602 |
| alpha191_172 | 674 | 38.2084 | 21.2174 | 5.0479 | 19.8982 | 34.0559 | 54.7267 | 88.8475 |
| alpha191_173 | 674 | 104.8184 | 51.7254 | 47.8337 | 65.5069 | 78.1889 | 144.6624 | 271.6095 |
| alpha191_174 | 674 | 3.4200 | 2.5809 | 0.7963 | 1.4970 | 2.4375 | 4.3806 | 11.6159 |
| alpha191_175 | 674 | 4.8385 | 3.7097 | 0.9050 | 2.1317 | 3.4008 | 6.7717 | 23.2633 |
| alpha191_176 | 674 | 0.2572 | 0.4937 | −0.9852 | −0.1138 | 0.3605 | 0.6738 | 0.9841 |
| alpha191_177 | 674 | 39.7329 | 34.7962 | 0 | 5.0000 | 30.0000 | 75.0000 | 95.0000 |
| alpha191_178 | 674 | |||||||
| alpha191_180 | 672 | −40.7500 | 60.0000 | |||||
| alpha191_182 | 674 | 0.3447 | 0.1257 | 0.0500 | 0.2500 | 0.3500 | 0.4000 | 0.7000 |
| alpha191_185 | 674 | 0.4974 | 0.2767 | 0.0028 | 0.2610 | 0.4986 | 0.7348 | 0.9821 |
| alpha191_187 | 674 | 51.3542 | 43.3030 | 11.1100 | 21.5250 | 34.2750 | 60.9200 | 231.9500 |
| alpha191_188 | 674 | −2.0000 | 36.4273 | −100.0000 | −27.7794 | −7.3642 | 17.2499 | 204.3086 |
| alpha191_189 | 674 | 3.4204 | 3.3909 | 0.3789 | 1.2220 | 2.2408 | 4.2592 | 23.1947 |
Fig. 5Schematic of the LSTM-Attention prediction model.
Metrics of classification performance.
| Metric | Expression | Metric | Expression |
|---|---|---|---|
Metrics of regression performance, where and are the actual and predicted values at time t, respectively. N represents the number of data sample points.
| Metric | Expression | Metric | Expression |
|---|---|---|---|
| MAE | MSE | ||
| RMSE | MAPE |
Fig. 6Graphical illustration of the long/short trading strategy.
Metrics of trading strategy performance, where denotes the return of year is the accumulated return until date t over a period and are the corresponding mean and standard deviation of the return .
| Metric | Expression | Metric | Expression |
|---|---|---|---|
, and F-measure metrics for the testing dataset derived using the LR, SVM, and GBDT models and different sources of input data, with features’ window size in the three panels.
| Model | Trading data | News data | Time–frequency | Alpha 101 | Alpha 191 | ||||
|---|---|---|---|---|---|---|---|---|---|
| LR | ✓ | 0.5072 | 0.5111 | 0.6571 | 0.5750 | ||||
| ✓ | ✓ | 0.5652 | 0.5556 | 0.7143 | 0.6250 | ||||
| ✓ | ✓ | 0.7105 | |||||||
| ✓ | ✓ | 0.5362 | 0.3778 | 0.8095 | 0.5152 | ||||
| ✓ | ✓ | 0.4638 | 0.2000 | 0.3273 | |||||
| SVM | ✓ | 0.4058 | 0.1111 | 0.1961 | |||||
| ✓ | ✓ | 0.4782 | 0.4222 | 0.6552 | 0.5135 | ||||
| ✓ | ✓ | 0.5217 | 0.4889 | 0.6875 | 0.5714 | ||||
| ✓ | ✓ | 0.6600 | |||||||
| ✓ | ✓ | 0.4782 | 0.3111 | 0.7368 | 0.4375 | ||||
| GBDT | ✓ | 0.4202 | 0.2667 | 0.6316 | 0.3750 | ||||
| ✓ | ✓ | 0.4638 | 0.4667 | 0.6176 | 0.5316 | ||||
| ✓ | ✓ | 0.5942 | 0.6809 | ||||||
| ✓ | ✓ | 0.5556 | 0.6494 | ||||||
| ✓ | ✓ | 0.5507 | 0.5553 | 0.7059 | 0.6076 | ||||
| LR | ✓ | 0.5882 | 0.5454 | 0.7500 | 0.6316 | ||||
| ✓ | ✓ | 0.5527 | 0.6389 | ||||||
| ✓ | ✓ | 0.5882 | 0.6667 | ||||||
| ✓ | ✓ | 0.6029 | 0.5000 | 0.8148 | 0.6197 | ||||
| ✓ | ✓ | 0.5147 | 0.4773 | 0.6774 | 0.5600 | ||||
| SVM | ✓ | 0.4264 | 0.1364 | 0.2353 | |||||
| ✓ | ✓ | 0.5588 | 0.4772 | 0.7500 | 0.5833 | ||||
| ✓ | ✓ | 0.6500 | |||||||
| ✓ | ✓ | 0.5294 | 0.7045 | 0.6200 | 0.6596 | ||||
| ✓ | ✓ | 0.3823 | 0.0909 | 0.6667 | 0.1600 | ||||
| ✓ | 0.5714 | 0.5538 | 0.6545 | 0.6000 | |||||
| GBDT | ✓ | ✓ | 0.4706 | 0.4773 | 0.6176 | 0.5385 | |||
| ✓ | ✓ | 0.5147 | 0.5909 | 0.6341 | 0.6118 | ||||
| ✓ | ✓ | 0.6406 | |||||||
| ✓ | ✓ | 0.4706 | 0.2500 | 0.3793 | |||||
| LR | ✓ | 0.5672 | 0.7500 | 0.5915 | |||||
| ✓ | ✓ | 0.5522 | 0.3953 | 0.8095 | 0.5313 | ||||
| ✓ | ✓ | 0.4179 | 0.2326 | 0.6250 | 0.3390 | ||||
| ✓ | ✓ | 0.4651 | |||||||
| ✓ | ✓ | 0.5075 | 0.3721 | 0.7273 | 0.4923 | ||||
| SVM | ✓ | 0.4029 | 0.0930 | 0.1667 | |||||
| ✓ | ✓ | 0.4030 | 0.0930 | 0.1667 | |||||
| ✓ | ✓ | 0.5224 | 0.5116 | 0.6667 | 0.5789 | ||||
| ✓ | ✓ | 0.4627 | 0.3488 | 0.6522 | 0.4545 | ||||
| ✓ | ✓ | 0.6667 | |||||||
| GBDT | ✓ | 0.3881 | 0.2558 | 0.5500 | 0.3492 | ||||
| ✓ | ✓ | 0.3731 | 0.2558 | 0.5238 | 0.3438 | ||||
| ✓ | ✓ | 0.3881 | 0.2093 | 0.5625 | 0.3051 | ||||
| ✓ | ✓ | 0.4030 | 0.4186 | 0.5455 | 0.4737 | ||||
| ✓ | ✓ | ||||||||
, and F-measure metrics for the testing dataset derived from the LR, SVM, and GBDT models using different combinations of features and PCA methods.
| Model | Trading data | Alpha 101 | Alpha 191 | Reduced dimension | ||||
|---|---|---|---|---|---|---|---|---|
| LR | ✓ | ✓ | PCA(2) | 0.5797 | 0.5556 | 0.7353 | 0.6329 | |
| ✓ | ✓ | PCA(3) | 0.5797 | 0.6818 | ||||
| ✓ | ✓ | PCA(4) | 0.5333 | 0.6400 | ||||
| ✓ | ✓ | PCA(5) | 0.5942 | 0.6000 | 0.7297 | 0.6585 | ||
| ✓ | ✓ | PCA(2) | ||||||
| ✓ | ✓ | PCA(3) | 0.5652 | 0.7143 | 0.6250 | |||
| ✓ | ✓ | PCA(4) | 0.5507 | 0.4889 | 0.7333 | 0.5867 | ||
| ✓ | ✓ | PCA(5) | ||||||
| SVM | ✓ | ✓ | PCA(2) | 0.4928 | 0.5333 | 0.6316 | 0.5783 | |
| ✓ | ✓ | PCA(3) | 0.4493 | 0.3111 | 0.6667 | 0.4242 | ||
| ✓ | ✓ | PCA(4) | 0.5362 | 0.6383 | ||||
| ✓ | ✓ | PCA(5) | 0.6222 | |||||
| ✓ | ✓ | PCA(2) | 0.3913 | 0.0889 | 0.1600 | |||
| ✓ | ✓ | PCA(3) | 0.4783 | 0.3556 | 0.6956 | 0.4706 | ||
| ✓ | ✓ | PCA(4) | 0.5778 | 0.7429 | 0.6500 | |||
| ✓ | ✓ | PCA(5) | 0.6809 | |||||
| GBDT | ✓ | ✓ | PCA(2) | 0.4638 | 0.3333 | 0.6818 | 0.4478 | |
| ✓ | ✓ | PCA(3) | 0.4058 | 0.3111 | 0.5833 | 0.4058 | ||
| ✓ | ✓ | PCA(4) | ||||||
| ✓ | ✓ | PCA(5) | 0.5797 | 0.6667 | 0.6818 | 0.6742 | ||
| ✓ | ✓ | PCA(2) | 0.4928 | 0.5556 | 0.6250 | 0.5882 | ||
| ✓ | ✓ | PCA(3) | 0.4928 | 0.5556 | 0.6250 | 0.5882 | ||
| ✓ | ✓ | PCA(4) | 0.4927 | 0.5556 | 0.6250 | 0.5882 | ||
| ✓ | ✓ | PCA(5) |
Fig. 7The relative importance of features based on how many times each feature is used to split data in the GBDT model.
Average estimates and standard deviations of the metrics , and F-measure from ten repetitions of experiments using the testing dataset derived and the LR, SVM, GBDT, LSTM, and LSTM-Attention models for prediction of price direction.
| Model | ||||
|---|---|---|---|---|
| LR | ||||
| SVM | ||||
| GBDT | ||||
| LSTM | ||||
| LSTM-Attention |
, and MAPE for the differences between the real closing prices and average estimates from ten repetitions of experiments using the testing dataset and the SVM, GBDT, LSTM, and LSTM-Attention models.
| Model | ||||
|---|---|---|---|---|
| SVM | 10.6655 | 159.0070 | 12.6098 | 0.0882 |
| GBDT | 11.3953 | 200.9968 | 14.1773 | 0.0892 |
| LSTM | 7.2207 | 83.8763 | 9.1584 | 0.0578 |
| LSTM-Attention |
Fig. 8Average estimates of ten repetitions of experiments for the SVM, GBDT, LSTM, and LSTM-Attention models and the corresponding 95.44% confidence intervals ( standard deviations), compared with the real closing price during the testing period.
, and Trading Days of the long/short strategy using the LR, SVM, GBDT, LSTM, and LSTM-Attention models, together with those of the ’buy-and-hold strategy’, denoted Benchmark, and of the ‘ex post trading strategy’, denoted Ex post, for the testing dataset.
| Model | Trading Days | |||||
|---|---|---|---|---|---|---|
| Ex post | 12.62 | 328.49 | 0 | Inf | 18 | 68 |
| Benchmark | 4.05 | 12.24 | 0.18 | 68.00 | 1 | 68 |
| LR | 5.52 | 6.66 | 222.00 | 14 | 68 | |
| SVM | 5.69 | 16.69 | 0.14 | 119.21 | 4 | 68 |
| GBDT | 4.81 | 2.33 | 0.06 | 38.83 | 14 | 68 |
| LSTM | 6.42 | 18.78 | 0.06 | 10 | 68 | |
| LSTM-Attention | 0.10 | 269.60 | 6 | 68 | ||
| Ex post | 11.68 | 217.66 | 0 | Inf | 18 | 68 |
| Benchmark | 4.01 | 11.93 | 0.18 | 66.28 | 1 | 68 |
| LR | 4.70 | 4.47 | 89.40 | 14 | 68 | |
| SVM | 5.13 | 8.94 | 0.09 | 99.33 | 7 | 68 |
| GBDT | 3.65 | 1.41 | 0.07 | 20.14 | 14 | 68 |
| LSTM | 11.72 | 0.06 | 12 | 68 | ||
| LSTM-Attention | 5.88 | 0.11 | 145.91 | 5 | 68 | |
Fig. 9Time evolution of the trading performance and buy and sell signals using the LR, SVM, GBDT, LSTM, and LSTM-Attention prediction models. Top panels of each part: Time evolution of the trading performance in the testing period based on the predictions of individual models, for the strategy without transaction costs (red line), the strategy with transaction costs (blue line), and for comparison, the buy-and-hold strategy (benchmark strategy, black line). Middle panels: Buy and sell signals of the prediction models’ trading strategies without transaction costs. Bottom panels: Buy and sell signals of the prediction models’ trading strategies with transaction costs.
| 1: Set the PRCs | |
| 2: | |
| 3: Decompose the subsequence | |
| 4: Compute the residual of the subsequence | |
| 5: Obtain the time–frequency features at time | |
| 6: |
| 1: Compute the outputs |
| 2: Calculate the output |
| 3: Compute the output |
| 4: Similar to step 3, obtain the output |
| 5: Calculate the loss |