| Literature DB >> 36196451 |
Arsalan Dezhkam1, Mohammad Taghi Manzuri1, Ahmad Aghapour1, Afshin Karimi1, Ali Rabiee1, Shervin Manzuri Shalmani2.
Abstract
Financial time series have been extensively studied within the past decades; however, the advent of machine learning and deep neural networks opened new horizons to apply supercomputing techniques to extract more insights from the underlying patterns of price data. This paper presents a tri-state labeling approach to classify the underlying patterns in price data into up, down and no-action classes. The introduction of a no-action state in our novel approach alleviates the burden of denoising the dataset as a preprocessing task. The performance of our labeling algorithm is experimented with using machine learning and deep learning models. The framework is augmented by applying the Bayesian optimization technique for the selection of the best tuning values of the hyperparameters. The price trend prediction module generates the required trading signals. The results show that the average annualized Sharpe ratio as the trading performance metric is about 2.823, indicating the framework produces excellent cumulative returns.Entities:
Keywords: Classification; Deep learning; Feature engineering; Financial time series; Machine learning; Trend prediction
Year: 2022 PMID: 36196451 PMCID: PMC9521884 DOI: 10.1007/s11227-022-04834-4
Source DB: PubMed Journal: J Supercomput ISSN: 0920-8542 Impact factor: 2.557
Fig. 1a Purging overlap in the training set; b Embargo of post-test train observations [28]
Fig. 2RNN architecture
Fig. 3LSTM architecture
Fig. 4GRU cell. Forget and input gates in LSTM are now integrated into the update gate in the GRU model
Fig. 5Confusion matrix. Source: [28]
Fig. 6Trend prediction and trading framework pipeline
Fig. 7A snapshot of the trading system using predicted labels. Buy and sell positions are taken upon receiving the appropriate signal. The first ‘1’ indicates a buy position and the next ‘− 1’ is the sell position. ‘0’ labels are indicating the volatile market
Descriptive statistics for datasets; the statistics are computed for the log return series
| Stock | Min | Max | Mean | Std | Skewness | Kurtosis | Jarque_Bera test | Beta | Observations |
|---|---|---|---|---|---|---|---|---|---|
| CLX | − 0.1777 | 0.1246 | 0.0005 | 0.015 | − 0.3623 | 11.1595 | (40,634.0, 0.0) | 0.17 | 7809 |
| WMT | − 0.1074 | 0.1107 | 0.0004 | 0.016 | 0.1196 | 5.1631 | (8677.52, 0.0) | 0.52 | 7808 |
| M | − 0.2244 | 0.1921 | 0.0002 | 0.0274 | − 0.0893 | 7.4364 | (17,299.2, 0.0) | 2.09 | 7515 |
| AAPL | − 0.7312 | 0.2869 | 0.0008 | 0.028 | − 2.2186 | 65.8689 | (1,416,089.97, 0.0) | 1.2 | 7808 |
| STX | − 0.2807 | 0.2458 | 0.0006 | 0.0292 | − 0.6957 | 11.4471 | (26,417.66, 0.0) | 1.06 | 4779 |
| AMD | − 0.4769 | 0.4206 | 0.0005 | 0.039 | − 0.3329 | 10.4935 | (35,916.69, 0.0) | 1.93 | 7808 |
The values in the parentheses in the Jarque–Bera column indicate Chi-Square value and p value, respectively
Fig. 8Automatic labeling of AMD and CLX time series using the proposed algorithm. a, b Are price time series for AMD and CLX, respectively, while c, d show their continuous trend labeling. The vertical axis in (a, b) is in US dollars. In labeling diagrams, up-trends are shown by 1, and the down-trend is − 1, while 0 stands for no deterministic trend due to high volatility between two up and down situations. For both time series, the threshold value is set to 0.05
Fig. 9Combinatorial embargoed purging K-fold CV. The blue and red bars indicate the training and validation sets, respectively. Target label is the labels extracted from price data using proposed framework. Green bars are + 1, Reds show − 1 and fluctuating periods are marked with yellow bars. Price data as input feature vector have been cross-validated with 8 training and 2 validation folds. Resource: research simulations
Selected hyperparameters for optimization
| Name | Description | Range |
|---|---|---|
| units | Specifies the number of units in each dense layer | 24, 32,48, 64 |
| Dropout | Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs | 0.2, 0.25, 0.3, 0.35, 0.4 |
| learning rate (l.r.) | How quickly the network updates its parameters | [0.001, 0.1] |
| objective | Learning objective function | multi:softprob, multi:softmax |
| n_estimators | The number of trees in our ensemble. Equivalent to the number of boosting rounds. The value must be an integer greater than 0. Default is 100 | 50–1000 |
| max_depth | The maximum depth per tree. A deeper tree might increase the performance, but also the complexity and chances to overfit. The value must be an integer greater than 0. Default is 6 | 3–18 |
| sampling_method | Used only by gpu_hist tree method | 'uniform,' 'gradient_based' |
| colsample_bytree | The fraction of columns to be randomly sampled for each tree. It might improve overfitting. The value must be between 0 and 1. Default is 1 | 0.1–0.99 |
| learning_rate | The step size at each iteration, while the model optimizes toward its objective | 0.01–0.2 |
| reg_alpha | L1 regularization on the weights (Lasso Regression). When working with a large number of features, it might improve speed performance. It can be any integer. Default is 0 | 0.00001–0.01 |
| reg_lambda | L2 regularization on the weights (Ridge Regression). It might help to reduce overfitting. It can be any integer. Default is 1 | 0.00001–0.01 |
| gamma | A pseudo-regularization parameter (Lagrangian multiplier) that depends on the other parameters. The higher Gamma is, the higher the regularization. It can be any integer. Default is 0 | 0.1–0.99 |
| kernel | Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used | poly, rbf, sigmoid |
| C | Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty | [0.01, 1] |
| gamma | Kernel coefficient for ‘rbf,’ ‘poly,’ and ‘sigmoid’ | 1, 2, 3, 4, 5, 6, 7, 8, 9 |
| degree | Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels | 2, 3, 4 |
Hyperparameter optimization of recurrent neural networks for AMD stock
| LSTM | GRU | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Dropout | Units | l.r | Accuracy | Dropout | Units | l.r | ||
| 0.88571 | 0.87882 | 0.2 | 48 | 0.00102 | |||||
| 0.9033 | 0.89778 | 0.4 | 64 | 0.00165 | 0.91209 | 0.90491 | 0.35 | 32 | 0.00923 |
| 0.9033 | 0.90412 | 0.3 | 48 | 0.00421 | 0.88352 | 0.87358 | 0.4 | 64 | 0.00503 |
| 0.88791 | 0.88588 | 0.3 | 24 | 0.00146 | 0.91209 | 0.90408 | 0.2 | 64 | 0.00274 |
| 0.9033 | 0.89748 | 0.35 | 64 | 0.00168 | 0.90549 | 0.90176 | 0.2 | 64 | 0.00813 |
| 0.89451 | 0.89012 | 0.4 | 64 | 0.03355 | 0.90769 | 0.89919 | 0.25 | 48 | 0.00127 |
| 0.8989 | 0.89013 | 0.25 | 64 | 0.00108 | 0.9033 | 0.89012 | 0.4 | 48 | 0.00242 |
| 0.87033 | 0.8709 | 0.3 | 32 | 0.00252 | 0.89451 | 0.87688 | 0.3 | 24 | 0.00343 |
| 0.88132 | 0.88102 | 0.25 | 48 | 0.02173 | 0.89451 | 0.88433 | 0.35 | 48 | 0.00101 |
| 0.89451 | 0.87768 | 0.4 | 64 | 0.02368 | 0.82857 | 0.75089 | 0.35 | 48 | 0.01142 |
| 0.9011 | 0.89575 | 0.3 | 32 | 0.00979 | 0.8967 | 0.89996 | 0.25 | 32 | 0.00372 |
| 0.90549 | 0.89988 | 0.2 | 48 | 0.00117 | 0.82857 | 0.75089 | 0.35 | 48 | 0.00211 |
| 0.90989 | 0.90269 | 0.2 | 48 | 0.02547 | 0.90549 | 0.89981 | 0.35 | 48 | 0.00935 |
| 0.89231 | 0.88145 | 0.25 | 32 | 0.0182 | 0.89451 | 0.8785 | 0.3 | 24 | 0.00259 |
| 0.88571 | 0.87547 | 0.2 | 32 | 0.0246 | 0.82857 | 0.75089 | 0.2 | 64 | 0.00112 |
| 0.9011 | 0.89999 | 0.4 | 48 | 0.00131 | 0.91429 | 0.90919 | 0.2 | 32 | 0.00339 |
| 0.88791 | 0.87228 | 0.3 | 32 | 0.00776 | 0.87692 | 0.85466 | 0.35 | 64 | 0.00402 |
| 0.8967 | 0.88018 | 0.35 | 48 | 0.00194 | 0.89011 | 0.87852 | 0.2 | 24 | 0.00425 |
| 0.86813 | 0.84878 | 0.3 | 24 | 0.02053 | 0.9033 | 0.89882 | 0.2 | 24 | 0.0072 |
| 0.89011 | 0.88068 | 0.2 | 64 | 0.00106 | 0.9033 | 0.89793 | 0.25 | 24 | 0.00611 |
| 0.88791 | 0.88289 | 0.4 | 24 | 0.00166 | |||||
| 0.90769 | 0.89961 | 0.3 | 48 | 0.00132 | 0.89231 | 0.87961 | 0.35 | 64 | 0.00233 |
| 0.90769 | 0.90329 | 0.4 | 48 | 0.02314 | 0.91429 | 0.91066 | 0.35 | 32 | 0.00394 |
| 0.89231 | 0.88607 | 0.3 | 32 | 0.03195 | 0.89231 | 0.8795 | 0.25 | 64 | 0.01113 |
| 0.86813 | 0.87035 | 0.3 | 24 | 0.03019 | 0.89451 | 0.88695 | 0.25 | 32 | 0.00204 |
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization of recurrent neural networks for CLX stock
| LSTM | GRU | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Dropout | Units | l.r | Accuracy | Dropout | Units | l.r | ||
| 0.85403 | 0.83438 | 0.25 | 32 | 0.00305 | 0.91068 | 0.90667 | 0.2 | 64 | 0.00107 |
| 0.90196 | 0.8878 | 0.4 | 32 | 0.00264 | |||||
| 0.90414 | 0.89121 | 0.25 | 64 | 0.00269 | 0.87582 | 0.8568 | 0.4 | 32 | 0.00169 |
| 0.90196 | 0.89477 | 0.25 | 24 | 0.02077 | 0.9085 | 0.89966 | 0.2 | 32 | 0.00167 |
| 0.89107 | 0.87656 | 0.25 | 32 | 0.0029 | 0.91068 | 0.90265 | 0.2 | 32 | 0.00216 |
| 0.84096 | 0.79521 | 0.4 | 32 | 0.03533 | 0.90414 | 0.89104 | 0.2 | 32 | 0.00171 |
| 0.89542 | 0.88591 | 0.25 | 64 | 0.0021 | 0.9085 | 0.8975 | 0.25 | 64 | 0.00103 |
| 0.90285 | 0.90176 | 0.25 | 24 | 0.00335 | 0.88671 | 0.87622 | 0.25 | 48 | 0.00232 |
| 0.88017 | 0.86976 | 0.2 | 48 | 0.02179 | 0.89107 | 0.88824 | 0.25 | 24 | 0.00234 |
| 0.9085 | 0.90066 | 0.2 | 48 | 0.0129 | 0.81481 | 0.73167 | 0.3 | 48 | 0.0058 |
| 0.90196 | 0.89181 | 0.2 | 64 | 0.00236 | |||||
| 0.89978 | 0.88549 | 0.2 | 32 | 0.02723 | 0.90632 | 0.89653 | 0.35 | 32 | 0.00499 |
| 0.90414 | 0.89392 | 0.4 | 32 | 0.0026 | 0.81481 | 0.73167 | 0.4 | 64 | 0.00142 |
| 0.88453 | 0.87549 | 0.3 | 32 | 0.01627 | 0.89325 | 0.87644 | 0.35 | 32 | 0.00179 |
| 0.8976 | 0.88662 | 0.2 | 24 | 0.01153 | 0.88453 | 0.87057 | 0.3 | 32 | 0.00222 |
| 0.90196 | 0.89527 | 0.25 | 24 | 0.00406 | 0.91068 | 0.90066 | 0.25 | 48 | 0.00128 |
| 0.85839 | 0.83964 | 0.25 | 32 | 0.00245 | 0.90414 | 0.89726 | 0.2 | 32 | 0.00153 |
| 0.91503 | 0.90515 | 0.25 | 32 | 0.0027 | 0.90414 | 0.89101 | 0.25 | 48 | 0.00117 |
| 0.90632 | 0.89613 | 0.3 | 64 | 0.00358 | 0.84532 | 0.7925 | 0.2 | 32 | 0.00182 |
| 0.85621 | 0.82603 | 0.2 | 48 | 0.00223 | 0.90196 | 0.89281 | 0.2 | 48 | 0.00202 |
| 0.87364 | 0.86368 | 0.3 | 64 | 0.00437 | 0.9085 | 0.89956 | 0.2 | 32 | 0.00118 |
| 0.87582 | 0.8611 | 0.2 | 64 | 0.03391 | 0.92157 | 0.91621 | 0.2 | 32 | 0.00148 |
| 0.91068 | 0.90329 | 0.4 | 24 | 0.00407 | 0.89978 | 0.88628 | 0.4 | 64 | 0.00109 |
| 0.89542 | 0.89053 | 0.3 | 24 | 0.01018 | 0.89325 | 0.88721 | 0.25 | 32 | 0.00436 |
| 0.85839 | 0.83285 | 0.2 | 24 | 0.00366 | 0.91503 | 0.90694 | 0.25 | 24 | 0.00354 |
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization of recurrent neural networks for M stock
| LSTM | GRU | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Dropout | Units | l.r | Accuracy | Dropout | Units | l.r | ||
| 0.86822 | 0.84763 | 0.3 | 24 | 0.00467 | 0.86047 | 0.86183 | 0.35 | 24 | 0.00199 |
| 0.83915 | 0.76576 | 0.2 | 32 | 0.01753 | 0.87597 | 0.85821 | 0.3 | 64 | 0.01056 |
| 0.8469 | 0.78649 | 0.35 | 48 | 0.00597 | 0.83915 | 0.76576 | 0.35 | 24 | 0.03173 |
| 0.8469 | 0.78338 | 0.4 | 32 | 0.01301 | 0.8469 | 0.83781 | 0.25 | 48 | 0.01737 |
| 0.87209 | 0.86186 | 0.4 | 32 | 0.0162 | 0.81395 | 0.8018 | 0.25 | 24 | 0.01998 |
| 0.8469 | 0.84341 | 0.3 | 24 | 0.00853 | 0.85078 | 0.84806 | 0.25 | 32 | 0.01251 |
| 0.85078 | 0.8074 | 0.35 | 48 | 0.01066 | 0.85078 | 0.85066 | 0.2 | 64 | 0.00283 |
| 0.78295 | 0.80698 | 0.35 | 48 | 0.02937 | 0.87597 | 0.87195 | 0.2 | 64 | 0.00781 |
| 0.85853 | 0.85415 | 0.4 | 32 | 0.00521 | 0.86434 | 0.85796 | 0.25 | 32 | 0.01924 |
| 0.87209 | 0.84343 | 0.35 | 64 | 0.01607 | 0.83333 | 0.83791 | 0.2 | 48 | 0.00374 |
| 0.85465 | 0.84037 | 0.35 | 48 | 0.01052 | 0.86047 | 0.85982 | 0.35 | 24 | 0.00381 |
| 0.83527 | 0.78557 | 0.2 | 64 | 0.01132 | 0.87016 | 0.86487 | 0.35 | 24 | 0.00274 |
| 0.86434 | 0.84591 | 0.4 | 64 | 0.01385 | 0.83915 | 0.76576 | 0.35 | 64 | 0.01673 |
| 0.83915 | 0.76576 | 0.2 | 32 | 0.00811 | 0.8469 | 0.78338 | 0.35 | 32 | 0.01054 |
| 0.85078 | 0.84877 | 0.3 | 24 | 0.01377 | 0.85465 | 0.85722 | 0.2 | 64 | 0.00661 |
| 0.82752 | 0.79296 | 0.35 | 24 | 0.00533 | 0.85271 | 0.85549 | 0.25 | 32 | 0.00704 |
| 0.83915 | 0.76576 | 0.4 | 64 | 0.00476 | 0.87403 | 0.86877 | 0.2 | 24 | 0.0026 |
| 0.85078 | 0.8462 | 0.4 | 48 | 0.00704 | 0.83915 | 0.76576 | 0.25 | 48 | 0.01857 |
| 0.83915 | 0.76576 | 0.2 | 24 | 0.01712 | 0.83333 | 0.84691 | 0.3 | 32 | 0.0135 |
| 0.83915 | 0.76576 | 0.35 | 24 | 0.01963 | 0.85853 | 0.84222 | 0.2 | 48 | 0.00768 |
| 0.85271 | 0.80062 | 0.3 | 24 | 0.02079 | 0.84884 | 0.84699 | 0.2 | 32 | 0.00239 |
| 0.87582 | 0.8611 | 0.2 | 64 | 0.03391 | |||||
| 0.8469 | 0.81967 | 0.4 | 24 | 0.01267 | |||||
| 0.89542 | 0.89053 | 0.3 | 24 | 0.01018 | 0.83915 | 0.76576 | 0.25 | 32 | 0.00389 |
| 0.85839 | 0.83285 | 0.2 | 24 | 0.00366 | 0.86434 | 0.82619 | 0.3 | 48 | 0.0185 |
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization for SVM classification
| SVC | ||||||
|---|---|---|---|---|---|---|
| Accuracy | Kernel | Degree | gamma | Time (s) | ||
| 0.88791 | 0.86462 | 0.97 | rbf | 4 | 6.35122 | 34.08338 |
| 0.88791 | 0.86267 | 0.74 | rbf | 4 | 4.61141 | 34.47854 |
| 0.83516 | 0.76597 | 0.6 | rbf | 3 | 4.1091 | 34.66601 |
| 0.89011 | 0.86669 | 0.92 | rbf | 2 | 8.8002 | 33.99457 |
| 0.88791 | 0.86374 | 0.69 | rbf | 2 | 5.24261 | 33.50706 |
| 0.89011 | 0.86669 | 0.7 | rbf | 4 | 5.92944 | 33.40672 |
| 0.89011 | 0.86665 | 0.97 | rbf | 4 | 2.56692 | 35.21639 |
| 0.88791 | 0.86271 | 0.56 | rbf | 4 | 4.81323 | 34.29156 |
| 0.89451 | 0.87525 | 1 | rbf | 3 | 8.67797 | 32.86624 |
| 0.89011 | 0.86665 | 0.99 | rbf | 4 | 8.42303 | 33.2351 |
| 0.85714 | 0.80872 | 0.85 | rbf | 4 | 4.69465 | 34.28548 |
| 0.88571 | 0.86038 | 0.94 | rbf | 3 | 2.38 | 34.4737 |
| 0.84615 | 0.78804 | 1 | rbf | 4 | 7.12775 | 34.51734 |
| 0.89231 | 0.87146 | 0.95 | rbf | 4 | 8.49369 | 33.89479 |
| 0.88571 | 0.85976 | 0.92 | rbf | 2 | 5.8358 | 34.21499 |
| 0.89011 | 0.86665 | 0.88 | rbf | 2 | 6.29374 | 32.85993 |
| 0.81538 | 0.8109 | 0.9 | rbf | 4 | 3.55488 | 32.62426 |
| 0.88352 | 0.85675 | 0.69 | rbf | 3 | 3.60119 | 34.24339 |
| 0.88571 | 0.8619 | 0.92 | rbf | 3 | 6.67293 | 33.9439 |
| 0.86374 | 0.82297 | 0.89 | rbf | 4 | 6.7315 | 33.09472 |
| 0.86593 | 0.82634 | 0.88 | rbf | 3 | 7.93163 | 34.81558 |
| 0.84396 | 0.78431 | 0.92 | rbf | 3 | 8.93898 | 33.51085 |
| 0.88791 | 0.86271 | 1 | rbf | 4 | 8.9172 | 34.22566 |
| 0.89011 | 0.86669 | 0.59 | rbf | 4 | 3.71744 | 34.04539 |
The values are reported for AMD stock
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization for XGBoost
| XGBoost | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | max_depth | gamma | reg_alpha | reg_lambda | colsample_bytree | min_child_weight | sampling_method | n_estimators | objective | Time | |
| 0.8637 | 0.8215 | 18 | 1.1051 | 41 | 0.3018 | 0.7678 | 10 | gradient_based | 696 | multi:softmax | 63.4289 |
| 0.8637 | 0.8289 | 11 | 3.969 | 40 | 0.6491 | 0.5095 | 2 | gradient_based | 781 | multi:softprob | 55.0986 |
| 0.8747 | 0.8458 | 15 | 3.4775 | 70 | 0.37 | 0.9825 | 7 | uniform | 355 | multi:softmax | 50.6152 |
| 0.8879 | 0.8697 | 17 | 2.8832 | 42 | 0.8175 | 0.8962 | 10 | uniform | 454 | multi: softmax | 51.8069 |
| 0.8703 | 0.8411 | 12 | 1.4598 | 87 | 0.9679 | 0.8299 | 10 | gradient_based | 318 | multi:softprob | 56.6415 |
| 0.8725 | 0.8442 | 7 | 1.2186 | 41 | 0.0171 | 0.9797 | 7 | uniform | 967 | multi:softmax | 50.2201 |
| 0.8725 | 0.8439 | 18 | 8.1591 | 43 | 0.2704 | 0.7073 | 6 | gradient_based | 486 | multi:softmax | 65.4552 |
| 0.8703 | 0.8409 | 14 | 3.0756 | 42 | 0.1931 | 0.8954 | 10 | gradient_based | 502 | multi:softprob | 57.5294 |
| 0.8813 | 0.8576 | 15 | 2.7026 | 56 | 0.7277 | 0.7741 | 7 | uniform | 423 | multi:softprob | 55.8899 |
| 0.8791 | 0.8537 | 11 | 1.0769 | 40 | 0.531 | 0.9712 | 9 | uniform | 667 | multi:softmax | 60.1367 |
| 0.8879 | 0.8683 | 15 | 1.2257 | 47 | 0.5404 | 0.6589 | 8 | gradient_based | 273 | multi:softmax | 59.3627 |
| 0.8747 | 0.8463 | 18 | 1.2111 | 43 | 0.241 | 0.5436 | 10 | uniform | 450 | multi:softmax | 65.7168 |
| 0.8835 | 0.8603 | 11 | 2.2105 | 62 | 0.7019 | 0.986 | 7 | gradient_based | 682 | multi:softprob | 42.5147 |
| 0.8857 | 0.8644 | 14 | 1.0571 | 40 | 0.498 | 0.6547 | 10 | gradient_based | 683 | multi:softmax | 65.4253 |
| 0.8791 | 0.8536 | 18 | 7.089 | 40 | 0.9822 | 0.7151 | 3 | uniform | 990 | multi:softprob | 83.399 |
| 0.8593 | 0.818 | 14 | 1.0921 | 40 | 0.2623 | 0.6218 | 0 | uniform | 611 | multi:softmax | 66.2685 |
| 0.8791 | 0.8526 | 9 | 2.2144 | 41 | 0.0674 | 0.6766 | 10 | gradient_based | 501 | multi:softmax | 58.5607 |
| 0.8615 | 0.8276 | 14 | 1.3054 | 58 | 0.2625 | 0.5855 | 2 | gradient_based | 613 | multi:softprob | 49.1764 |
| 0.8791 | 0.8537 | 4 | 1.9456 | 59 | 0.7058 | 0.976 | 4 | uniform | 351 | multi:softprob | 50.5104 |
| 0.8769 | 0.851 | 16 | 3.8586 | 54 | 0.4508 | 0.8698 | 3 | uniform | 478 | multi:softmax | 74.8684 |
| 0.8791 | 0.8529 | 7 | 1.7158 | 40 | 0.9097 | 0.6642 | 4 | gradient_based | 786 | multi:softprob | 45.9424 |
| 0.8769 | 0.851 | 4 | 4.0713 | 41 | 0.0022 | 0.9273 | 6 | uniform | 574 | multi:softmax | 43.6041 |
| 0.8769 | 0.851 | 8 | 5.3256 | 41 | 0.6899 | 0.8038 | 1 | uniform | 948 | multi:softmax | 47.8001 |
| 0.8725 | 0.8429 | 14 | 7.1177 | 42 | 0.6682 | 0.8005 | 8 | gradient_based | 334 | multi:softmax | 63.6851 |
The values are reported for AMD stock
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization for XGBoost
| XGBoost | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | max_depth | gamma | reg_alpha | reg_lambda | colsample_bytree | min_child_weight | sampling_method | n_estimators | objective | Time | |
| 0.817 | 0.7741 | 6 | 5.931 | 41 | 0.3727 | 0.7007 | 0 | gradient_based | 372 | multi:softmax | 24.0615 |
| 0.7974 | 0.7075 | 7 | 2.7059 | 113 | 0.0845 | 0.9575 | 9 | uniform | 198 | multi:softprob | 28.5653 |
| 0.8366 | 0.812 | 18 | 1.3742 | 41 | 0.4262 | 0.9842 | 3 | gradient_based | 191 | multi:softprob | 37.0794 |
| 0.8105 | 0.7604 | 18 | 6.8847 | 40 | 0.6793 | 0.6817 | 7 | uniform | 305 | multi:softprob | 30.7957 |
| 0.8279 | 0.7962 | 9 | 1.0936 | 40 | 0.5142 | 0.7101 | 3 | uniform | 299 | multi:softprob | 30.1246 |
| 0.8366 | 0.8141 | 18 | 2.1214 | 44 | 0.4773 | 0.815 | 5 | uniform | 359 | multi:softmax | 33.8445 |
| 0.8061 | 0.7334 | 5 | 5.6337 | 50 | 0.8566 | 0.7466 | 3 | uniform | 295 | multi:softmax | 22.1428 |
| 0.8214 | 0.7825 | 10 | 4.2535 | 42 | 0.0339 | 0.7193 | 2 | uniform | 328 | multi:softmax | 31.4881 |
| 0.8844 | 0.8699 | 16 | 2.4258 | 40 | 0.0621 | 0.9548 | 8 | uniform | 210 | multi:softprob | 27.5668 |
| 0.8148 | 0.7685 | 10 | 4.2891 | 50 | 0.0452 | 0.6634 | 4 | uniform | 344 | multi:softmax | 24.2424 |
| 0.8301 | 0.7982 | 12 | 4.4569 | 41 | 0.9636 | 0.9874 | 6 | uniform | 283 | multi:softprob | 31.7568 |
| 0.7974 | 0.7075 | 13 | 7.0061 | 41 | 0.4065 | 0.9421 | 4 | gradient_based | 298 | multi:softprob | 32.9238 |
| 0.8279 | 0.7947 | 12 | 8.8421 | 59 | 0.3582 | 0.5612 | 6 | gradient_based | 304 | multi:softmax | 30.7019 |
| 0.8366 | 0.8148 | 11 | 2.5667 | 40 | 0.3796 | 0.976 | 2 | uniform | 266 | multi:softprob | 30.2273 |
| 0.8126 | 0.7601 | 6 | 4.0282 | 44 | 0.6599 | 0.4823 | 1 | uniform | 399 | multi:softmax | 23.8804 |
| 0.8061 | 0.7539 | 7 | 1.6378 | 40 | 0.4469 | 0.3806 | 0 | gradient_based | 386 | multi:softprob | 23.8486 |
| 0.8279 | 0.7895 | 4 | 1.8386 | 53 | 0.7457 | 0.9378 | 1 | gradient_based | 249 | multi:softmax | 25.9997 |
| 0.8622 | 0.8535 | 18 | 8.976 | 40 | 0.0084 | 0.8736 | 10 | uniform | 265 | multi:softmax | 30.6596 |
| 0.8912 | 0.875 | 13 | 3.3235 | 40 | 0.1619 | 0.9776 | 6 | uniform | 323 | multi:softprob | 25.5677 |
| 0.8061 | 0.7328 | 5 | 1.6994 | 46 | 0.7916 | 0.983 | 10 | uniform | 181 | multi:softmax | 23.8973 |
| 0.8301 | 0.7949 | 4 | 6.3871 | 41 | 0.9904 | 0.8966 | 1 | uniform | 327 | multi:softprob | 23.898 |
| 0.8192 | 0.7805 | 12 | 2.8584 | 40 | 0.9966 | 0.9889 | 8 | uniform | 400 | multi:softprob | 33.197 |
| 0.8279 | 0.7932 | 7 | 4.7116 | 94 | 0.7571 | 0.9516 | 3 | gradient_based | 240 | multi:softmax | 28.2843 |
| 0.7974 | 0.7307 | 14 | 7.9455 | 42 | 0.5891 | 0.9898 | 10 | uniform | 274 | multi:softprob | 32.7641 |
The values are reported for CLX stock
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization for XGBoost
| XGBoost | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | max_depth | gamma | reg_alpha | reg_lambda | colsample_bytree | min_child_weight | sampling_method | n_estimators | objective | Time (s) | |
| 0.8574 | 0.8011 | 14 | 2.8253 | 43 | 0.1571 | 0.8231 | 13 | uniform | 520 | multi:softmax | 22.5147 |
| 0.8438 | 0.7722 | 6 | 4.4761 | 41 | 0.707 | 0.983 | 10 | gradient_based | 698 | multi:softprob | 22.8963 |
| 0.8438 | 0.7722 | 8 | 1.0467 | 44 | 0.4469 | 0.8078 | 13 | gradient_based | 667 | multi:softmax | 24.2549 |
| 0.8574 | 0.8011 | 8 | 2.2016 | 73 | 0.2234 | 0.8758 | 20 | gradient_based | 644 | multi:softmax | 23.5862 |
| 0.8438 | 0.7722 | 13 | 1.0282 | 70 | 0.2868 | 0.9875 | 20 | uniform | 589 | multi:softprob | 21.5718 |
| 0.8555 | 0.7975 | 6 | 5.8813 | 68 | 0.4518 | 0.417 | 8 | uniform | 694 | multi:softmax | 20.2554 |
| 0.8438 | 0.7722 | 14 | 4.6023 | 73 | 0.9216 | 0.2635 | 16 | gradient_based | 257 | multi:softprob | 18.0163 |
| 0.8574 | 0.8011 | 9 | 2.6645 | 74 | 0.0395 | 0.391 | 14 | gradient_based | 568 | multi:softprob | 20.9973 |
| 0.8633 | 0.8111 | 13 | 3.233 | 87 | 0.6541 | 0.8851 | 2 | gradient_based | 288 | multi:softmax | 28.0208 |
| 0.8574 | 0.8011 | 12 | 2.0782 | 73 | 0.3246 | 0.2529 | 1 | gradient_based | 626 | multi:softmax | 19.2962 |
| 0.8438 | 0.7722 | 7 | 2.0008 | 79 | 0.3725 | 0.9389 | 20 | uniform | 352 | multi:softprob | 23.9215 |
| 0.8438 | 0.7722 | 18 | 3.8969 | 40 | 0.764 | 0.981 | 10 | uniform | 369 | multi:softmax | 21.9027 |
| 0.8438 | 0.7722 | 18 | 3.1102 | 40 | 0.7071 | 0.8959 | 8 | gradient_based | 288 | multi:softmax | 29.7171 |
| 0.8438 | 0.7722 | 15 | 4.6847 | 87 | 0.0333 | 0.843 | 20 | gradient_based | 749 | multi:softprob | 21.9142 |
| 0.8438 | 0.7722 | 12 | 1.0602 | 62 | 0.2624 | 0.7675 | 5 | gradient_based | 751 | multi:softmax | 25.447 |
| 0.8594 | 0.8045 | 13 | 1.3293 | 41 | 0.6298 | 0.7549 | 5 | gradient_based | 523 | multi:softprob | 23.9069 |
| 0.8574 | 0.8011 | 7 | 7.2568 | 62 | 0.5774 | 0.8555 | 3 | uniform | 702 | multi:softmax | 25.7756 |
| 0.8574 | 0.8011 | 15 | 2.2939 | 60 | 0.7819 | 0.5265 | 0 | uniform | 615 | multi:softmax | 18.968 |
| 0.8438 | 0.7722 | 14 | 4.0672 | 61 | 0.4329 | 0.9167 | 16 | gradient_based | 896 | multi:softprob | 22.0706 |
| 0.8438 | 0.7722 | 7 | 1.0534 | 79 | 0.4175 | 0.76 | 20 | gradient_based | 527 | multi:softmax | 22.7151 |
| 0.8633 | 0.8111 | 6 | 1.3747 | 67 | 0.9937 | 0.6049 | 9 | gradient_based | 700 | multi:softprob | 21.7321 |
| 0.8594 | 0.8045 | 17 | 4.4469 | 87 | 0.3973 | 0.662 | 20 | uniform | 258 | multi:softmax | 22.6639 |
| 0.8652 | 0.8142 | 11 | 1.0097 | 40 | 0.3281 | 0.8083 | 11 | gradient_based | 343 | multi:softmax | 27.3033 |
| 0.8438 | 0.7722 | 16 | 1.1164 | 44 | 0.8867 | 0.9735 | 20 | gradient_based | 944 | multi:softprob | 22.6539 |
The values are reported for M stock
Best tuning values and the respective performance metrics are shown in bold
Hyperparameter optimization for SVM classification
| SVC | ||||||
|---|---|---|---|---|---|---|
| Accuracy | Kernel | Degree | gamma | Time (s) | ||
| 0.54773 | 0.42636 | 0.96 | rbf | 2 | 8.96451 | 68.55751 |
| 0.58864 | 0.51771 | 0.6 | rbf | 3 | 3.23122 | 61.5789 |
| 0.58182 | 0.50193 | 1 | rbf | 4 | 8.94659 | 63.0378 |
| 0.61591 | 0.56178 | 0.75 | rbf | 4 | 5.24323 | 60.19616 |
| 0.69318 | 0.66328 | 0.44 | rbf | 4 | 8.99035 | 63.81414 |
| 0.70909 | 0.68182 | 0.98 | rbf | 2 | 8.99531 | 62.34934 |
| 0.70455 | 0.67627 | 0.4 | rbf | 4 | 8.71733 | 62.89979 |
| 0.87115 | 0.85123 | 0.99 | rbf | 3 | 7.43513 | 64.88739 |
| 0.71136 | 0.68461 | 0.97 | rbf | 4 | 8.86893 | 62.36296 |
| 0.68636 | 0.65382 | 0.44 | rbf | 2 | 8.9409 | 62.44242 |
| 0.52045 | 0.35631 | 0.69 | rbf | 3 | 8.8603 | 63.15386 |
| 0.53409 | 0.39643 | 0.8 | rbf | 3 | 8.83122 | 60.07399 |
| 0.70227 | 0.67503 | 0.93 | rbf | 2 | 8.81325 | 60.41432 |
| 0.70909 | 0.68228 | 0.84 | rbf | 3 | 8.87425 | 62.6169 |
| 0.71136 | 0.68461 | 0.96 | rbf | 3 | 8.6247 | 64.30415 |
| 0.70455 | 0.67738 | 0.68 | rbf | 2 | 8.98146 | 63.31783 |
| 0.70909 | 0.68228 | 1 | rbf | 4 | 7.34932 | 60.10106 |
| 0.69318 | 0.66289 | 0.82 | rbf | 4 | 3.2027 | 62.91097 |
| 0.68864 | 0.66082 | 0.68 | rbf | 4 | 8.98826 | 63.06187 |
| 0.70227 | 0.67419 | 0.71 | rbf | 3 | 8.27532 | 62.57946 |
| 0.70455 | 0.67712 | 1 | rbf | 3 | 8.57879 | 60.85747 |
| 0.59318 | 0.51773 | 0.98 | rbf | 3 | 8.96061 | 63.29737 |
| 0.625 | 0.60501 | 0.7 | rbf | 4 | 7.19191 | 60.17934 |
| 0.69091 | 0.66149 | 0.71 | rbf | 4 | 8.82945 | 61.37867 |
The values are reported for CLX stock
Best tuning values and the respective performance metrics are shown in bold
Classification metrics report for CLX stock
| Class label | BB-XGBoost | BB-SVM | ||||||
|---|---|---|---|---|---|---|---|---|
| Precision | Recall | Support | Precision | Recall | Support | |||
| − 1 | 0.870 | 0.455 | 0.597 | 44 | 0.941 | 0.364 | 0.525 | 44 |
| 0 | 0.914 | 0.992 | 0.951 | 374 | 0.894 | 0.997 | 0.943 | 374 |
| 1 | 0.767 | 0.561 | 0.648 | 41 | 0.720 | 0.439 | 0.545 | 41 |
| Summary | ||||||||
| Accuracy | 0.902 | 459 | 0.887 | 459 | ||||
| Macro avg | 0.850 | 0.669 | 0.732 | 459 | 0.852 | 0.600 | 0.671 | 459 |
| Weighted avg | 0.896 | 0.902 | 0.890 | 459 | 0.883 | 0.887 | 0.867 | 459 |
The threshold and window size are 0.05 and 11, respectively
Performance comparison of LSTM, GRU, XGBoost, and SVM
| Stock | Algorithm | SR | MDD | Time (s) | Stock | Algorithm | SR | MDD | Time (s) |
|---|---|---|---|---|---|---|---|---|---|
| AMD | LSTM | 2.66 | 7.22 | 211.84 | STX | LSTM | 1.94 | 2.16 | 127.58 |
| GRU | 2.44 | 2.32 | 241.81 | GRU | 1.96 | 17.27 | 145.10 | ||
| XGBoost | XGBoost | ||||||||
| SVM | 2.56 | 7.22 | 2.565 | SVM | 2.16 | 2.16 | 0.873 | ||
| WMT | LSTM | 1.28 | 7.38 | 213.89 | M | LSTM | 0.73 | 2.98 | 327.65 |
| GRU | − 1.98 | 8.97 | 236.885 | GRU | 2.39 | 1.79 | 325.11 | ||
| XGBoost | XGBoost | ||||||||
| SVM | 1.37 | 7.71 | 3.498 | SVM | 2.34 | 2.61 | 1.925 | ||
| AAPL | LSTM | 2.61 | 10.37 | 215.11 | CLX | LSTM | 0.81 | 2.76 | 218.7 |
| GRU | 2.52 | 10.38 | 238.48 | GRU | 2.72 | 2.76 | 237.30 | ||
| XGBoost | XGBoost | ||||||||
| SVM | 2.57 | 6.25 | 2.179 | SVM | 1.04 | 2.56 | 1.302 |
Best tuning values and the respective performance metrics are shown in bold
Average performance comparison
| Proposed framework | Comparison frameworks | ||
|---|---|---|---|
| Algorithm | SR | Algorithm | SR |
| LSTM | 1.672 | 2D-CNNpred [ | 2.257 |
| GRU | 1.675 | 3D-CNNpred [ | 2.243 |
| XGBoost | DPGRGT [ | 0.642 | |
| SVM | 2.007 | L&Mc (News) [ | 1.235 |
| L&Mc (News and Price) [ | 0.756 | ||
Sharpe ratio for all studies is reported in annualized rate
Fig. 10RoR diagrams to show how much return is produced within each learning model. The horizontal axis represents date and the vertical one is the percentage of RoR. The charts show the RoR from January 2020 to November 2021
Fig. 11Draw-Down comparison between classification algorithms used within the proposed framework. XGBoost has smaller DDs during the back-testing period