| Literature DB >> 33286613 |
M Nabipour1, P Nayyeri2, H Jabani3, A Mosavi4,5, E Salwana6, Shahab S7.
Abstract
The prediction of stock groups values has always been attractive and challenging for shareholders due to its inherent dynamics, non-linearity, and complex nature. This paper concentrates on the future prediction of stock market groups. Four groups named diversified financials, petroleum, non-metallic minerals, and basic metals from Tehran stock exchange were chosen for experimental evaluations. Data were collected for the groups based on 10 years of historical records. The value predictions are created for 1, 2, 5, 10, 15, 20, and 30 days in advance. Various machine learning algorithms were utilized for prediction of future values of stock market groups. We employed decision tree, bagging, random forest, adaptive boosting (Adaboost), gradient boosting, and eXtreme gradient boosting (XGBoost), and artificial neural networks (ANN), recurrent neural network (RNN) and long short-term memory (LSTM). Ten technical indicators were selected as the inputs into each of the prediction models. Finally, the results of the predictions were presented for each technique based on four metrics. Among all algorithms used in this paper, LSTM shows more accurate results with the highest model fitting ability. In addition, for tree-based models, there is often an intense competition between Adaboost, Gradient Boosting, and XGBoost.Entities:
Keywords: LSTM; business intelligence; deep learning; economics; finance; financial forecast; information economics; information science; long short-term memory; machine learning; regression analysis; stock market; stock market prediction; tree-based methods
Year: 2020 PMID: 33286613 PMCID: PMC7517440 DOI: 10.3390/e22080840
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The evolution of tree-based methods.
Figure 2Schematic illustration of decision tree
Figure 3Schematic illustration of the random forest.
Figure 4Schematic illustration of artificial neural networks
Figure 5An illustration of relationship between inputs and output for ANN.
Figure 6An illustration of recurrent network.
Selected technical indicators (n is 10 here).
| Simple n-day moving average = |
| Weighted 14-day moving average = |
| Momentum = |
| Stochastic K% = |
| Stochastic D% = |
| Relative strength index (RSI) = 100 − |
| Signal(n)t = MACDt × |
| Larry William’s R% = |
| Accumulation/Distribution (A/D) oscillator: |
| CCI (Commodity channel index) = |
| While: |
| n is number of days |
| Ct is the closing price at time t |
| Lt and Ht is the low price and high price at time t, respectively |
| UPt and DWt means upward price change and downward price change at time t, respectively |
| EMA(K)t = EMA(K)t−1 × (1 − |
| Moving average convergence divergence (MACDt) = EMA(12)t − EMA(26)t |
| Mt = |
| SMt = |
| Dt = |
Summary statistics of indicators.
| Feature | Max | Min | Mean | Standard Deviation |
|---|---|---|---|---|
| Diversified Financials | ||||
| SMA | 6969.46 | 227.5 | 1471.201 | 1196.926 |
| WMA | 3672.226 | 119.1419 | 772.5263 | 630.0753 |
| MOM | 970.8 | −1017.8 | 21.77033 | 126.5205 |
| STCK | 99.93224 | 0.159245 | 53.38083 | 19.18339 |
| STCD | 96.9948 | 14.31843 | 53.34332 | 15.28929 |
| RSI | 68.96463 | 27.21497 | 50.18898 | 6.471652 |
| SIG | 310.5154 | −58.4724 | 16.64652 | 51.62368 |
| LWR | 99.84076 | 0.06776 | 46.61917 | 19.18339 |
| ADO | 0.99986 | 0.000682 | 0.504808 | 0.238426 |
| CCI | 270.5349 | −265.544 | 14.68813 | 101.8721 |
| Basic Metals | ||||
| SMA | 322,111.5 | 7976.93 | 69,284.11 | 60,220.95 |
| WMA | 169,013.9 | 4179.439 | 36,381.48 | 31,677.51 |
| MOM | 39,393.8 | −20,653.8 | 1030.265 | 4457.872 |
| STCK | 98.47765 | 1.028891 | 54.64576 | 16.41241 |
| STCD | 90.93235 | 12.94656 | 54.64294 | 13.25043 |
| RSI | 72.18141 | 27.34428 | 49.8294 | 6.113667 |
| SIG | 12,417.1 | −4019.14 | 803.5174 | 2155.701 |
| LWR | 98.97111 | 1.522349 | 45.36526 | 16.43646 |
| ADO | 0.999141 | 0.00097 | 0.498722 | 0.234644 |
| CCI | 264.6937 | −242.589 | 23.4683 | 99.14922 |
| Non-metallic Minerals | ||||
| SMA | 15,393.62 | 134.15 | 1872.483 | 2410.316 |
| WMA | 8081.05 | 69.72762 | 985.1065 | 1272.247 |
| MOM | 1726.5 | −2998.3 | 49.21097 | 264.0393 |
| STCK | 100.00 | 0.154268 | 54.71477 | 20.2825 |
| STCD | 96.7883 | 13.15626 | 54.68918 | 16.37712 |
| RSI | 70.89401 | 24.07408 | 49.67247 | 6.449379 |
| SIG | 848.558 | −127.47 | 37.36441 | 123.9744 |
| LWR | 99.84573 | −2.66648 | 45.28523 | 20.2825 |
| ADO | 0.998941 | 0.00036 | 0.501229 | 0.238008 |
| CCI | 296.651 | −253.214 | 20.06145 | 101.9735 |
| Petroleum | ||||
| SMA | 1,349,138 | 16,056.48 | 243,334.2 | 262,509.8 |
| WMA | 707,796.4 | 8580.536 | 127,839.1 | 138,101 |
| MOM | 227,794 | −136,467 | 4352.208 | 26,797.25 |
| STCK | 100.00 | 0.253489 | 53.78946 | 22.0595 |
| STCD | 95.93565 | 2.539517 | 53.83312 | 17.46646 |
| RSI | 75.05218 | 23.26627 | 50.02778 | 6.838486 |
| SIG | 71830.91 | −33132 | 3411.408 | 11,537.98 |
| LWR | 99.74651 | −1.8345 | 46.23697 | 22.02162 |
| ADO | 0.999933 | 0.000288 | 0.498381 | 0.239229 |
| CCI | 286.7812 | −284.298 | 14.79592 | 101.8417 |
Tree-based models parameters.
| Model | Parameters | Value(s) |
|---|---|---|
| Decision Tree | Number of Trees (ntrees) | 1 |
| Bagging | Number of Trees (ntrees) | 50, 100, 150, 200, 250, 300, 350, 400, 450, 500 |
| Max Depth | 10 | |
| Random Forest | Number of Trees (ntrees) | 50, 100, 150, 200, 250, 300, 350, 400, 450, 500 |
| Max Depth | 10 | |
| Adaboost | Number of Trees (ntrees) | 50, 100, 150, 200, 250, 300, 350, 400, 450, 500 |
| Max Depth | 10 | |
| Learning Rate | 0.1 | |
| Gradient Boosting | Number of Trees (ntrees) | 50, 100, 150, 200, 250, 300, 350, 400, 450, 500 |
| Max Depth | 10 | |
| Learning Rate | 0.1 | |
| XGBoost | Number of Trees (ntrees) | 50, 100, 150, 200, 250, 300, 350, 400, 450, 500 |
| Max Depth | 10 | |
| Learning Rate | 0.1 |
Neural network-based models parameters.
| Model | Parameters | Value(s) |
|---|---|---|
| Artificial neural networks (ANN) | Number of Neurons | 500 |
| Activation Function | Relu | |
| Optimizer | Adam ( | |
| Learning Rate | 0.01 | |
| Epochs | 100, 200, 500, 1000 | |
| Recurrent neural network (RNN) | Number of Neurons | 500 |
| Activation Function | tanh | |
| Optimizer | Adam ( | |
| Learning Rate | 0.0001 | |
| Training Days (ndays) | 1, 2, 5, 10, 20, 30 | |
| Epochs (w.r.t. ndays) | 100, 200, 300, 500, 800, 1000 | |
| Long short-term memory (LSTM) | Number of Neurons | 200 |
| Activation Function | tanh | |
| Optimizer | Adam ( | |
| Learning Rate | 0.0005 | |
| Training Days (ndays) | 1, 2, 5, 10, 20, 30 | |
| Epochs (w.r.t. ndays) | 50, 50, 70, 100, 200, 300 |
Diversified financials one day ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 1.29 | 23.05 | 0.0235 | 4948.07 |
| Bagging | 400 | 0.92 | 15.80 | 0.0142 | 1403.24 |
| Random Forest | 300 | 0.92 | 15.51 | 0.0141 | 1290.91 |
| Adaboost | 250 | 0.91 | 15.09 | 0.0132 | 912.51 |
| Gradient Boosting | 300 | 1.02 | 19.19 | 0.0203 | 4312.09 |
|
|
|
|
|
|
|
| epochs | |||||
| ANN | 1000 | 1.01 | 16.07 | 0.0146 | 1107.02 |
| ndays | |||||
| RNN | 1 | 1.59 | 14.70 | 0.0242 | 362.26 |
|
|
|
|
|
|
|
Diversified financials two days ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 1.52 | 25.93 | 0.0250 | 2893.88 |
|
|
|
|
|
|
|
| Random Forest | 500 | 1.12 | 18.39 | 0.0171 | 1322.25 |
| Adaboost | 400 | 1.11 | 19.56 | 0.0164 | 1687.85 |
| Gradient Boosting | 300 | 1.14 | 19.51 | 0.0199 | 1781.31 |
| XGBoost | 150 | 1.14 | 19.81 | 0.0162 | 1724.65 |
| epochs | |||||
| ANN | 1000 | 1.41 | 23.35 | 0.0208 | 2614.08 |
| ndays | |||||
| RNN | 10 | 1.66 | 14.75 | 0.0243 | 423.14 |
|
|
|
|
|
|
|
Diversified financials five days ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 1.66 | 28.94 | 0.0298 | 4715.42 |
| Bagging | 150 | 1.45 | 24.00 | 0.0215 | 2146.32 |
| Random Forest | 500 | 1.47 | 24.46 | 0.0216 | 2317.71 |
| Adaboost | 400 | 1.39 | 23.91 | 0.0198 | 2494.78 |
|
|
|
|
|
|
|
| XGBoost | 300 | 1.45 | 24.12 | 0.0202 | 2056.23 |
| epochs | |||||
| ANN | 1000 | 2.27 | 39.69 | 0.0322 | 7156.56 |
| ndays | |||||
| RNN | 10 | 1.77 | 15.21 | 0.0263 | 468.32 |
|
|
|
|
|
|
|
Diversified financials 10 days ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 2.09 | 34.00 | 0.0382 | 5129.32 |
| Bagging | 250 | 1.88 | 31.47 | 0.0283 | 3219.30 |
| Random Forest | 300 | 1.86 | 31.36 | 0.0279 | 3246.80 |
|
|
|
|
|
|
|
| Gradient Boosting | 500 | 1.74 | 28.00 | 0.0322 | 3356.01 |
| XGBoost | 500 | 1.77 | 31.07 | 0.0257 | 3600.53 |
| epochs | |||||
| ANN | 1000 | 4.12 | 65.38 | 0.0556 | 18,866.04 |
| ndays | |||||
| RNN | 5 | 1.91 | 16.98 | 0.0280 | 528.71 |
|
|
|
|
|
|
|
Diversified financials 15 days ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 2.28 | 41.29 | 0.0451 | 11,051.93 |
| Bagging | 100 | 2.24 | 37.61 | 0.0349 | 4997.20 |
| Random Forest | 50 | 2.24 | 37.28 | 0.0349 | 4755.32 |
|
|
|
|
|
|
|
| Gradient Boosting | 200 | 1.97 | 35.95 | 0.0390 | 8759.44 |
| XGBoost | 500 | 2.03 | 35.37 | 0.0305 | 5534.65 |
| epochs | |||||
| ANN | 1000 | 5.05 | 85.46 | 0.0696 | 29,483.87 |
| ndays | |||||
| RNN | 10 | 1.95 | 19.09 | 0.0307 | 644.50 |
|
|
|
|
|
|
|
Diversified financials 20 days ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 2.80 | 49.12 | 0.0571 | 14,227.06 |
| Bagging | 100 | 2.56 | 42.43 | 0.0388 | 5916.19 |
| Random Forest | 450 | 2.57 | 42.66 | 0.0393 | 6008.33 |
|
|
|
|
|
|
|
| Gradient Boosting | 350 | 2.17 | 39.10 | 0.0385 | 8573.37 |
| XGBoost | 500 | 2.30 | 39.30 | 0.0358 | 6406.16 |
| epochs | |||||
| ANN | 1000 | 5.66 | 126.69 | 0.0790 | 42,701.88 |
| ndays | |||||
| RNN | 20 | 1.96 | 19.47 | 0.0314 | 668.82 |
|
|
|
|
|
|
|
Diversified financials 30 days ahead.
| Prediction Models | Parameters | Error Measures | |||
|---|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | ||
| ntrees | |||||
| Decision Tree | 1 | 2.83 | 48.39 | 0.0587 | 12,924.43 |
| Bagging | 350 | 3.21 | 54.37 | 0.0467 | 8803.66 |
| Random Forest | 50 | 3.18 | 54.06 | 0.0465 | 8799.45 |
|
|
|
|
|
|
|
| Gradient Boosting | 500 | 2.54 | 43.59 | 0.0485 | 9354.03 |
| XGBoost | 400 | 2.48 | 42.85 | 0.0378 | 6306.78 |
| epochs | |||||
| ANN | 1000 | 7.48 | 126.69 | 0.0994 | 54,940.25 |
| ndays | |||||
| RNN | 20 | 2.11 | 20.20 | 0.0322 | 1355.35 |
|
|
|
|
|
|
|
Average performance for diversified financials.
| Prediction Models | Error Measures | |||
|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | |
| Decision Tree | 2.07 | 35.82 | 0.0396 | 7984.30 |
| Bagging | 1.91 | 32.00 | 0.0288 | 3973.92 |
| Random Forest | 1.91 | 31.96 | 0.0288 | 3962.97 |
|
|
|
|
|
|
| Gradient Boosting | 1.70 | 29.91 | 0.0318 | 5662.68 |
| XGBoost | 1.72 | 29.63 | 0.0255 | 3776.28 |
| ANN | 3.86 | 69.05 | 0.0530 | 22,409.96 |
| RNN | 1.85 | 17.20 | 0.0281 | 635.85 |
|
|
|
|
|
|
Average performance for Petroleum.
| Prediction Models | Error Measures | |||
|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | |
| Decision Tree | 2.70 | 7613.54 | 0.0528 | 502,831,775.59 |
| Bagging | 2.62 | 6640.41 | 0.0397 | 212,982,692.85 |
| Random Forest | 2.62 | 6649.18 | 0.0400 | 212,239,589.62 |
|
|
|
|
|
|
| Gradient Boosting | 2.26 | 6402.08 | 0.0403 | 305,274,334.62 |
| XGBoost | 2.33 | 5947.22 | 0.0363 | 175,385,973.35 |
| ANN | 5.52 | 14,045.78 | 0.0753 | 1,123,371,989.92 |
| RNN | 3.40 | 4097.20 | 0.0596 | 57,606,535.91 |
|
|
|
|
|
|
Average performance for Non-metallic minerals.
| Prediction Models | Error Measures | |||
|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | |
| Decision Tree | 2.18 | 52.75 | 0.0456 | 22,287.11 |
| Bagging | 2.12 | 47.88 | 0.0331 | 13,333.59 |
| Random Forest | 2.12 | 47.89 | 0.0331 | 13,045.77 |
|
| 1.84 |
|
| 11,798.23 |
|
|
| 43.26 | 0.0339 | 15,155.18 |
|
| 1.86 | 42.15 | 0.0312 |
|
| ANN | 4.67 | 100.28 | 0.0662 | 98,705.75 |
| RNN | 5.23 | 44.18 | 0.0875 | 9227.55 |
|
|
|
|
|
|
Average performance for Metals.
| Prediction Models | Error Measures | |||
|---|---|---|---|---|
| MAPE | MAE | rRMSE | MSE | |
| Decision Tree | 1.41 | 1159.46 | 0.0274 | 11,082,872.18 |
| Bagging | 1.36 | 1046.64 | 0.0207 | 5,314,782.99 |
| Random Forest | 1.36 | 1043.30 | 0.0207 | 5,192,173.88 |
|
| 1.18 |
|
|
|
|
|
| 960.52 | 0.0212 | 7,029,319.85 |
|
| 1.21 | 963.42 |
| 4,619,506.50 |
| ANN | 3.17 | 2441.71 | 0.0420 | 31,250,640.68 |
| RNN | 1.48 | 663.45 | 0.0238 | 1,434,974.44 |
|
|
|
|
|
|
Average runtime per sample for all models.
| Tree-Based | ||||||
|---|---|---|---|---|---|---|
| Models | Decision Tree | Bagging | Random Forest | Adaboost | Gradient Boosting | XGBoost |
| Average runtime per sample (ms) | 0.009 | 1.399 | 1.316 | 1.308 | 1.483 | 2.373 |
| ANNs-based | ||||||
| Models | ANN | RNN | LSTM | |||
| Average runtime per sample (ms) | 20.088 | 20.630 | 80.902 | |||
Figure 7Performance of XGBoost for five days ahead of Diversified Financials.