| Literature DB >> 35205442 |
Pin Lv1, Qinjuan Wu1, Jia Xu1, Yating Shu1.
Abstract
The stock index is an important indicator to measure stock market fluctuation, with a guiding role for investors' decision-making, thus being the object of much research. However, the stock market is affected by uncertainty and volatility, making accurate prediction a challenging task. We propose a new stock index forecasting model based on time series decomposition and a hybrid model. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) decomposes the stock index into a series of Intrinsic Mode Functions (IMFs) with different feature scales and trend term. The Augmented Dickey Fuller (ADF) method judges the stability of each IMFs and trend term. The Autoregressive Moving Average (ARMA) model is used on stationary time series, and a Long Short-Term Memory (LSTM) model extracts abstract features of unstable time series. The predicted results of each time sequence are reconstructed to obtain the final predicted value. Experiments are conducted on four stock index time series, and the results show that the prediction of the proposed model is closer to the real value than that of seven reference models, and has a good quantitative investment reference value.Entities:
Keywords: ADF; ARMA; CEEMDAN; LSTM; hybrid model; stock index forecasting
Year: 2022 PMID: 35205442 PMCID: PMC8871263 DOI: 10.3390/e24020146
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Stock market index forecasting model.
Descriptive statistics of closing indices.
| Index | Count | Mean | Max | Min | Standard Deviation | ADF Test |
|---|---|---|---|---|---|---|
| DAX | 3300 | 9118.21 | 13,789.00 | 3666.41 | 2722.52 | 0.79 |
| HSI | 3219 | 23,206.70 | 33,154.12 | 11,015.84 | 3660.60 | 0.11 |
| S&P500 | 3273 | 1915.40 | 3702.25 | 676.53 | 713.03 | 0.99 |
| SSE | 3163 | 2846.43 | 5497.90 | 1706.70 | 586.51 | 0.01 |
Figure 2Daily closing index series of four financial markets. (a) DAX. (b) HSI. (c) S&P500. (d) SSE.
Figure 3LSTM network architecture.
Details of the parameters of the CAL model.
| Parameter | Meaning | Value |
|---|---|---|
| Input layer | Number of input layer nodes | 128 |
| Hidden layer 1 | Number of first hidden layer nodes | 64 |
| Hidden layer 2 | Number of second hidden layer nodes | 16 |
| Output layer | Number of output layer nodes | 1 |
| Batch size | Pass through to the network at one time | 128 |
| Optimization algorithm | Select the training mode | Adam |
| Loss function | With the goal of minimizing the loss | MSE |
| Epochs | Number of training | 200 |
| Timesteps | Input time steps | 10 |
Figure 4SSE decomposition results.
Contrastive experiments.
| Model | Comparison Purpose of Model Settings |
|---|---|
| LSTM | Comparison to single deep learning model |
| ARIMA | Comparison to single linear model |
| GRU | Comparison to other single non-linear model |
| Bi-LSTM | Comparison to improved deep learning model |
| EMD-ARMA-LSTM | Evaluation of CEEMDAN and EMD |
| ARIAM-ANN | Comparison of CAL to hybrid models [ |
| CEEMDAN-LSTM | Comparison of CAL to stock forecasting model [ |
Figure 5SSE comparison of sequence prediction results.
Figure 6SSE error changes between real and predicted values.
Prediction results of different models in DAX.
| Model | MAE | RMSE | MAPE (%) | R |
|---|---|---|---|---|
| LSTM | 167.0816 | 224.5003 | 1.4006 | 0.9570 |
| ARIMA | 136.0422 | 206.5253 | 1.1633 | 0.9650 |
| GRU | 153.5215 | 216.7465 | 1.2982 | 0.9608 |
| Bi-LSTM LSTM | 138.0041 | 209.2315 | 1.1768 | 0.9641 |
| ARIMA-ANN | 140.4099 | 211.9800 | 1.1966 | 0.9630 |
| CEEMDAN-LSTM | 97.2277 | 128.2331 | 0.8106 | 0.9866 |
| EMD-ARMA-LSTM | 127.1255 | 191.0622 | 1.0771 | 0.9687 |
| CAL | 72.3340 | 101.8321 | 0.6099 | 0.9915 |
Prediction results of different models in HSI.
| Model | MAE | RMSE | MAPE (%) | R |
|---|---|---|---|---|
| LSTM | 257.7703 | 347.1944 | 1.0197 | 0.9454 |
| ARIMA | 250.9188 | 345.3399 | 0.995 | 0.9470 |
| GRU | 256.1635 | 345.9382 | 1.0134 | 0.9451 |
| Bi-LSTM | 258.2292 | 353.4523 | 1.0249 | 0.9450 |
| ARIMA-ANN | 249.1046 | 344.5775 | 0.9882 | 0.9469 |
| CEEMDAN-LSTM | 127.0750 | 168.3214 | 0.5023 | 0.9879 |
| EMD-ARMA-LSTM | 181.7516 | 235.1773 | 0.7187 | 0.9751 |
| CAL | 120.8184 | 159.8226 | 0.4789 | 0.9885 |
Prediction results of different models in S&P500.
| Model | MAE | RMSE | MAPE (%) | R |
|---|---|---|---|---|
| LSTM | 33.4958 | 53.4345 | 1.1207 | 0.9595 |
| ARIMA | 34.1031 | 54.8336 | 1.1411 | 0.9598 |
| GRU | 43.3137 | 63.2251 | 1.4416 | 0.9469 |
| Bi-LSTM | 33.5198 | 53.4177 | 1.1262 | 0.9610 |
| ARIMA-ANN | 33.7170 | 53.6489 | 1.125 | 0.9608 |
| CEEMDAN-LSTM | 21.1496 | 30.1187 | 0.6964 | 0.9878 |
| EMD-ARMA-LSTM | 22.1886 | 33.4485 | 0.7334 | 0.9843 |
| CAL | 17.1362 | 26.1373 | 0.5645 | 0.9910 |
Prediction results of different models in SSE.
| Model | MAE | RMSE | MAPE (%) | R |
|---|---|---|---|---|
| LSTM | 38.3486 | 47.9563 | 1.2468 | 0.9475 |
| ARIMA | 25.1019 | 36.9815 | 0.819 | 0.9690 |
| GRU | 31.8217 | 43.1568 | 1.0355 | 0.9599 |
| Bi-LSTM | 31.8026 | 42.7439 | 1.0382 | 0.9596 |
| ARIMA-ANN | 25.6976 | 37.4014 | 0.8383 | 0.9686 |
| CEEMDAN-LSTM | 14.3562 | 19.6741 | 0.4681 | 0.9913 |
| EMD-ARMA-LSTM | 19.5074 | 28.5532 | 0.6382 | 0.9814 |
| CAL | 14.0294 | 19.9246 | 0.459 | 0.9911 |
The regression parameters and diagnostics results.
| Model | Parameter | Estimation | SE |
|
|
|---|---|---|---|---|---|
| DAX |
| 0.9909 | 0.005 | 196.519 | 0.000 |
|
| 104.2845 | 62.836 | 1.660 | 0.098 | |
| HSI |
| 1.0012 | 0.006 | 167.616 | 0.000 |
|
| −12.2083 | 153.286 | −0.080 | 0.937 | |
| S&P500 |
| 0.9844 | 0.005 | 192.819 | 0.000 |
|
| 44.7296 | 16.195 | 2.762 | 0.006 | |
| SSE |
| 0.9913 | 0.005 | 187.342 | 0.000 |
|
| 28.5226 | 16.263 | 1.754 | 0.080 |
Figure 7Linear regression analysis. (a) DAX. (b) HSI. (c) S&P500. (d) SSE.