| Literature DB >> 35755829 |
Weixin Sun1, Heli Chen1, Feng Liu2, Yong Wang1.
Abstract
Crude oil is the most important energy source in the world, and fluctuations in oil prices can significantly influence investors, companies, and governments. However, crude oil prices have numerous characteristics, including randomness, sudden structural changes, intrinsic nonlinearity, volatility, and chaotic nature. This makes the accurate forecasting of crude oil prices a difficult and challenging task. In this paper, a hybrid prediction model for crude oil futures prices is proposed, the accuracy and robustness of which are demonstrated via controlled experiments and sensitivity analysis. This study uses a new data denoising method for data processing to improve the accuracy and stability of the predictions of crude oil prices. Furthermore, the chaotic time-series prediction method, shallow neural networks, linear model prediction methods, and deep learning methods are adopted as submodels. The results of interval forecasts with narrow widths and high prediction accuracies are derived by introducing a confidence interval adjustment coefficient. The results of the simulation experiments indicate that the proposed hybrid prediction model exhibits higher accuracy and efficiency, as well as better robustness of the forecasting than the control models. In summary, the proposed forecasting framework can derive accurate point and interval forecasts and provide a valuable reference for the price forecasting of crude oil futures.Entities:
Keywords: Chaotic time-series prediction method; Crude oil futures price; Hybrid prediction model; Interval forecasts; Multiobjective slime mold algorithm
Year: 2022 PMID: 35755829 PMCID: PMC9211054 DOI: 10.1007/s10479-022-04781-6
Source DB: PubMed Journal: Ann Oper Res ISSN: 0254-5330 Impact factor: 4.820
Summary of available studies
| Type | Subtype | Model | References | Advantages | Disadvantages |
|---|---|---|---|---|---|
| Single model | Linear prediction model | VAR | Mirmirani and Cheng Li ( | These models have low complexity, fast computational speed, and effective linear time series prediction ability | The nonlinear time series prediction ability of these models is poor |
| ARIMA | He et al. ( | ||||
| GARCH | Agnolucci ( | ||||
| Shallow neural networks | MLP | Guotai et al. ( | These models have an excellent performance to fit simple functions and fast running speed | The ability of these models to fit complex functions is limited | |
| BPNN | Mingming and Jinliang ( | ||||
| SVM | Guo et al. ( | ||||
| ELM | Wang, Du, et al. ( | ||||
| Deep learning models | DBN | Zhang and Ci ( | Deep learning can approximate complex functions and is highly adaptive | These models require a lot of computation and a long-running time | |
| LSTM | Zhang et al. ( | ||||
| BiLSTM | Li et al. ( | ||||
| Hybrid model | Single objective | GA | Yang et al. ( | Fast convergence and global optimization search | Only one objective can be achieved at a time |
| PSO | Ribeiro et al. ( | ||||
| ALO | Reddy et al. (2018) | ||||
| FLA | He et al. ( | ||||
| WOA | Lin and Zhang ( | ||||
| Multiple objectives | MOALO | Wang, Du, et al. ( | Multiple objectives can be achieved simultaneously | A good balance of convergence and diversity has not yet been achieved | |
| MOWOA | Wang et al. ( | ||||
| MOSMA | Li, Chen, et al. ( |
Fig. 1Framework of the proposed TVF_EMD_MOSMA
Detailed description of the studied data (Data from https://cn.investing.com/com-modities/crude-oil-historical-data)
| Type | Dataset | Date | Number | Statistics ($) | |||||
|---|---|---|---|---|---|---|---|---|---|
| Maximum | Minimum | Mean | Std | Kurtosis | Skewness | ||||
| WTI | All samples | 2010/01/04–2021/09/30 | 3053 | 113.9300 | − 37.6300 | 68.8658 | 22.4751 | 1.9931 | 0.1061 |
| Training samples | 2010/01/04–2020/09/30 | 2790 | 113.9300 | − 37.6300 | 69.7469 | 23.0555 | 1.9098 | 0.0329 | |
| Testing samples | 2020/10/01–2021/09/30 | 263 | 76.2500 | 35.7900 | 59.5181 | 11.3902 | 2.0058 | − 0.5233 | |
| Brent | All samples | 2010/01/04–2021/09/30 | 3034 | 126.6500 | 19.3300 | 76.1399 | 26.1419 | 1.7798 | 0.2194 |
| Training samples | 2010/01/04–2020/09/30 | 2776 | 126.6500 | 19.3300 | 77.4311 | 26.7414 | 1.6886 | 0.1232 | |
| Testing samples | 2020/10/01–2021/09/30 | 258 | 79.5300 | 37.4600 | 62.2462 | 11.4875 | 2.0842 | − 0.6089 | |
Description of the metrics used by the model
| Type | Metric | Full name | Calculation formula |
|---|---|---|---|
| PF | MAPE | Mean absolute percentage error | |
| RMSE | Root mean square error | ||
| MAE | Mean absolute error | ||
| MdAPE | Median absolute percentage error | ||
| IF | FICP | Prediction interval coverage probability | |
| AIS | Average interval score | ||
| FINAW | Prediction interval normalized average width |
and denote the actual and predicted COFPs, respectively; ; and and are the lower and upper bounds of the confidence interval at the confidence level (). In addition, MAPE and MdAPE are used to reflect the predictive accuracy of the model, and the smaller the value, the higher the accuracy; RMSE and MAE are used to reflect the predictive robustness of the model, and the smaller the value, the higher the robustness; FICP is used to reflect the accuracy of the IF, and the larger the value, the higher the accuracy; FINAW is used to measure the width of the prediction interval, and the smaller the value, the higher the accuracy; and AIS is a composite indicator that reflects the accuracy and stability of the interval forecast, and the larger the value, the higher the accuracy
Parameter settings of the model and optimization algorithm
| Method | Parameter | Symbol | Value | Reason |
|---|---|---|---|---|
| TVF_EMD | Bandwidth threshold | 0.08 | Common value | |
| Max number of IMFs | 50 | Preset | ||
| B-spline order | 26 | Common value | ||
| BPNN | Hidden layer nodes number | 10,15,10 | Trial–error manner | |
| Iteration number | 10,000 | Trial–error manner | ||
| Learning rate | 0.1 | Trial–error manner | ||
| Training requirements precision | 0.0001 | Preset | ||
| LSTM | Hidden layer nodes number | 50 | Trial–error manner | |
| Iteration number | 1000 | Trial–error manner | ||
| Learning rate | 0.001 | Trial–error manner | ||
| Volterra | Time delay | 21(WTI), 14(Brent) | The MI method | |
| Embedding dimension | 10(WTI), 11(Brent) | The FNN method | ||
| Order | 3 | Common value | ||
| ELM | Iteration number | 10,000 | Preset | |
| Hidden layer nodes number | 5 | Trial–error manner | ||
| BiLSTM | Variable learning rate | 0.001(0.5) | Trial–error manner | |
| Iteration number | 3000 | Trial–error manner | ||
| Hidden layer nodes number | 100 | Trial–error manner | ||
| FLA, SMA | Iteration number | 100 | Trial–error manner | |
| Population size | 200 | Trial–error manner | ||
| MOALO, MOWOA, MOSMA | Population size | 200 | Trial–error manner | |
| Archive size | 100 | Trial–error manner | ||
| Iteration number | 100 | Trial–error manner |
Fig. 2Results of Experiment I
PF performance of TVF_EMD_MOSMA and benchmark models
| Model | MAPE | MdAPE | MAE | RMSE | ||||
|---|---|---|---|---|---|---|---|---|
| WCOFP | BCOFP | WCOFP | BCOFP | WCOFP | BCOFP | WCOFP | BCOFP | |
| Volterra | 1.9240 | 2.0442 | 1.7847 | 1.7114 | 1.3266 | 1.4964 | 1.5559 | 1.8056 |
| BPNN | 1.6181 | 1.5095 | 1.3078 | 1.1053 | 0.9470 | 0.9187 | 1.2434 | 1.2259 |
| LSTM | 2.2360 | 2.6364 | 1.6559 | 1.8858 | 1.2266 | 1.4416 | 1.6239 | 2.1953 |
| BiLSTM | 1.9095 | 1.5275 | 1.6279 | 1.1342 | 1.1372 | 0.9336 | 1.4324 | 1.2803 |
| ARIMA | 2.1412 | 1.5358 | 1.6513 | 1.2029 | 1.2295 | 0.9259 | 1.5927 | 1.2467 |
| ELM | 1.6058 | 1.5823 | 1.3558 | 1.2901 | 0.9392 | 0.9756 | 1.2365 | 1.2750 |
| TVF_EMD_Volterra | 1.2246 | 1.2122 | 0.8981 | 0.8980 | 0.8654 | 0.8565 | 1.1264 | 1.1004 |
| TVF_EMD_BiLSTM | 1.2033 | 1.1705 | 0.9863 | 0.8619 | 0.7063 | 0.7225 | 0.9160 | 0.9825 |
| TVF_EMD_ARIMA | 1.0259 | 0.9916 | 0.7460 | 0.7342 | 0.5951 | 0.6039 | 0.8046 | 0.8117 |
| TVF_EMD_ELM | 0.9317 | 1.0481 | 0.7231 | 0.9022 | 0.5431 | 0.6413 | 0.7114 | 0.8209 |
Bold is used in Table to highlight the results of the “TVF_EMD_MOSMA” model proposed in this paper
The PF performance of TVF_EMD_MOSMA and the control models
| Model | MAPE | MdAPE | MAE | RMSE | ||||
|---|---|---|---|---|---|---|---|---|
| WCOFP | BCOFP | WCOFP | BCOFP | WCOFP | BCOFP | WCOFP | BCOFP | |
| EMD_MOSMA | 1.0895 | 1.4658 | 0.7828 | 1.1880 | 0.7730 | 1.0298 | 0.9996 | 1.2920 |
| REMD_MOSMA | 1.1222 | 1.1112 | 0.8565 | 0.9461 | 0.7701 | 0.7985 | 0.9789 | 1.0035 |
| TVF_EMD_FLA | 1.0441 | 1.4731 | 0.9211 | 1.0938 | 0.7351 | 1.0483 | 0.9102 | 1.5689 |
| TVF_EMD_SMA | 0.8556 | 1.2238 | 0.7289 | 0.9758 | 0.6005 | 0.8636 | 0.8229 | 1.1538 |
| TVF_EMD_MOALO | 0.8476 | 0.8401 | 0.7517 | 0.6798 | 0.5952 | 0.5958 | 0.7435 | 0.7526 |
| TVF_EMD_MOWOA | 1.0890 | 1.4657 | 0.7823 | 1.1819 | 0.7728 | 1.0295 | 0.9990 | 1.2919 |
Bold is used in Table to highlight the results of the “TVF_EMD_MOSMA” model proposed in this paper
Fig. 3Results of experiment II
Results of fitting the distribution functions of WCOFP_E and BCOFP_E
| Data | Metric | Gumbel | GEV | Gamma |
|---|---|---|---|---|
| WCOFP_E | RMSE | 0.2910 | 0.7220 | 0.2510 |
| 0.9740 | 0.9250 | 0.9770 | ||
| BCOFP_E | RMSE | 0.3140 | 1.1060 | 0.1240 |
| 0.9630 | 0.8430 | 0.9870 |
IF performance of the proposed TVF_EMD_MOSMA
| Data | Confidence level | AIS | FICP | FINAW |
|---|---|---|---|---|
| BCOFP | 90 | − 0.5943 | 0.9008 | 0.1423 |
| 95 | − 0.3441 | 0.9542 | 0.1676 | |
| 99 | − 0.0944 | 0.9924 | 0.2706 | |
| WCOFP | 90 | − 0.6363 | 0.9043 | 0.1638 |
| 95 | − 0.4012 | 0.9468 | 0.1718 | |
| 99 | − 0.0868 | 0.9894 | 0.3112 |
Results of the sensitivity analysis
| Data | Pattern | ||||
|---|---|---|---|---|---|
| WCOFP | 0.0004 | 0.0025 | 0.0002 | 0.0001 | |
| 0.0004 | 0.0059 | 0.0002 | 0.0001 | ||
| BCOFP | 0.0002 | 0.0022 | 0.0001 | 0.0004 | |
| 0.0001 | 0.0008 | 0.0001 | 0.0000 |
Results of and
| Model | WCOFP | BCOFP | ||
|---|---|---|---|---|
| Volterra | 60.8226 | 60.0070 | 59.4903 | 60.8264 |
| BPNN | 53.4161 | 43.9726 | 45.1408 | 36.1938 |
| LSTM | 66.2892 | 56.7463 | 68.5897 | 59.3378 |
| BiLSTM | 60.5251 | 53.3454 | 45.7872 | 37.2133 |
| ARIMA | 64.7967 | 56.8469 | 46.0802 | 36.6911 |
| ELM | 53.0586 | 43.5107 | 47.6648 | 39.9139 |
| TVF_EMD_Volterra | 38.4473 | 38.6899 | 31.6862 | 31.5614 |
| TVF_EMD_BiLSTM | 37.3578 | 24.8855 | 29.2525 | 18.8610 |
| TVF_EMD_ARIMA | 26.5256 | 10.8402 | 16.4859 | 2.9233 |
| TVF_EMD_ELM | 19.0994 | 2.3140 | 20.9895 | 8.5942 |
| EMD_MOSMA | 30.8147 | 31.3629 | 43.5053 | 43.0763 |
| REMD_MOSMA | 32.8307 | 31.1044 | 25.4770 | 26.5874 |
| TVF_EMD_SFL | 27.8064 | 27.8241 | 43.7852 | 44.0809 |
| TVF_EMD_SMA | 11.9011 | 11.6462 | 32.3337 | 32.1214 |
| TVF_EMD_MOALO | 11.0696 | 10.8594 | 1.4284 | 1.6113 |
| TVF_EMD_MOWOA | 30.7829 | 31.3451 | 43.5014 | 43.0597 |
| Average | 39.0965 | 33.4563 | 37.5749 | 32.6658 |
Evaluation results of multiple models’ forecasting efficiency
| Model | WCOFP | BCOFP | ||
|---|---|---|---|---|
| TVF_EMD_MOALO | 0.9915 | 0.9850 | 0.9916 | 0.9851 |
| TVF_EMD_SMA | 0.9914 | 0.9832 | 0.9878 | 0.9768 |
| TVF_EMD_ELM | 0.9907 | 0.9825 | 0.9908 | 0.9827 |
| TVF_EMD_ARIMA | 0.9898 | 0.9804 | 0.9901 | 0.9812 |
| TVF_EMD_SFL | 0.9896 | 0.9820 | 0.9853 | 0.9696 |
| TVF_EMD_MOWOA | 0.9891 | 0.9803 | 0.9853 | 0.9741 |
| EMD_MOSMA | 0.9891 | 0.9802 | 0.9853 | 0.9741 |
| REMD_MOSMA | 0.9888 | 0.9799 | 0.9889 | 0.9805 |
| TVF_EMD_BiLSTM | 0.9880 | 0.9779 | 0.9883 | 0.9776 |
| TVF_EMD_Volterra | 0.9878 | 0.9776 | 0.9879 | 0.9782 |
| ELM | 0.9839 | 0.9704 | 0.9842 | 0.9710 |
| BPNN | 0.9838 | 0.9704 | 0.9849 | 0.9718 |
| BiLSTM | 0.9809 | 0.9667 | 0.9847 | 0.9704 |
| Volterra | 0.9808 | 0.9688 | 0.9796 | 0.9661 |
| ARIMA | 0.9786 | 0.9610 | 0.9846 | 0.9710 |
| LSTM | 0.9776 | 0.9559 | 0.9736 | 0.9367 |
List of abbreviations
| Abbreviation | The full name |
|---|---|
| AIS | Average interval score |
| ALO | Ant lion optimization algorithm |
| ARIMA | Autoregressive integrated moving average model |
| BiLSTM | Bidirectional long short-term memory networks |
| BCOFP | Brent crude oil futures price |
| BPNN | Back propagation neural network |
| COFP | Crude oil futures price |
| COFP_E | COFP prediction error series |
| COP | Crude oil price |
| CIAC | Confidence interval adjustment coefficient |
| DBN | Deep belief network |
| EMD | Empirical mode decomposition |
| ELM | Extreme learning machine |
| FICP | Prediction interval coverage probability |
| FLA | Frog leaping algorithm |
| FINAW | Prediction interval normalized average width |
| FNN | The false nearest neighbor method |
| GARCH | Generalized autoregressive conditional heteroskedasticity model |
| GEV | Generalized extreme value |
| IF | Interval forecasts |
| LSTM | Long short-term memory network |
| MdAPE | Median absolute percentage error |
| MAE | Mean absolute error |
| MLE | Maximum likelihood estimate |
| MAPE | Mean absolute percentage error |
| MI | The mutual information method |
| ML | Machine learning model |
| MLP | Multi-Layer Perceptron |
| MOALO | Multiobjective ant lion optimization algorithm |
| MOSMA | Multiobjective slime mold algorithm |
| MOWOA | Multiobjective whale optimization algorithm |
| PF | Point forecasts |
| PSO | Particle swarm optimization |
| RMSE | Root mean square error |
| REMD | Recursive empirical mode decomposition |
| GA | Genetic algorithm |
| SMA | Slime mold algorithm |
| SVM | Support vector machine |
| TVF_EMD | Time varying filtering for empirical mode decomposition |
| VAR | Vector autoregression model |
| WCOFP | West texas intermediate crude oil futures price |
| WOA | Whale optimization algorithm |