| Literature DB >> 36157277 |
Kais Tissaoui1,2, Taha Zaghdoudi1,3, Abdelaziz Hakimi4, Mariem Nsaibi1.
Abstract
This study examines the forecasting power of the gas price and uncertainty indices for crude oil prices. The complex characteristics of crude oil price such as a non-linear structure, time-varying, and non-stationarity motivate us to use a newly proposed approach of machine learning tools called XGBoost Modelling. This intelligent tool is applied against the SVM and ARIMAX (p,d,q) models to assess the complex relationships between crude oil prices and their forecasters. Empirical evidence shows that machine learning models, such as the SVM and XGBoost models, dominate traditional models, such as ARIMAX, to provide accurate forecasts of crude oil prices. Performance assessment reveals that the XGBoost model displays superior prediction capacity over the SVM model in terms of accuracy and convergence. The superior performance of XGBoost is due to its lower complexity and costs, high accuracy, and rapid processing times. The feature importance analysis conducted by the Shapley additive explanation method (SHAP) highlights that the different uncertainty indexes and the gas price display a significant ability to forecast future WTI crude prices. Additionally, the SHAP values suggest that the oil implied volatility captures valuable forecasting information of gas prices and other uncertainty indices that affect the WTI crude oil price.Entities:
Keywords: Complex relationship; Crude oil price; Gas price; Shapley additive explanation method; Uncertainty indexes; eXtreme Gradient Boosting
Year: 2022 PMID: 36157277 PMCID: PMC9483467 DOI: 10.1007/s10614-022-10305-y
Source DB: PubMed Journal: Comput Econ ISSN: 0927-7099 Impact factor: 1.741
Contemporary studies on crude oil price prediction
| References | Sample period | Method (s) | Main results |
|---|---|---|---|
| Huang and Deng ( | January 1994–July 2018: daily and monthly frequency | Variational mode decomposition (VMD); Long short-term memory (LSTM) network; ARMIA model; Genetic algorithm optimized SVM | The prediction results show the superiority of the variational mode decomposition (VMD), the long term memory network (LSTM) over the ARMIA model, and the SVM optimised by the genetic algorithm |
| Abdollahi ( | June 2015–4 April 2016: daily frequency | Hybrid model; Complete ensemble empirical mode decomposition; Support vector machine; Particle swarm optimization; Markov-switching generalized autoregressive conditional heteroskedasticity | The results show the predominance of the proposed hybrid models over their counterparts in terms of good fit |
| Hao et al. ( | January 1986–May 2018: monthly frequency | Regression models with regularization constraints | Findings show that the proposed models generate accurate forecasting for the crude oil price |
| Rubaszek ( | January 1984–March 2018: quarterly frequency | Dynamic stochastic general equilibrium; vector autoregression; random walk models | The forecasting results obtained by the dynamic stochastic general equilibrium show more accurate forecasts than those found by the vector auto-regression and random walk models |
| Liu et al ( | January 2004–December 2018: monthly frequency | Hierarchical shrinkage model; Autoregression models; Multivariate models | The performance measures show a dominance of the hierarchical shrinkage model in terms of accuracy predictions over the competing models |
| Wu et al ( | January 1986–February 2018: daily frequency | ICEEMDAN-SCA-RVFL(Improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN), sine cosine algorithm (SCA), and random vector functional link (RVFL) neural network); Back propagation neural network; ARIMA; Ensemble empirical mode decomposition | The results indicate that the forecast accuracy is more pronounced using ICEEMDAN-SCA-RVFL than using the back-propagation neural network, ARIMA and the ensemble empirical mode decomposition |
| Ben Jabeur et al., ( | January 2010–April 2020: daily frequency | LightGBM, CatBoost, XGBoost, Random Forest (RF), and neural network models | The main results confirm the superiority of RF and LightGBM over traditional models |
| Li et al. ( | January 2010–December 2019: daily frequency | Variational mode decomposition (VMD) and random sparse Bayesian learning (RSBL, SBL-based prediction with random lags and random samples) | The proposed VMD-RSBL system is significantly more efficient than many state-of-the-art systems |
| Ahmad et al. ( | February 1989–October 2019: daily frequency | Median ensemble empirical mode decomposition and Group method of Data Handling | The performance measures show that the new hybrid model (Median ensemble empirical mode decomposition and Group method of Data Handling) is superior to traditional models such as empirical mode decomposition, artificial neural network and ARIMA models |
Descriptive statistics
| Mean | Median | Max | Min | SD | Skewness | Kurtosis | Jarque–Bera | Observations | |
|---|---|---|---|---|---|---|---|---|---|
| WTI | 1.8269 | 1.8387 | 2.16229 | 0.94987 | 0.1555 | − 0.5600 | 3.510 | 226.2934 | 3584 |
| GP | 0.5412 | 0.5145 | 1.37767 | 0.12385 | 0.1767 | 0.79062 | 3.818 | 473.3466 | 3584 |
| OVX | 1.5481 | 1.5320 | 2.51208 | 1.16136 | 0.1591 | 0.83223 | 5.531 | 1370.591 | 3584 |
| VIX | 1.2683 | 1.2422 | 1.91745 | 0.96094 | 0.1670 | 0.88404 | 3.821 | 567.6611 | 3584 |
| EPU | 1.9978 | 1.9880 | 2.90722 | 0.52113 | 0.2720 | 0.03562 | 3.479 | 35.14986 | 3584 |
Table reports the descriptive statistics including the mean, median, standard deviation (SD), skewness, kurtosis, minimum (Min.), maximum (Max.) Jarque–Bera (JB) and number of observations (Obs.) of the daily innovations of Gas price and uncertainty indexes. The sample period extends from May 10, 2007 to August 09, 2021
Fig. 1Correlation matrix
Variance inflation factor (VIF)
| Variables | VIF |
|---|---|
| lovx | 1.492684 |
| lvix | 1.019923 |
| lepu | 1.012898 |
| lgp | 1.570897 |
Fig. 2Autocorrelation functions results of time-series. a Autocorrelation of WTI. b Autocorrelation of OVX. c Autocorrelation of VIX. d Autocorrelation of EPU. e Autocorrelation of gas
Fig. 3Autocorrelation functions of one-differenced time-series. a Autocorrelation of WTI. b Autocorrelation of OVX. c Autocorrelation of VIX. d Autocorrelation of EPU. e Autocorrelation of gas
Fig. 4Plot of crude oil prices forecast
Prediction assessment of the candidate models
| Models | RMSE | MAE |
|---|---|---|
| ARIMAX(p,d,q) | 0.2066 | 0.1599 |
| SVM | 0.1001 | 0.0768 |
| XGboost | 0.0581 | 0.0392 |
Fig. 5Residual convergence
Fig. 6SVM model: feature importance
Fig. 7XGBoost model: feature importance
Fig. 8SVM model: RMSE loss after permutations
Fig. 9XGBoost model: RMSE loss after permutations