| Literature DB >> 34629690 |
Binrong Wu1, Lin Wang1, Sirui Wang1, Yu-Rong Zeng2.
Abstract
Accurate oil market forecasting plays an important role in the theory and application of oil supply chain management for profit maximization and risk minimization. However, the coronavirus disease 2019 (COVID-19) has compelled governments worldwide to impose restrictions, consequently forcing the closure of most social and economic activities. The latter leads to the volatility of the oil markets and poses a huge challenge to oil market forecasting. Fortunately, the social media information can finely reflect oil market factors and exogenous factors, such as conflicts and political instability. Accordingly, this study collected vast online oil news and used convolutional neural network to extract relevant information automatically. Oil markets are divided into four categories: oil price, oil production, oil consumption, and oil inventory. A total of 16,794; 9,139; 8,314; and 8,548 news headlines were collected in four respective cases. Experimental results indicate that social media information contributes to the forecasting of oil price, oil production and oil consumption. The mean absolute percentage errors are respectively 0.0717, 0.0144 and 0.0168 for the oil price, production, and consumption prediction during the COVID-19 pandemic. Marketers must consider the impact of social media information on the oil or similar markets, especially during the COVID-19 outbreak.Entities:
Keywords: COVID-19 pandemic; Deep learning; Social media information; Text mining; Time series forecasting
Year: 2021 PMID: 34629690 PMCID: PMC8486164 DOI: 10.1016/j.energy.2021.120403
Source DB: PubMed Journal: Energy (Oxf) ISSN: 0360-5442 Impact factor: 7.147
Fig. 1System design of this study.
Summary of recent studies for oil market forecasting.
| Classification | Methods | Influencing factors | Forecasting object or areas |
|---|---|---|---|
| Oil price forecasting | Vector Trend Forecasting Method (VTFM) [ | Historical oil price data | Brent crude oil spot price |
| A semi-heterogeneous approach [ | Historical oil price data | West Texas Intermediate (WTI) crude oil price | |
| Intelligent model search engine [ | CPI, IPI, USI, BTI, CU, LBR, SP, JPU, and CHU | Brent oil price | |
| Convolutional neural network and Latent Dirichlet Allocation (LDA) topic model [ | Online oil news, financial market data, and oil price data | West Texas Intermediate (WTI) crude oil | |
| Convolutional neural network with Variational mode decomposition [ | Google Trends and online media information | West Texas Intermediate (WTI) crude oil price | |
| Oil production forecasting | Combining the nonlinear metabolism grey model with Auto-Regressive Integrated Moving Average (ARIMA) [ | Historical production data | U.S. shale oil production |
| Ensemble empirical mode decomposition with Long Short-Term Memory [ | Historical production data | Two actual oilfields from China | |
| Oil consumption forecasting | AdaBoost ensemble technology [ | Historical consumption data | Oil consumption of China |
| GM (1,1) model [ | Historical consumption data | Global oil consumption | |
| NMGM (1, 1, α) [ | Historical consumption data | Oil consumption of China | |
| LogR, DT, BPNN, and SVM [ | Google Trends and historical consumption data | Global oil consumption | |
| Oil inventory forecasting | – | – | – |
Note:Studies rarely investigate the implementation of oil inventory prediction.
Fig. 2Structure of the text CNN.
Fig. 3Structure of BPNN.
Fig. 4Time series of U.S. monthly oil price, production, consumption, and stocks.
Fig. 5Monthly oil price, production, consumption, and inventory in 2018, 2019, and 2020.
Fig. 6Train, validation, and test period of oil market prediction model.
Number of news using different keywords.
| Type | Keywords | The number of news in CNN train period | The number of news in CNN test period | Total numbers |
|---|---|---|---|---|
| International news | Crude oil | 2408 | 4385 | 6793 |
| Crude oil price | 2674 | 5006 | 7680 | |
| WTI | Not enough news | – | 860 | |
| Domestic news | American oil | 622 | 839 | 1461 |
Fig. 7Word cloud in the news corpus with keywords “American oil”.
Fig. 8Word cloud in the news corpus with keywords “Crude oil”.
Fig. 9Word cloud in the news corpus with keywords “Crude oil price”.
CNN classification results of different datasets.
| Keywords | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|
| Crude oil | 0.70 | 0.58 | 0.63 | |
| Crude oil price | 0.61 | 0.65 | 0.37 | 0.47 |
| American oil | 0.60 | 0.61 | 0.57 | 0.59 |
Note: The accuracy, precision, recall, and F-measure of CNN classification are evaluated as follows: Accuracy = (TP + TN)/(TP + FP + TN + FN); Precision = TP/(TP + FP); Recall = TP/(TP + FN); F-measure = 2 ∗Precision∗Recall/(Precision + Recall), where TP (true positive) is the number of positive cases which are categorized as positive; FP (false positive) is the number of positive cases which are classified as negative; TN (true negative) is the number of negative cases which are classified as negative; and FN (false negative) is the number of positive cases which are classified as negative. The precision, recall, and F-measure in the table are all micro averages.
Fig. 10Time series of oil price and the three CNN values.
Fig. 11Time series of crude oil price and keywords “Crude oil”.
Descriptive statistics of the CNN values, historical oil price, DJI, GDP, and oil price datasets.
| CNN | Historical oil price | DJI | GDP | Oil price | |
|---|---|---|---|---|---|
| Mean | 0.5099 | 52.5055 | 25139.47 | 18627.88 | 51.9398 |
| Median | 0.5112 | 54.8 | 25396.24 | 18696.71 | 54.8 |
| Std. Dev. | 0.0046 | 16.0557 | 304.6084 | 88.0079 | 16.5658 |
| Skewness | −0.3799 | −1.3844 | −0.3707 | −1.6744 | −1.2559 |
| Kurtosis | 0.2140 | 1.6120 | −0.2720 | 4.6609 | 1.0222 |
| Jarque-Bera | 0.8936 | 14.5199∗∗∗ | 1.0959 | 43.3310∗∗∗ | 10.6770∗∗ |
| Augmented Dickey-Fuller | −4.8209∗∗∗ | −1.3072 | −2.1983 | −1.8139 | −1.4594 |
Note: ∗, ∗∗, ∗∗∗represent significance at the 10%, 5%, 1% levels, respectively.
Performance comparison of different techniques.
| MAPE | RMSE | MAE | IR(MAPE) | ||
|---|---|---|---|---|---|
| BPNN | 1 | 46.78% | 11.1041 | 8.4084 | – |
| 2 | 9.56% | 2.2285 | 1.6749 | 79.56% | |
| 3 | 28.51% | 6.6743 | 5.3789 | 39.06% | |
| 84.67% | |||||
| MLR | 1 | 76.27% | 14.5121 | 12.8406 | – |
| 2 | 76.19% | 14.6423 | 12.7107 | 0.10% | |
| 3 | 64.45% | 15.4351 | 12.0035 | 15.50% | |
| 4 | 56.28% | 12.9765 | 10.4790 | 26.21% | |
| SVM | 1 | 36.17% | 8.9761 | 7.4284 | – |
| 2 | 26.97% | 7.1991 | 6.4495 | 19.20% | |
| 3 | 46.59% | 14.4929 | 10.1242 | −28.81% | |
| 4 | 23.60% | 5.2307 | 4.0115 | 34.75% | |
| LSTM | 1 | 54.43% | 14.5938 | 11.1827 | – |
| 2 | 32.53% | 9.7595 | 8.1046 | 40.24% | |
| 3 | 44.40% | 8.2702 | 6.9034 | 18.43% | |
| 4 | 31.88% | 6.7175 | 5.2260 | 41.43% | |
| RNN | 1 | 58.92% | 12.3870 | 11.2355 | – |
| 2 | 22.53% | 4.7991 | 4.4857 | 61.76% | |
| 3 | 50.54% | 15.0736 | 11.4218 | 14.22% | |
| 4 | 17.02% | 4.7706 | 3.3586 | 71.11% |
Note: “1” means that historical data are used to predict. “2” means that historical data and text features are used to predict. “3” means that historical data and financial data are used to predict. “4” means that historical data, financial data, and text features data are both employed to predict. IR(MAPE) means improving the rate of MAPE from “1” to “2 (3 or 4)”. The grid search method is used to determine the parameters of adopted algorithms [37]. Appendix C lists the final parameter values of these forecasting models in all examples.
Fig. 12Forecasting performances of different predictive factors using BPNN.
Fig. 13Ranking of the mean impact value for oil price prediction using BPNN.
Number of news using different keywords.
| Type | Keywords | The number of news in CNN train period | The number of news in CNN test period | Total numbers |
|---|---|---|---|---|
| International news | Crude oil | 2408 | 4385 | 6793 |
| Domestic news | American oil | 622 | 839 | 1461 |
| American oil production | Not enough news | – | 885 |
CNN classification results.
| Keywords | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|
| Crude oil | 0.70 | 0.70 | 0.70 | |
| American oil | 0.66 | 0.66 | 0.68 | 0.67 |
Fig. 14Time series of oil production and the two CNN values.
Descriptive statistics of CNN values, historical oil production, and oil production datasets.
| CNN | Historic oil production | Oil production | |
|---|---|---|---|
| Mean | 0.3409 | 11135.46 | 11172.07 |
| Median | 0.3311 | 11423.45 | 11423.45 |
| Std. Dev. | 0.0803 | 1220.75 | 1179.87 |
| Skewness | 0.2133 | −0.2150 | −0.2034 |
| Kurtosis | 0.3729 | −1.3088 | −1.2607 |
| Jarque-Bera | 0.3358 | 3.0895∗ | 2.8797 |
| Augmented Dickey-Fuller | −3.1838∗∗ | −2.0042 | −1.8434 |
Note: The descriptive statistics of DJI and GDP are shown in Table 4.
Performance comparison of different models.
| MAPE | RMSE | MAE | IR(MAPE) | ||
|---|---|---|---|---|---|
| BPNN | 1 | 4.40% | 616.4990 | 475.5733 | – |
| 2 | 3.94% | 625.7280 | 428.3586 | 10.45% | |
| 3 | 2.24% | 294.0954 | 254.1759 | 49.09% | |
| 4 | 1.69% | 262.4757 | 190.5080 | 61.59% | |
| MLR | 1 | 5.10% | 847.4493 | 545.5473 | – |
| 2 | 5.37% | 907.5902 | 574.5642 | −5.29% | |
| 3 | 16.70% | 2.4171e+03 | 1.8282e+03 | −227.45% | |
| 4 | 12.18% | 1.6593e+03 | 1.3454e+03 | −138.82% | |
| SVM | 1 | 3.59% | 504.9028 | 398.3374 | – |
| 2 | 4.16% | 609.1765 | 738.3157 | −15.88% | |
| 3 | 2.04% | 293.7686 | 230.5830 | 43.18% | |
| 59.89% | |||||
| LSTM | 1 | 4.23% | 738.3157 | 447.6391 | – |
| 2 | 4.28% | 589.7330 | 485.2420 | −1.18% | |
| 3 | 3.40% | 477.3984 | 369.1028 | 19.62% | |
| 4 | 2.93% | 649.7016 | 339.4954 | 30.73% | |
| RNN | 1 | 3.84% | 500.2649 | 424.0437 | – |
| 2 | 3.61% | 627.1218 | 387.3984 | 5.99% | |
| 3 | 3.37% | 691.9704 | 394.2134 | 12.24% | |
| 4 | 3.06% | 433.1777 | 337.4889 | 20.31% |
Fig. 15Forecasting performances of different predictive factors using SVM.
Fig. 16Ranking of the mean impact value for oil production prediction.
Number of news using different keywords.
| Type | Keywords | The number of news in CNN train period | The number of news in CNN test period | Total numbers |
|---|---|---|---|---|
| International news | Crude oil | 2408 | 4385 | 6793 |
| Domestic news | American oil | 622 | 839 | 1461 |
| American oil consumption | Not enough news | – | 60 |
Classification performance of the CNN model.
| Keywords | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|
| Crude oil | 0.60 | 0.63 | 0.52 | 0.57 |
| American oil | 0.66 | 0.56 | 0.61 |
Fig. 17Time series of oil consumption and the two CNN values.
Fig. 18Time series of oil consumption and CNN values with keywords “American oil”.
Descriptive statistics of CNN values, historical oil consumption, and oil consumption datasets.
| CNN | Historical oil consumption | Oil consumption | |
|---|---|---|---|
| Mean | 0.4803 | 5084.385 | 5092.151 |
| Median | 0.4841 | 45.8462 | 5120.261 |
| Std. Dev. | 0.0128 | 289.9571 | 45.2494 |
| Skewness | −0.6523 | −0.2681 | −0.337 |
| Kurtosis | 0.5070 | 0.2633 | 0.4568 |
| Jarque-Bera | 2.7751 | 0.4555 | 0.8095 |
| Augmented Dickey-Fuller | −2.0265 | −3.8996∗∗∗ | −3.8380∗∗∗ |
Performance comparison of different models.
| MAPE | RMSE | MAE | IR(MAPE) | ||
|---|---|---|---|---|---|
| BPNN | 1 | 4.08% | 364.7069 | 182.6500 | – |
| 3 | 2.98% | 176.9245 | 144.6125 | 26.96% | |
| 4 | 2.93% | 191.2712 | 136.6095 | 28.19% | |
| MLR | 1 | 5.61% | 355.8126 | 258.9670 | – |
| 2 | 4.97% | 339.5771 | 228.4414 | 11.41% | |
| 3 | 80.80% | 4.3301e+03 | 3.9944e+03 | −1340.29% | |
| 4 | 27.59% | 1.6708e+03 | 1.2832e+03 | −391.80% | |
| SVM | 1 | 4.12% | 329.7182 | 184.5873 | – |
| 2 | 3.24% | 174.3819 | 155.5488 | 21.36% | |
| 3 | 2.85% | 152.4080 | 137.7221 | 30.83% | |
| 4 | 2.78% | 156.5058 | 134.2022 | 32.52% | |
| LSTM | 1 | 3.53% | 316.4694 | 157.4753 | – |
| 2 | 2.55% | 135.8281 | 123.2782 | 27.76% | |
| 3 | 3.45% | 194.4310 | 161.6923 | 2.27% | |
| 4 | 3.86% | 277.9089 | 181.9194 | −9.35% | |
| RNN | 1 | 3.48% | 314.4836 | 154.8826 | – |
| 2 | 2.72% | 183.2028 | 129.0973 | 21.84% | |
| 3 | 1.91% | 110.1180 | 91.6264 | 45.11% | |
| 4 | 3.41% | 198.0660 | 160.7579 | 2.01% |
Fig. 19Forecasting performances of different predictive factors using BPNN.
Fig. 20Ranking of the mean impact value for oil consumption prediction.
Number of news using different keywords.
| Type | Keywords | The number of news in CNN train period | The number of news in CNN test period | Total numbers |
|---|---|---|---|---|
| International news | Crude oil | 2408 | 4385 | 6793 |
| Domestic news | American oil | 622 | 839 | 1461 |
| American oil inventory | Not enough news | – | 294 |
Results of CNN classification.
| Keywords | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|
| Crude oil | 0.75 | 0.73 | 0.74 | |
| American oil | 0.69 | 0.71 | 0.62 | 0.66 |
Fig. 21Time series of oil inventory and the two CNN values.
Descriptive statistics of CNN values, historical oil inventory, and oil inventory datasets.
| CNN | Historical oil inventory | Oil inventory | |
|---|---|---|---|
| Mean | 0.4320 | 1941.638 | 1943.099 |
| Median | 0.4617 | 1923.667 | 1923.667 |
| Std. Dev. | 0.1337 | 10.1227 | 66.5930 |
| Skewness | −0.2266 | 1.1259 | 1.1397 |
| Kurtosis | −0.9941 | 0.8091 | 0.6487 |
| Jarque-Bera | 2.0521 | 8.3552∗∗ | 8.3145∗∗ |
| Augmented Dickey-Fuller | −4.2322∗∗∗ | −1.1972 | −1.6843 |
Performance comparison of different models.
| MAPE | RMSE | MAE | IR(MAPE) | ||
|---|---|---|---|---|---|
| BPNN | 1 | 1.52% | 40.0705 | 30.6790 | – |
| 2 | 2.24% | 52.0491 | 45.9824 | −47.37% | |
| 3 | 1.51% | 44.1664 | 30.2314 | 0.66% | |
| 4 | 1.66% | 39.5310 | 33.9962 | −9.21% | |
| MLR | 1 | 1.60% | 38.7095 | 32.4300 | – |
| 2 | 1.73% | 40.9421 | 35.1122 | −8.1% | |
| 3 | 2.77% | 67.0007 | 56.1855 | −73.13% | |
| 4 | 2.59% | 59.2417 | 52.7921 | −61.88% | |
| SVM | 1 | 1.43% | 39.3432 | 28.7629 | – |
| 2 | 1.43% | 39.3953 | 26.4642 | 0 | |
| 3 | 2.00% | 56.6477 | 41.6142 | −39.86% | |
| 4 | 2.38% | 52.1609 | 48.6861 | −66.43% | |
| LSTM | 1 | 1.31% | 39.8103 | 26.4642 | – |
| 12.98% | |||||
| 3 | 1.69% | 43.0887 | 34.5260 | −29.01% | |
| 4 | 1.55% | 40.4426 | 31.5797 | −18.32% | |
| RNN | 1 | 1.37% | 40.0492 | 27.5696 | – |
| 2 | 1.58% | 36.1341 | 31.5475 | −15.33% | |
| 3 | 2.10% | 54.8052 | 43.2982 | −53.28% | |
| 4 | 1.66% | 51.7864 | 33.1418 | −21.68% |
Fig. 22Forecasting performances of different predictive factors using LSTM.