Literature DB >> 34629690

Forecasting the U.S. oil markets based on social media information during the COVID-19 pandemic.

Binrong Wu1, Lin Wang1, Sirui Wang1, Yu-Rong Zeng2.   

Abstract

Accurate oil market forecasting plays an important role in the theory and application of oil supply chain management for profit maximization and risk minimization. However, the coronavirus disease 2019 (COVID-19) has compelled governments worldwide to impose restrictions, consequently forcing the closure of most social and economic activities. The latter leads to the volatility of the oil markets and poses a huge challenge to oil market forecasting. Fortunately, the social media information can finely reflect oil market factors and exogenous factors, such as conflicts and political instability. Accordingly, this study collected vast online oil news and used convolutional neural network to extract relevant information automatically. Oil markets are divided into four categories: oil price, oil production, oil consumption, and oil inventory. A total of 16,794; 9,139; 8,314; and 8,548 news headlines were collected in four respective cases. Experimental results indicate that social media information contributes to the forecasting of oil price, oil production and oil consumption. The mean absolute percentage errors are respectively 0.0717, 0.0144 and 0.0168 for the oil price, production, and consumption prediction during the COVID-19 pandemic. Marketers must consider the impact of social media information on the oil or similar markets, especially during the COVID-19 outbreak.
© 2021 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  COVID-19 pandemic; Deep learning; Social media information; Text mining; Time series forecasting

Year:  2021        PMID: 34629690      PMCID: PMC8486164          DOI: 10.1016/j.energy.2021.120403

Source DB:  PubMed          Journal:  Energy (Oxf)        ISSN: 0360-5442            Impact factor:   7.147


World Health Organization Coronavirus disease 2019 International Energy Agency Vector autoregressive Convolutional neural network Backpropagation neural networks Support vector machines Multiple linear regression Recurrent neural network Long short-term memory United States Natural Language Processing West Texas Intermediate United States of America Hannan–Quinn information criterion Gross Domestic Product Term frequency-inverse document frequency Barrels per day American Petroleum Institute Organization of Petroleum Exporting Countries Liquefied Natural Gas Mean absolute error Mean absolute percentage error Root mean square error Augmented Dickey-Fuller test Artificial intelligence Improving rate Mean Impact Value Akaike information criterion Bayesian information criterion Final prediction error Dow Jones Industrial Average

Introduction

Oil plays a leading role in energy resources, and oil supply chain plays an important role in the global economy [1]. It has profound significance to explore the theory and application of petroleum supply chain management in profit maximization and risk minimization [2]. However, the uncertainty surrounding the ongoing COVID-19 pandemic brings great challenges to forecasting the oil markets. The World Health Organization (WHO) declares the ongoing coronavirus pandemic as a global threat. Currently, no effective treatment is found for this respiratory disease, namely, the coronavirus disease 2019 (COVID-19). Since the end of February 2020, the new virus has spread across many countries. As of early 2021, more than 91 million cases have been reported worldwide with more than 1.9 million deaths. Governments worldwide have imposed restrictions to slow down the spread of the virus, consequently forcing the closure of most social and economic activities. Some of the measures are partial or complete blockades, including banning public gatherings, closing non-essential businesses, and even educational institutions [3] (International Energy Agency [4]). Owing to the impact of the COVID-19 pandemic, oil markets in many countries have shown a volatile trend. With the global economic downturn, the unpredictable COVID-19 pandemic has led to a sharp drop in oil demand [5]. In recent years, researchers have extensively incorporated Internet data into forecasting studies as explanatory variables [6]. The combination of Internet data and prediction models might facilitate improved forecasting performance [7]. Looking at related oil events can help forecast the oil markets. The challenging question is how to select and quantify these events. Fortunately, qualitative evidence, such as emergencies and politics, has been transformed into the quantitative analysis (e.g., time-series data) through text mining, social network, and sentiment analysis techniques [8]. Especially, text mining is useful for identifying ideas and extracting information [9]. For example, Zhang et al. [10] used text mining techniques to mine product innovation ideas from online reviews. Jeong, Yoon, and Lee [11] proposed a product opportunity mining approach for product planning. In this study, the text mining method of text extraction is implemented to convert unstructured text into a structured format. According to the major validated scientific studies, Hemmatian and Sohrabi [12] summarized the recent methods for opinion classification and aspect extraction. They deemed the CNN model as the most potential technique of text mining. Meanwhile, CNN has been widely used in other popular fields, including sentence modeling [13], image classification [14], and speech recognition [15]. Inspired by this research, the current study uses the CNN model to process online oil news. Besides, the volume of media information continues to increase, including a wealth of qualitative information that can be chosen to improve the accuracy of oil markets forecasting. As online news is more persuasive and quieter, it is a more reliable source than other social media, such as Twitter and blogs [16]. Therefore, online oil news sources are considered to be qualitative data that can help develop the accuracy of oil market forecasting. Above all, the authors aim to present a novel oil market prediction methodology, whose focus lies in developing forecasting performance by examining social media information. Oil markets are divided into four categories: oil price, oil production, oil consumption, and oil inventory. This research attempts to solve the following three research questions: firstly, which type of news is more relevant to the oil price, oil production, oil consumption, and oil inventory? Secondly, to what extent can the application of social media information improve the forecasting of oil price, oil production, oil consumption, and oil inventory during the COVID-19 pandemic? Thirdly, how do different forecasting models perform in the oil market prediction task? Fig. 1 illustrates the general framework of the system design of this study. This study takes the U.S. oil markets as an example, as the U.S. is the leading economy in the world, and its oil markets are highly volatile as an effect of the COVID-19 pandemic. In this study, different keywords, such as “Crude oil” and “American oil,” were employed to collect oil news headlines released from the popular oil website “Oilprice.com” from May 2012 to August 2020. The convolutional neural network (CNN) model is applied to extract textual information of these news headlines. The VAR method is used to select the appropriate lay order of the outputs of CNN and historical oil data. Then, text feature, financial market data, and historical oil data are input into several prediction techniques, namely, backpropagation neural networks (BPNN), support vector machines (SVM), multiple linear regression (MLR), recurrent neural network (RNN), and long short-term memory (LSTM). The results illustrate that online media information can facilitate crude oil price, oil production, and oil consumption forecasting, especially during the COVID-19 pandemic. However, using oil news might not help predict oil inventory.
Fig. 1

System design of this study.

System design of this study. The main contributions of this study are summarized as follows: This study is the first to examine the predictive relationship between social media information and oil price, production, consumption, and inventory. The results of text mining indicate that the fluctuations of oil price, production, and inventory are influenced by various international events. By contrast, oil consumption is closely related to domestic events. A deep learning algorithm, namely, CNN, was employed to extract textual information from social media information automatically. The results of the mean impact value (MIV) approach indicate that text features are important predictors, which illustrate the explanatory power of textual information in the forecasting of oil price, production, and consumption. We proposed a novel oil price, production, and consumption forecasting methodology. The findings contribute specifically to theoretical insights for processing information, in that oil price, production, and consumption prediction obtained remarkable accuracy performance by considering social media information during the COVID-19 pandemic. The rest of the paper is arranged as follows. Section 2 presents a literature review of the prediction of oil price, production, consumption, and inventory. Section 3 provides the methodology of this study, including the text mining technique and several forecasting techniques. Section 4 describes the data and empirical results of text mining and oil market forecasting. Finally, Section 5 provides the conclusion and future research.

Literature review

The research community focuses on oil markets with various forecasting techniques [17,18]. Studies on these topics are divided into four categories, namely, oil price forecasting, oil production forecasting, oil consumption forecasting, and oil inventory forecasting. Table 1 presents a summary of typical studies for oil market forecasting.
Table 1

Summary of recent studies for oil market forecasting.

ClassificationMethodsInfluencing factorsForecasting object or areas
Oil price forecastingVector Trend Forecasting Method (VTFM) [20]Historical oil price dataBrent crude oil spot price
A semi-heterogeneous approach [21]Historical oil price dataWest Texas Intermediate (WTI) crude oil price
Intelligent model search engine [22]CPI, IPI, USI, BTI, CU, LBR, SP, JPU, and CHUBrent oil price
Convolutional neural network and Latent Dirichlet Allocation (LDA) topic model [23]Online oil news, financial market data, and oil price dataWest Texas Intermediate (WTI) crude oil
Convolutional neural network with Variational mode decomposition [24]Google Trends and online media informationWest Texas Intermediate (WTI) crude oil price
Oil production forecastingCombining the nonlinear metabolism grey model with Auto-Regressive Integrated Moving Average (ARIMA) [25]Historical production dataU.S. shale oil production
Ensemble empirical mode decomposition with Long Short-Term Memory [26]Historical production dataTwo actual oilfields from China
Oil consumption forecastingAdaBoost ensemble technology [28]Historical consumption dataOil consumption of China
GM (1,1) model [27]Historical consumption dataGlobal oil consumption
NMGM (1, 1, α) [29]Historical consumption dataOil consumption of China
LogR, DT, BPNN, and SVM [30]Google Trends and historical consumption dataGlobal oil consumption
Oil inventory forecasting

Note:Studies rarely investigate the implementation of oil inventory prediction.

Summary of recent studies for oil market forecasting. Note:Studies rarely investigate the implementation of oil inventory prediction. Crude oil price fluctuations have a significant impact on the global economy. Forecasting oil price fluctuations that affect a country’s social stability, economic development can help governments make policies and reduce financial losses in the industrial sector [19]. Using historical statistical data, some studies have attempted econometric techniques, intelligent algorithms, and various decomposition techniques to predict crude oil price. To illustrate, Zhao et al. [20] used the vector trend forecasting method (VTFM) to forecast Brent crude oil price. Wang, Li, and Hong [21] decomposed the oil price series using four decomposition techniques, namely, Variational Mode Decomposition, Empirical Mode Decomposition, Singular Spectral Analysis, and Wavelet Analysis; then, four different forecasting methods are used to predict the components from each decomposition technique. They reconstructed the oil price forecasts and obtained better performance of oil price forecasting. However, crude oil price movements are driven by a variety of factors, including oil market factors (e.g., oil demand, stocks, and supply) and exogenous factors (e.g., epidemics, political instability). Considering these related factors to forecast oil price has been a hot spot. For example, Bekiroglu et al. [22] proposed an intelligent model search engine to forecast Brent oil using some financial indexes and oil data. Li, Shang, and Wang [23] presented a new oil price forecasting method using online media text mining. Wu et al. [24] combined Google Trends and social media information to forecast weekly oil prices. Drawing on experience from previous studies, we further coped with online media news and compared various forecasting techniques to forecast oil price during the COVID-19 pandemic. As for oil production forecasting, the research uses historical data to predict oil yields. For instance, Wang, Song, and Li [25] developed a hybridization model of the nonlinear grey model and linear ARIMA to forecast U.S. shale oil production. Liu, Liu, and Gu [26] proposed an ensemble empirical mode decomposition (EEMD) based on Long Short-Term Memory (LSTM) to forecast the production of two actual oilfields from China. Recently, oil consumption forecasting has been a hot field. For example, Yuan et al. [27] used the GM (1,1) model cluster to predict global oil consumption. Xiao et al. [28] proposed a new hybrid oil consumption forecasting model using selective ensemble, which could extract nonlinear subseries of oil consumption. Wang and Song [29] developed a new nonlinear-dynamic grey model, namely, NMGM (1, 1, α), to forecast China’s oil consumption. Yu et al. [30] found that the accuracy of oil consumption prediction can be significantly improved using Google Trends, which finely reflects related factors of oil markets. In comparison with previous work, we focused more on analyzing the role of media information in predicting oil consumption. Above all, many studies examined the oil market factors and exogenous factors to predict oil prices. However, research scarcely uses these factors to forecast oil production, consumption, and inventory. Influenced by COVID-19, oil markets have experienced drastic fluctuations, with which accurate prediction using historical data is impossible. Looking at oil market factors and exogenous factors may facilitate the forecasting of oil price, production, consumption, and inventory. Meanwhile, the online media information can finely reflect oil market factors and exogenous factors. Compared with previous studies, it is the first time for this study to apply online oil news to examine the predictive relationship with oil markets during the COVID-19 pandemic.

Methodology

Convolutional neural network for text classification

Fig. 2 illustrates the basic structure of the text CNN model. The first step in using CNN for text mining is to implement tokenization and filter punctuation and stop words. Stop words such as "the," "in," and "is" are generally considered useless, as they are common and they dramatically increase the size of the index without increasing the accuracy or recall.
Fig. 2

Structure of the text CNN.

Structure of the text CNN. Considering the difference in length of each document, the padded sequence technique is implemented to convert each document into the same length. We cut off the words, which exceed the fixed value. Then, we add “0” after a document that is shorter than the specified value. Thereafter, we employed a word embedding model (word2vec) to convert each word into a unique vector. Words with similar meanings are also closer to each other in Euclidean terms. In the text-CNN, the width of the convolution kernel is consistent with the dimension of the word vector. Each line of input vector represents a word, and in the process of feature extraction, the word is the minimum granularity of text. Given that the input of CNN is a sentence and the correlation between adjacent words is high, the word order and its context are considered in the process of convolution. In the process of convolution layer with different heights of convolution kernels, the different dimensions of the vector are obtained. In the pooling layer, we use a 1-Max-pooling to extract the maximum characteristics of each feature vector. After 1-Max-pooling for all feature vectors, each value shall be subjected to splicing. To prevent overfitting, dropout is added before the pooling layer to the full connection layer. We identify two full connection layers. The first layer uses “relu” as the activation function, and the second layer employs a softmax activation function to obtain the probability belonging to each class. The outputs of the CNN model denote the fluctuation of monthly oil markets. The oil market movement is described as follows:where denotes the oil markets at the end of month m. The above structure of text CNN is implemented by a Python library, TensorFlow.

Forecasting models

In this study, common and popular forecasting models, such as backpropagation neural network, SVM, and multivariate linear regression, are implemented [18,24,31]. RNN and LSTM, which are the latest hot deep learning models, are also considered to predict the oil markets [32,33]. The following prediction models are all in one-step-ahead prediction. Backpropagation neural network BPNN, which belongs to supervised learning, is the most basic neural network. Its output results are propagated forward, and errors are propagated backward. Fig. 3 illustrates the BPNN for a single hidden layer. denote the input values, and denotes the output value.
Fig. 3

Structure of BPNN.

Multivariate linear regression Structure of BPNN. When a linear relationship exists between multiple independent variables and dependent variables, the regression analysis is multivariate linear regression (MLR). is the dependent variable, and are the independent variables. If a linear relationship exists between the independent variables and the dependent variable, the multiple linear regression model is described as follows:where is a constant, are regression coefficients, and denotes an error. Model parameters are calculated using the ordinary least square method (OLS). Support vector machine SVM is a linear classifier with the largest interval defined in the feature space. The SVM model can be expressed as the following constrained optimization problem: Given the training data, where is the input and is the output, the primal formulation is expressed as follows: where is the training data, denotes the hyperplane vector, represents the regularization parameter, b represents the bias, denotes the tolerable misclassification error and denotes nonlinear mapping function, which is the Gaussian radial basis function kernel with variance . Recurrent neural network and Long short-term memory RNNs denote a class of neural networks that processes sequential data. A basic neural network only establishes weight connections between layers. RNN also establishes weight connections between neurons in the same layer. Hochreiter and Schmidhuber [34] proposed LSTM, which is an RNN variant. LSTM extends the RNN architecture with a separate memory unit and control mechanism that controls the flow of information in the network. The gating mechanism consists of input gates, forget gates, and output gates.

Experiment study

This section presents the implementation of econometric methods by Econometrics Views 10 and that of artificial intelligence models by Python 3.8. The computation is evaluated on an efficient computer with an Intel (R) Core (TM) i7-10700K CPU, 3.80 GHz, 32 GB RAM, and Windows 10 system.

Data retrieval and descriptions

Fig. 4 presents the time series data in May 2017–August 2020 for monthly oil price (West Texas Intermediate, WTI), oil production, oil consumption by the industrial sector, and oil stocks, which were all selected from the U.S. Energy Information Administration (http://www.eia.gov). In this graphic representation, the oil markets have changed dramatically, especially in March, April, and May 2020.
Fig. 4

Time series of U.S. monthly oil price, production, consumption, and stocks.

Time series of U.S. monthly oil price, production, consumption, and stocks. As shown in Fig. 5 , the coronavirus disease 2019 (COVID-19) pandemic and the oil price war between Russia and Saudi Arabia affected the monthly oil price in 2020, which was significantly lower than that in the same period of 2018 and 2019. Especially from February to March 2020, the oil price fell by $25.08, which represents a total reduction of 55.57%. Meanwhile, the end of the oil price war and effective control of the COVID-19 pandemic initiated the rise of crude oil prices in May 2020.
Fig. 5

Monthly oil price, production, consumption, and inventory in 2018, 2019, and 2020.

Monthly oil price, production, consumption, and inventory in 2018, 2019, and 2020. With the development of shale gas in the U.S., total petroleum production capacity has continued to grow in recent years. However, the collapse in crude oil price caused by the oil price war and the decline in oil demand caused by the outbreak of the COVID-19 pandemic have sharply reduced petroleum production capacity in the U.S. Especially, from April to May 2020, the oil production fell by 1,991.192 thousand BPD (barrels per day), which is a 16.58% drop. The crude oil production approached the lowest level for the recent three years in May 2020. Fig. 5 also illustrates the evident reduction in oil consumption in April since the beginning of the lockdown measures adopted by the U.S. government. With the reopening policy announced in May 2020, total petroleum consumption by the industrial sector saw a rapid recovery. Uncertainty surrounding the ongoing COVID-19 pandemic could also lead to uncertainty in oil consumption. Meanwhile, petroleum stocks in 2020 are consistently higher than those in the same periods of 2018 and 2019. Under the influence of falling demand for oil, petroleum stocks approached the highest level for recent years in June 2020. As mentioned above, the impact caused by the COVID-19 pandemic and oil price war has caused oil markets to fluctuate dramatically in 2020. Accurately predicting the oil markets has become a huge challenge. Since many literatures show that stock markets and economic development affect the oil market, Dow Jones Industrial Average (DJIA) and real Gross Domestic Product (GDP) in May 2017–August 2020 were collected as two predictive indicators that influence oil markets [35]. DJIA was selected from a financial portal ‘‘Investing.com’’ and monthly GDP was collected from “Macroadvisers.com”. Meanwhile, this study utilized different keywords to collect oil news headlines released from the popular oil website “Oilprice.com,” from May 2012 to August 2020. Notably, the monthly oil news is consolidated into a sample, with a total of 100 observations. Fig. 6 illustrates train, validation, and test periods of the oil market prediction model. In the CNN model, the training period is May 2012–April 2017, including 60 monthly records. The test period is May 2017–Aug 2020, consisting of 40 monthly records. As an input variable of oil market forecasting, the output of the CNN model in the test period is used as the input of the train, validation, and test periods of the oil market prediction model. The training period for the oil market forecasting model is May 2017–April 2019, consisting of 24 monthly observations. The validation period is May 2019–December 2019, including 8 monthly observations. The test period is January 2020–August 2020, including 8 monthly observations. A rolling window is used to estimate the oil market prediction model.
Fig. 6

Train, validation, and test period of oil market prediction model.

Train, validation, and test period of oil market prediction model.

Oil market forecasting

Oil price forecasting

News selection and text mining We collected four online oil news collections with different keywords “Crude oil,” “Crude oil price,” “American oil,” and “WTI.” Table 2 presents the number of different news collections. Notably, the oil news collections with the keyword “WTI” are unavailable every month between May 2012 and August 2020. Thus, we use the other three news collections to carry out the next step analysis.
Table 2

Number of news using different keywords.

TypeKeywordsThe number of news in CNN train periodThe number of news in CNN test periodTotal numbers
International newsCrude oil240843856793
Crude oil price267450067680
WTINot enough news860
Domestic newsAmerican oil6228391461
Number of news using different keywords. Fig. 7, Fig. 8, Fig. 9 describe the top 100-word cloud with keywords “American oil,” “Crude oil,” and “Crude oil price,” respectively. Words of different news are sorted using the popular largest term frequency-inverse document frequency (TF–IDF) weightings.
Fig. 7

Word cloud in the news corpus with keywords “American oil”.

Fig. 8

Word cloud in the news corpus with keywords “Crude oil”.

Fig. 9

Word cloud in the news corpus with keywords “Crude oil price”.

Word cloud in the news corpus with keywords “American oil”. Word cloud in the news corpus with keywords “Crude oil”. Word cloud in the news corpus with keywords “Crude oil price”. As shown in Fig. 7, the top 20 words, which are collected with keywords “American oil,” are as follows: “crude,” “API (American Petroleum Institute),” “report,” “build,” “inventory,” “price,” “U(USA),” “draw,” “Venezuela,” “sanction,” “gas,” “oil,” “pipeline,” “gasoline,” “Iran,” “surprise,” “energy,” “production,” “export,” and “shale”. For a more detailed analysis, “crude,” “gas,” “oil,” “pipeline,” “gasoline,” “energy,” and “shale” show a close relationship with oil. Meanwhile, “report,” “build,” “inventory,” “price,” “draw,” “production,” and “export” may represent online oil news related to oil supply, oil demand, oil inventory, and oil price. Furthermore, “API (American Petroleum Institute),” “U (USA),” “Venezuela,” “sanction,” and “Iran” may reflect related political and international events. As illustrated in Fig. 8, the top 100 words collected with keywords “Crude oil.” Compared with the collected words using keywords “American oil,” these words refer to other countries and institutions, such as “U (USA),” “Aramco (Arabian-American Oil Company),” “OPEC (Organization of Petroleum Exporting Countries),” “Tesla,” “Venezuela,” “API (American Petroleum Institute),” “Iranian,” and “Saudi.” It suggests that the online oil news collections with keywords “Crude oil,” are more diverse and international. Fig. 9 illustrates that the collected top 100 words with keywords “Crude oil price,” are similar to the collected top 100 words with keywords “Crude oil.” Appendix A lists the final parameter values of the CNN model in all examples. Table 3 presents the results of CNN classification using different datasets on the test dataset. Table 3 shows that the online oil news collections with keywords “Crude oil” achieved a better performance. Fig. 10 shows the time series of oil price and the three CNN values. To show the connection more clearly, Fig. 11 shows the time series of oil price and keywords “Crude oil.” Above all, the online oil news collections with keywords “Crude oil” are the most appropriate dataset to predict crude oil price. This finding indicates that the fluctuation of crude oil price is influenced by various international events.
Table 3

CNN classification results of different datasets.

KeywordsAccuracyPrecisionRecallF-measure
Crude oil0.660.700.580.63
Crude oil price0.610.650.370.47
American oil0.600.610.570.59

Note: The accuracy, precision, recall, and F-measure of CNN classification are evaluated as follows: Accuracy = (TP + TN)/(TP + FP + TN + FN); Precision = TP/(TP + FP); Recall = TP/(TP + FN); F-measure = 2 ∗Precision∗Recall/(Precision + Recall), where TP (true positive) is the number of positive cases which are categorized as positive; FP (false positive) is the number of positive cases which are classified as negative; TN (true negative) is the number of negative cases which are classified as negative; and FN (false negative) is the number of positive cases which are classified as negative. The precision, recall, and F-measure in the table are all micro averages.

Fig. 10

Time series of oil price and the three CNN values.

Fig. 11

Time series of crude oil price and keywords “Crude oil”.

Performance assessment CNN classification results of different datasets. Note: The accuracy, precision, recall, and F-measure of CNN classification are evaluated as follows: Accuracy = (TP + TN)/(TP + FP + TN + FN); Precision = TP/(TP + FP); Recall = TP/(TP + FN); F-measure = 2 ∗Precision∗Recall/(Precision + Recall), where TP (true positive) is the number of positive cases which are categorized as positive; FP (false positive) is the number of positive cases which are classified as negative; TN (true negative) is the number of negative cases which are classified as negative; and FN (false negative) is the number of positive cases which are classified as negative. The precision, recall, and F-measure in the table are all micro averages. Time series of oil price and the three CNN values. Time series of crude oil price and keywords “Crude oil”. The performance of the forecasting model is estimated by three statistical criteria, namely, mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) [36]. These statistical criteria are the basis for evaluating the differences between actual and predicted values. MAPE, MAE, and RMSE are described as follows: where is the total month of the test dataset, is the predicted value at month , and is the actual value at month . Results and discussion The lag order selection results of all examples are shown in Appendix B. The results demonstrate that the optimal lay orders for text features extracted by CNN, historical data, DJI, and GDP are one, two, two, and two, respectively. Table 4 presents the descriptive statistics of the CNN values with keywords “Crude oil,” historical oil price, DJI, GDP, and oil price datasets. According to the Jarque-Bera test, the historical oil price, GDP, and oil price datasets both reject the null hypothesis of normal distribution. Based on the Augmented Dickey-Fuller test (ADF), the historical oil price, DJI, GDP, and oil price dataset are demonstrated to have non-stationarity.
Table 4

Descriptive statistics of the CNN values, historical oil price, DJI, GDP, and oil price datasets.

CNNHistorical oil priceDJIGDPOil price
Mean0.509952.505525139.4718627.8851.9398
Median0.511254.825396.2418696.7154.8
Std. Dev.0.004616.0557304.608488.007916.5658
Skewness−0.3799−1.3844−0.3707−1.6744−1.2559
Kurtosis0.21401.6120−0.27204.66091.0222
Jarque-Bera0.893614.5199∗∗∗1.095943.3310∗∗∗10.6770∗∗
Augmented Dickey-Fuller−4.8209∗∗∗−1.3072−2.1983−1.8139−1.4594

Note: ∗, ∗∗, ∗∗∗represent significance at the 10%, 5%, 1% levels, respectively.

Descriptive statistics of the CNN values, historical oil price, DJI, GDP, and oil price datasets. Note: ∗, ∗∗, ∗∗∗represent significance at the 10%, 5%, 1% levels, respectively. The adopted typical techniques and artificial intelligence (AI) models are employed to select the optimal method to forecast oil price. Table 5 lists the performance comparison of different models using different inputs. The results show that the BPNN model obtains the best forecasting performance.
Table 5

Performance comparison of different techniques.

MAPERMSEMAEIR(MAPE)
BPNN146.78%11.10418.4084
29.56%2.22851.674979.56%
328.51%6.67435.378939.06%
47.17%1.91941.577584.67%
MLR176.27%14.512112.8406
276.19%14.642312.71070.10%
364.45%15.435112.003515.50%
456.28%12.976510.479026.21%
SVM136.17%8.97617.4284
226.97%7.19916.449519.20%
346.59%14.492910.1242−28.81%
423.60%5.23074.011534.75%
LSTM154.43%14.593811.1827
232.53%9.75958.104640.24%
344.40%8.27026.903418.43%
431.88%6.71755.226041.43%
RNN158.92%12.387011.2355
222.53%4.79914.485761.76%
350.54%15.073611.421814.22%
417.02%4.77063.358671.11%

Note: “1” means that historical data are used to predict. “2” means that historical data and text features are used to predict. “3” means that historical data and financial data are used to predict. “4” means that historical data, financial data, and text features data are both employed to predict. IR(MAPE) means improving the rate of MAPE from “1” to “2 (3 or 4)”. The grid search method is used to determine the parameters of adopted algorithms [37]. Appendix C lists the final parameter values of these forecasting models in all examples.

Performance comparison of different techniques. Note: “1” means that historical data are used to predict. “2” means that historical data and text features are used to predict. “3” means that historical data and financial data are used to predict. “4” means that historical data, financial data, and text features data are both employed to predict. IR(MAPE) means improving the rate of MAPE from “1” to “2 (3 or 4)”. The grid search method is used to determine the parameters of adopted algorithms [37]. Appendix C lists the final parameter values of these forecasting models in all examples. Furthermore, Fig. 12 shows the forecasting performances of different predictive factors using BPNN. As shown in Table 5 and Fig. 12, combining historical data, text feature data, and financial features can obtain better forecasting performance. Especially, in terms of IR (MAPE), the forecasting performance with text features, financial features, and historical data is improved by 84.67% than the forecasting performance of historical oil price using the BPNN model. In most scenarios, using financial data or text feature data can enhance forecasting performance than using only historical oil price. This finding suggests that online news and financial data may provide additional predictive information for oil price forecasting.
Fig. 12

Forecasting performances of different predictive factors using BPNN.

Evaluate the importance of each predictor using the MIV approach Forecasting performances of different predictive factors using BPNN. The MIV approach is implemented to evaluate the assigned coefficient values. The relative importance of each predictor is analyzed quantitatively. The size of each input variable is changed by ±10% to generate two new training sets. The average value of the difference between the two simulation results is evaluated, and MIV is obtained. Fig. 13 presents the ranking of the MIV for oil price prediction. The DJI (−2,-1,0), GDP (−1,0), text features (0, −1) and historical data (−2) are important factors. The historical oil price data (−1,0) and GDP (−2) have few effects on the oil price prediction. This finding suggests that financial information and online news have more explanatory power than historical oil prices for oil price forecasting.
Fig. 13

Ranking of the mean impact value for oil price prediction using BPNN.

Ranking of the mean impact value for oil price prediction using BPNN.

Oil production forecasting

We collected three online oil news collections with different keywords “Crude oil,” “American oil,” and “American oil production.” Table 6 presents the number of different news collections. The oil news collections with the keywords “American oil production,” are unavailable every month. The pieces of news collected by keywords “Crude oil” reflect all international oil events, and those collected by “American oil” reflect oil events related to America.
Table 6

Number of news using different keywords.

TypeKeywordsThe number of news in CNN train periodThe number of news in CNN test periodTotal numbers
International newsCrude oil240843856793
Domestic newsAmerican oil6228391461
American oil productionNot enough news885
Number of news using different keywords. Table 7 presents the classification performances of the CNN model. The results show that the online oil news collections with keywords “Crude oil,” achieved a better performance. Fig. 14 shows the time series of oil production and the two CNN values. The online oil news collections with the keywords “Crude oil,” are the appropriate dataset to predict oil production.
Table 7

CNN classification results.

KeywordsAccuracyPrecisionRecallF-measure
Crude oil0.700.700.700.70
American oil0.660.660.680.67
Fig. 14

Time series of oil production and the two CNN values.

CNN classification results. Time series of oil production and the two CNN values. Based on the VAR model, the optimal lay orders for text features, historical oil production data, DJI, and GDP are one, one, three, and three, respectively. Table 8 presents the descriptive statistics of the CNN values with keywords “Crude oil,” historical oil production, and oil production datasets. Judged by the Jarque-Bera test, only the historical oil production rejects the null hypothesis of normal distribution. Based on the ADF test, the historical oil production and oil production datasets are demonstrated to have non-stationarity.
Table 8

Descriptive statistics of CNN values, historical oil production, and oil production datasets.

CNNHistoric oil productionOil production
Mean0.340911135.4611172.07
Median0.331111423.4511423.45
Std. Dev.0.08031220.751179.87
Skewness0.2133−0.2150−0.2034
Kurtosis0.3729−1.3088−1.2607
Jarque-Bera0.33583.0895∗2.8797
Augmented Dickey-Fuller−3.1838∗∗−2.0042−1.8434

Note: The descriptive statistics of DJI and GDP are shown in Table 4.

Descriptive statistics of CNN values, historical oil production, and oil production datasets. Note: The descriptive statistics of DJI and GDP are shown in Table 4. Table 9 lists the performance comparison of different models using different inputs. The best MAPE was achieved using the SVM model with all selected features. Fig. 15 shows the forecasting performances of different predictive factors using SVM. Fig. 16 shows the ranking of mean impact value for oil production prediction. The MIV results show that the GDP (−3,-2,-1,0), DJI (−2,0), text features (0), and historical data (0) are important factors, but the DJI (−3,-1), text features (−1), and historical data (−1) only have little help for oil production forecasting. Online news exhibits some explanatory power for oil production forecasting, and GDP is the best predictor of oil production.
Table 9

Performance comparison of different models.

MAPERMSEMAEIR(MAPE)
BPNN14.40%616.4990475.5733
23.94%625.7280428.358610.45%
32.24%294.0954254.175949.09%
41.69%262.4757190.508061.59%
MLR15.10%847.4493545.5473
25.37%907.5902574.5642−5.29%
316.70%2.4171e+031.8282e+03−227.45%
412.18%1.6593e+031.3454e+03−138.82%
SVM13.59%504.9028398.3374
24.16%609.1765738.3157−15.88%
32.04%293.7686230.583043.18%
41.44%213.6692163.078359.89%
LSTM14.23%738.3157447.6391
24.28%589.7330485.2420−1.18%
33.40%477.3984369.102819.62%
42.93%649.7016339.495430.73%
RNN13.84%500.2649424.0437
23.61%627.1218387.39845.99%
33.37%691.9704394.213412.24%
43.06%433.1777337.488920.31%
Fig. 15

Forecasting performances of different predictive factors using SVM.

Fig. 16

Ranking of the mean impact value for oil production prediction.

Performance comparison of different models. Forecasting performances of different predictive factors using SVM. Ranking of the mean impact value for oil production prediction.

Oil consumption forecasting

We collected three online oil news collections with different keywords “Crude oil,” “American oil,” and “American oil consumption.” Table 10 shows the different news collections.
Table 10

Number of news using different keywords.

TypeKeywordsThe number of news in CNN train periodThe number of news in CNN test periodTotal numbers
International newsCrude oil240843856793
Domestic newsAmerican oil6228391461
American oil consumptionNot enough news60
Number of news using different keywords. Table 11 exhibits the classification results of the CNN model. The results show that the online oil news collections with keywords “American oil,” achieved better performance. This finding suggests that oil consumption is closely related to oil news or events, which are related to American. Fig. 17 illustrates the time series of crude oil consumption and the two CNN values. Fig. 18 shows the time series of oil consumption and keywords “American oil.”
Table 11

Classification performance of the CNN model.

KeywordsAccuracyPrecisionRecallF-measure
Crude oil0.600.630.520.57
American oil0.640.660.560.61
Fig. 17

Time series of oil consumption and the two CNN values.

Fig. 18

Time series of oil consumption and CNN values with keywords “American oil”.

Classification performance of the CNN model. Time series of oil consumption and the two CNN values. Time series of oil consumption and CNN values with keywords “American oil”. The results demonstrate that the appropriate lay orders for keywords “American oil,” historical oil consumption data, DJI, and GDP are one, two, one, and two, respectively. Table 12 shows the descriptive statistics of the CNN values with keywords “American oil,” historical oil consumption, and oil consumption datasets. Based on the ADF test, at the significance of 1% level, the historical oil consumption and oil consumption datasets are both demonstrated to have non-stationarity.
Table 12

Descriptive statistics of CNN values, historical oil consumption, and oil consumption datasets.

CNNHistorical oil consumptionOil consumption
Mean0.48035084.3855092.151
Median0.484145.84625120.261
Std. Dev.0.0128289.957145.2494
Skewness−0.6523−0.2681−0.337
Kurtosis0.50700.26330.4568
Jarque-Bera2.77510.45550.8095
Augmented Dickey-Fuller−2.0265−3.8996∗∗∗−3.8380∗∗∗
Descriptive statistics of CNN values, historical oil consumption, and oil consumption datasets. Table 13 lists the performance comparison of different models using different inputs. The best MAPE, RMSE, and MAE were obtained using the BPNN model with text features and historical data. Fig. 19 shows the forecasting performances of different predictive factors using BPNN. Fig. 20 shows the ranking of mean impact value for oil consumption prediction. The MIV results show that the historical data (0, −2) and text features (0, −1) are important factors, but the historical data (−1) only have little help for oil consumption forecasting.
Table 13

Performance comparison of different models.

MAPERMSEMAEIR(MAPE)
BPNN14.08%364.7069182.6500
21.68%94.564780.332458.82%
32.98%176.9245144.612526.96%
42.93%191.2712136.609528.19%
MLR15.61%355.8126258.9670
24.97%339.5771228.441411.41%
380.80%4.3301e+033.9944e+03−1340.29%
427.59%1.6708e+031.2832e+03−391.80%
SVM14.12%329.7182184.5873
23.24%174.3819155.548821.36%
32.85%152.4080137.722130.83%
42.78%156.5058134.202232.52%
LSTM13.53%316.4694157.4753
22.55%135.8281123.278227.76%
33.45%194.4310161.69232.27%
43.86%277.9089181.9194−9.35%
RNN13.48%314.4836154.8826
22.72%183.2028129.097321.84%
31.91%110.118091.626445.11%
43.41%198.0660160.75792.01%
Fig. 19

Forecasting performances of different predictive factors using BPNN.

Fig. 20

Ranking of the mean impact value for oil consumption prediction.

Performance comparison of different models. Forecasting performances of different predictive factors using BPNN. Ranking of the mean impact value for oil consumption prediction. As shown in Table 13 and Fig. 19, the forecasting performance of historical data and text feature data is better than that of historical data. In terms of IR (MAPE), the forecasting performance with text feature and historical data is improved by 58.82% than the prediction performance of historical oil consumption using the BPNN model. Meanwhile, using historical data and text feature data enhances performance than using only historical oil consumption in the other four methods. This finding suggests that online news may provide additional predictive information for oil consumption forecasting.

Oil inventory forecasting

Three online oil news collections with different keywords “Crude oil,” “American oil,” and “American oil inventory,” are collected. Table 14 presents the number of different news collections. The oil news collections with keywords “American oil inventory,” which only include 294 pieces of news, are unavailable every month.
Table 14

Number of news using different keywords.

TypeKeywordsThe number of news in CNN train periodThe number of news in CNN test periodTotal numbers
International newsCrude oil240843856793
Domestic newsAmerican oil6228391461
American oil inventoryNot enough news294
Number of news using different keywords. Table 15 presents the classification results of CNN. The results show that the online oil news collections with keywords “Crude oil,” achieved a better performance. Fig. 21 shows the time series of oil inventory and the two CNN values.
Table 15

Results of CNN classification.

KeywordsAccuracyPrecisionRecallF-measure
Crude oil0.750.750.730.74
American oil0.690.710.620.66
Fig. 21

Time series of oil inventory and the two CNN values.

Results of CNN classification. Time series of oil inventory and the two CNN values. The results of the VAR model demonstrate that the optimal lay orders for text feature, historical oil inventory data, DJI, and GDP are two, two, one, two, respectively. Table 16 presents the descriptive statistics of the CNN values with keywords “Crude oil,” historical oil inventory, and oil inventory datasets. Based on the Jarque-Bera test, the historical oil inventory and oil inventory datasets both reject the null hypothesis of normal distribution. Based on the ADF test, CNN values are demonstrated to have non-stationarity at the significance of 1% level.
Table 16

Descriptive statistics of CNN values, historical oil inventory, and oil inventory datasets.

CNNHistorical oil inventoryOil inventory
Mean0.43201941.6381943.099
Median0.46171923.6671923.667
Std. Dev.0.133710.122766.5930
Skewness−0.22661.12591.1397
Kurtosis−0.99410.80910.6487
Jarque-Bera2.05218.3552∗∗8.3145∗∗
Augmented Dickey-Fuller−4.2322∗∗∗−1.1972−1.6843
Descriptive statistics of CNN values, historical oil inventory, and oil inventory datasets. Table 17 lists the performance comparison of different models using different inputs. The best MAPE was achieved using the LSTM model with historical data and text features. Fig. 22 shows the forecasting performances of different predictive factors using LSTM. However, using the BPNN, MLR, and RNN models obtains better performance when only using historical data to forecast. In terms of IR(MAPE), using text features or financial features obtained worse MAPE than only using historical features. Above all, we can conclude that the oil news or financial features might not help the forecast of oil inventory.
Table 17

Performance comparison of different models.

MAPERMSEMAEIR(MAPE)
BPNN11.52%40.070530.6790
22.24%52.049145.9824−47.37%
31.51%44.166430.23140.66%
41.66%39.531033.9962−9.21%
MLR11.60%38.709532.4300
21.73%40.942135.1122−8.1%
32.77%67.000756.1855−73.13%
42.59%59.241752.7921−61.88%
SVM11.43%39.343228.7629
21.43%39.395326.46420
32.00%56.647741.6142−39.86%
42.38%52.160948.6861−66.43%
LSTM11.31%39.810326.4642
21.14%30.440623.339812.98%
31.69%43.088734.5260−29.01%
41.55%40.442631.5797−18.32%
RNN11.37%40.049227.5696
21.58%36.134131.5475−15.33%
32.10%54.805243.2982−53.28%
41.66%51.786433.1418−21.68%
Fig. 22

Forecasting performances of different predictive factors using LSTM.

Performance comparison of different models. Forecasting performances of different predictive factors using LSTM.

Results analysis and managerial implications

Result analysis of text mining The results show that online oil news collections with keywords “Crude oil” is conducive to oil price and oil production forecasting, and oil news collections with keywords “American oil” facilitate consumption forecasting. This finding indicates that the fluctuation of crude oil price, oil production, and oil inventory is influenced by various international events, but oil consumption is influenced by domestic events. Using the keywords “Crude oil” to forecast oil inventory obtained 75% classification accuracy. However, the different forecasting experiments demonstrate that using oil news might not help predict oil inventory. News facilitates oil price, production, and consumption forecasting Oil price, production, and consumption show a violent fluctuating trend during the COVID-19 pandemic, and using online media information obtains better forecasting performance. Online media information includes recent events related to oil markets and the media sentiment of short-term oil markets. The results illustrate that online media information can facilitate crude oil price, production, and consumption forecasting, especially during the COVID-19 pandemic. This finding demonstrates that the news might help predict volatile data rather than smooth or regular data. Managerial implications Affected by the coronavirus disease 2019 (COVID-19) pandemic and the oil price war between Russia and Saudi Arabia from February to March 2020, the oil price represents a total reduction of 55.57%. The collapse in crude oil price and the decline in oil demand caused by the outbreak of the COVID-19 pandemic have sharply reduced petroleum production capacity in the U.S. Oil consumption has dropped significantly since the beginning of the lockdown measures adopted by the U.S. government. With the reopening policy released in May 2020, oil consumption saw a rapid recovery. Consequently, forecasting oil price, production, and consumption becomes challenging. Fortunately, as social media messages contain explanations or analyses of the relevant restrictions or reopening policies, the utilization of online oil news can accurately predict the large fluctuations in oil price, production, and consumption during the COVID-19 pandemic. The results of oil price, production, and consumption prediction have helpful implications for marketers. It helps marketers deepen their understanding of the internal relationship between social media information and oil markets. Social media information can be used to estimate whether the news is positive or negative for the oil markets. An important positive correlation exists between social media information and the performance of oil markets. In other words, oil markets tend to perform better when social media information shows optimism about oil markets. By contrast, when social media information shows a negative sentiment toward oil markets, oil markets tend to underperform. Therefore, the previous mood of social media information should be considered when carrying out oil marketing activities. In other words, marketers must consider the impact of social media information on the oil markets or other markets, especially during the COVID-19 pandemic.

Conclusion and future research

Owing to the impact of the COVID-19 pandemic, the oil markets have shown profound uncertainty and volatility, which pose a huge challenge to accurate forecasting. To overcome this challenge, this study uses qualitative information to forecast oil markets, as online media information can reflect a variety of oil-related social events or unexpected political events and plays an important role in the fluctuating trend of the oil markets. Inspired by this issue, we use online news as a predictor for predicting oil price, production, consumption, and inventory during the COVID-19 pandemic. The CNN, which is a deep learning model, is employed to extract online news text features automatically. MLR, BPNN, SVM, RNN, and LSTM are used as forecasting techniques. The results demonstrate that social media information contributes to the forecasting of oil price, production, and consumption. However, the forecasting accuracies of oil inventory are unaffected by online news information. The main contribution is the introduction of social media information to forecast oil price, production, and oil consumption. Especially, this study is the first attempt to consider online oil news to forecast oil production and consumption, and excellent forecasting performance is achieved. The findings contribute to broadening the theoretical insights for methodological forecasting in that social media information might facilitate the prediction of unstable oil price, production, and consumption during the COVID-19 pandemic. Some limitations and potential extensions are acknowledged. First, other public opinions on breaking news, press release, and regulatory announcements can be accessed to distinguish sentiments and extract information. Second, attempting other emerging text classification technologies, such as Graph Neural Network (GNN) and long short-term memory (LSTM) is worth doing [[38], [39]]. Third, to explore the applicability of this study, other markets affected by the pandemic, such as stocks markets and gold markets, can also consider using online social information for forecasting. In future research, we will also use the proposed methodology to handle more forecasting problems under complex environment [40].

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  6 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  Deep Convolutional Neural Networks for large-scale speech tasks.

Authors:  Tara N Sainath; Brian Kingsbury; George Saon; Hagen Soltau; Abdel-rahman Mohamed; George Dahl; Bhuvana Ramabhadran
Journal:  Neural Netw       Date:  2014-09-16

3.  Dermatologist-level classification of skin cancer with deep neural networks.

Authors:  Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun
Journal:  Nature       Date:  2017-01-25       Impact factor: 49.962

4.  Erratum: Author Correction: Machine learning model to project the impact of COVID-19 on US motor gasoline demand.

Authors:  Shiqi Ou; Xin He; Weiqi Ji; Wei Chen; Lang Sui; Yu Gan; Zifeng Lu; Zhenhong Lin; Sili Deng; Steven Przesmitzki; Jessey Bouchard
Journal:  Nat Energy       Date:  2020-10-08       Impact factor: 60.858

5.  Forecasting and planning during a pandemic: COVID-19 growth rates, supply chain disruptions, and governmental decisions.

Authors:  Konstantinos Nikolopoulos; Sushil Punia; Andreas Schäfers; Christos Tsinopoulos; Chrysovalantis Vasilakis
Journal:  Eur J Oper Res       Date:  2020-08-08       Impact factor: 5.334

  6 in total
  7 in total

1.  Forecasting oil consumption with attention-based IndRNN optimized by adaptive differential evolution.

Authors:  Binrong Wu; Lin Wang; Sheng-Xiang Lv; Yu-Rong Zeng
Journal:  Appl Intell (Dordr)       Date:  2022-06-24       Impact factor: 5.019

2.  A novel grey model based on Susceptible Infected Recovered Model: A case study of COVD-19.

Authors:  Huiming Duan; Weige Nie
Journal:  Physica A       Date:  2022-05-30       Impact factor: 3.778

3.  Future assessment of the impact of the COVID-19 pandemic on the electricity market based on a stochastic socioeconomic model.

Authors:  Vinicius B F Costa; Lígia C Pereira; Jorge V B Andrade; Benedito D Bonatto
Journal:  Appl Energy       Date:  2022-03-02       Impact factor: 9.746

4.  Prediction of production facility priorities using Back Propagation Neural Network for bus body building industries: a post pandemic research article.

Authors:  A Sivakumar; N Bagath Singh; D Arulkirubakaran; P Praveen Vijaya Raj
Journal:  Qual Quant       Date:  2022-03-31

5.  Prediction of electricity energy consumption including COVID-19 precautions using the hybrid MLR-FFANN optimized with the stochastic fractal search with fitness distance balance algorithm.

Authors:  Adem Dalcali; Harun Özbay; Serhat Duman
Journal:  Concurr Comput       Date:  2022-03-22       Impact factor: 1.831

6.  PageRank Topic Finder based Algorithm for Multimedia Resources in Preschool Education.

Authors:  Guiping Yu
Journal:  Comput Intell Neurosci       Date:  2022-07-20

7.  Applications of deep learning into supply chain management: a systematic literature review and a framework for future research.

Authors:  Fahimeh Hosseinnia Shavaki; Ali Ebrahimi Ghahnavieh
Journal:  Artif Intell Rev       Date:  2022-09-30       Impact factor: 9.588

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.