Literature DB >> 31312697

Data on forecasting energy prices using machine learning.

Gabriel Paes Herrera^1,2, Michel Constantino², Benjamin Miranda Tabak³, Hemerson Pistori^2,4, Jen-Je Su¹, Athula Naranpanawa¹.

Abstract

This article contains the data related to the research article "Long-term forecast of energy commodities price using machine learning" (Herrera et al., 2019). The datasets contain monthly prices of six main energy commodities covering a large period of nearly four decades. Four methods are applied, i.e. a hybridization of traditional econometric models, artificial neural networks, random forests, and the no-change method. Data is divided into 80-20% ratio for training and test respectively and RMSE, MAPE, and M-DM test used for performance evaluation. Other methods can be applied to the dataset and used as a benchmark.

Entities: Chemical Disease

Keywords: ANN; Coal; Natural gas; Oil

Year: 2019 PMID： 31312697 PMCID： PMC6610706 DOI： 10.1016/j.dib.2019.104122

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table The data cover a large period of nearly four decades, which provides enough observations to train and test machine learning algorithms. Different methods can be applied to the data and compared to the ones presented here. The data can be used to guide policy makers, investors, companies, and others involved in the international energy market.

Data

The data includes monthly prices, reported in nominal U.S. dollars, period average and not seasonally adjusted of six energy commodities that were chosen according to their importance for the international energy market, i.e. Oil Brent, Oil WTI, Oil Dubai Fateh, Coal AU, Gas US, and Gas Russia. In 2017 the global primary energy sources were: oil (32%), coal (27%) and natural gas (22%) [2]. The description and summary statistics of each commodity is presented in Table 1. In addition, the data contains the log-return of each time series. Fig. 1 shows the price behavior and reveals the non-seasonality of the data in all six cases. We divided the data into two segments, the first 80% of the data for training and the remaining 20% for test as suggested by Ref. [3].

Table 1

Description and summary statistics.

Time series	Description	Period	Min.	Max.	Mean	Std. Dev.
Oil Brent	Crude Oil (petroleum). Dated Brent, light blend 38 API, fob U.K., US$/barrel.	Jan/1980–Jun/2017	9.56	133.90	41.946	30.927
Oil WTI	Crude Oil (petroleum). West Texas Intermediate 40 API, Midland Texas, US$/barrel.	Jan/1980–Jun/2017	11.31	133.93	41.309	27.720
Oil Dubai	Crude Oil (petroleum). Dubai medium Fateh 32 API, US$/barrel.	Jan/1980–Jun/2017	8.50	131.22	39.737	30.246
Coal AU	Australian thermal coal. 12,000- BTU/pound, FOB Newcastle/Port Kembla, US$/metric ton.	Jan/1980–Jun/2017	24.00	195.19	52.475	29.603
Gas US	Natural Gas spot price at the Henry Hub terminal in Louisiana, US$/Million Metric BTU.	Jan/1991–Jun/2017	1.14	13.63	3.875	2.260
Gas Russia	Russian Natural Gas border price in Germany, US$/Million Metric BTU.	Jan/1985–Jun/2017	1.44	16.02	5.097	3.510

Fig. 1

Historical prices behavior.

Description and summary statistics. Historical prices behavior.

Experimental design, materials, and methods

We performed and evaluated four different methods, a hybridization of traditional econometric models, artificial neural networks, random forests, and the no-change method, as described in [1]. The last one implies that changes in an observation value are unpredictable, so the best forecast is simply the current observation value. It is a convention to compare the performance of models with the no-change method as it is considered a natural benchmark [4]. The data provided within this article is the original obtained in the data source, except for the log-returns of each commodity price. However, our analyses rely on an original database since the raw information needs to go through a few steps before the actual application of the machine learning techniques. For the artificial neural network (ANN) we applied first differentiation, generated lagged values from one to twelve and then calculated the model. For the random forests (RF) we created lagged values from one to 72, which work as predictors variables. These two transformations can be easily applied using any statistical software. The hybrid model combines autoregressive integrated moving average (ARIMA); error, trend, and seasonality (ets); seasonal and trend decomposition using Loess (stl); exponential smoothing state space model with Box-Cox transformation, ARMA errors, trend and seasonal components (tbats) and Theta model. We tested the method performance using equal weights to each individual model as well as optimal weights determined by an algorithm that uses non-rolling time series cross-validation to minimize the root mean square error (RMSE) and set the weight coefficients. The ANN applied was a feedforward multi-layer perceptron (MLP). An iterative neural filter (INF) was used to capture the number and the period of the seasonalities that are present in the data and determine the input vector. As stated by Ref. [5], there is no methodology universally accepted to guide the architecture specification of MLPs. Therefore, different combinations of numbers of hidden layers and nodes were tested to set the optimal structure. The random forests method operates by applying three steps: sample fractions of the data, grow a randomized tree predictor on each small piece and then aggregate these predictors by averaging. Different combinations of number of trees, lagged variables and number of variables randomly sampled for splitting at each tree node were tested to set the best architecture. The methods were evaluated using two statistical loss functions, i.e. the mean absolute percentage error (MAPE) and the root mean square error (RMSE). Additionally, to statistically test the significant difference regarding the performance amongst the methods we used the modified Diebold-Mariano (M-DM) test as proposed by Ref. [6].

Specifications Table

Subject area	Economics
More specific subject area	Energy forecasting
Type of data	Tables, Figures and Excel file
How data was acquired	Primary data on historical prices of oil, coal and natural gas were obtained from the International Monetary Fund (IMF)
Data format	Raw, analyzed
Experimental factors	Four forecasting methods were compared using six time series with different sizes
Experimental features	Several parameters were tested for each method. The code was implemented on R software
Data source location	International Monetary Fund – IMF, 720 19th street, Washington, D.C., United States of America.
Data accessibility	The data is included in this article
Related research article	G.P. Herrea, M. Constantino, B.M. Tabak, H. Pistori, J. Su, A. Naranpanawa, Long-term forecast of energy commodities price using machine learning, Energy. 179 (2019) 214–221. https://doi.org/10.1016/j.energy.2019.04.077

Value of the data

•

The data cover a large period of nearly four decades, which provides enough observations to train and test machine learning algorithms.

•

Different methods can be applied to the data and compared to the ones presented here.

•

The data can be used to guide policy makers, investors, companies, and others involved in the international energy market.

3 in total

1. Replication in Energy Markets: Use and Misuse of Chaos Tools.

Authors: Loretta Mastroeni; Pierluigi Vellucci
Journal: Entropy (Basel) Date: 2022-05-16 Impact factor: 2.738

2. The Nexus Between the Big Five Personality Traits Model of the Digital Economy and Blockchain Technology Influencing Organization Psychology.

Authors: Yu Dan; Alim Al Ayub Ahmed; Supat Chupradit; Priyanut Wutti Chupradit; Abdelmohsen A Nassani; Mohamed Haffar
Journal: Front Psychol Date: 2021-11-25

3. Do Gas Price and Uncertainty Indices Forecast Crude Oil Prices? Fresh Evidence Through XGBoost Modeling.

Authors: Kais Tissaoui; Taha Zaghdoudi; Abdelaziz Hakimi; Mariem Nsaibi
Journal: Comput Econ Date: 2022-09-16 Impact factor: 1.741

3 in total