Literature DB >> 31312697

Data on forecasting energy prices using machine learning.

Gabriel Paes Herrera1,2, Michel Constantino2, Benjamin Miranda Tabak3, Hemerson Pistori2,4, Jen-Je Su1, Athula Naranpanawa1.   

Abstract

This article contains the data related to the research article "Long-term forecast of energy commodities price using machine learning" (Herrera et al., 2019). The datasets contain monthly prices of six main energy commodities covering a large period of nearly four decades. Four methods are applied, i.e. a hybridization of traditional econometric models, artificial neural networks, random forests, and the no-change method. Data is divided into 80-20% ratio for training and test respectively and RMSE, MAPE, and M-DM test used for performance evaluation. Other methods can be applied to the dataset and used as a benchmark.

Entities:  

Keywords:  ANN; Coal; Natural gas; Oil

Year:  2019        PMID: 31312697      PMCID: PMC6610706          DOI: 10.1016/j.dib.2019.104122

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table The data cover a large period of nearly four decades, which provides enough observations to train and test machine learning algorithms. Different methods can be applied to the data and compared to the ones presented here. The data can be used to guide policy makers, investors, companies, and others involved in the international energy market.

Data

The data includes monthly prices, reported in nominal U.S. dollars, period average and not seasonally adjusted of six energy commodities that were chosen according to their importance for the international energy market, i.e. Oil Brent, Oil WTI, Oil Dubai Fateh, Coal AU, Gas US, and Gas Russia. In 2017 the global primary energy sources were: oil (32%), coal (27%) and natural gas (22%) [2]. The description and summary statistics of each commodity is presented in Table 1. In addition, the data contains the log-return of each time series. Fig. 1 shows the price behavior and reveals the non-seasonality of the data in all six cases. We divided the data into two segments, the first 80% of the data for training and the remaining 20% for test as suggested by Ref. [3].
Table 1

Description and summary statistics.

Time seriesDescriptionPeriodMin.Max.MeanStd. Dev.
Oil BrentCrude Oil (petroleum). Dated Brent, light blend 38 API, fob U.K., US$/barrel.Jan/1980–Jun/20179.56133.9041.94630.927
Oil WTICrude Oil (petroleum). West Texas Intermediate 40 API, Midland Texas, US$/barrel.Jan/1980–Jun/201711.31133.9341.30927.720
Oil DubaiCrude Oil (petroleum). Dubai medium Fateh 32 API, US$/barrel.Jan/1980–Jun/20178.50131.2239.73730.246
Coal AUAustralian thermal coal. 12,000- BTU/pound, FOB Newcastle/Port Kembla, US$/metric ton.Jan/1980–Jun/201724.00195.1952.47529.603
Gas USNatural Gas spot price at the Henry Hub terminal in Louisiana, US$/Million Metric BTU.Jan/1991–Jun/20171.1413.633.8752.260
Gas RussiaRussian Natural Gas border price in Germany, US$/Million Metric BTU.Jan/1985–Jun/20171.4416.025.0973.510
Fig. 1

Historical prices behavior.

Description and summary statistics. Historical prices behavior.

Experimental design, materials, and methods

We performed and evaluated four different methods, a hybridization of traditional econometric models, artificial neural networks, random forests, and the no-change method, as described in [1]. The last one implies that changes in an observation value are unpredictable, so the best forecast is simply the current observation value. It is a convention to compare the performance of models with the no-change method as it is considered a natural benchmark [4]. The data provided within this article is the original obtained in the data source, except for the log-returns of each commodity price. However, our analyses rely on an original database since the raw information needs to go through a few steps before the actual application of the machine learning techniques. For the artificial neural network (ANN) we applied first differentiation, generated lagged values from one to twelve and then calculated the model. For the random forests (RF) we created lagged values from one to 72, which work as predictors variables. These two transformations can be easily applied using any statistical software. The hybrid model combines autoregressive integrated moving average (ARIMA); error, trend, and seasonality (ets); seasonal and trend decomposition using Loess (stl); exponential smoothing state space model with Box-Cox transformation, ARMA errors, trend and seasonal components (tbats) and Theta model. We tested the method performance using equal weights to each individual model as well as optimal weights determined by an algorithm that uses non-rolling time series cross-validation to minimize the root mean square error (RMSE) and set the weight coefficients. The ANN applied was a feedforward multi-layer perceptron (MLP). An iterative neural filter (INF) was used to capture the number and the period of the seasonalities that are present in the data and determine the input vector. As stated by Ref. [5], there is no methodology universally accepted to guide the architecture specification of MLPs. Therefore, different combinations of numbers of hidden layers and nodes were tested to set the optimal structure. The random forests method operates by applying three steps: sample fractions of the data, grow a randomized tree predictor on each small piece and then aggregate these predictors by averaging. Different combinations of number of trees, lagged variables and number of variables randomly sampled for splitting at each tree node were tested to set the best architecture. The methods were evaluated using two statistical loss functions, i.e. the mean absolute percentage error (MAPE) and the root mean square error (RMSE). Additionally, to statistically test the significant difference regarding the performance amongst the methods we used the modified Diebold-Mariano (M-DM) test as proposed by Ref. [6].

Specifications Table

Subject areaEconomics
More specific subject areaEnergy forecasting
Type of dataTables, Figures and Excel file
How data was acquiredPrimary data on historical prices of oil, coal and natural gas were obtained from the International Monetary Fund (IMF)
Data formatRaw, analyzed
Experimental factorsFour forecasting methods were compared using six time series with different sizes
Experimental featuresSeveral parameters were tested for each method. The code was implemented on R software
Data source locationInternational Monetary Fund – IMF, 720 19th street, Washington, D.C., United States of America.
Data accessibilityThe data is included in this article
Related research articleG.P. Herrea, M. Constantino, B.M. Tabak, H. Pistori, J. Su, A. Naranpanawa, Long-term forecast of energy commodities price using machine learning, Energy. 179 (2019) 214–221. https://doi.org/10.1016/j.energy.2019.04.077
Value of the data

The data cover a large period of nearly four decades, which provides enough observations to train and test machine learning algorithms.

Different methods can be applied to the data and compared to the ones presented here.

The data can be used to guide policy makers, investors, companies, and others involved in the international energy market.

  3 in total

1.  Replication in Energy Markets: Use and Misuse of Chaos Tools.

Authors:  Loretta Mastroeni; Pierluigi Vellucci
Journal:  Entropy (Basel)       Date:  2022-05-16       Impact factor: 2.738

2.  The Nexus Between the Big Five Personality Traits Model of the Digital Economy and Blockchain Technology Influencing Organization Psychology.

Authors:  Yu Dan; Alim Al Ayub Ahmed; Supat Chupradit; Priyanut Wutti Chupradit; Abdelmohsen A Nassani; Mohamed Haffar
Journal:  Front Psychol       Date:  2021-11-25

3.  Do Gas Price and Uncertainty Indices Forecast Crude Oil Prices? Fresh Evidence Through XGBoost Modeling.

Authors:  Kais Tissaoui; Taha Zaghdoudi; Abdelaziz Hakimi; Mariem Nsaibi
Journal:  Comput Econ       Date:  2022-09-16       Impact factor: 1.741

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.