| Literature DB >> 29584784 |
Spyros Makridakis1, Evangelos Spiliotis2, Vassilios Assimakopoulos2.
Abstract
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.Entities:
Mesh:
Year: 2018 PMID: 29584784 PMCID: PMC5870978 DOI: 10.1371/journal.pone.0194889
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
sMAPE across the 3003 time series of the M3 competition.
| Method | Forecasting horizon | Average of forecasting horizons | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 8 | 12 | 15 | 18 | 1–4 | 1–6 | 1–8 | 1–12 | 1–15 | 1–18 | |
| Theta | 8.4 | 9.6 | 11.3 | 12.5 | 13.2 | 14.0 | 12.0 | 13.2 | 16.2 | 18.2 | 10.4 | 11.5 | 11.6 | 12.0 | 12.4 | 13.0 |
| Damped | 8.8 | 10.0 | 12.0 | 13.5 | 13.7 | 14.3 | 12.5 | 13.9 | 17.5 | 18.9 | 11.1 | 12.0 | 12.1 | 12.4 | 13.0 | 13.6 |
| Box-Jenkins | 9.2 | 10.4 | 12.2 | 13.9 | 14.0 | 14.8 | 13.0 | 14.1 | 17.8 | 19.3 | 11.4 | 12.4 | 12.5 | 12.8 | 13.4 | 14.0 |
| Single | 9.5 | 10.6 | 12.7 | 14.1 | 14.3 | 15.0 | 13.3 | 14.5 | 18.3 | 19.4 | 11.7 | 12.7 | 12.8 | 13.1 | 13.7 | 14.4 |
| Holt | 9.0 | 10.4 | 12.8 | 14.5 | 15.1 | 15.8 | 13.9 | 14.8 | 18.8 | 20.2 | 11.7 | 12.9 | 13.1 | 13.4 | 14.0 | 14.6 |
| Naive 2 | 10.5 | 11.3 | 13.6 | 15.1 | 15.1 | 15.9 | 14.5 | 16.0 | 19.3 | 20.7 | 12.6 | 13.6 | 13.8 | 14.2 | 14.8 | 15.5 |
NNs are defined with bold numbers.
sMAPE and ranks of errors on the complete dataset of the C-H-N study.
| Method | Average errors | Rank across all methods | ||||||
|---|---|---|---|---|---|---|---|---|
| sMAPE(%) | MdRAE | MASE | AR | sMAPE(%) | MdRAE | MASE | AR | |
| Theta | 14.89 | 0.88 | 1.13 | 17.8 | 2 | 3 | 1 | 2 |
| ForecastPro | 15.44 | 0.89 | 1.17 | 18.2 | 4 | 4 | 3 | 3 |
| DES | 15.90 | 0.94 | 1.17 | 18.9 | 5 | 14 | 3 | 6 |
| Comb S-H-D | 15.93 | 0.09 | 1.21 | 18.8 | 6 | 5 | 7 | 5 |
| Autobox | 15.95 | 0.93 | 1.18 | 19.2 | 7 | 11 | 5 | 7 |
| SES | 16.42 | 0.96 | 1.21 | 19.6 | 9 | 16 | 7 | 12 |
NNs are defined with bold numbers.
Forecasting performance (sMAPE) of the ML methods tested in the study of Ahmed et.al.
| Rank | Method | sMAPE(%) |
|---|---|---|
| 1 | MLP | 8.34 |
| 2 | BNN | 8.58 |
| 3 | GP | 9.62 |
| 4 | GRNN | 10.33 |
| 5 | KNN | 10.34 |
| 6 | SVR | 10.40 |
| 7 | CART | 11.72 |
| 8 | RBF | 15.79 |
Forecasting performance of MLP for one-step-ahead forecasts having applied various preprocessing alternatives.
| Approach | sMAPE(%) | MASE | CC | MF |
|---|---|---|---|---|
| 9.15 | 0.67 | 90.24 | 2.80 | |
| 8.99 | 0.67 | 88.04 | 3.06 | |
| 8.97 | 0.67 | 88.07 | 2.99 | |
| 10.43 | 0.65 | 87.05 | 2.67 | |
| 11.87 | 0.86 | 85.02 | 2.95 | |
| 8.16 | 0.57 | 93.31 | 2.10 | |
| 0.56 | 88.54 | 2.16 | ||
| 9.56 | 0.56 | 84.44 | ||
| 9.07 | 0.64 | 88.78 | 2.82 | |
| 8.39 | 2.11 |
The bold numbers highlight the best performing approach per metric.
Forecasting performance of the eight statistical methods included in the study for one-step-ahead forecasts using the original data.
| Method | sMAPE(%) | MASE | CC | MF |
|---|---|---|---|---|
| 8.59 | 0.56 | 1.00 | 3.63 | |
| 7.36 | 0.49 | 1.53 | 2.37 | |
| 7.41 | 0.48 | 2.31 | 2.35 | |
| 7.30 | 0.48 | 3.96 | 2.34 | |
| 7.27 | 0.48 | 6.88 | 2.32 | |
| 7.31 | 0.48 | 5.84 | 2.34 | |
| 7.34 | 0.47 | 43.96 | 2.53 | |
| 7.19 | 0.47 | 34.07 | 2.28 |
Forecasting performance of the eight statistical methods included in the study for one-step-ahead forecasts having applied the Box-Cox transformation.
| Method | sMAPE(%) | MASE | CC | MF |
|---|---|---|---|---|
| 8.58 | 0.56 | 1.28 | 3.66 | |
| 7.25 | 0.49 | 1.69 | 2.38 | |
| 7.32 | 0.48 | 2.45 | 2.35 | |
| 7.19 | 0.48 | 4.54 | 2.33 | |
| 7.20 | 0.48 | 7.23 | 2.32 | |
| 7.23 | 0.48 | 5.75 | 2.36 | |
| 7.19 | 0.47 | 46.56 | 2.59 | |
| 7.12 | 0.47 | 35.55 | 2.30 |
Forecasting performance of the ten ML methods included in the study for one-step-ahead forecasts having applied the most appropriate preprocessing alternative.
| Method | sMAPE(%) | MASE | CC | MF |
|---|---|---|---|---|
| 8.39 | 0.55 | 83.49 | 2.11 | |
| 8.34 | ||||
| 8.17 | 0.53 | 47.44 | 2.11 | |
| 8.56 | ||||
| 9.57 | 0.71 | 146.11 | 1.66 | |
| 15.79 | ||||
| 9.49 | 0.67 | 388.73 | 1.80 | |
| 10.33 | ||||
| 11.49 | 0.80 | 12.01 | 3.30 | |
| 10.34 | ||||
| 10.28 | 0.74 | 8.89 | 1.74 | |
| 11.72 | ||||
| 8.88 | 0.61 | 9.79 | 2.11 | |
| 10.40 | ||||
| 9.14 | 0.62 | 29.39 | 2.09 | |
| 9.62 | ||||
| 9.48 | 0.54 | 23.18 | 1.98 | |
| 11.67 | 0.72 | 48.66 | 1.84 |
The corresponding accuracies of Ahmed and coauthors, from their Table 3 p. 611 are also shown for reasons of comparison.
Fig 1The three possible multi-step-ahead forecasting approaches used by NNs.
(a) The iterative, (b) the direct and (c) the multi-neural network forecasting approach.
Forecasting performance (sMAPE) of ML and Statistical methods across various horizons having applied the most appropriate preprocessing alternative.
| Method | Short | Medium | Long | Average | CC |
|---|---|---|---|---|---|
| 9.53 | 12.34 | 15.00 | 12.29 | 245.58 | |
| 10.72 | 13.55 | 16.20 | 13.49 | 438.53 | |
| 9.53 | 12.69 | 16.08 | 12.77 | 4006.82 | |
| 9.39 | 12.08 | 14.80 | 12.09 | 141.91 | |
| 9.48 | 12.70 | 15.96 | 12.71 | 2046.49 | |
| 10.78 | 12.46 | 15.08 | 12.77 | ||
| 9.17 | 10.85 | 13.77 | 11.26 | 1.60 | |
| 9.07 | 11.18 | 14.29 | 11.51 | 1.75 | |
| 8.96 | 10.63 | 13.46 | 11.02 | 2.07 | |
| 8.95 | 10.57 | 13.38 | 10.97 | 2.65 | |
| 8.96 | 1.70 | ||||
| 11.08 | 13.84 | 11.28 | 73.50 | ||
| 9.07 | 10.98 | 13.74 | 11.26 | 56.66 |
The bold numbers highlight the best performing method per forecasting horizon and computational complexity.
Forecasting performance (MASE) of ML and statistical methods across various horizons having applied the most appropriate preprocessing alternative.
| Method | Short | Medium | Long | Average | CC |
|---|---|---|---|---|---|
| 0.66 | 0.98 | 1.24 | 0.96 | 245.58 | |
| 0.76 | 1.10 | 1.38 | 1.08 | 438.53 | |
| 0.65 | 1.02 | 1.37 | 1.01 | 4006.82 | |
| 0.64 | 0.94 | 1.20 | 0.93 | 141.91 | |
| 0.65 | 1.02 | 1.35 | 1.01 | 2046.49 | |
| 0.76 | 1.05 | 1.35 | 1.05 | ||
| 0.67 | 0.96 | 1.29 | 0.97 | 1.60 | |
| 0.64 | 0.92 | 1.25 | 0.94 | 1.75 | |
| 0.64 | 0.91 | 1.21 | 0.92 | 2.07 | |
| 0.64 | 0.902 | 1.20 | 0.91 | 2.65 | |
| 0.64 | 0.90 | 1.70 | |||
| 0.89 | 1.17 | 73.50 | |||
| 0.64 | 0.92 | 1.21 | 0.92 | 56.66 |
The bold numbers highlight the best performing method per forecasting horizon and computational complexity.
Fig 2Forecasting performance (sMAPE) of the ML and statistical methods included in the study.
The results are reported for one-step-ahead forecasts having applied the most appropriate preprocessing alternative.
Fig 3Forecasting performance (sMAPE) versus model fitting.
The results are reported for one-step-ahead forecasts having applied the most appropriate preprocessing alternative.
Fig 4Forecasting performance (sMAPE) versus computational complexity.
The results are reported for one-step-ahead forecasts having applied the most appropriate preprocessing alternative.
Features of various Artificial Intelligence (AI) applications.
| Type of Application | Rules are known and do not change | The environment is known and stable | Predictions can influence the future | Extent of Uncertainty (or amount of noise) | Examples |
|---|---|---|---|---|---|
| Games | Yes | Yes | No | None | Chess, GO |
| Image and speech recognition | Yes | Yes | No | Minimal (can be minimized by big data) | Face Recognition, Siri, Cortana, Google AI |
| Predictions based on the Law of large numbers | Yes | Yes | Minimally | Measurable (Normally distributed) | Forecasting the sales of beer, coffee, soft drinks, weather etc. |
| Autonomous Functions | Yes | Yes | No | Can be assessed and minimized | Self-Driving Vehicles |
| Strategy, Competition, Investments | No | No | Yes, often to a great extent | Cannot be measured (fat tails) | Decisions, Anticipations, Forecasts |
| Combinations of the above | It may be the ultimate challenge moving towards GAI (General AI) but also increasing the level of complexity and sophistication of algorithms | Eventually it can cover everything | |||