| Literature DB >> 35165290 |
Yasminah Alali1, Fouzi Harrou2, Ying Sun1.
Abstract
This study aims to develop an assumption-free data-driven model to accurately forecast COVID-19 spread. Towards this end, we firstly employed Bayesian optimization to tune the Gaussian process regression (GPR) hyperparameters to develop an efficient GPR-based model for forecasting the recovered and confirmed COVID-19 cases in two highly impacted countries, India and Brazil. However, machine learning models do not consider the time dependency in the COVID-19 data series. Here, dynamic information has been taken into account to alleviate this limitation by introducing lagged measurements in constructing the investigated machine learning models. Additionally, we assessed the contribution of the incorporated features to the COVID-19 prediction using the Random Forest algorithm. Results reveal that significant improvement can be obtained using the proposed dynamic machine learning models. In addition, the results highlighted the superior performance of the dynamic GPR compared to the other models (i.e., Support vector regression, Boosted trees, Bagged trees, Decision tree, Random Forest, and XGBoost) by achieving an averaged mean absolute percentage error of around 0.1%. Finally, we provided the confidence level of the predicted results based on the dynamic GPR model and showed that the predictions are within the 95% confidence interval. This study presents a promising shallow and simple approach for predicting COVID-19 spread.Entities:
Mesh:
Year: 2022 PMID: 35165290 PMCID: PMC8844088 DOI: 10.1038/s41598-022-06218-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic presentation pf the used machine learning-based forecasting framework.
Figure 2The number of (a) confirmed and (b) recovered COVID-19 cases from January 22, 2020, through June 12, 2021, in Brazil and India.
Summary of the used COVID-19 time-series dataset.
| Series | Q1 | Median | Mean | Q3 | Std | skewness | kurtosis |
|---|---|---|---|---|---|---|---|
| Confirm India | 161736 | 6433806 | 7146174.35 | 10820333.5 | 7549037.149 | 1.169865194 | 3.997238807 |
| Confirm Brazil | 425029.5 | 4847092 | 5690061.496 | 9447165 | 5246461.833 | 0.61590352 | 2.19566924 |
| Recovered India | 69334.5 | 5389892 | 6454191.404 | 10516698.5 | 6870596.966 | 1.117660039 | 3.972183915 |
| Recovered Brazil | 172125.5 | 4299659 | 4965903.071 | 8412570 | 4689476.608 | 0.604219818 | 2.150912456 |
Figure 3Boxplots of the daily number of confirmed and covered COVID-19 time-series datasets in India and Brazil.
Figure 4Sample Autocorrelation functio of confirmed and covered COVID-19 time-series datasets in India and Brazil.
Figure 5BO-based optimized GPR procedure.
Forecasting methods investigated in this study.
| Model approach | Model name | Model description | Kernel function |
|---|---|---|---|
| Support Vector Regression (SVR) | SV_L | SVR with the Linear kernel | |
| SVR_Q | SVR with the Quadratic kernel | ||
| SVR_C | SVR with the Cubic Kernel | ||
| SVR_FG | SVR with the Fine Gaussian kernel | ||
| SVR_MG | SVR with the Medium Gaussian kernel | ||
| SVR_CG | SVR with the Cubic Gaussian kernel | ||
| Gaussian Process Regression (GPR) | GP_RQ | GPR with the Rational Quadratic kernel | |
| GP_SE | GPR with the Squared Exponential kernel | ||
| GP_M52 | GPR with the Matern 5/2 kernel | ||
| GP_Exp | GPR with the Exponential kernel | ||
| Ensemble Learning (EL) | BST | Boosted Trees | |
| BT | Bagged Trees | ||
| RF | Random Forest | ||
| XGBoost | eXtreme Gradient Boosting | ||
| Optimised models | OSVR | Optimized SVR | |
| OGPR | Optimized GPR | ||
| OEL | Optimized EL |
in the GPR-based kernel function.
Figure 6Illustration of the used forecasting framework.
Figure 7Procedure to restructure univariate COVID-19 time-series data to look like input-output data.
Hyperparameters search range and Optimized Hyperparameters using the BO algorithm.
| Model | Hyperparameter Search Range | Optimized Hyperparameters |
|---|---|---|
| SVRO | -Box constraint: 0.001-1000 | -Box constraint: 1.7128 |
| -Kernel scale: 0.001-1000 | -Kernel scale: 1 | |
| -Epsilon: 0.18495-18495.1816 | -Epsilon: 1.3156 | |
| -Kernel function: Gaussian, Linear, Quadratic, Cubic | -Kernel function: Cubic | |
| -Standardize data: true, false | -Standardize data: true | |
| GPRO | -Sigma: 0.0001-1441.9316 | -Sigma: 1217.1288 |
| -Basis function: Constant, Zero, Linear | - Basis function: Linear | |
| -Kernel function: Exponential, Matern 5/2, Rational Quadratic, Squared Exponential | -Kernel function: Matern 5/2 | |
| -Kernel scale: 0.498-498 | -Kernel scale: 493.0376 | |
| -Standardize: true, false | -Standardize: false | |
| ELO | -Ensemble method: Bag, LSBoost | Ensemble method: LSBoost |
| -Number of learners: 10-500 | -Number of learners: 11 | |
| -Learning rate: 0.001-1 | -Learning rate: 0.98438 | |
| -Minimum leaf size: 1-249 | -Minimum leaf size: 2 | |
| -Number of predictors to sample: 1-2 | -Number of predictors to sample: 2 |
Figure 8Records and forecasts of (a) confirmed and (b) recovered COVID-19 cases in India for testing period, using the fifteen machine learning methods.
Figure 9Records and forecasts of (a) confirmed and (b) recovered COVID-19 cases in Brazil for testing period, using the fifteen machine learning methods.
Th obtaine statistical criteria for confirmed and recovered COVID-19 cases forecasts in India.
| Series | Model | RMSE | MAE | MAPE |
|---|---|---|---|---|
| Confirm India | SVRO | 22337053.113 | 22334295.732 | 38.960 |
| Confirm India | SVR | 1012357.846 | 1008215.980 | 3.365 |
| Confirm India | SVR | 1382637.913 | 1369677.347 | 4.967 |
| Confirm India | SVR | 6967701.735 | 6507706.629 | 30.360 |
| Confirm India | SVR | 759414.577 | 759392.528 | 2.697 |
| Confirm India | SVR | 2356188.304 | 2280681.541 | 8.581 |
| Confirm India | SVR | 1024262.932 | 1019713.136 | 3.401 |
| Confirm India | GPR | 37398.517 | 32479.864 | 0.112 |
| Confirm India | GPR | 36208.928 | 30442.130 | 0.105 |
| Confirm India | GPR | 14350.001 | 12258.416 | 0.072 |
| Confirm India | GPR | 972208.519 | 905902.811 | 3.233 |
| Confirm India | GPRO | 111506.899 | 108951.780 | 0.374 |
| Confirm India | BT | 2005011.686 | 1974055.703 | 7.325 |
| Confirm India | BS | 2779609.635 | 2757363.556 | 10.538 |
| Confirm India | ELO | 1625388.219 | 1587044.713 | 5.806 |
| Confirm India | RF | 956649.730 | 853385.714 | 3.053 |
| Confirm India | XGBoost | 874210.275 | 759823.714 | 2.709 |
| Recoved India | SVRO | 21100097.634 | 21087507.583 | 11.320 |
| Recoved India | SVR | 1125965.648 | 1107063.877 | 3.919 |
| Recoved India | SVR | 1771552.327 | 1718529.593 | 6.779 |
| Recoved India | SVR | 10306345.167 | 9339278.915 | 59.969 |
| Recoved India | SVR | 754472.349 | 754424.474 | 2.877 |
| Recoved India | SVR | 3670129.936 | 3373505.792 | 14.480 |
| Recoved India | SVR | 1179022.579 | 1155346.231 | 4.080 |
| Recoved India | GPR | 167795.963 | 143454.775 | 0.527 |
| Recoved India | GPR | 30214.921 | 23379.830 | 0.085 |
| Recoved India | GPR | 54374.745 | 48482.147 | 0.178 |
| Recoved India | GPR | 1524148.063 | 1336405.700 | 5.208 |
| Recoved India | GPRO | 58832.766 | 46691.520 | 0.052 |
| Recoved India | BT | 3078226.027 | 2990707.504 | 11.681 |
| Recoved India | BS | 3467540.137 | 3390087.093 | 14.359 |
| Recoved India | ELO | 1871290.946 | 1723538.715 | 6.127 |
| Recoved India | RF | 1688270.003 | 1522862.929 | 5.977 |
| Recoved India | XGBoost | 1496590.266 | 1307148.929 | 5.088 |
Th obtaine statistical criteria for confirmed and recovered COVID-19 cases forecasts in Brazil.
| Series | Model | RMSE | MAE | MAPE |
|---|---|---|---|---|
| Confirm Brazil | SVRO | 178495.629 | 176859.741 | 1.055 |
| Confirm Brazil | SVR | 859664.899 | 846897.020 | 4.749 |
| Confirm Brazil | SVR | 856493.084 | 849454.839 | 5.279 |
| Confirm Brazil | SVR | 2681791.796 | 2400829.100 | 17.153 |
| Confirm Brazil | SVR | 658423.358 | 657587.460 | 4.041 |
| Confirm Brazil | SVR | 1201574.167 | 1157539.469 | 7.350 |
| Confirm Brazil | SVR | 100175.382 | 92240.318 | 0.552 |
| Confirm Brazil | GPR | 22347.367 | 20542.504 | 0.122 |
| Confirm Brazil | GPR | 36548.399 | 29617.766 | 0.175 |
| Confirm Brazil | GPR | 25452.517 | 22951.994 | 0.136 |
| Confirm Brazil | GPR | 499989.497 | 426469.985 | 2.585 |
| Confirm Brazil | GPRO | 22821.043 | 21485.641 | 0.127 |
| Confirm Brazil | BT | 819117.717 | 776873.997 | 4.811 |
| Confirm Brazil | BS | 1414426.255 | 1390388.795 | 8.951 |
| Confirm Brazil | ELO | 1471998.737 | 1448916.717 | 3.363 |
| Confirm Brazil | RF | 503458.241 | 431334.642 | 2.615 |
| Confirm Brazil | XGBoost | 484044.544 | 408507.642 | 2.474 |
| Recoved Brazil | SVRO | 30627.182 | 30583.310 | 7.618 |
| Recoved Brazil | SVR | 167774.226 | 167591.124 | 4.806 |
| Recoved Brazil | SVR | 151929.129 | 151847.006 | 5.351 |
| Recoved Brazil | SVR | 259341.166 | 254940.374 | 18.745 |
| Recoved Brazil | SVR | 129890.658 | 129878.098 | 3.400 |
| Recoved Brazil | SVR | 181760.427 | 181092.640 | 7.876 |
| Recoved Brazil | SVR | 30362.953 | 30188.755 | 1.473 |
| Recoved Brazil | GPR | 1719.567 | 1558.378 | 0.241 |
| Recoved Brazil | GPR | 1656.806 | 1525.704 | 0.247 |
| Recoved Brazil | GPR | 1667.333 | 1508.890 | 0.188 |
| Recoved Brazil | GPR | 40549.196 | 37897.498 | 2.935 |
| Recoved Brazil | GPRO | 1667.329 | 1508.883 | 0.188 |
| Recoved Brazil | BT | 84573.683 | 83349.510 | 4.581 |
| Recoved Brazil | BS | 247302.171 | 246886.202 | 9.251 |
| Recoved Brazil | ELO | 36794.352 | 33885.945 | 3.489 |
| Recoved Brazil | RF | 26978.983 | 18038.854 | 2.452 |
| Recoved Brazil | XGBoost | 57643.591 | 36789.043 | 3.117 |
Figure 10Heatmap of MAPE values obtained using the seventeen models.
Figure 11Averaged MAPE per model.
Figure 12Process of dataset preparation for dynamic models.
Figure 13HeatMap of MAPE values by methods for (a) Confirmed and (b) recovered COVID-19 times series in India.
Figure 14HeatMap of MAPE values by methods for (a) Confirmed and (b) recovered COVID-19 times series in Brazil.
Figure 15Averaged MAPE values per model.
Figure 16Feature importance identification based on RF by time-series.
Figure 17One-step-ahead prediction boundaries for (a) confirmed cases and (b) recovered cases in India with the GPR model.
Summary of different studies on COVID-19 spread prediction.
| Refs | Country | Model | Average MAPE (%) |
|---|---|---|---|
| Ceylan[ | Italy, Spain, and France | ARIMA | 5.59% |
| Ballı Serkan[ | Germany and USA | Random forest | 1.0042 |
| Linear Regression | 0.2228 | ||
| MLP | 0.5153 | ||
| SVM | 0.1162 | ||
| Nasution et al.[ | Jakarta | ARIMA | 20.51 |
| SES | 20.435 | ||
| HW | 47.415 | ||
| BATS | 33.945 | ||
| Prophet | 42.27 | ||
| PAR | 18.435 | ||
| Istaiteh et al.[ | China, Eritrea | ARIMA | 14.14 |
| ANN | 3.23 | ||
| LSTM | 4.14 | ||
| CNN | 3.13 | ||
| Shaikh et al.[ | India | Linear regression | 27.9 |
| Polynomial with 2 degrees | 13.3 | ||
| Acosta et al.[ | Brazil, Chile, Colombia, Mexico, Peru and the United States | SVM | 23.5 |
| MLP | 17 | ||
| Dairi et al.[ | Brazil, France, India, Mexico, Russia, Saudi Arabia, and the US | RBM | 18.452 |
| CNN | 20.763 | ||
| LSTM | 20.394 | ||
| GAN-DNN | 11.105 | ||
| GAN-GRU | 5.254 | ||
| LSTM-CNN | 3.718 | ||
| Omran et al.[ | Egypt, Saudi Arabia, Kuwait | a single-layer GRU | 3.0419 |
| a single-layer LSTM | 0.6203 | ||
| Kafieh et al.[ | Nine countries, including China, Spain, Italy, and the US | M-LSTM | 0.509 |
| India and Brazil |