| Literature DB >> 35885836 |
Khanita Duangchaemkarn1,2,3, Waraporn Boonchieng4,5, Phongtape Wiwatanadate6, Varin Chouvatut4,7.
Abstract
This study aims to identify and evaluate a robust and replicable public health predictive model that can be applied to the COVID-19 time-series dataset, and to compare the model performance after performing the 7-day, 14-day, and 28-day forecast interval. The seasonal autoregressive integrated moving average (SARIMA) model was developed and validated using a Thailand COVID-19 open dataset from 1 December 2021 to 30 April 2022, during the Omicron variant outbreak. The SARIMA model with a non-statistically significant p-value of the Ljung-Box test, the lowest AIC, and the lowest RMSE was selected from the top five candidates for model validation. The selected models were validated using the 7-day, 14-day, and 28-day forward-chaining cross validation method. The model performance matrix for each forecast interval was evaluated and compared. The case fatality rate and mortality rate of the COVID-19 Omicron variant were estimated from the best performance model. The study points out the importance of different time interval forecasting that affects the model performance.Entities:
Keywords: COVID-19; SARIMA; coronavirus; predictive modeling; seasonal ARIMA; time-series forecasting
Year: 2022 PMID: 35885836 PMCID: PMC9324558 DOI: 10.3390/healthcare10071310
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Graphical presentation of COVID-19 cases statistics in Thailand showing the data points used in the study (from 1 December 2021 to 28 May 2022).
Figure 2Model validation using day forward-chaining cross validation method.
Model accuracy evaluation criteria using MAPE values.
| MAPE (%) | Interpretation |
|---|---|
| <10 | Highly accurate forecasting |
| 10–20 | Good forecasting |
| 20–50 | Reasonable forecasting |
| >50 | Inaccurate forecasting |
Figure 3Graphical presentation of COVID-19 confirmed cases in Thailand during Omicron variant outbreak from 1 December 2021 to 30 April 2022.
Unit root test results for training set using augmented Dickey–Fuller (ADF) method.
| Dataset | Diff a | ADF Value | |
|---|---|---|---|
| Daily confirmed cases | 0 | 0.31 | 0.98 |
| 1 | −1.94 | 0.31 | |
| Daily deaths | 0 | 0.79 | 0.99 |
| 1 | −6.99 | <0.01 | |
| Daily recovery cases | 0 | 0.55 | 0.99 |
| 1 | −2.24 | 0.19 |
a Number of differencing; b p-value less than 0.05 indicates that the dataset is stationary.
Figure 4The ACF and PACF diagnostic plots after performing the 1st differencing order of the dataset with 95%CI.
The model training diagnostic results on the COVID-19 data performed by candidate models.
| Model | AIC | RMSE | MAPE | Ljung–Box Test | |
|---|---|---|---|---|---|
| Score | |||||
|
| |||||
| SARIMA(1, 1, 1),(0, 1, 1, 7) | 1336.32 | 1931.46 | 7.26% | 0.14 | 0.71 |
| SARIMA(1, 1, 1),(0, 1, 2, 7) | 1336.41 | 2230.31 | 8.18% | 0.18 | 0.68 |
| SARIMA(1, 1, 1),(1, 1, 1, 7) | 1336.42 | 2281.49 | 8.31% | 0.15 | 0.7 |
| SARIMA(0, 1, 1),(1, 1, 1, 7) | 1338.13 | 3185.02 | 11.28% | 0.82 | 0.37 |
| SARIMA(1, 1, 2),(0, 1, 1, 7) | 1338.29 | 1999.01 | 7.44% | 0.14 | 0.71 |
|
| |||||
| SARIMA(0, 1, 2),(1, 1, 2, 7) | 756.04 | 8.93 | 5.87% | 0.07 | 0.8 |
| SARIMA(0, 1, 2),(1, 1, 1, 7) | 754.13 | 8.90 | 5.82% | 0.05 | 0.82 |
| SARIMA(1, 1, 1),(1, 1, 2, 7) | 756.57 | 9.27 | 6.22% | 0.18 | 0.68 |
| SARIMA(1, 1, 1),(1, 1, 1, 7) | 754.30 | 8.99 | 5.87% | 0.1 | 0.76 |
| SARIMA(1, 1, 2),(1, 1, 2, 7) | 758.35 | 9.01 | 5.91% | 0.07 | 0.79 |
|
| |||||
| SARIMA(0, 1, 1),(0, 1, 1, 7) | 1410.43 | 3686.54 | 13.51% | 0.14 | 0.71 |
| SARIMA(0, 1, 2),(0, 1, 1, 7) | 1411.39 | 3799.29 | 13.90% | 0.49 | 0.48 |
| SARIMA(1, 1, 1),(0, 1, 1, 7) | 1411.87 | 3750.77 | 13.74% | 0.39 | 0.53 |
| SARIMA(0, 1, 1),(1, 1, 1, 7) | 1412.22 | 3312.99 | 12.13% | 0.09 | 0.76 |
| SARIMA(0, 1, 1),(0, 1, 2, 7) | 1412.23 | 3343.85 | 12.25% | 0.10 | 0.75 |
Figure 5The standardized residuals plot (left), and a histogram and estimated density plot (right) for the selected models of daily COVID-19 confirmed cases (a,b); death (c,d); and recovered cases (e,f); within 1 standard deviation from the mean prediction value.
The final model performance evaluation on the COVID-19 daily confirmed cases.
| n-Day Forward | Block | Training Period | Forecasted Period | RMSE | MAE | MAPE (%) | U1 |
|---|---|---|---|---|---|---|---|
| 7-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–9 Apr 22 | 3097.87 | 2789.22 | 11.39 | 0.06 |
| 2 | 1 Dec 21–9 Apr 22 | 10–16 Apr 22 | 2649.41 | 1733.54 | 8.66 | 0.06 | |
| 3 | 1 Dec 21–16 Apr 22 | 17–23 Apr 22 | 4936.84 | 4325.87 | 21.69 | 0.14 | |
| 4 | 1 Dec 21–23 Apr 22 | 24–30 Apr 22 | 4514.16 | 4099.44 | 29.02 | 0.13 | |
| Average | 3799.57 | 3237.02 | 17.69 | 0.10 | |||
| 14-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–16 Apr 22 | 5785.04 | 4890.97 | 22.30 | 0.11 |
| 2 | 1 Dec 21–16 Apr 22 | 17–30 Apr 22 | 4779.98 | 4407.78 | 26.30 | 0.15 | |
| Average | 5282.51 | 4649.38 | 24.30 | 0.13 | |||
| 28-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–30 Apr 22 | 10,997.77 | 9429.39 | 55.08 | 0.22 |
Figure 6Model validation results of daily COVID-19 confirmed cases (orange solid line), compared to the observed cases and 7-day interval forecast daily COVID-19 confirmed cases (red solid line) with 95% CI (blue shade).
The final model performance evaluation on the COVID-19 daily deaths.
| n-Day Forward | Block | Training Period | Forecasted Period | RMSE | MAE | MAPE (%) | U1 |
|---|---|---|---|---|---|---|---|
| 7-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–9 Apr 22 |
|
|
|
|
| 2 | 1 Dec 21–9 Apr 22 | 10–16 Apr 22 | 11.45 | 9.59 | 8.27 | 0.05 | |
| 3 | 1 Dec 21–16 Apr 22 | 17–23 Apr 22 | 2.98 | 2.63 | 2.05 | 0.01 | |
| 4 | 1 Dec 21–23 Apr 22 | 24–30 Apr 22 | 13.68 | 13.40 | 10.70 | 0.05 | |
|
|
|
|
|
| |||
| 14-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–16 Apr 22 | 7.19 | 5.35 | 5.02 | 0.03 |
| 2 | 1 Dec 21–16 Apr 22 | 17–30 Apr 22 | 10.31 | 8.30 | 6.60 | 0.04 | |
|
|
|
|
|
| |||
| 28-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–30 Apr 22 |
|
|
|
|
Figure 7Model validation results of daily COVID-19-related deaths (orange solid line), compared to the observed cases and 7-day interval forecast daily COVID-19-related deaths (red solid line) with 95% CI (blue shade).
The final model validation results on the COVID-19 daily recovered cases.
| n-Day Forward | Block | Training Period | Forecasted Period | RMSE | MAE | MAPE (%) | U1 |
|---|---|---|---|---|---|---|---|
| 7-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–9 Apr 22 | 1999.88 | 1814.99 | 7.41 | 0.03 |
| 2 | 1 Dec 21–9 Apr 22 | 10–16 Apr 22 | 748.85 | 663.08 | 2.57 | 0.01 | |
| 3 | 1 Dec 21–16 Apr 22 | 17–23 Apr 22 | 2523.33 | 1942.32 | 8.55 | 0.05 | |
| 4 | 1 Dec 21–23 Apr 22 | 24–30 Apr 22 | 4665.59 | 4235.41 | 22.55 | 0.10 | |
|
|
|
|
|
| |||
| 14-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–16 Apr 22 | 2484.92 | 2296.17 | 9.15 | 0.04 |
| 2 | 1 Dec 21–16 Apr 22 | 17–30 Apr 22 | 5238.33 | 4264.90 | 21.64 | 0.11 | |
|
|
|
|
|
| |||
| 28-day interval | 1 | 1 Dec 21–2 Apr 22 | 3–30 Apr 22 |
|
|
|
|
Figure 8Model validation results of daily COVID-19 daily recovered cases (orange solid line), compared to the observed cases and 7-day interval forecast daily COVID-19 recovered cases (red solid line) with 95% CI (blue shade).
Figure 9A forecasting result of cumulative COVID-19 cases in Thailand compared to the observed cumulative confirmed cases with mean average percentage error.
Figure 10A forecasting result of COVID-19 cumulative deaths in Thailand compared to the observed cumulative deaths with mean average percentage error.
Figure 11A forecasting result of COVID-19 cumulative recovered cases in Thailand compared to the observed cumulative deaths with mean average percentage error.
Figure 12The forecasted COVID-19 mortality analysis of Thailand; (a) COVID-19 mortality rate per 100,000 population; (b) COVID-19 case fatality rate.