| Literature DB >> 35240374 |
Sumit Mohan1, Anil Kumar Solanki2, Harish Kumar Taluja3, Anuj Singh4.
Abstract
BACKGROUND: Since January 2020, India has faced two waves of COVID-19; preparation for the upcoming waves is the primary challenge for public health sectors and governments. Therefore, it is important to forecast future cumulative confirmed cases to plan and implement control measures effectively.Entities:
Keywords: ARIMA; COVID-19; Natural language processing; Prophet; Sentiment analysis; Time series forecasting
Mesh:
Year: 2022 PMID: 35240374 PMCID: PMC8881817 DOI: 10.1016/j.compbiomed.2022.105354
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589
Vaccination progress as of February 13, 2022.
| Groups | 1st Dose | 2nd Dose | Precaution Dose |
|---|---|---|---|
| 1,03,99,410 | 99,30,634 | 38,78,308 | |
| 1,84,05,152 | 1,73,74,818 | 53,58,037 | |
| 5,20,32,858 | 1,47,92,245 | N/A | |
| 54,80,44,294 | 42,63,39,386 | N/A | |
| 20,16,19,377 | 17,62,74,802 | N/A | |
| 12,58,81,409 | 10,98,24,107 | 79,94,610 |
Fig. 1Comparison of active and recovered cases and deaths.
Fig. 2Daily confirmed cases in India, March 2020–August 2021.
Summary of recent related works.
| Approach/Model | Country | Accuracy | Reference |
|---|---|---|---|
| Genetic programming/gene expression programming | Australia | Genetic programming better than other ML models | Salgotra et al., 2021 [ |
| Deep learning/ARIMA, LSTM, and SLSTM | India/Chennai | SLSTM better than LSTM and ARIMA | Devaraj et al., 2021 [ |
| ARIMA, KNN, R.F., SVM, Holt-Winters, SARIMA, PR, decision trees | Bulgaria, Greece, Russia, China, Iran, Sweden, India, The Netherlands | Holt-Winters, SARIMA better than other ML models | Saba et al., 2021 [ |
| SARIMA, LSTM, ARIMA, and RF | Spain, India, USA, Worldwide | SARIMA and LSTM Better than ARIMA and RF | Malki et al., 2021 [ |
| LR, SARIMAX, SSL, statistical SARIMAX | India, China, Brazil, USA | SSL better than others | Patil et al., 2021 [ |
| Uncertain time series forecasting | China | Better than traditional time series forecasting | Ye et al., 2021 [ |
| VARIMAX | Philippines | Able to forecast future cases with ordinary least squares algorithm | Jamdade et al., 2021 [ |
| ARIMA and SARIMA | Top 16 infected countries: Brazil, Chile, India, Colombia, Russia, Mexico, Iran, Peru, Bangladesh | SARIMA models outperform the ARIMA models | Arun et al., 2021 [ |
| ARIMA, Holt-Winters, TBATS, and Spline | USA, Italy | ARIMA and Holt-Winters better than TBATS and Spline | Gecili et al., 2021 [ |
| Epidemiology SIR with regression, ARIMA, and Prophet | Top 20 countries | SEIR better for long term prediction, and POLY d(2) better for short periods | Furtado et al., 2021 [ |
| ARIMA | Bangladesh | ARIMA (0,2,1) and ARIMA (0,1,1) better than others | Kundu et al., 2021 [ |
| ARIMA | Egypt | ARIMA (2,1,2) and ARIMA (2,1,3) | Sabry et al., 2021 [ |
| ARIMA | India | ARIMA (2,2,2) | Roy et al., 2021 [ |
Fig. 3The proposed methodology.
Fig. 4Trend, seasonality, and residual of confirmed cases.
Fig. 5Rolling mean and standard deviation of confirmed cases.
ADF test and after log parameter values.
| Parameters | Values | After Log | After 1st-Order Differencing |
|---|---|---|---|
| Test Statistic | 0.898771 | −3.151168 | −2.592978 |
| p-value | |||
| # Lags Used | 17.000000 | 16.000000 | 19.000000 |
| No. of Observations | 542.000000 | 542.000000 | 538.000000 |
| Critical Value (1%) | −3.442473 | −3.442473 | −3.442563 |
| Critical Value (5%) | −2.866887 | −2.866887 | −2.866927 |
| Critical Value (10%) | −2.569618 | −2.569618 | −2.569639 |
Fig. 6Rolling mean and standard deviation after log.
Fig. 7Acf plot of confirmed cases.
Fig. 8Forecasting by the proposed ARIMA (cumulative confirmed cases).
Fig. 9Forecasting by the proposed ARIMA (daily confirmed cases).
SARIMAX results of the proposed ARIMA (1,2,2).
| Dep. Variable: | Confirmed | No. Observations: | 560 |
|---|---|---|---|
| Model: | ARIMA (1,2,2) | Log-Likelihood | −5,779.855 |
| Date: | Sat, September 11, 2021 | ||
| Time: | 18:17:47 | ||
| Sample: | 01-30-2020 to 08-11-2021 | ||
| Covariance Type: | opg |
Coefficients of the proposed ARIMA (1,2,2).
| coef | std err | z | P > |z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| ar. L1 | 0.9615 | 0.011 | 88.563 | 0.000 | 0.940 | 0.983 |
| ma. L1 | −1.0398 | 0.019 | −53.939 | 0.000 | −1.078 | −1.002 |
| ma. L2 | 0.1576 | 0.021 | 7.355 | 0.000 | 0.116 | 0.200 |
| sigma2 | 6.337e+07 | 6.81e-11 | 9.31e+17 | 0.000 | 6.34e+07 | 6.34e+07 |
Summary of the proposed ARIMA (1,2,2).
| Ljung Box (L1) (Q): | Jarque Bera (JB): | ||
|---|---|---|---|
| Prob (Q): | 0.46 | 0.00 | |
| Heteroscedasticity (H): | 483.22 | −0.85 | |
| Prob (H) (two-sided): | 0.00 | 10.43 |
Fig. 11Prediction by proposed prophet model.
Fig. 12Cross validation plot (MSE).
Fig. 10Overall (monthly and weekly) trend.
Initial prediction by Prophet.
| trend | yhat lower | yhat upper | trend lower | trend upper | yhat |
|---|---|---|---|---|---|
| 5.619166e+07 | 7.290385e+07 | 5.639183e+07 | 7.257264e+07 | 6.497644e+07 | |
| 5.832973e+07 | 8.029341e+07 | 5.925532e+07 | 7.896211e+07 | 6.960754e+07 | |
| 6.141334e+07 | 8.675533e+07 | 6.202543e+07 | 8.690896e+07 | 7.477018e+07 | |
| 6.430167e+07 | 9.485449e+07 | 6.436557e+07 | 9.495638e+07 | 7.972060e+07 | |
| 6.621694e+07 | 1.022151e+08 | 6.646909e+07 | 1.025106e+08 | 8.481441e+07 |
Fig. 13Forecasting by the proposed prophet (cumulative confirmed cases).
Cross-Validation of the Prophet model from March 10, 2020, to August 11, 2021.
| ds | yhat | yhat lower | yhat upper | y | cutoff |
|---|---|---|---|---|---|
| 17.105278 | 0.716286 | 34.057535 | 58 | 2020-03-09 | |
| 21.504746 | 4.469184 | 38.730697 | 60 | 2020-03-09 | |
| 22.921459 | 6.439937 | 40.124224 | 74 | 2020-03-09 | |
| 23.088273 | 6.289238 | 41.124438 | 81 | 2020-03-09 | |
| 23.755008 | 6.365515 | 40.079387 | 84 | 2020-03-09 | |
| 2.577697e+07 | 2.245349e+07 | 2.889221e+07 | 31895385 | 2021-05-13 | |
| 2.587800e+07 | 2.271847e+07 | 2.925299e+07 | 31934455 | 2021-05-13 | |
| 2.597733e+07 | 2.256805e+07 | 2.945662e+07 | 31969954 | 2021-05-13 | |
| 2.607027e+07 | 2.258676e+07 | 2.949502e+07 | 31998158 | 2021-05-13 | |
| 2.616913e+07 | 2.261766e+07 | 2.970208e+07 | 32036511 | 2021-05-13 |
Fig. 14Forecasting by the proposed prophet (daily confirmed cases).
Final prediction (prophet).
| ds | yhat | yhat lower | yhat upper | |
|---|---|---|---|---|
| 562 | 2021-11-01 | 4.977510e+07 | 4.667466e+07 | 5.331550e+07 |
| 563 | 2021-12-01 | 5.471227e+07 | 5.058744e+07 | 5.938837e+07 |
| 564 | 2022-01-01 | 5.987917e+07 | 5.438893e+07 | 6.664042e+07 |
| 565 | 2022-02-01 | 6.497644e+07 | 5.754582e+07 | 7.329056e+07 |
| 566 | 2022-03-01 | 6.960754e+07 | 6.045268e+07 | 8.018420e+07 |
Comparative studies between state of the art and proposed model (Prophet and ARIMA).
| State of the art models | Proposed study country (India) | |||||||
|---|---|---|---|---|---|---|---|---|
| Model | Country | Metrics | Values | Horizon | MSE | RMSE | MAPE | MDAPE |
| ML RF [ | Worldwide | MAE | 368.82 | 09 days | 02.49 | 01.57 | 00.145 | 00.067 |
| ML KNN [ | India | MAE | 649.74 | 10 days | 02.74 | 01.65 | 00.154 | 00.073 |
| ARIMA [ | India | MAE | 47.42 | 11 days | 03.01 | 01.73 | 00.163 | 00.079 |
| ML RF [ | India | RMSE | 717.73 | 12 days | 03.29 | 01.81 | 00.171 | 00.087 |
| DL LSTM [ | USA | RMSE | 324.61 | 13 days | 03.59 | 01.89 | 00.179 | 00.093 |
| DL LSTM [ | Worldwide | RMSE | 307.58 | 14 days | 03.87 | 01.94 | 00.186 | 00.101 |
| Holt Winter [ | India | MAE | 269.39 | 15 days | 04.16 | 02.01 | 00.192 | 00.107 |
| ARIMA [ | Spain | RMSE | 379.89 | 16 days | 04.41 | 02.09 | 00.201 | 00.115 |
| SARIMA [ | India | RMSE | 98.717 | 17 days | 04.67 | 02.17 | 00.208 | 00.122 |
| GBR [ | India | RMSE | 678.74 | 18 days | 04.94 | 02.24 | 00.208 | 00.129 |
Proposed prophet model performance metrics (diagnostics).
| horizon | mse | rmse | mae | mape | mdape | coverage |
|---|---|---|---|---|---|---|
| 9 days | 2.491663e+12 | 1.578500e+06 | 622397.820887 | 0.146576 | 0.067136 | 0.040404 |
| 10 days | 2.748039e+12 | 1.657721e+06 | 662582.775243 | 0.154969 | 0.073924 | 0.037879 |
| 11 days | 3.017092e+12 | 1.736978e+06 | 703421.226818 | 0.163270 | 0.079725 | 0.037879 |
| 12 days | 3.298150e+12 | 1.816081e+06 | 744934.722980 | 0.171243 | 0.087448 | 0.037879 |
| 13 days | 3.593368e+12 | 1.895618e+06 | 787285.835527 | 0.179066 | 0.093846 | 0.037879 |
Fig. 15Comparison of NLP libraries.
Fig. 16Word cloud for negative sentiments.