| Literature DB >> 33462560 |
Jayanthi Devaraj1, Rajvikram Madurai Elavarasan2, Rishi Pugazhendhi3, G M Shafiullah4, Sumathi Ganesan1, Ajay Kaarthic Jeysree1, Irfan Ahmad Khan2, Eklas Hossain5.
Abstract
The ongoing outbreak of the COVID-19 pandemic prevails as an ultimatum to the global economic growth and henceforth, all of society since neither a curing drug nor a preventing vaccine is discovered. The spread of COVID-19 is increasing day by day, imposing human lives and economy at risk. Due to the increased enormity of the number of COVID-19 cases, the role of Artificial Intelligence (AI) is imperative in the current scenario. AI would be a powerful tool to fight against this pandemic outbreak by predicting the number of cases in advance. Deep learning-based time series techniques are considered to predict world-wide COVID-19 cases in advance for short-term and medium-term dependencies with adaptive learning. Initially, the data pre-processing and feature extraction is made with the real world COVID-19 dataset. Subsequently, the prediction of cumulative confirmed, death and recovered global cases are modelled with Auto-Regressive Integrated Moving Average (ARIMA), Long Short-Term Memory (LSTM), Stacked Long Short-Term Memory (SLSTM) and Prophet approaches. For long-term forecasting of COVID 19 cases, multivariate LSTM models employed. The performance metrics are computed for all the models and the prediction results are subjected to comparative analysis to identify the most reliable model. From the results, it is evident that the Stacked LSTM algorithm yields higher accuracy with an error of less than 2% compared to the other considered algorithms for the studied performance metrics. Country-specific analysis of India and city-specific analysis of Chennai COVID-19 cases are predicted and analyzed in detail. Also, statistical hypothesis analysis and correlation analysis are done on the COVID 19 datasets by including the features like temperature, rainfall, population, total infected cases, area and population density during the months of May, June, July and August to find out the best suitable model. Further, Practical significance of predicting COVID-19 cases is elucidated in terms of assessing pandemic characteristics, scenario planning, optimization of models and supporting Sustainable Development Goals (SDGs).Entities:
Keywords: ARIMA; Artificial Intelligence (AI); COVID-19 Pandemic; Deep Learning; Long Short-Term Memory; Prophet; Stacked LSTM; Sustainable Development Goals (SDGs)
Year: 2021 PMID: 33462560 PMCID: PMC7806459 DOI: 10.1016/j.rinp.2021.103817
Source DB: PubMed Journal: Results Phys ISSN: 2211-3797 Impact factor: 4.476
Comparison of existing works on COVID-19 prediction methodologies with the proposed work.
| Ref. | Forecasting method (Learning Algorithm) | Forecasting horizon | Type of data and Sample size | Data source | Accuracy | Purpose of prediction |
|---|---|---|---|---|---|---|
| Proposed work | Comparative analysis of time series forecasting using ARIMA, LSTM, SLSTM and Prophet | 30, 60 and 90 days ahead prediction is done. | Global-wide, country and city specific analysis data from 22nd Jan 2020 to 8th May 2020. Simulated dataset for seven cities for the months of May, June, July and August 2020. All countries data from January 2020 to September 2020. | Datasets were collected from John Hopkins University, World Weather Page and Wikipedia page. | SLSTM outperformed other models. In statistical analysis, ARIMA outperformed LSTM model. Overall, SLSTM model is better than other models. | i. Global-wide, Country specific and city specific cumulative COVID cases prediction is done. |
| Kırbaş et al. | ARIMA, Nonlinear Autoregression Neural Network (NARNN) and Long-Short Term Memory (LSTM) | 14 day ahead forecast | Cumulative confirmed cases data of 8 different European countries and the dataset is considered till 3, May 2020 | European Center for Disease Prevention and Control | MAPE values of LSTM model are better than the other models. | To model and predict the cumulative confirmed cases and total increase rate of the countries was analyzed and compared. LSTM outperforms other models. |
| Arora et al. | Deep LSTM/Stacked LSTM, Convolutional LSTM and Bidirectional LSTM | Daily and weekly predictions | Confirmed cases in India. | Ministry of Health and Family Welfare | Bi-directional LSTM provides better results than the other models with less error. | Daily and weekly predictions of all states are done to explore the increase of positive cases. |
| Zeroual et al. | RNN (Recurrent Neural Network), LSTM, Bi-LSTM(Bi-directional), VAE (Variational AutoEncoder) | 17 days ahead forecast | Daily confirmed and recovered cases for six countries. | Center for Systems Science and Engineering (CSSE) at Johns Hopkins University | Based on the performance metrics, VAE outperformed other models in forecasting the pandemic. | To forecast the number of new COVID-19 cases and recovered cases. |
| Shahid et al. | ARIMA, support vector regression (SVR), long short-term memory (LSTM), Bi-LSTM | 48 days ahead forecast | 22 January 2020 to 27 June 2020. 158 samples of the number of confirmed cases, deaths and recovered cases. | Dataset is taken from the Harvard University | Bi-LSTM outperforms other models with lower R2 score values. | To predict the number of confirmed, death and recovered cases in ten countries for better planning and management. |
| Chimmula and Zhang | LSTM | 14 days ahead forecast | confirmed cases of Canada and Italy till 31, March 2020 | Johns Hopkins University and Canadian Health authority | 92% accuracy | To predict the number of confirmed cases of Canada and Italy and to compare the growth rate. |
| Alzahrani et al. | ARIMA, Autoregressive Moving Average (ARMA) | 1 month ahead forecast | Cumulative daily cases from | Daily and cumulative confirmed COVID-19 cases in Saudi Arabia were collected from Saudi Arabia Government website. | ARIMA performs well than ARMA, MA and AR. | To predict the daily reproduction of confirmed cases one month ahead. |
| Ogundokun et al. | Linear regression model | 8 days ahead forecast | March 31, 2020 to May 29, 2020 | NCDC website | 95% confidence interval | To predict the COVID-19 confirmed cases in Nigeria. |
| Ribeiro et al. | ARIMA, cubist regression (CUBIST), random forest (RF), ridge regression (RIDGE), support vector regression (SVR), and stacking-ensemble learning | 1,3 and 6 days ahead forecast | Cumulative confirmed cases in Brazil until April, 18 or 19 of 2020 | The dataset was collected from an application programming interface that retrieves the daily data about COVID-19 cases which are publicly available | Based on the performance metrics, SVR, and stacking-ensemble learning outperformed other models | To predict the cumulative confirmed cases in Brazil |
| Tomar and Gupta | LSTM | 30 days ahead forecast | Cumulative and daily dataset of COVID-19 cases in India | Center for Systems Science and Engineering (CSSE) at Johns Hopkins University | LSTM has got 90% accuracy in predicting COVID cases | To predict the number of confirmed and recovered cases using data-driven estimation method. |
| Car et al. | Multilayer Perceptron (MLP) artificial neural network (ANN) | 30 days ahead forecast | 22nd January 2020 to 12th March 2020 | Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) and supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL) | Higher accuracy for confirmed cases with 0.986R2 Value | To predict the spread of pandemic world-wide. |
| Shastri et al. | LSTM, Stacked LSTM, Bi-directional LSTM and Convolutional LSTM | 30 days ahead forecast | India and USA- Confirmed cases data from | Datasets of India and USA are taken from the Ministry of Health and Family Welfare, Government of India and Centers for Disease Control and Prevention, U.S Department of Health and Human Services. | ConvLSTM outperforms stacked and bi-directional LSTM in confirmed and death cases. | To predict the COVID-19 confirmed and death cases one month ahead and to compare the accuracy of deep learning models |
| Hawas | Recurrent Neural Network (RNN) | 30 days and 40 days ahead forecast | Daily confirmed cases in Brazil | Center for Systems Science and Engineering (CSSE) at Johns Hopkins University | Achieved 60.17% accuracy. | To predict one month ahead confirmed cases and to take preventive measures. |
| Papastefanopoulos et al. | Six different forecasting methods are presented. ARIMA, the Holt-Winters additive model (HWAAS), TBAT, Facebook’s Prophet, Deep AR | 7 days ahead for the ten countries | Jan 2020 to April 2020 and the population of countries. | Novel Corona Virus 2019 Dataset and population-by-country dataset from kaggle.com | ARIMA and TBAT outperformed other models in forecasting the pandemic | To predict the future COVID-19 confirmed, death and recovered cases by considering the country population. |
Fig. 1The architecture of LSTM.
Fig. 2The architecture of Stacked LSTM.
Fig. 3Architecture diagram of predictive model infected cases.
Fig. 4Prediction of (a) Confirmed, (b) deaths and (c) recovered cases using the ARIMA model.
Coefficient and error values for forecasted infected cases using the ARIMA model.
| (a) Coefficient and error values of confirmed cases using the ARIMA model | ||||||
|---|---|---|---|---|---|---|
| Coef | std err | z | p>|z| | [0.025 | 0.975] | |
| ar.L1 | 0.9715 | 0.032 | 30.236 | 0 | 0.908 | 1.034 |
| ma.L1 | −0.0876 | 0.159 | −0.552 | 0.581 | −0.398 | 0.233 |
| ar.S.L12 | −0.5586 | 0.114 | −4.892 | 0 | −0.782 | −0.335 |
| sigma2 | 7.60E + 06 | 2.08E-10 | 3.65E + 17 | 0 | 7.60E + 07 | 7.60E + 07 |
| (b) Coefficient and error values of death cases using the ARIMA model | ||||||
| ar.L1 | 0.9214 | 0.036 | 25.694 | 0 | 0.851 | 0.992 |
| ma.L1 | 0.0569 | 0.081 | 0.699 | 0.484 | −0.103 | 0.216 |
| ar.S.L12 | −0.6301 | 0.105 | −6.001 | 0 | −0.836 | −0.424 |
| sigma2 | 9.02E + 07 | 9.67E + 04 | 9.333 | 0 | 7.13E + 05 | 1.09E + 06 |
| ar.L1 | 0.9992 | 0.016 | 64.128 | 0 | 0.969 | 1.03 |
| ma.L1 | −0.7541 | 0.094 | −7.991 | 0 | −0.939 | −0.569 |
| ar.S.L12 | −0.7164 | 0.173 | −4.148 | 0 | −1.055 | −0.378 |
| sigma2 | 5.23E + 07 | 2.87E-09 | 1.82E + 16 | 0 | 5.23E + 07 | 5.23E + 07 |
Fig. 5(a) LSTM loss vs epochs for Confirmed cases (b) LSTM loss vs epochs for Death cases (c) LSTM loss vs epochs for Recovered cases.
Fig. 6A plot of (a) confirmed (b) death and (c) recovered cases using LSTM model.
Fig. 7Plot of (a) confirmed (b) death and (c) recovered cases using Stacked LSTM model.
Fig. 8A plot of (a) confirmed (b) death and (c) recovered cases using Prophet Model.
Fig. 9Forecast components of Prophet Model in predicting (a) confirmed, (b) death and (c) recovered cases (until end of June 2020).
Performance Evaluation.
| Model | Predicted variable | RMSE | MAE | MAPE | R2 |
|---|---|---|---|---|---|
| ARIMA | Confirmed | 10078.36 | 8097.55 | 0.372 | 0.94 |
| Deaths | 1359.27 | 1067.46 | 0.742 | 0.938 | |
| Recovered | 8806.29 | 6551.99 | 1.12 | 0.92 | |
| LSTM | Confirmed | 10051.22 | 9201.02 | 0.37 | 0.964 |
| Deaths | 1670.84 | 1366.66 | 0.53 | 0.97 | |
| Recovered | 14210.39 | 12370.8 | 1.07 | 0.95 | |
| SLSTM | Confirmed | 9310.83 | 7218.97 | 0.2 | 1 |
| Deaths | 1219.35 | 1102.21 | 0.43 | 0.998 | |
| Recovered | 13201.4 | 11675.9 | 0.9 | 0.92 | |
| PROPHET | Confirmed | 11516.2 | 8154.4 | 0.39 | 0.92 |
| Deaths | 1348.71 | 1056.5 | 0.7 | 0.90 | |
| Recovered | 24485.6 | 17435.5 | 1.2 | 0.88 | |
Comparison of Actual and Forecasted cases.
| Date | Observed Values | Forecast Values | |||||||
|---|---|---|---|---|---|---|---|---|---|
| ARIMA | Error (%) | LSTM | Error (%) | SLSTM | Error (%) | PROPHET | Error (%) | ||
| 5/27/2020 | 5,700,405 | 5,813,095 | 1.94 | 5,802,626 | 1.79 | 5,717,622 | 0.3 | 5,429,379 | 4.75 |
| 5/28/2020 | 5,819,719 | 5,907,110 | 1.48 | 5,929,106 | 1.88 | 5,835,205 | 0.27 | 5,513,519 | 5.26 |
| 5/29/2020 | 5,940,890 | 6,058,147 | 1.94 | 6,005,139 | 1.08 | 5,956,332 | 0.26 | 5,598,714 | 5.76 |
| 5/30/2020 | 6,078,719 | 6,113,348 | 0.57 | 6,189,701 | 1.83 | 6,081,633 | 0.05 | 5,677,062 | 6.61 |
| 5/31/2020 | 6,186,277 | 6,324,107 | 2.18 | 6,218,889 | 0.53 | 6,211,114 | 0.4 | 5,756,597 | 6.93 |
| 5/27/2020 | 359,038 | 383,378 | 6.78 | 374,025 | 4.17 | 352,072 | 1.94 | 386,796 | 4.75 |
| 5/28/2020 | 363,749 | 387,903 | 6.64 | 379,073 | 4.2 | 356,317 | 0.04 | 392,737 | 5.26 |
| 5/29/2020 | 368,496 | 393,712 | 6.84 | 383,378 | 4.04 | 360,513 | 2.17 | 398,760 | 5.76 |
| 5/30/2020 | 372,662 | 400,116 | 7.37 | 387,903 | 4.09 | 364,671 | 2.14 | 404,615 | 6.61 |
| 5/31/2020 | 375,555 | 405,840 | 8 | 393,712 | 4.83 | 368,793 | 1.8 | 409,812 | 6.93 |
| 5/27/2020 | 2,346,232 | 2,263,404 | 3.53 | 2,423,004 | 3.27 | 2,334,784 | 0.49 | 1,976,908 | 15.7 |
| 5/28/2020 | 2,413,089 | 2,316,140 | 4 | 2,537,655 | 5.16 | 2,387,862 | 0.05 | 2,015,088 | 16.4 |
| 5/29/2020 | 2,490,435 | 2,367,520 | 4.94 | 2,658,750 | 6.73 | 2,441,177 | 1.97 | 2,051,215 | 17.6 |
| 5/30/2020 | 2,560,888 | 2,431,633 | 5.05 | 2,786,033 | 8.79 | 2,494,732 | 2.6 | 2,084,831 | 18.5 |
| 5/31/2020 | 2,637,208 | 2,485,323 | 5.8 | 2,916,419 | 0.5 | 2,548,528 | 3.36 | 2,118,968 | 19.6 |
Fig. 10MAPE Comparison of four models.
Fig. 11Model-wise comparison of (a) confirmed, (b) death and (c) recovered cases.
Fig. 12Predicted COVID-19 cases.
Fig. 13Plot of error (a) in recovered cases (b) overall prediction.
Performance Evaluation for country-specific prediction.
| Models | Predicted variables | RMSE | MAE | MAPE |
|---|---|---|---|---|
| ARIMA | Confirmed | 194.76 | 70.94 | 2.08 |
| Deaths | 2324.9 | 1350.86 | 2.76 | |
| Recovered | 2178.86 | 1305 | 1.21 | |
| LSTM | Confirmed | 1167.56 | 979.03 | 0.43 |
| Deaths | 992.84 | 866.66 | 1.9 | |
| Recovered | 1670.84 | 1366.66 | 2.4 | |
| SLSTM | Confirmed | 274.22 | 920.02 | 0.3 |
| Deaths | 309.12 | 278.29 | 0.6 | |
| Recovered | 1125.47 | 864 | 1.8 | |
| PROPHET | Confirmed | 9970.33 | 7231.1 | 2.8 |
| Deaths | 1843.71 | 1389.5 | 3.7 | |
| Recovered | 9310.83 | 7633.49 | 2.7 | |
Fig. 14Prediction of Confirmed cases in India (a) Monthly comparison (b) Forecasted data upto August 2020 (Dataset was taken till 20/8/2020).
Fig. 15Prediction of Death cases in India (a) Monthly comparison (b) Forecasted data upto August 2020 (Dataset was taken till 20/8/2020).
Fig. 16Prediction of Recovered cases in India (a) Monthly comparison (b) Forecasted data upto August 2020 (Dataset was taken till 20/8/2020).
Fig. 17Prediction of Confirmed Cases using SLSTM in Chennai (a) Monthly Comparison (b) Forecasted upto mid-September 2020 (Dataset was taken from 20/05/2020 till 20/08/2020).
Fig. 18Prediction of Death Cases using SLSTM in Chennai (a) Monthly Comparison (b) Forecasted upto mid-September 2020 (Dataset taken from 20/05/2020 till 20/08/2020).
Fig. 19Prediction of Recovered Cases using SLSTM in Chennai (a) Monthly Comparison (b) Forecasted upto mid-September 2020 (Dataset was taken from 20/05/2020 till 20/08/2020).
Results of hypothesis testing.
| Time series data | Test Result | Hypothesis status |
|---|---|---|
| Total confirmed cases | Statistics: 0.512 | Fail to reject H0 |
| Statistics: 0.306 | Fail to reject H0 | |
| Statistics: 0.058 | Fail to reject H0 | |
| Total recovered cases | Statistics: −38.457 | Reject H0 |
| Statistics: 0.435 | Fail to reject H0 | |
| Statistics: 0.290 | Fail to reject H0 | |
| Total death cases | Statistics: −37.475 | Reject H0 |
| Statistics: 0.431 | Fail to reject H0 | |
| Statistics: 0.374 | Fail to reject H0 | |
Ranking of Algorithms,
| Model | Average Ranking | Overall Rank |
|---|---|---|
| ARIMA | 1.9100 | 2 |
| LSTM | 2.1134 | 3 |
| SLSTM | 1.7531 | 1 |
Fig. 20Ranking of algorithms.
Fig. 21Correlation between COVID cases and external factors.
Fig. 22(a) Chennai COVID 19 cases vs Temperature, (b) Normalized values for temperature and cases and (c) Chennai Confirmed cases vs daily average temperature.
Fig. 23Confirmed and recovered cases of India.
Fig. 24Representation of practical significance for predicting COVID-19 infected cases.