| Literature DB >> 35083430 |
Shashank Reddy Vadyala1, Sai Nethra Betgeri1, Eric A Sherer2, Amod Amritphale3.
Abstract
COVID-19 is a pandemic disease that began to rapidly spread in the US, with the first case detected on January 19, 2020, in Washington State. March 9, 2020, and then quickly increased with total cases of 25,739 as of April 20, 2020. Although most people with coronavirus 81%, according to the U.S. Centers for Disease Control and Prevention (CDC), will have little to mild symptoms, others may rely on a ventilator to breathe or not at all. SEIR models have broad applicability in predicting the outcome of the population with a variety of diseases. However, many researchers use these models without validating the necessary hypotheses. Far too many researchers often "overfit" the data by using too many predictor variables and small sample sizes to create models. Models thus developed are unlikely to stand validity check on a separate group of population and regions. The researcher remains unaware that overfitting has occurred, without attempting such validation. In the paper, we present a combination algorithm that combines similar days features selection based on the region using Xgboost, K-Means, and long short-term memory (LSTM) neural networks to construct a prediction model (i.e., K-Means-LSTM) for short-term COVID-19 cases forecasting in Louisana state USA. The weighted k-means algorithm based on extreme gradient boosting is used to evaluate the similarity between the forecasts and past days. The results show that the method with K-Means-LSTM has a higher accuracy with an RMSE of 601.20 whereas the SEIR model with an RMSE of 3615.83.Entities:
Keywords: COVID-19; COVID-19, coronavirus disease; Coronavirus; Day level forecasting; Deep learning; LSTM, long short-term memory; Neural network; RMSE, root mean square error; SEIR Model; SEIR, Susceptible Exposed to Infectious Recovered; USA, United States of America; WHO, World Health Organization
Year: 2021 PMID: 35083430 PMCID: PMC8378999 DOI: 10.1016/j.array.2021.100085
Source DB: PubMed Journal: Array (N Y) ISSN: 2590-0056
Pearson correlation coefficients between COVID-19 and weather variables.
| Weather variable | Pearson correlation coefficient |
|---|---|
| Temperature min °F | 0.189 |
| Temperature max °F | 0.245 |
| Temperature avg °F | 0.439 |
| Humidity (%) | 0.004 |
Modeling assumptions and parameter setting.
| Symbol | Description | Estimate Mean Value | Data Source |
|---|---|---|---|
| S | Susceptible individuals | – | |
| E | Exposed individuals in the latent period | – | |
| I | Infectious individuals | – | |
| R | Recovered individuals with immunity | 0.05 | [ |
| Infection rate | 0.040 | [ | |
| 1/ϒ | Average infectious period | 3.85 | [ |
| N | Total population size | Average Population of the parish | [ |
| Basic reproduction number | 4.83 | [ | |
| 1/ξ | Average latent period | 1/5 | [ |
Fig. 1Overall procedure of forecasting COVID-19 cases.
Features used for predicting new COVID-19 cases.
| Features | Description | Symbol |
|---|---|---|
| Date | Date | |
| Parish | Parish Name | |
| Population Density | Population Density in the parish | |
| Race | Total number of individuals of the race k | |
| Median Age | The median age in the parish | |
| Temperature min oF | The minimum temperature in the parish | |
| Temperature max oF | Maximum temperature in the parish | |
| Temperature avg oF | The average temperature in the parish | |
| Humidity (%) | Humidity in parish | |
| Median Household income per year | Median Household income per year in the parish | |
| Confirmed COVID-19 cases | COVID-19 Incidence of prediction day in the parish | |
| Risk Of COVID-19 | Probability of risk of COVID-19 in the parish |
Fig. 2XGBoost feature importance.
Fig. 3LSTM cell diagram.
Fig. 4LSTM neural networks model for daily COVID-19 Case forecasting.
Parameters used in optimum.
| Parameter | Selection | Selection |
|---|---|---|
| Learning rate | Log uniform | 1e-1 to 1e-7 |
| Hidden layers | Discrete numeric | 1 to 20 |
| Hidden state | Discrete numeric | 1 to 200 |
| Activation | Category | {ReLu, sigmoid, tanh} |
| Batch size | Discrete numeric | 1 to 10 |
| Dropout | Log uniform | 0 to 0.5 |
Fig. 6Comparisons between cumulative confirmed cases OF COVID-19 K-Means-LSTM COVID-19: coronavirus disease.
Fig. 7Shows cumulative data points for each parish in Lousiana.
(a): Forecast of the number of confirmed cases in Richland, Louisiana. (Data points = 65), Table 5(b): Forecast of the number of confirmed cases in St.Martin, Louisiana. (Data points = 39), Table 5(c): Forecast of the number of confirmed cases in Calcasieu, Louisiana.(Data points = 23).
| Date | Real | K-Means-LSTM | SEIR |
|---|---|---|---|
| 5/7/2020 | 91 | 93 | 69 |
| 5/8/2020 | 95 | 94 | 75 |
| 5/9/2020 | 99 | 96 | 79 |
| 5/10/2020 | 100 | 102 | 83 |
| 5/11/2020 | 103 | 103 | 86 |
| 5/12/2020 | 103 | 106 | 91 |
| 5/13/2020 | 104 | 110 | 93 |
| 5/7/2020 | 254 | 209 | 129 |
| 5/8/2020 | 255 | 218 | 136 |
| 5/9/2020 | 257 | 221 | 149 |
| 5/10/2020 | 257 | 229 | 152 |
| 5/11/2020 | 260 | 232 | 163 |
| 5/12/2020 | 264 | 239 | 175 |
| 5/13/2020 | 276 | 241 | 192 |
| 5/7/2020 | 478 | 402 | 326 |
| 5/8/2020 | 481 | 411 | 331 |
| 5/9/2020 | 498 | 422 | 341 |
| 5/10/2020 | 501 | 437 | 357 |
| 5/11/2020 | 508 | 461 | 369 |
| 5/12/2020 | 512 | 472 | 375 |
| 5/13/2020 | 537 | 506 | 381 |
Fig. 5The cumulative number confirmed cases of COVID-19 across Lousiana.
Experimental results in terms of RMSE.
| Parish | K-Means-LSTM | SEIR |
|---|---|---|
| Louisiana | 601.20 | 3615.83 |