| Literature DB >> 34104256 |
Daniela A Gomez-Cravioto1, Ramon E Diaz-Ramos1, Francisco J Cantu-Ortiz1, Hector G Ceballos1.
Abstract
To understand and approach the spread of the SARS-CoV-2 epidemic, machine learning offers fundamental tools. This study presents the use of machine learning techniques for projecting COVID-19 infections and deaths in Mexico. The research has three main objectives: first, to identify which function adjusts the best to the infected population growth in Mexico; second, to determine the feature importance of climate and mobility; third, to compare the results of a traditional time series statistical model with a modern approach in machine learning. The motivation for this work is to support health care providers in their preparation and planning. The methods compared are linear, polynomial, and generalized logistic regression models to describe the growth of COVID-19 incidents in Mexico. Additionally, machine learning and time series techniques are used to identify feature importance and perform forecasting for daily cases and fatalities. The study uses the publicly available data sets from the John Hopkins University of Medicine in conjunction with the mobility rates obtained from Google's Mobility Reports and climate variables acquired from the Weather Online API. The results suggest that the logistic growth model fits best the pandemic's behavior, that there is enough correlation of climate and mobility variables with the disease numbers, and that the Long short-term memory network can be exploited for predicting daily cases. Given this, we propose a model to predict daily cases and fatalities for SARS-CoV-2 using time series data, mobility, and weather variables.Entities:
Keywords: Covid19; Data science; Recurrent neural networks.; Time series forecasting
Year: 2021 PMID: 34104256 PMCID: PMC8175062 DOI: 10.1007/s12559-021-09885-y
Source DB: PubMed Journal: Cognit Comput ISSN: 1866-9956 Impact factor: 5.418
Statistical summary of confirmed cases and fatalities datasets
| Confirmed Cases | Fatalities | |
|---|---|---|
| mean | 5,757 | 380 |
| std | 49,991 | 3414 |
| min | 0 | 0 |
| 25% | 0 | 0 |
| 50% | 28 | 0 |
| 75% | 547 | 0 |
| max | 1,699,176 | 100,417 |
Fig. 1Accumulated worldwide COVID-19 confirmed cases since January 22, 2020
Data summary of five Latin America countries
| Country | Mexico | Chile | Brazil | Peru | Ecuador |
|---|---|---|---|---|---|
| Start | 2/28/20 | 3/3/20 | 2/26/20 | 3/6/20 | 3/1/20 |
| End | 5/21/20 | 5/21/20 | 5/21/20 | 5/21/20 | 5/21/20 |
| Accumulated Mean Cases | 16,153 | 16,337 | 77,318 | 32,060 | 13,775 |
| St. Dev. of Accumulated | 21668 | 20,796 | 110,300 | 42,013 | 14,005 |
| Growth Factor of Daily New Cases | 19.08% | 17.24% | 17.92% | 19.93% | 12.09% |
| Mean Fatalities of Daily Accumulated | 2098 | 228 | 6,441 | 1,095 | 1,020 |
| St. Dev Fatalities of Daily Accumulated | 2471 | 221 | 7,503 | 1,191 | 1,102 |
| Growth Factor of Daily New Fatalities | 16.1% | 12.84% | 19.95% | 12.49% | 11.43% |
| Average Mortality Rate | 5.67% | 0.77% | 4.18% | 2.23% | 4.3% |
Fig. 2Total number of confirmed cases (A) and fatalities (B) since the confirmation of the first case
Fig. 3Accumulated number of confirmed cases (A) and logarithmic transformation of confirmed cases (B)
Fig. 4Constructed model diagram
Fig. 5Linear regression model for logarithm confirmed cases (A) and fatalities (B) of Mexico of the last 20 days
Growth models RMSE and BIC results with its corresponding coefficients
| Models | Confirmed Cases | Fatalities | ||||
|---|---|---|---|---|---|---|
| Coef. | RMSE | BIC | Coef. | RMSE | BIC | |
| Linear Regression | c1=0.05 b=6.52 | 2299.24 | 80.62 | c1=0.06 b=5.06 | 135.82 | 42.06 |
| Polynomial Regression | c1=-16.67 c2=1.69 c3=-0.06 c4=0.01 b=34.14 | 3781.91 | 294.26 | c1=-0.96 c2=-0.15 c3=0.03 c4=-0.01 b= 7.43 | 179.98 | 147.84 |
| Sigmoid curve fitting | L=99,592 | 535.57 | 850.38 | L=11,036 | 102.72 | 480.33 |
Fig. 6Polynomial regression model for Mexico confirmed cases (A) and fatalities (B)
Fig. 7Sigmoid model for confirmed cases (A) and fatalities (B) of Mexico Sigmoid model for confirmed cases (A) and fatalities (B) of Mexico
Inflection point and limiting values of confirmed cases and deaths
| Accumulated Cases | Accumulated Fatalities | |
|---|---|---|
| Inflection Point | 49,796 (May 18th) | 5,518 (May 19th) |
| Limiting value | 99,592 (September 29th) | 11,036 (August 27th) |
Fig. 8Non-normal distribution quantile–quantile scatter plot of confirmed cases
Time step new daily cases and fatalities feature selection coefficients
| Variables | ID | New cases Absolute Mean | Fatalities Absolute Mean |
|---|---|---|---|
| New log cases | var1 | 0.84 | 0.53 |
| Max tempC | var2 | 0.30 | 0.26 |
| Min tempC | var3 | 0.31 | 0.24 |
| UV Index | var4 | 0.23 | 0.19 |
| Humidity | var5 | 0.30 | 0.21 |
| PrecipMM | var6 | 0.33 | 0.26 |
| Pressure | var7 | 0.24 | 0.19 |
| Wind speed Kmph | var8 | 0.15 | 0.15 |
| Retail and Recreation | var9 | 0.32 | 0.34 |
| Grocery and Pharmacy | var10 | 0.38 | 0.32 |
| Parks | var11 | 0.35 | 0.35 |
| Transit stations | var12 | 0.40 | 0.35 |
| Workplaces | var13 | 0.21 | 0.35 |
| Residential | var14 | 0.21 | 0.33 |
Time series models metrics summary
| Model | RMSE | BIC |
|---|---|---|
| LSTM daily cases | 275.35 | 71.00 |
| LSTM daily fatalities | 31.91 | 45.14 |
| VAR daily cases | 630.3469 | 94.14 |
| VAR daily fatalities | 208.4456 | 78.65 |
Hyperparameters of LSTM
| Hidden Layers | 2 |
|---|---|
| Number of neurons in hidden layer 1 | 200 |
| Activation of hidden layer 1 | tanh |
| Number of neurons in hidden layer 2 | 100 |
| Activation of hidden layer 2 | tanh |
| Batch size | 100 |
| Epochs | 100 |
| Loss function | MSE |
| Optimizer | Adam |