| Literature DB >> 35496053 |
Abstract
The COVID-19 pandemic has been a global crisis affecting billions of people and causing countless economic losses. Different approaches have been proposed for combating this crisis, including both medical measures and technical innovations, e.g., artificial intelligence technologies to diagnose and predict COVID-19 cases. While there is much attention being paid to the USA and China, little research attention has been drawn to less developed countries, e.g., India. In this study, I conduct an analysis of the COVID-19 epidemic in India, with datasets collected from different sources. Several machine learning models have been built to predict the COVID-19 spread, with different combinations of input features, in which the Transformer is proven as the most precise one. I also find that the Facebook mobility dataset is the most useful for predicting the number of confirmed cases. However, I find that the datasets from different sources are not very effective when predicting the number of deaths caused by the COVID-19 infection.Entities:
Mesh:
Year: 2022 PMID: 35496053 PMCID: PMC9039780 DOI: 10.1155/2022/2601149
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.246
Figure 1The COVID-19 India dataset read in CSV format.
Figure 2The sample of MoHFW Vaccination data.
Raw data sources.
| Raw data sources |
|---|
| COVID-19 Indiaa |
| MoHFW Vaccinationb |
| Indian census 2011 |
| Google mobility [ |
| Facebook mobility [ |
aData source website: http://www.covid19india.org/. bMinistry of Health and Family Welfare: https://www.mohfw.gov.in/.
Raw dataset description.
| Raw data sources | ||
|---|---|---|
| COVID-19 India | Level | State |
| Storage format | Json/csv | |
|
| ||
| MoHFW Vaccination | Level | State |
| Storage format | ||
|
| ||
| Indian census 2011 | Level | District |
| Storage format | Website | |
| Variable | Description | |
| District | The name of district | |
| State | The state which the district belongs to | |
| Population | The population number of the district | |
| Growth | The growth of population number since last census | |
| Sex ration | The ratio of number of male and female | |
| Literacy | The percentage of literacy in the state | |
|
| ||
| Google mobility | Level | State |
| Storage format | CSV | |
| Variable | Description | |
| Grocery and pharmacy | Mobility trends for places like grocery markets | |
| Parks | Mobility trends for parks | |
| Transit stations | Mobility trends for public transport hubs | |
| Retail and recreation | Mobility trends for retail and recreation | |
| Residential | Mobility trends for places of residence | |
| Workplaces | Mobility trends for places of work | |
|
| ||
| Facebook mobility | Level | District |
| Storage format | CSV | |
| Variable | Description | |
| all_day_bing_tiles_visited_relative_change | Positive or negative change in movement relative to baseline | |
| all_day_ratio_single_tile_users | Positive proportion of users staying put within a single location | |
Figure 3Procedure of data preprocessing.
Feature explained.
| Variable | Description |
|---|---|
| dateymd | The date of tuple of data |
| confirmed | The number of daily confirmed |
| recovered | The number of daily recovered |
| deaths | The number of daily deaths |
| Dose1 | The cumulative ratio of population injected first vaccination |
| Dose2 | The cumulative ratio of population injected second vaccination |
| Grocery & pharmacy | Mobility trends for places like grocery markets |
| Parks | Mobility trends for parks |
| Transit stations | Mobility trends for public transport hubs |
| Retail & recreation | Mobility trends for retail and recreation |
| Residential | Mobility trends for places of residence |
| Workplaces | Mobility trends for places of work |
| visit | Positive or negative change in movement relative to baseline |
| staying | Positive proportion of users staying put within a single location |
Figure 4The structure of a typical LSTM cell.
Figure 5(LSTM) neural networks; L represents the cell of Figure 4.
Figure 6The architecture of Transformer.
Figure 7The architecture of TCN.
Figure 8The training process of DeepAR. L is the cell of LSTM.
Figure 9The prediction process of DeepAR.
Results of different models.
| Model |
| Training time (hours) |
|---|---|---|
| LSTM | 584.74 | 2.33 |
| Transformer | 460.08 | 11.1 |
| Logistic regression | 971.89 | 0.03 |
| Multiple linear regression | 1885.12 | 0.01 |
| TCN | 1605.52 | 1.12 |
| DeepAR | 927.79 | 4.36 |
The explanation of feature selection models.
| Model | Description |
|---|---|
| Baseline (Transformer) | The transformer model training with all features |
| Transformer_without_dose | The transformer model training without dose feature group, which are Dose1 and Dose2 as shown in |
| Transformer_without_google | The transformer model training without Google mobility feature group, which are grocery and pharmacy parks, transit stations, retail and recreation, residential, and workplaces as shown in |
| Transformer_without_facebook | The transformer model training without Facebook mobility feature group, which are visit and staying as shown in |
Transformer model with feature selection for predicting the number of confirmed cases.
| Transformer model with feature selection (confirm) | |
|---|---|
| Model |
|
| Baseline (transformer) | 460.08 |
| Transformer_without_dose | 426.47 |
| Transformer_without_google | 519.28 |
| Transformer_without_facebook | 1105.17 |
Transformer model with feature selection for predicting the number of deaths.
| Transformer_deaths model with feature selection | |
|---|---|
| Model | RMSE |
| Baseline (transformer) | 103.80 |
| Transformer_without_dose | 31.22 |
| Transformer_without_google | 45.09 |
| Transformer_without_facebook | 26.89 |