| Literature DB >> 35476849 |
Charles Nicholson1,2, Lex Beattie2, Matthew Beattie2, Talayeh Razzaghi1, Sixia Chen3.
Abstract
COVID-19 is a global pandemic threatening the lives and livelihood of millions of people across the world. Due to its novelty and quick spread, scientists have had difficulty in creating accurate forecasts for this disease. In part, this is due to variation in human behavior and environmental factors that impact disease propagation. This is especially true for regionally specific predictive models due to either limited case histories or other unique factors characterizing the region. This paper employs both supervised and unsupervised methods to identify the critical county-level demographic, mobility, weather, medical capacity, and health related county-level factors for studying COVID-19 propagation prior to the widespread availability of a vaccine. We use this feature subspace to aggregate counties into meaningful clusters to support more refined disease analysis efforts.Entities:
Mesh:
Year: 2022 PMID: 35476849 PMCID: PMC9045668 DOI: 10.1371/journal.pone.0267558
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Machine learning to enhance county-level COVID-19 analyses.
Fig 2County-level weather data projected to 2 dimensions with PCA.
COVID-19 outcomes per county.
| Target variable | Description |
|---|---|
|
| total positive COVID-19 cases per 1,000 capita |
|
| total COVID-19 deaths per 1,000 capita |
|
| 30-day average of new COVID-19 cases per day per 1,000 capita |
|
| 30-day average of new COVID-19 deaths per day per 1,000 capita |
Model performance.
| Outcome | Metric | Supervised learning method | ||||
|---|---|---|---|---|---|---|
| ENET | RF | CF | GBT | MARS | ||
|
| RMSE | 11.2616 |
| 10.2133 | 10.0319 | 10.9266 |
|
| 0.4428 |
| 0.5510 | 0.5584 | 0.4808 | |
|
| RMSE | 0.4279 |
| 0.4164 | 0.4173 | 0.4338 |
|
| 0.4070 |
| 0.4389 | 0.4356 | 0.3897 | |
|
| RMSE | 0.1482 |
| 0.1436 | 0.1425 | 0.1505 |
|
| 0.2852 |
| 0.3423 | 0.3395 | 0.2659 | |
|
| RMSE | 0.0054 | 0.0054 |
| 0.0054 | 0.0055 |
|
| 0.1096 | 0.1060 |
| 0.1193 | 0.0919 | |
Important variables for county-level COVID-19 modeling.
| Variable category | Name | Description |
|---|---|---|
| Race/ethnicity | NHWA | not Hispanic, White alone |
| NHBA | not Hispanic, Black alone | |
| NHIA | not Hispanic, American Indian alone | |
| TOM | two or more races | |
| Medical capacity | SNF-sites | specialized nursing facilities per capita |
| health-insurance | ratio of insured to uninsured (for ages 40-64) | |
| Health | pct-FairPoorHealth | percent reporting fair or poor health |
| days-UnhealthyMental | self-reported mentally unhealthy days | |
| pct-Smokers | percent who smoke | |
| Economics | median-income | median household income |
| unemployment-rate | percent of labor force that is unemployed | |
| Weather | PC1-wx | PC1 for weather data |
| PC2-wx | PC2 for weather data | |
| Education | pct-woHSdiploma | percent adults without HS diploma |
| pct-4yr-degree+ | percent adults with 4 yr degree or higher | |
| Age | under18 | population under 18 years of age |
| over65 | population over 65 years of age | |
| Gender | gender-ratio | ratio of males to females |
| Density | pop-density | population density (per square mile) |
| Politics | dem-rep-ratio | ratio of Democrats to Republicans |
Fig 3Correlation plot for critical COVID-19 county-level variables.
Important variables by model.
| cases | deaths | case rate | death rate |
|---|---|---|---|
| PC1-wx (100) | PC1-wx (100) | days-UnhealthyMental (100) | PC1-wx (100) |
| NHWA (65) | NHBA (93) | unemployment-rate (47) | median-income (84) |
| pct-woHSdiploma (46) | NHWA (89) | NHIA (46) | days-UnhealthyMental (70) |
| pct-FairPoorHealth (34) | TOM (37) | SNF-sites (42) | NHBA (52) |
| gender-ratio (31) | pct-woHSdiploma (35) | PC1-wx (36) | SNF-sites (45) |
| NHBA (31) | pct-FairPoorHealth (31) | pct-Smokers (35) | pct-FairPoorHealth (44) |
| days-UnhealthyMental (30) | NHIA (28) | under18 (21) | health-insurance (39) |
| under18 (30) | gender-ratio (26) | dem-rep-ratio (21) | pct-4yr-degree+ (33) |
| pct-Smokers (24) | median-income (22) | PC2-wx (20) | pct-Smokers (29) |
| over65 (24) | pop-density (18) | health-insurance (19) | pct-woHSdiploma (23) |
Recommended number of clusters.
| Index name | PAM | HC | |
|---|---|---|---|
| Beale [ | 2 | . | 11 |
| DB [ | 15 | . | 12 |
| Silhouette [ | 2 | 2 | 2 |
| Marriot [ | 7 | . | 6 |
| Point-biserial [ | 4 | . | 7 |
| Gap statistic [ | 6 | 2 | 6 to 11 |
Fig 4Gap statistic for the hierarchical clustering.
Fig 5Conterminous US with county-level cluster assignments.
Cluster profile.
| Cluster | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| Number of counties | 570 | 791 | 350 | 319 | 411 | 24 | 70 | 21 | 550 |
| Scaled feature averages | |||||||||
| NHWA |
| 0.48 | 0.73 | -0.97 | 0.08 |
| -0.67 |
| 0.69 |
| NHBA |
| -0.35 | -0.40 | -0.41 | -0.16 | -0.53 | 0.34 |
| -0.56 |
| NHIA | -0.18 | -0.11 | -0.19 | 0.52 | -0.20 |
| -0.09 | -0.23 | -0.08 |
| TOM | -0.31 | -0.02 | -0.39 |
| 0.34 |
| -0.22 | 0.63 | -0.35 |
| SNF-sites | -0.21 | -0.07 | -0.05 | -0.35 | -0.53 | -0.53 | 0.18 | -0.66 | 0.98 |
| health-insurance | -0.70 | 0.21 | -0.18 | -0.69 | 0.99 |
| -0.64 | 0.10 | 0.33 |
| pct-FairPoorHealth | 0.90 | -0.26 | 0.93 | 0.47 | -0.89 |
| 0.59 | 0.20 | -0.92 |
| days-UnhealthyMental | 0.56 | 0.14 |
| 0.10 | -0.63 |
| -0.04 | -0.15 |
|
| pct-Smokers | 0.49 | -0.01 |
| -0.32 | -0.83 |
| 0.40 | -0.20 | -0.66 |
| median-income | -0.60 | -0.07 | -0.81 | -0.06 |
|
| -0.66 | 0.62 | 0.20 |
| unemployment-rate | 0.45 | 0.18 | 0.56 | 0.15 | -0.56 |
| -0.10 | 0.14 | -0.79 |
| PC1-wx |
| 0.13 | -0.41 | 0.25 | 0.32 |
| -0.68 | -0.29 | 0.92 |
| PC2-wx | 0.02 | -0.36 | -0.37 |
| -0.17 | 0.66 | 0.66 | 0.13 | -0.20 |
| pct-woHsdiploma | 0.77 | -0.27 | 0.74 | 0.68 | -0.88 | 0.45 |
| 0.20 | -0.77 |
| pct-4yr-degree+ | -0.42 | -0.12 | -0.76 | -0.22 |
| -0.57 | -0.87 |
| 0.04 |
| under18 | 0.13 | -0.40 | -0.27 |
| -0.18 |
|
| -0.50 | 0.17 |
| over65 | -0.30 | 0.40 | 0.29 | -0.53 | -0.58 |
| -0.45 |
| 0.46 |
| gender-ratio | -0.27 | -0.09 | -0.19 | 0.17 | -0.23 | -0.16 |
| -0.81 | 0.08 |
| pop-density | -0.05 | -0.06 | -0.11 | -0.10 | 0.20 | -0.14 | -0.14 |
| -0.13 |
| dem-rep-ratio | 0.21 | -0.19 | -0.44 | 0.03 | 0.53 | 0.85 | -0.29 |
| -0.34 |
| Scaled outcome averages | |||||||||
|
| 0.79 | -0.46 | -0.10 | 0.15 | -0.38 |
|
| 0.36 | -0.15 |
|
| 0.86 | -0.24 | -0.19 | -0.06 | -0.15 | 0.86 | 0.35 |
| -0.41 |
|
| -0.09 | -0.25 | 0.11 | 0.02 | -0.42 |
| 0.24 | -0.54 | 0.61 |
|
| 0.41 | -0.14 | 0.09 | -0.06 | -0.36 | 0.43 | 0.37 | -0.32 | -0.04 |
Fig 6Cluster per capita COVID-19 case trends.