| Literature DB >> 34083563 |
David McCoy1, Whitney Mgbara2, Nir Horvitz3, Wayne M Getz2,3, Alan Hubbard4.
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities.Entities:
Mesh:
Year: 2021 PMID: 34083563 PMCID: PMC8175420 DOI: 10.1038/s41598-021-90827-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Number of variables used from respective sources with some examples given, complete list with distributions given in supplementary material.
| Source | N Var. | Var. Examples |
|---|---|---|
| USAFacts | 6 | COVID-19 outcome data, population |
| Bureau of Economic Analysis (BEA) | 1 | GDP |
| 5-Year American Community Survey (ACS), 2014–2018 | 14 | County percentages by Sex and Ethnicity, Employment, Household Income, use of Public Transportation |
| TIGER/Line Geodatabases | 7 | Latitude, longtitude, land area |
| TIGER/Line Geodatabases; Federal Aviation Administration (FAA) | Distance to Airports | |
| Interactive Atlas of Heart Disease and Stroke (2014–2016) | 4 | Number of Hospitals, Stroke, Access to Parks |
| County Health Rankings and Roadmaps | 21 | Life Expectancy, Smoking, Obesity,, Food Access, Mental Health, Physicians, Houshold Overcrowding etc. |
| Centers for Medicare & Medicaid Services (CMS) | 15 | Druge Abuse, Hypertension, Hyperlipidemia, Osteoporosis, etc. |
| National Centers for Environmental Information | 1 | Precipitation |
| CDC’s Social Vulnerability Index (SVI) | 11 | Percentile over 65 or under 17, Minority Scores, Limited English, Low Income Housing Estimates, Number Institutionalized |
| Quarterly Census of Employment and Wages | 14 | Labor force types, farming/mining, private industry, education/healthcare etc. |
| MIT election lab | 1 | Calculated Proportion Voted Republican 2016 |
| 6 | Google mobility to location type, Residence, Grocery etc. |
Figure 2Variable importance as indicated by the relative increase of mean-squared error when the block of variables is permuted.
Figure 3Variable importance as indicated by the relative increase of mean-squared error when a single variable is permuted.
Figure 1COVID-19 heatmap visualization of the distribution of county-level data. The rows represent counties clustered by the dendrogram and the columns are features of the counties which are also clustered by similarity. Red coloration indicates higher values (up to a z-score of 4) and blue coloration indicates lower values (− 4). The column bars on the left are outcomes, categorized by quantiles. The sections marked by C1, C2, and C3 show similar high or low features of counties in this region which have early COVID-19 appearance and high transmission and mortalities. Figure was created using pheatmap version 1.0.12[28].
Cross validated superlearner risk across COVID-19 Outcomes.
| COVID-19 outcome | Model risk (per capita) | Model risk (counts) | R-squared |
|---|---|---|---|
| Day of first case | NA | 159.58 | 0.75 |
| COVID-19 cases at day 25 | 4.22 e−05 | 10539.64 | 0.59 |
| Total COVID-19 cases to-date | 5.22 e−05 | 13053.75 | 0.87 |
| All-cause death at day 100 | 2.80 e−08 | 7.00 | 0.57 |
| All-cause death at to-date | 1.42 e−07 | 35.52 | 0.59 |
Figure 4Marginal predictions of day of first case (relative to index time) for different proportional reductions of total population size for models adjusting for all other covariates, only covariates not in sub-category (see supplement Table S1) and unadjusted.
Figure 5Marginal predictions of total cases and deaths by July 14, 2020) for three of the most consistently important variables in predicting the count outcomes: CDC minority score, proportion of Black- and/or African-Americans and a metric of public transportation use. X-axis is different proportional reductions of each of the three predictors, the Y-axis is the marginal predicted total counts for models adjusting for all other covariates, only covariates not in sub-category (see supplement table S1) and unadjusted. Black lines indicate actual total number of COVID-19 cases and mortalities.