| Literature DB >> 35945947 |
Lauren Hoskovec1, Sheena Martenies2, Tori L Burket3, Sheryl Magzamen4, Ander Wilson1.
Abstract
Recent ecological analyses suggest air pollution exposure may increase susceptibility to and severity of coronavirus disease 2019 (COVID-19). Individual-level studies are needed to clarify the relationship between air pollution exposure and COVID-19 outcomes. We conduct an individual-level analysis of long-term exposure to air pollution and weather on peak COVID-19 severity. We develop a Bayesian multinomial logistic regression model with a multiple imputation approach to impute partially missing health outcomes. Our approach is based on the stick-breaking representation of the multinomial distribution, which offers computational advantages, but presents challenges in interpreting regression coefficients. We propose a novel inferential approach to address these challenges. In a simulation study, we demonstrate our method's ability to impute missing outcome data and improve estimation of regression coefficients compared to a complete case analysis. In our analysis of 55,273 COVID-19 cases in Denver, Colorado, increased annual exposure to fine particulate matter in the year prior to the pandemic was associated with increased risk of severe COVID-19 outcomes. We also found COVID-19 disease severity to be associated with interactions between exposures. Our individual-level analysis fills a gap in the literature and helps to elucidate the association between long-term exposure to air pollution and COVID-19 outcomes.Entities:
Keywords: Pólya‐gamma; SARS‐CoV‐2; categorical regression; multiple imputation
Year: 2022 PMID: 35945947 PMCID: PMC9353392 DOI: 10.1002/env.2751
Source DB: PubMed Journal: Environmetrics ISSN: 1099-095X Impact factor: 1.527
Missing data pattern for the peak severity outcome categories in our analysis of the Denver, Colorado cohort ()
|
| Missing categories |
| % cases |
|---|---|---|---|
| 0 | – | 20,872 | 37.8 |
| 2 | (Asymptomatic, symptomatic) | 2916 | 5.3 |
| 3 | (Symptomatic, hospitalized, ICU) | 59 | 0.1 |
| 4 | (Symptomatic, hospitalized, ICU, ventilator) | 8725 | 15.8 |
| 5 | (Asymptomatic, symptomatic, hospitalized, ICU, ventilator) | 22,701 | 41.1 |
Note: Cases with partially missing outcomes were missing between 2 and 5 outcome categories. The table shows the number and percent of cases with each missing outcome category pattern.
Classification probabilities into each of the six outcome categories
| Data probabilities | Equal probabilities | ||||
|---|---|---|---|---|---|
| Real data | Signal | Null | Signal | Null | |
| Symptomatic | 0.76 | 0.71 (0.64, 0.81) | 0.77 | 0.19 (0.09, 0.29) | 0.14 |
| Asymptomatic | 0.15 | 0.16 (0.08, 0.25) | 0.16 | 0.18 (0.09, 0.27) | 0.16 |
| Hospitalized | 0.06 | 0.06 (0.01, 0.14) | 0.05 | 0.15 (0.06, 0.24) | 0.19 |
| ICU | 0.01 | 0.03 (0.01, 0.09) | 0.01 | 0.16 (0.07, 0.26) | 0.19 |
| Ventilator | 0.01 | 0.02 (0.01, 0.05) | 0.01 | 0.14 (0.07, 0.25) | 0.14 |
| Death | 0.01 | 0.01 (0.01, 0.07) | 0.01 | 0.18 (0.07, 0.28) | 0.18 |
Note: The table shows the outcome probabilities for the complete cases of the real data (“real data”) and for the complete data in our simulation scenarios. Measures for the simulated data were taken from 500 simulated datasets. The table shows the mean (minimum, maximum) classification probabilities for scenarios with a signal, and the fixed classification probabilities for null scenarios. Classification probabilities for null scenarios did not differ among the simulated datasets. Probabilities are shown for both the “data probabilities” and “equal probabilities” simulation design settings.
Simulation study results for the data probabilities setting
| Proposed method | Complete case analysis | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| RMSE | Bias | Width | Cov | RMSE | Bias | Width | Cov | ||
| Partially missing, signal | 0% | 0.35 | 0.00 | 0.87 | 0.95 | 0.35 | 0.00 | 0.87 | 0.95 |
| 20% | 0.38 | 0.00 | 0.92 | 0.94 | 0.39 | 0.00 | 0.96 | 0.95 | |
| 50% | 0.43 | 0.00 | 1.02 | 0.94 | 0.46 | 0.00 | 1.16 | 0.95 | |
| 80% | 0.51 | 0.00 | 1.20 | 0.93 | 0.63 | 0.00 | 1.62 | 0.96 | |
| Fully missing, signal | 0% | 0.35 | 0.00 | 0.87 | 0.95 | 0.35 | 0.00 | 0.87 | 0.95 |
| 20% | 0.38 | 0.00 | 0.93 | 0.94 | 0.39 | 0.00 | 0.96 | 0.95 | |
| 50% | 0.45 | 0.00 | 1.08 | 0.93 | 0.46 | 0.00 | 1.16 | 0.95 | |
| 80% | 0.59 | 0.00 | 1.41 | 0.91 | 0.63 | 0.00 | 1.62 | 0.96 | |
| Partially missing, null | 0% | 0.34 | 0.00 | 0.83 | 0.95 | 0.34 | 0.00 | 0.83 | 0.95 |
| 20% | 0.38 | 0.00 | 0.90 | 0.95 | 0.38 | 0.00 | 0.93 | 0.95 | |
| 50% | 0.44 | 0.00 | 1.07 | 0.94 | 0.47 | 0.00 | 1.16 | 0.95 | |
| 80% | 0.53 | 0.00 | 1.35 | 0.95 | 0.62 |
| 1.68 | 0.96 | |
| Fully missing, null | 0% | 0.34 | 0.00 | 0.83 | 0.95 | 0.34 | 0.00 | 0.83 | 0.95 |
| 20% | 0.38 | 0.00 | 0.91 | 0.95 | 0.38 | 0.00 | 0.93 | 0.95 | |
| 50% | 0.45 | 0.00 | 1.11 | 0.94 | 0.47 | 0.00 | 1.16 | 0.95 | |
| 80% | 0.56 |
| 1.49 | 0.94 | 0.62 |
| 1.68 | 0.96 | |
Note: The table shows mean across 500 datasets for each measure in four simulation scenarios (“partially missing, signal,” “fully missing, signal,” “partially missing, null,” and “fully missing, null”). The measures are root mean squared error (RMSE), bias, 95% credible interval width (width), and coverage (cov) for exposure regression coefficients. The table shows results from our proposed method and the complete case analysis for missing data levels of 0%, 20%, 50%, and 80%.
Simulation study results for the equal probabilities setting
| Proposed method | Complete case analysis | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| RMSE | Bias | Width | Cov | RMSE | Bias | Width | Cov | ||
| Partially missing, signal | 0% | 0.15 | 0.00 | 0.40 | 0.94 | 0.15 | 0.00 | 0.40 | 0.94 |
| 20% | 0.16 | 0.00 | 0.42 | 0.94 | 0.17 | 0.00 | 0.45 | 0.95 | |
| 50% | 0.18 | 0.00 | 0.47 | 0.93 | 0.21 | 0.00 | 0.56 | 0.95 | |
| 80% | 0.21 | 0.00 | 0.55 | 0.93 | 0.32 | 0.00 | 0.86 | 0.95 | |
| Fully missing, signal | 0% | 0.15 | 0.00 | 0.40 | 0.94 | 0.15 | 0.00 | 0.40 | 0.94 |
| 20% | 0.16 | 0.00 | 0.43 | 0.94 | 0.17 | 0.00 | 0.45 | 0.95 | |
| 50% | 0.20 | 0.00 | 0.52 | 0.93 | 0.21 | 0.00 | 0.56 | 0.95 | |
| 80% | 0.30 | 0.00 | 0.74 | 0.90 | 0.32 | 0.00 | 0.86 | 0.95 | |
| Partially missing, null | 0% | 0.10 | 0.00 | 0.26 | 0.95 | 0.10 | 0.00 | 0.26 | 0.95 |
| 20% | 0.11 | 0.00 | 0.28 | 0.94 | 0.11 | 0.00 | 0.29 | 0.95 | |
| 50% | 0.12 | 0.00 | 0.33 | 0.94 | 0.14 | 0.00 | 0.37 | 0.95 | |
| 80% | 0.16 | 0.00 | 0.41 | 0.92 | 0.22 | 0.00 | 0.59 | 0.95 | |
| Fully missing, null | 0% | 0.10 | 0.00 | 0.26 | 0.95 | 0.10 | 0.00 | 0.26 | 0.95 |
| 20% | 0.11 | 0.00 | 0.29 | 0.94 | 0.11 | 0.00 | 0.29 | 0.95 | |
| 50% | 0.13 | 0.00 | 0.35 | 0.93 | 0.14 | 0.00 | 0.37 | 0.95 | |
| 80% | 0.20 | 0.00 | 0.52 | 0.92 | 0.22 | 0.00 | 0.59 | 0.95 | |
Note: The table shows mean across 500 datasets for each measure in four simulation scenarios (“partially missing, signal,” “fully missing, signal,” “partially missing, null,” and “fully missing, null”). The measures are root mean squared error (RMSE), bias, 95% credible interval width (width), and coverage (cov) for exposure regression coefficients. The table shows results from our proposed method and the complete case analysis for missing data levels of 0%, 20%, 50%, and 80%.
Summary of imputation performance in the data probabilities setting
| Outcome category | |||||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | ||
| Partially missing, signal | Precision | 0.92 | 0.69 | 0.54 | 0.43 | 0.31 | 0.31 |
| Recall | 0.92 | 0.69 | 0.53 | 0.43 | 0.31 | 0.31 | |
| Fully missing, signal | Precision | 0.85 | 0.47 | 0.30 | 0.21 | 0.13 | 0.13 |
| Recall | 0.85 | 0.47 | 0.30 | 0.21 | 0.13 | 0.13 | |
| Partially missing, null | Precision | 0.88 | 0.50 | 0.28 | 0.13 | 0.07 | 0.06 |
| Recall | 0.88 | 0.49 | 0.28 | 0.13 | 0.07 | 0.07 | |
| Fully missing, null | Precision | 0.77 | 0.16 | 0.05 | 0.01 | 0.01 | 0.01 |
| Recall | 0.76 | 0.16 | 0.05 | 0.02 | 0.01 | 0.01 | |
Note: Results are shown for 80% missing data and four simulation scenarios (“partially missing, signal,” “fully missing, signal,” “partially missing, null,” and “fully missing, null”). The table shows mean across 500 datasets for precision and recall for each outcome category. Results for the other missing data levels (20% and 50%) were similar and are shown in the supplemental materials.
Summary of imputation performance in the equal probabilities setting
| Outcome category | |||||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | ||
| Partially missing, signal | Precision | 0.71 | 0.67 | 0.63 | 0.63 | 0.59 | 0.62 |
| Recall | 0.71 | 0.68 | 0.63 | 0.63 | 0.58 | 0.61 | |
| Fully missing, signal | Precision | 0.55 | 0.50 | 0.44 | 0.44 | 0.38 | 0.42 |
| Recall | 0.56 | 0.50 | 0.45 | 0.43 | 0.38 | 0.41 | |
| Partially missing, null | Precision | 0.28 | 0.31 | 0.35 | 0.36 | 0.27 | 0.35 |
| Recall | 0.29 | 0.31 | 0.35 | 0.36 | 0.27 | 0.34 | |
| Fully missing, null | Precision | 0.14 | 0.16 | 0.19 | 0.19 | 0.14 | 0.18 |
| Recall | 0.14 | 0.16 | 0.19 | 0.19 | 0.13 | 0.18 | |
Note: Results are shown for 80% missing data and four simulation scenarios (“partially missing, signal,” “fully missing, signal,” “partially missing, null,” and “fully missing, null”). The table shows mean across 500 datasets for precision and recall for each outcome category. Results for the other missing data levels (20% and 50%) were similar and are shown in the supplemental materials.
FIGURE 1Results of the analysis of the Denver, Colorado COVID‐19 cohort from our proposed method (black circles) and the complete case analysis (blue triangles). The figure shows the posterior mean and 95% credible intervals for the estimated exponentiated category‐specific regression coefficients associated with main effects (top row) and pairwise interactions (bottom row). Exposures are PM, ozone, and temperature. Categories are symptomatic (sympt.), asymptomatic (asympt.), hospitalized (hosp.), admitted to the ICU (ICU), and placed on a mechanical ventilator (vent). There are no regression coefficients for the death category because it was the last category, and thus contains the remaining probability mass in the stick‐breaking representation.
FIGURE 2Results from the analysis of the Denver, Colorado COVID‐19 cohort using our proposed method. The figure shows the posterior mean (black line) and 95% credible interval (gray shaded area) of the estimated odds ratio (OR) for categories symptomatic, hospitalized, admitted to the ICU (ICU), placed on a mechanical ventilator (ventilator) and death, relative to asymptomatic. The OR was calculated as a function of annual average PM exposure (g/m) relative to the mean exposure, holding ozone and temperature at their 25th (a), 50th (b), and 75th (c) percentiles.
FIGURE 3Results from the analysis of the Denver, Colorado COVID‐19 cohort using our proposed method. The figure shows the posterior mean (black line) and 95% credible interval (gray shaded area) of the estimated incremental odds ratio (IOR) for categories symptomatic, hospitalized, admitted to the ICU (ICU), placed on a mechanical ventilator (ventilator) and death, relative to all less severe categories. The IOR was calculated as a function of annual average PM exposure (g/m) relative to the mean exposure, holding ozone and temperature at their 25th (a), 50th (b), and 75th (c) percentiles.