| Literature DB >> 35877617 |
Dylan Clark-Boucher1, Jonathan Boss1, Maxwell Salvatore1,2, Jennifer A Smith2,3, Lars G Fritsche1,4,5, Bhramar Mukherjee1,2,4,5.
Abstract
Since the beginning of the Coronavirus Disease 2019 (COVID-19) pandemic, a focus of research has been to identify risk factors associated with COVID-19-related outcomes, such as testing and diagnosis, and use them to build prediction models. Existing studies have used data from digital surveys or electronic health records (EHRs), but very few have linked the two sources to build joint predictive models. In this study, we used survey data on 7,054 patients from the Michigan Genomics Initiative biorepository to evaluate how well self-reported data could be integrated with electronic records for the purpose of modeling COVID-19-related outcomes. We observed that among survey respondents, self-reported COVID-19 diagnosis captured a larger number of cases than the corresponding EHRs, suggesting that self-reported outcomes may be better than EHRs for distinguishing COVID-19 cases from controls. In the modeling context, we compared the utility of survey- and EHR-derived predictor variables in models of survey-reported COVID-19 testing and diagnosis. We found that survey-derived predictors produced uniformly stronger models than EHR-derived predictors-likely due to their specificity, temporal proximity, and breadth-and that combining predictors from both sources offered no consistent improvement compared to using survey-based predictors alone. Our results suggest that, even though general EHRs are useful in predictive models of COVID-19 outcomes, they may not be essential in those models when rich survey data are already available. The two data sources together may offer better prediction for COVID severity, but we did not have enough severe cases in the survey respondents to assess that hypothesis in in our study.Entities:
Mesh:
Year: 2022 PMID: 35877617 PMCID: PMC9312965 DOI: 10.1371/journal.pone.0269017
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Processing of COVID-19 survey data and Michigan Genomics Initiative EHRs.
For the survey-based analysis, respondents from the COVID-19 survey were limited to those with relevant EHR from MGI. The desired EHR variables were then merged with the survey data for those patients.
Fig 2Modeling of survey-reported COVID-19 outcomes.
(A) Initially, each possible predictor was tested for each outcome individually, adjusting for covariates. (B) Next, different subsets of the data were used to run penalized multivariable models, allowing for comparison between groups of variables.
Fig 3Internal validation by repeated 70/30 train/test data splitting on multiply imputed datasets.
We evaluated each model internally using data splitting. For each split of the data, we pooled the results of the 30 imputed datasets into a single AUC using Rubin’s Rules. The resulting 100 pooled AUCs were used to compute an empirical mean and confidence interval.
Descriptive statistics of covariates across survey-reported COVID-19 outcomes.
| Variables | All (n = 7,054) | Tested (n = 842) | Diagnosed by physician or test (n = 78) | Self-diagnosed due to symptoms (n = 132) |
|---|---|---|---|---|
|
| ||||
| Age (Years) | 58.1 (14.7) | 56.6 (14.7) | 49.4 (14.8) | 54.9 (12.8) |
| BMI (kg/m2) | 29.2 (6.72) | 29.8 (6.82) | 30.0 (6.91) | 29.1 (6.13) |
|
| ||||
| Female Sex | 4,223 (59.9%) | 542 (64.4%) | 50 (64.1%) | 89 (67.4%) |
| Essential Worker | 1,421 (20.1%) | 245 (29.1%) | 36 (46.2%) | 36 (27.3%) |
| Race / Ethnicity | ||||
| | 158 (2.24%) | 37 (4.39%) | 6 (7.69%) | 4 (3.03%) |
| | 6,545 (92.8%) | 755 (89.7%) | 64 (82.1%) | 123 (93.2%) |
| | 261 (3.70%) | 41 (4.87%) | 6 (7.69%) | 4 (3.03%) |
| | 90 (1.28%) | 9 (1.07%) | 2 (2.56%) | 1 (0.75%) |
| Education | ||||
| | 1,180 (16.7%) | 173 (20.6%) | 16 (20.5%) | 23 (17.4%) |
| | 1,128 (16.0%) | 152 (18.1%) | 14 (17.9%) | 20 (15.2%) |
| | 2,204 (31.2%) | 240 (28.6%) | 22 (28.2%) | 43 (32.6%) |
| | 2,510 (35.6%) | 275 (32.7%) | 25 (32.1%) | 45 (34.1%) |
| | 32 (0.45%) | 2 (0.24%) | 1 (1.28%) | 1 (0.76%) |
Abbreviations: BMI, Body Mass Index; NHB, non-Hispanic Black; NHW, non-Hispanic White.
Fig 4Firth-corrected odds ratios for survey-reported COVID-19 outcomes, adjusted for covariates.
All odds ratios used Firth correction, were adjusted for age, sex, race/ethnicity, education, and essential worker status, and were combined across 30 multiply imputed datasets using Rubin’s Rules. Significance was determined using α = 0.05 for covariates and α = 0.05/184 ≈ 2.72x10-4 for predictors. For brevity, predictors are included in the figure only if they are statistically significant for at least one outcome.
Area Under the Curve (AUC) and the corresponding 95% CI for the two COVID-19-related outcome prediction models.
| Mean AUC (95% Empirical CI)* | |||||
|---|---|---|---|---|---|
| Outcome Variable | Model Type | Covariates Only | Covariates + EHR Variables | Covariates + Survey Variables | All Variables |
| Tested for COVID-19 | Lasso | 0.582 | 0.593 | 0.646 | 0.645 |
| Ridge Regression | 0.582 | 0.597 | 0.641 | 0.639 | |
| Elastic Net | 0.582 | 0.595 | 0.649 | 0.648 | |
| Diagnosed with COVID-19 | Lasso | 0.694 | 0.694 | 0.798 | 0.798 |
| Ridge Regression | 0.694 | 0.713 | 0.812 | 0.821 | |
| Elastic net | 0.694 | 0.709 | 0.802 | 0.804 | |
Mean AUC reflects the average of 100 random training test/splits, with a CI representing the 2.5th and 97.5th percentiles, respectively. The tested for COVID-19 outcome compares the tested population (1) to those not tested (0). The diagnosed with COVID-19 outcome compares those diagnosed with COVID-19 by a physician or test (1) to those not diagnosed, not tested, and not self-diagnosed (0).
Data from Michigan Medicine COVID-19 Survey and Michigan Genomics Initiative. Sample size: n = 7,054 for testing outcome models, n = 6,159for diagnosis models.
Elastic net regression variable selection in models of “received COVID-19 test” outcome.
| Proportion of Times Selected | |||||
|---|---|---|---|---|---|
| EHR Variables Models | Survey Variable Models | All Variable Models | |||
| Comorbidity score | 0.99 | Q17. Ever Hospitalized with infection | 1.00 | Q17. Ever Hospitalized with infection | 1.00 |
| Kidney disease | 0.98 | Q36. Household member diagnosed with COVID-19 | 1.00 | Q36. Household member diagnosed with COVID-19 | 1.00 |
| Respiratory disease | 0.94 | Q147.1 Kidney disease | 1.00 | Q147.1 Kidney disease | 1.00 |
| Liver disease | 0.91 | Q68.1 Felt fatigued in past week | 1.00 | Q68.1 Felt fatigued in past week | 1.00 |
| Former smoker | 0.80 | Q70.1 Abdomen pain in past 6 months | 1.00 | Q70.1 Abdomen pain in past 6 months | 1.00 |
| Q70.3 Headaches in past 6 months | 1.00 | Q70.3 Headaches in past 6 months | 0.99 | ||
| Q13. No times gotten flu in past year | 0.99 | Q13. No times gotten flu in past year | 0.99 | ||
| Q125. Cardiovascular condition | 0.98 | Q125. Cardiovascular condition | 0.98 | ||
| Q146.2 COPD | 0.98 | Q146.2 COPD | 0.97 | ||
| Q147. Metabolic Condition | 0.96 | Q125.7 Blood clotting disorder | 0.95 | ||
| Q125.7 Has cardiovascular condition | 0.95 | Q147. Metabolic condition | 0.94 | ||
| Q59.1 Police officer lives in home | 0.95 | Q59.1 Police officer lives in home | 0.94 | ||
| Q23.3 Concerned about losing job | 0.95 | Q114.1 Overall body pain at worst | 0.94 | ||
| Q71.1 Some difficult doing chores | 0.94 | Q23.3 Concerned about losing job | 0.93 | ||
| Q71.1 Much difficulty doing chores | 0.94 | Q71.1 Much difficulty doing chores | 0.93 | ||
| Q114.1 Overall body pain at worst | 0.94 | Q71.1 Some difficulty doing chores | 0.93 | ||
| Q133.2 Benzodiazepine use has increased | 0.88 | Q133.2 Benzodiazepine use has increased | 0.87 | ||
| Q114.2 Overall body pain on average | 0.86 | Q68.3 Trouble waking up refreshed | 0.85 | ||
| Q46. Flu shot in past year | 0.86 | Q77. Poor sleep quality, past 7 days | 0.85 | ||
| Q77. Poor sleep quality, past 7 days | 0.86 | Q114.2 Overall body pain, on average | 0.85 | ||
| Q68.3 Trouble waking up refreshed | 0.85 | Q133.1 Opioid use has increased | 0.83 | ||
| Q133.1 Opioid use has increase | 0.85 | Q46. Flu shot in past year | 0.83 | ||
| Q36.1. Lives alone | 0.83 | Q36.1. Lives alone | 0.81 | ||
| Q68.2 Memory trouble in past week | 0.81 | ||||
The value shown is the proportion of times the variable was chosen in 3,000 fitted models, as models were fit on 100 train/test splits of 30 multiply imputed datasets (100x30 = 3,000). Only variables with a selection rate over 80% are included. Variable descriptions are available in the supplement (S1 Table). The tested for COVID-19 outcome compares the tested population (1) to those not tested (0). All models included the six covariates age, sex, race/ethnicity, body mass index, education level, and essential worker status, which were not selected for or penalized. Data from Michigan Medicine COVID-19 Survey and Michigan Genomics Initiative. Sample size: 7,054.
Elastic net regression variable selection in models of “diagnosed with COVID-19” outcome.
| Proportion of Times Selected | |||||
|---|---|---|---|---|---|
| EHR Variables Models | Survey Variable Models | All Variable Models | |||
| Liver disease | 0.92 | Q36. Household member diagnosed with COVID-19 | 1.00 | Q36. Household member diagnosed with COVID-19 | 1.00 |
| Respiratory disease | 0.87 | Q85. Relative died from COVID-19 | 0.85 | Q85. Relative died from COVID-19 | 0.82 |
| Q70.1 Abdomen pain in past 6 months | 0.82 | Q81. Relative diagnosed with COVID-19 | 0.81 | ||
| Q81. Relative diagnosed with COVID-19 | 0.82 | Q70.1 Abdomen pain in past 6 months | 0.80 | ||
The value shown is the proportion of times the variable was chosen in 3,000 fitted models, as models were fit on 1000 train/test splits of 30 multiply imputed datasets (100x30 = 3,000). Only variables with a selection rate over 80% are included. Variable descriptions are available in the supplement (S1 Table). The diagnosed with COVID-19 outcome compares those diagnosed with COVID-19 by a physician or test (1) to those not diagnosed, not tested, and not self-diagnosed (0). All models included the six covariates age, sex, race/ethnicity, body mass index, education level, and essential worker status, which were not selected for or penalized. Data from Michigan Medicine COVID-19 Survey and Michigan Genomics Initiative. Sample size: 6,159.