| Literature DB >> 33330887 |
Connor Davis, Michael Gao, Marshall Nichols, Ricardo Henao.
Abstract
Using structured elements from Electronic Health Records (EHR), we seek to: i ) build predictive models to stratify patients tested for COVID-19 by their likelihood for hospitalization, ICU admission, mechanical ventilation and inpatient mortality, and ii ) identify the most important EHR-based features driving the predictions. We leveraged EHR data from the Duke University Health System tested for COVID-19 or hospitalized between March 11, 2020 and August 24, 2020, to build models to predict hospital admissions within 4 weeks. Models were also created for ICU admissions, need for mechanical ventilation and mortality following admission. Models were developed on a cohort of 86,355 patients with 112,392 outpatient COVID-19 tests or any-cause hospital admissions between March 11, 2020 and June 4, 2020. The four models considered resulted in AUROC=0.838 (CI: 0.832-0.844) and AP=0.272 (CI: 0.260-0.287) for hospital admissions, AUROC=0.847 (CI: 0.839-855) and AP=0.585 (CI: 0.565-0.603) for ICU admissions, AUROC=0.858 (CI: 0.846-0.871) and AP=0.434 (CI: 0.403-0.467) for mechanical ventilation, and AUROC=0.0.856 (CI: 0.842-0.872) and AP=0.243 (CI: 0.205-0.282) for inpatient mortality. Patient history abstracted from the EHR has the potential for being used to stratify patients tested for COVID-19 in terms of utilization and mortality. The dominant EHR features for hospital admissions and inpatient outcomes are different. For the former, age, social indicators and previous utilization are the most important predictive features. For the latter, age and physiological summaries (pulse and blood pressure) are the main drivers.Entities:
Year: 2020 PMID: 33330887 PMCID: PMC7743096 DOI: 10.1101/2020.12.04.20244137
Source DB: PubMed Journal: medRxiv
Figure 1.Patient flow summary. Patients can be thought as being in one of six states: outpatient, inpatient (admitted to the hospital), ICU admitted, ventilated, dead or discharged. Patients can transition between some of these states. In every state and possible transition, encounter counts for the DUHS cohort are provided. Dots indicate the state transitions for which the proposed models are will make predictions.
Outpatient cohort characteristics for patient encounters where a COVID-19 test was performed in an outpatient setting. ICU admissions, ventilation and mortality are for inpatient encounters within 4 weeks of COVID-19 testing. p-values were obtained using Wilcoxon rank sum test for Age and χ2 for all other comparisons.
| Population Characteristics | COVID-19 Test | ||
|---|---|---|---|
| Negative ( | Positive ( | ||
| Age (IQR) | 49.76 (32.40, 64.37) | 39.85 (28.44, 52.93) | < 0.0001 |
| Female (%) | 41,504 (60.26) | 2,165 (56.41) | < 0.0001 |
| Race | < 0.0001 | ||
| African-American (%) | 14,797 (21.48) | 1021 (26.60) | |
| White/Caucasian (%) | 43,938 (63.79) | 1,529 (39.84) | |
| Other | 10,084 (14.64) | 1,275 (33.22) | |
| 4-Week Admission (%) | 5,640 (8.19) | 138 (3.59) | < .0001 |
| 4-Week ICU Admission (%) | 771 (1.12) | 27 (0.70) | 0.015 |
| 4-Week Ventilation (%) | 221 (0.32) | 15 (0.39) | 0.46 |
| 4-Week Mortality (%) | 33 (0.05) | 10 (0.26) | < 0.0001 |
Inpatient cohort characteristics for patient encounters in a hospitalization setting. LOS is short for length of stay (in days). p-values were obtained using a Wilcoxon rank sum test for Age and LOS and χ2 for all other comparisons.
| Population Characteristics | COVID-19 Test | ||
|---|---|---|---|
| Negative or Unknown ( | Positive ( | ||
| Age (IQR) | 61.14 (42.85,72.54) | 60.02 (45.09, 71.85) | 0.6 |
| Female (%) | 21,280 (54.91) | 442 (47.99) | < 0.0001 |
| Race | < 0.0001 | ||
| African-American (%) | 12,532 (32.33) | 324 (35.18) | |
| White/Caucasian (%) | 22,082 (56.98) | 282 (30.62) | |
| Other | 4,141 (10.68) | 315 (34.20) | |
| ICU Admission (%) | 6,276 (16.19) | 306 (33.22) | < 0.0001 |
| Ventilation (%) | 2,488 (6.42) | 173 (18.78) | < 0.0001 |
| Mortality (%) | 956 (2.47) | 119 (12.92) | < 0.0001 |
| LOS days (IQR) | 3.65 (2.15,6.71) | 6.18 (3.22, 12.20) | < 0.0001 |
Figure 2.Classification performance metrics. ROC (left) and PRC (right) for the models considered (top to bottom): hospital admission, ICU admission, ventilation and inpatient mortality. Dashed lines represent no skill (random) predictions. Classification thresholds selected using Youden’s methods on only the validation set of each model are denoted as dots.
Performance summary metrics for all models on the validation and test sets. We report AUROC (area under the ROC) and AP (average precision of the PRC). Figures in parentheses are 95% CIs estimated via bootstrapping.
| Validation Set | Test Set | |||
|---|---|---|---|---|
| Model | AUROC | AP | AUROC | AP |
| Admission | 0.831(0.820,0.842) | 0.336(0.313,0.365) | 0.838(0.832,0.844) | 0.272(0.260,0.287) |
| ICU Admission | 0.846(0.837,0.856) | 0.625(0.604,0.649) | 0.847(0.839,0.855) | 0.585(0.565,0.603) |
| Ventilation | 0.845(0.828,0.863) | 0.330(0.287,0.373) | 0.858(0.846,0.871) | 0.434(0.403,0.467) |
| Inpatient Mortality | 0.853(0.834,0.874) | 0.200(0.163,0.251) | 0.856(0.842,0.872) | 0.243(0.205,0.282) |
Confusion matrix summaries on the test set. Thresholds, t, for each model were selected using Youden’s method on the validation set. The reported metrics include: TPR (True Positive Rate, or sensitivity), TNR (True Negative Rate, or specificity), FPR (False Positive Rate), FNR (False Negative Rate), PPV (Positive Predictive Value, or precision), NPV (Negative Predictive Value) and Accuracy (overall agreement). Figures in parentheses are 95% CIs estimated via bootstrapping.
| Model | τ | TPR | TNR | FPR | FNR | PPV | NPV | Accuracy |
|---|---|---|---|---|---|---|---|---|
| Admission | 0.101 | 73.5(72.0,74.9) | 78.1(77.8,78.5) | 21.8(21.5,22.2) | 26.5(25.1,28.0) | 19.6(18.9,20.3) | 97.6(97.4,97.6) | 77.8(77.5,78.2) |
| ICU Admission | 0.154 | 76.4(74.7,77.7) | 75.5(75.0,76.2) | 24.4(23.8,25.0) | 23.6(22.3,25.3) | 35.3(43.1,36.5) | 94.8(94.4,95.2) | 75.7(75.1,76.3) |
| Ventilation | 0.034 | 76.8(74.2,79.3) | 79.8(79.2,80.4) | 20.2(19.6,20.8) | 23.2(20.7,25.8) | 18.2(17.0,19.3) | 98.3(98.1,98.5) | 79.6(79.1,80.2) |
| Inpatient Mortality | 0.014 | 73.8(70.3,77.4) | 79.7(79.1,80.2) | 20.3(19.8,20.9) | 26.2(22.6,29.7) | 8.1(7.3,9.0) | 99.2(99.1,99.3) | 79.6(79.0,80.1) |
Figure 3.Predictor relative importance. The top-10 features by relative importance are shown for the outpatient model (Top) and inpatient models (Middle). Distribution summaries for two of the features with high relative importance stratified by COVID-19 testing (not tested, positive and negative) are also shown, e.g., max pulse (Bottom left) and its missingness (Bottom Middle), and whether the individuals is an alcohol user (Bottom Right).