| Literature DB >> 33973011 |
Peter D Sottile1, David Albers2, Peter E DeWitt2, Seth Russell3, J N Stroh4, David P Kao5, Bonnie Adrian6, Matthew E Levine7, Ryan Mooney8, Lenny Larchick8, Jean S Kutner9, Matthew K Wynia10,11, Jeffrey J Glasheen12, Tellen D Bennett2,13.
Abstract
OBJECTIVE: To rapidly develop, validate, and implement a novel real-time mortality score for the COVID-19 pandemic that improves upon sequential organ failure assessment (SOFA) for decision support for a Crisis Standards of Care team.Entities:
Keywords: COVID-19; crisis triage; decision support systems, clinical, machine learning; mortality prediction
Mesh:
Year: 2021 PMID: 33973011 PMCID: PMC8136054 DOI: 10.1093/jamia/ocab100
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.Study data flow and cohort identification. A) Data flow through the EHR and research team, B) Retrospective Cohort selection for model development, C) Prospective Cohort selection for model evaluation and validation.
Prospective cohort characteristics and hospital course
| All Encounters(N = 27 296) | COVID-19 Negative (N = 25 938) | COVID-19 Positive (N = 1358) |
| |
|---|---|---|---|---|
|
| 54.3 (20.4) | 54.2 (20.5) | 56.8 (18.4) |
|
|
| 15 660 (57.4%) | 15 057 (58.0%) | 603 (44.4%) |
|
|
|
| |||
|
| 20 430 (74.8%) | 19 848 (76.5%) | 582 (42.9%) | |
|
| 1964 (7.2%) | 1790 (6.9%) | 174 (12.8%) | |
|
| 4481 (16.4%) | 3901 (15.0%) | 580 (42.7%) | |
|
| 421 (1.5%) | 399 (1.5%) | 22 (1.6%) | |
|
|
| |||
|
| 22 496 (82.4%) | 21 755 (83.9%) | 741 (54.6%) | |
|
| 4398 (16.1%) | 3795 (14.6%) | 603 (44.4%) | |
|
| 402 (1.5%) | 388 (1.5%) | 14 (1.0%) | |
|
| 16 052 (58.8%) | 14 859 (57.3%) | 1193 (87.8%) |
|
|
| 1398 (5.1%) | 1057 (4.1%) | 341 (25.1%) |
|
|
| 1482 (5.4%) | 1382 (5.3%) | 100 (7.4%) |
|
|
| 3.0 (2.0, 5.2) | 3.0 (1.9, 5.0) | 5.5 (3.0, 9.6) |
|
|
| 717 (2.6%) | 551 (2.1%) | 166 (12.2%) |
|
|
| 1480 (5.4%) | 1241 (4.8%) | 239 (17.6%) |
|
|
| 8.4 (4.6, 15.1) | 7.7 (4.1, 13.3) | 15.2 (8.2, 21.0) |
|
|
| 3.6 (1.6, 7.8) | 2.9 (1.4, 6.2) | 9.1 (5.3, 15.0) |
|
|
| 1.8 (0.7, 5.7) | 1.4 (0.6, 3.9) | 7.5 (4.5, 12.6) |
|
|
| 408 (27.6%) | 325 (26.2%) | 83 (34.7%) |
|
Note: Reported P values are to assess differences between COVID-19 negative and COVID-19 positive encounters.
Abbreviations: ICU, intensive care unit; IQR, interquartile range; SD, standard deviation.
Figure 2.Stacked model development. Primary analysis/production model: we used retrospective data to train the component models (40%) and the ensemble/stacked model (40%) and to assess (blue) the ensemble/stacked model (20%). This ensemble/stacked model was used to predict mortality for the whole prospective (red) and prospective COVID-19 (green) datasets. Sensitivity Analysis 1 (not shown): same workflow as primary analysis but the prospective data were used to train and test the models (same 40/40/20 split). The final model was used to predict the entire prospective COVID-19 dataset. Sensitivity Analysis 2 (not shown): same workflow as primary analysis but the prospective COVID-19 data were used to train and test the models (same 40/40/20 split). We fit multiple qSOFA (4), SOFA (2), and CURB-65 (2) component models in health system-guided attempts for parsimony. The different forms of qSOFA, SOFA, and CURB-65 are shown in Supplementary eTable 3. All 11 component models were fed into the model stacking process. The novel COVID-19 model included laboratory results reported to be associated with COVID-19 mortality including D-dimer, LDH, absolute lymphocyte count, BUN, troponin, CK, ALT, and lactate(Supplementary eTable 2).
Model area under the receiver operator curve and precision recall curve for each of the component models and the final stacked model. Models were trained and validated on the initial retrospective cohort. The models were then validated on the prospective cohort and on the subset of patients with COVID-19. The AUROC and AUPRC for the retrospective cohort were based on a 20% holdout of the encounters for testing and evaluation. The prospective validation cohort reflects expected performance when running in a live EHR for both COVID-19 positive and negative patients. Bootstrapped 95% confidence intervals are shown for both AUROC and AUPRC.
|
| Prospective Validation Cohort (N = 27 296) | COVID-19 Positive Validation Cohort (N = 1358) | ||||
|---|---|---|---|---|---|---|
| AUROC | AUPRC (baseline 0.07) | AUROC | AUPRC (baseline 0.03) | AUROC | AUPRC (baseline 0.12) | |
|
| 0.90 (0.89, 0.90) | 0.55 (0.55, 0.57) | 0.90 (0.89, 0.91) | 0.42 (0.38, 0.46) | 0.85 (0.82, 0.88) | 0.56 (0.48, 0.63) |
|
| 0.83 (083, 0.84) | 0.35 (0.33, 0.36) | 0.84 (0.82, 0.86) | 0.26 (0.23, 0.29) | 0.79 (0.74, 0.83) | 0.43 (0.36, 0.51) |
|
| 0.81 (0.81, 0.82) | 0.33 (0.31, 0.33) | 0.87 (0.86, 0.88) | 0.26 (0.23, 0.29) | 0.90 (0.87, 0.92) | 0.59 (0.52, 0.67) |
|
| 0.85 (0.85, 0.86) | 0.51 (0.51, 0.54) | 0.88 (0.87, 0.90) | 0.40 (0.36, 0.44) | 0.86 (0.83, 0.89) | 0.60 (0.52, 0.67) |
|
| 0.63 (0.63, 0.66) | 0.11 (0.11, 0.12) | 0.72 (0.70, 0.73) | 0.05 (0.05, 0.06) | 0.75 (0.71, 0.78) | 0.26 (0.21, 0.33) |
|
| 0.83 (0.83, 0.84) | 0.45 (0.44, 0.46) | 0.88 (0.87, 0.90) | 0.33 (0.29, 0.36) | 0.91 (0.89, 0.93) | 0.61 (0.54, 0.68) |
|
| 0.93 (0.93, 0.94) | 0.65 (0.65, 0.67) | 0.94 (0.93, 0.95) | 0.54 (0.50, 0.57) | 0.90 (0.87, 0.92) | 0.65 (0.59, 0.71) |
Abbreviations: ARDS, acute respiratory distress syndrome; AUPRC, area under the precision-recall curve; AUROC, area under the receiver operator curve; CCI, Charlson Comorbidity Index; qSOFA, a widely used pneumonia mortality score; SOFA, sequential organ failure assessment.
Mortality model inputs
| All encounters(N = 27 296) | COVID-19 negative (N = 25 938) | COVID-19 positive (N = 1358) |
| |
|---|---|---|---|---|
|
| ||||
|
| 0.0 (0.0, 1.0) | 0.0 (0.0, 1.0) | 0.1 (0.0, 1.0) |
|
|
| 2.0 (2.0, 4.0) | 2.0 (2.0, 3.0) | 3.0 (2.0, 5.0) |
|
|
| 1.0 (0.1, 2.0) | 1.0 (0.1, 2.0) | 1.0 (0.0, 2.0) |
|
|
| 1.0 (0.0, 3.0) | 1.0 (0.0, 3.0) | 1.0 (0.0, 2.0) |
|
|
| ||||
|
| 59 (0.2%) | 59 (0.2%) | 0 (0.0%) |
|
|
| 396 (1.5%) | 392 (1.5%) | 4 (0.3%) |
|
|
| 264 (1.0%) | 246 (0.9%) | 18 (1.3%) |
|
|
| 2676 (9.8%) | 2503 (9.6%) | 173 (12.7%) |
|
|
| 2486 (9.1%) | 2323 (9.0%) | 163 (12.0%) |
|
|
| 0.7 ± 2.0 | 0.7 ± 2.0 | 0.6 ± 0.8 |
|
|
| 7.4 ± 0.0 | 7.4 ± 0.0 | 7.4 ± 0.1 |
|
|
| 335.7 ± 212.7 | 340.7 ± 215.8 | 239.6 ± 102.0 |
|
|
| 94.7 ± 2.4 | 94.7 ± 2.4 | 93.4 ± 3.1 |
|
|
| ||||
|
| 405.0 ± 3,699.8 | 326.4 ± 2,440.3 | 1,906.2 ± 12,614.9 |
|
|
| 229.1 ± 214.9 | 223.1 ± 207.4 | 343.5 ± 305.5 |
|
|
| 1.4 ± 2.0 | 1.5 ± 2.0 | 1.3 ± 1.6 |
|
|
| 19.4 ± 15.1 | 19.3 ± 14.9 | 21.2 ± 18.4 |
|
|
| 0.5 ± 9.0 | 0.6 ± 9.2 | 0.2 ± 3.9 |
|
|
| 173.7 ± 1,612.7 | 170.5 ± 1,567.2 | 235.4 ± 2,316.0 |
|
|
| 21.1 ± 20.6 | 21.1 ± 21.0 | 20.9 ± 10.4 |
|
|
| 1.0 ± 1.1 | 1.0 ± 1.1 | 1.2 ± 1.6 |
|
In this table, the summary measures for the covariates of each component model in the stacked model are calculated at a single point in time—the time of maximum SOFA score for each encounter.
Abbreviations: ALC, absolute lymphocyte count; ALT, alanine aminotransferase; BUN, blood urea nitrogen; CK, creatinine kinase; FFP, fresh frozen plasma; GCS, Glasgow comas score; IQR, interquartile range; LDH, lactate dehydrogenase; PF, PaO2 to FiO2 ratio; PRBC, packed red blood cells, SD, standard deviation.
Figure 3.Stacked model receiver operator characteristic curves. The retrospective cohort was used for training and validation (in a 40%-40%-20% split). The prospective and COVID-19 positive cohorts were used to validate the retrospectively trained model.
Figure 4.Stacked model performance metrics across all potential probability thresholds. The purpose of the main stacked model was to create a ranked patient list by probability of mortality. If the model was to be used as part of a clinical decision support alert, then a threshold for the estimated probability would need to be used to define when an alert fires. Figure 4 shows common model performance metrics as a function of the threshold.
Figure 5.Confidence intervals around point-wise predicted mortality. This figure shows the width of 95% confidence interval (y-axis) around the stacked model mortality probabilities estimates at each potential value for estimated probability. Confidence intervals were narrowest at the extremes of mortality probability (likely the most actionable predictions, thus the predictions with the highest stakes).
Figure 6.Average predicted mortality over the course of the hospitalization, stratified by actual mortality. This figure shows smoothed average probability of mortality over the course of the hospitalization, stratified by actual mortality. On average, patients who died had mortality probability estimates much higher than those who did not die, even shortly after admission.