| Literature DB >> 33818393 |
Brandon C Cummings1, Sardar Ansari1, Jonathan R Motyka1, Guan Wang1, Richard P Medlin1, Steven L Kronick1, Karandeep Singh1,2,3,4, Pauline K Park1,5, Lena M Napolitano1,5, Robert P Dickson1,2,6, Michael R Mathis1,7, Michael W Sjoding1,2, Andrew J Admon1,2,4, Ross Blank1,7, Jakob I McSparron1,2, Kevin R Ward1,4,8, Christopher E Gillies1,4.
Abstract
BACKGROUND: COVID-19 has led to an unprecedented strain on health care facilities across the United States. Accurately identifying patients at an increased risk of deterioration may help hospitals manage their resources while improving the quality of patient care. Here, we present the results of an analytical model, Predicting Intensive Care Transfers and Other Unforeseen Events (PICTURE), to identify patients at high risk for imminent intensive care unit transfer, respiratory failure, or death, with the intention to improve the prediction of deterioration due to COVID-19.Entities:
Keywords: COVID-19; ICU; biomedical informatics; critical care; deterioration; informatics; intensive care unit; machine learning; mortality; prediction; predictive analytics
Year: 2021 PMID: 33818393 PMCID: PMC8061893 DOI: 10.2196/25066
Source DB: PubMed Journal: JMIR Med Inform
Study population.a
| Data set | Non–COVID-19 | COVID-19 | |||||||||||
|
| Training 2014-2018 | Validation 2014-2018 | Testing 2019 | Testing 2020 |
| ||||||||
| Encounters, n | 105,457 | 26,089 | 33,472 | 637 | N/Ac | ||||||||
| Patients, n | 62,392 | 15,597 | 23,368 | 600 | N/A | ||||||||
| Age (years), median (IQR) | 60.2 (46.5-70.8) | 60.4 (46.7-71.2) | 61.0 (47.0-71.5) | 61.8 (49.6-72.0) | .02 | ||||||||
|
| |||||||||||||
|
| White | 86,522 (82.0) | 21,647 (83.0) | 27,036 (80.8) | 329 (51.6) | <.001 | |||||||
|
| Black | 12,344 (11.7) | 2861 (11.0) | 4214 (12.6) | 220 (34.5) | <.001 | |||||||
|
| Asian | 2145 (2.0) | 504 (1.9) | 686 (2.0) | 29 (4.6) | <.001 | |||||||
|
| Otherd | 4446 (4.2) | 1077 (4.1) | 1536 (4.6) | 59 (9.3) | <.001 | |||||||
| Female sex, n (%) | 53,225 (50.5) | 13,048 (50.0) | 16,760 (50.1) | 282 (44.3) | .003 | ||||||||
|
| 4236 (4.0) | 1007 (3.9) | 1337 (4.0) | 155 (24.3) | <.001 | ||||||||
|
| Death | 920 (0.9) | 232 (0.9) | 277 (0.8) | 16 (2.5) | <.001 | |||||||
|
| ICUf transfer | 2979 (2.8) | 717 (2.7) | 1000 (3.0) | 139 (21.8) | <.001 | |||||||
|
| Mechanical ventilation | 1330 (1.3) | 299 (1.1) | 352 (1.1) | 49 (7.7) | <.001 | |||||||
|
| Cardiac arrestg | 143 (0.1) | 37 (0.1) | 56 (0.2) | N/A | N/A | |||||||
aPatients were subset into one of four study cohorts: a training set for learning model parameters, a validation set for model structure and hyperparameter tuning, a holdout test set for evaluation, and a final test set composed of patients testing positive for COVID-19. Values are based on individual hospital encounters.
bP values were calculated across the two test sets using a Mann-Whitney U test for continuous variables (age) and a chi-square test for categorical variables.
cN/A: not applicable.
dOther races comprising less than 1% of the population each were incorporated under the “Other” heading.
eThe event rate represents a composite outcome indicating that one of the following events occurred: death, ICU transfer, mechanical ventilation, and cardiac arrest. The individual frequencies of these adverse events are also reported and represent the number of cases where each particular outcome was the first to occur. Please see the section Outcomes for the procedure of calculating these targets.
fICU: intensive care unit.
gCardiac arrest was not used as a target in the COVID-19 positive population, as the manually adjudicated data is not yet available at the time of writing.
Figure 1PICTURE training and validation framework. The electronic health record data is split into COVID-19 and non–COVID-19 patients. Encounters with an admission date between January 1, 2014, and December 31, 2018, were set aside for training (80%) and validation (20%) subsets. Encounters with an admission date between January 1 and December 31, 2019, were used as a non–COVID-19 test set. Encounters from 2020 that tested positive for COVID-19 were held out as a separate test set. In the case that a given patient has multiple encounters that overlap these boundaries, only the later encounters were considered to remove patient overlap between the cohorts. EDI: Epic Deterioration Index; NEWS: National Early Warning Score; PICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events; XGBoost: extreme gradient boosting.
Figure 2Alignment of PICTURE predictions to EDI scores. Although the PICTURE system outputs predictions each time a new observation (eg, a new vital sign) is input in to the system, the EDI score is generated every 15 minutes. To give the EDI any potential advantage, PICTURE scores are aligned to EDI scores by selecting the most recent PICTURE score before each EDI prediction. In both cases, observations occurring 30 minutes before the target and after are excluded (red). For the patients who did not experience an adverse event, the maximum score was calculated across the entire encounter. EDI: Epic Deterioration Index; PICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
Evaluation of PICTURE (performance in a non–COVID-19 cohort).
| Granularity and analytic | AUROCa (95% CIb) | AUPRCd (95% CI) | Event rate (%) | ||||||||||||
|
| <.001 |
| <.001 | 1.01 | |||||||||||
|
| PICTUREe | 0.821 (0.810-0.832) |
| 0.099 (0.085-0.110) |
|
| |||||||||
|
| NEWSf,g | 0.753 (0.741-0.765) |
| 0.058 (0.049-0.064) |
|
| |||||||||
|
| <.001 |
| <.001 | 3.99 | |||||||||||
|
| PICTURE | 0.846 (0.834-0.858) |
| 0.326 (0.301-0.351) |
|
| |||||||||
|
| NEWS | 0.782 (0.768-0.795) |
| 0.185 (0.165-0.203) |
|
| |||||||||
aAUROC: area under the receiver operating characteristic curve.
b95% CIs were calculated using a block bootstrap with 1000 replicates. In the case of the observation level, this bootstrap was blocked on the encounter level.
cP values are calculated using the bootstrap method outlined in the section Performance Measures.
dAUPRC: area under the precision-recall curve.
ePICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
fNEWS: National Early Warning Score.
gNEWS is used as a baseline for comparison.
Comparison of PICTURE and the EDI in a non–COVID-19 cohort.
| Granularity and analytic | AUROCa (95% CI) | AUPRCc (95% CI) | Event rate (%) | ||||
|
| 0.77 | ||||||
|
| PICTUREd | 0.819 (0.805-0.834) |
vs EDIe: <.001 vs NEWSf: <.001 | 0.115 (0.096-0.130) |
vs EDI: <.001 vs NEWS: <.001 |
| |
|
| EDI | 0.763 (0.746-0.781) |
vs NEWS: .01 | 0.081 (0.066-0.094) |
vs NEWS: <.001 |
| |
|
| NEWS | 0.745 (0.729-0.761) |
N/Ag | 0.062 (0.051-0.072) |
N/A |
| |
|
| 4.21 | ||||||
|
| PICTURE | 0.859 (0.846-0.873) |
vs EDI: <.001 vs NEWS: <.001 | 0.368 (0.335-0.400) |
vs EDI: <.001 vs NEWS: <.001 |
| |
|
| EDI | 0.803 (0.788-0.821) |
vs NEWS: .15 | 0.274 (0.244-0.301) |
vs NEWS: <.001 |
| |
|
| NEWS | 0.797 (0.781-0.814) |
N/A | 0.229 (0.204-0.254) |
N/A |
| |
aAUROC: area under the receiver operating characteristic curve.
bP values reflect the difference in AUROC or AUPRC.
cAUPRC: area under the precision-recall curve.
dPICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
eEDI: Epic Deterioration Index.
fNEWS: National Early Warning Score.
gN/A: not applicable.
Figure 3Comparison of PICTURE and the EDI. Panel A: receiver operating characteristic (ROC) curves for PICTURE, EDI, and NEWS models in the non–COVID-19 cohort. PICTURE area under the curve (AUC): 0.819; EDI AUC: 0.763; NEWS AUC: 0.745. Panel B: Precision-recall (PR) curves for the two models in the non–COVID-19 cohort. PICTURE AUC: 0.115; EDI AUC: 0.081; NEWS AUC: 0.062. Panel C: ROC curves for PICTURE, EDI, and NEWS models in the COVID-19 cohort. PICTURE AUC: 0.849; EDI AUC: 0.803; NEWS AUC: 0.746. Panel D: PR curves for the two models. PICTURE AUC: 0.173; EDI AUC: 0.131; NEWS AUC: 0.098 in the COVID-19 cohort. All curves represent observation-level analysis. EDI: Epic Deterioration Index; FPR: false-positive rate; NEWS: National Early Warning Score; PICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events; TPR: true-positive rate.
Lead time analysis in non–COVID-19 cohort.a
| Lead time (hours) | AUROCb (95% CI) | AUPRCc (95% CI) | Event rate (%) | Sample size, n | |||||
|
| PICTUREd | EDIe | PICTURE | EDI |
|
| |||
| 0.5 | 0.859 (0.846-0.873) | 0.803 (0.787-0.820) | 0.368 (0.336-0.400) | 0.274 (0.244-0.302) | 4.21 | 21,636 | |||
| 1 | 0.850 (0.835-0.864) | 0.795 (0.778-0.811) | 0.346 (0.315-0.379) | 0.254 (0.227-0.280) | 4.18 | 21,636 | |||
| 2 | 0.838 (0.823-0.853) | 0.784 (0.767-0.802) | 0.321 (0.292-0.352) | 0.238 (0.210-0.265) | 4.14 | 21,622 | |||
| 6 | 0.825 (0.810-0.840) | 0.768 (0.750-0.787) | 0.280 (0.249-0.310) | 0.210 (0.184-0.237) | 3.92 | 21,572 | |||
| 12 | 0.817 (0.801-0.832) | 0.767 (0.749-0.786) | 0.247 (0.215-0.275) | 0.183 (0.159-0.207) | 3.67 | 21,515 | |||
| 24 | 0.808 (0.790-0.826) | 0.759 (0.740-0.779) | 0.205 (0.172-0.230) | 0.144 (0.121-0.164) | 3.24 | 21,419 | |||
aThe performance of the two models (encounter level) at various lead times were assessed by evaluating the maximum prediction score prior to x hours before the given event, with x ranging in progressively greater intervals from 0.5 to 24. On this cohort of non–COVID-19 patients, PICTURE consistently outperformed the EDI. At each level of censoring, the P value when comparing PICTURE to the EDI was <.001.
bAUROC: area under the receiver operating characteristic curve.
cAUPRC: area under the precision-recall curve.
dPICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
eEDI: Epic Deterioration Index.
Comparison of PICTURE and the EDI in patients testing positive for COVID-19.
| Granularity and analytic | AUROCa (95% CI) | AUPRCb (95% CI) | Event rate (%) | ||||
|
| 3.20 | ||||||
|
| PICTUREc | 0.849 (0.820-0.878) |
vs EDId: <.001 vs NEWSe: <.001 | 0.173 (0.116-0.211) |
vs EDI: .002 vs NEWS: <.001 |
| |
|
| EDI | 0.803 (0.772-0.838) |
vs NEWS: <.001 | 0.131 (0.087-0.163) |
vs NEWS: .002 |
| |
|
| NEWS | 0.746 (0.708-0.783) |
N/Af | 0.098 (0.066-0.122) |
N/A |
| |
|
| 20.6 | ||||||
|
| PICTURE | 0.895 (0.868-0.928) |
vs EDI: <.001 vs NEWS: <.001 | 0.665 (0.590-0.743) |
vs EDI: <.001 vs NEWS: <.001 |
| |
|
| EDI | 0.802 (0.762-0.848) |
vs NEWS: .05 | 0.510 (0.438-0.588) |
vs NEWS: .02 |
| |
|
| NEWS | 0.773 (0.732-0.818) |
N/A | 0.441 (0.364-0.510) |
N/A |
| |
aAUROC: area under the receiver operating characteristic curve.
bAUPRC: area under the precision-recall curve.
cPICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
dEDI: Epic Deterioration Index.
eNEWS: National Early Warning Score.
fN/A: not applicable.
Lead time analysis in COVID-19 cohort.a
| Lead time (hours) | AUROCb (95% CI) | AUPRCc (95% CI) | Event rate (%) | Sample size, n | ||||
|
| PICTUREd | EDIe | PICTURE | EDI |
|
| ||
| 0.5 | 0.895 (0.867-0.926) | 0.802 (0.761-0.842) | 0.665 (0.586-0.739) | 0.510 (0.436-0.587) | 20.6 | 607 | ||
| 1 | 0.887 (0.860-0.918) | 0.793 (0.753-0.836) | 0.631 (0.553-0.710) | 0.491 (0.418-0.570) | 20.5 | 606 | ||
| 2 | 0.870 (0.840-0.901) | 0.794 (0.754-0.833) | 0.598 (0.518-0.675) | 0.478 (0.400-0.555) | 20.1 | 603 | ||
| 6 | 0.847 (0.813-0.885) | 0.769 (0.729-0.813) | 0.552 (0.474-0.639) | 0.435 (0.354-0.517) | 19.3 | 597 | ||
| 12 | 0.821 (0.783-0.863) | 0.752 (0.708-0.798) | 0.497 (0.411-0.577) | 0.403 (0.333-0.480) | 17.9 | 587 | ||
| 24 | 0.808 (0.767-0.856) | 0.740 (0.690-796) | 0.443f (0.344-0.529) | 0.370 (0.289-0.459) | 16.0 | 574 | ||
aThe performance of the two models (encounter level) at various lead times were again assessed by evaluating the maximum prediction score prior to x hours before the given event, with x ranging in progressively greater intervals from 0.5 to 24. On this cohort of non–COVID-19 patients, PICTURE consistently outperformed the EDI. At each level of censoring, the P value when comparing PICTURE to the EDI was <.001 unless otherwise marked.
bAUROC: area under the receiver operating characteristic curve.
cAUPRC: area under the precision-recall curve.
dPICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
eEDI: Epic Deterioration Index.
fP=.001.
Figure 4Shapley summary plots. Panel A depicts an aggregated summary plot of the Shapley values from the 2019 test set, while panel B corresponds to COVID-19 positive patients. The 20 most influential features are ranked from top to bottom, and the distribution of Shapley values across all predictions are plotted. The magnitude of the Shapley value is displayed on the horizontal axis, while the value of the feature itself is represented by color. For example, a large amount of oxygen support over 24 hours (red) in panel A was associated with a highly positive influence on the model, while low to no oxygen support (blue) pushed the model back toward 0. BUN: blood urea nitrogen; GCS: Glasgow Coma Scale; INR: international normalized ratio; SHAP: Shapley; WBC: white blood cells.
Figure 5Distribution of scores and calibration curve. Panel A presents a KDE of the distribution of PICTURE and EDI scores. In addition to raw PICTURE scores, logit-transformed scores are also included. Panel B depicts quantiles of PICTURE and EDI scores (0.1, 0.2, 0.3,...0.9) against observed risk. Neither PICTURE nor the EDI are calibrated as probabilities, and as such, the use of set alarm thresholds may be useful to help alert clinicians when their patient is at an increased risk. EDI: Epic Deterioration Index; KDE: kernel density estimate; PICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
Alert thresholds and median lead time.a
| Score | Threshold source | Threshold value | Sensitivity | Specificity | PPVb | NPVc | WDRd | F1 scoree | Lead timef (h:min), median (IQR) |
| EDIg | Singh et al [ | 64.8 | 0.448 | 0.917 | 0.583 | 0.865 | 1.71 | 0.507 | 32:26 (4:37-66:08) |
|
| |||||||||
|
| Align by sensitivity | 0.165 | N/Ai | 0.946 | 0.683 | 0.869 | 1.46 | 0.541 | 40:14 (7:51-67:50) |
|
| Align by specificity | 0.097 | 0.616 | N/A | 0.658 | 0.902 | 1.52 | 0.636 | 40:04 (7:44-91:00) |
|
| Align by PPV | 0.048 | 0.792 | 0.851 | N/A | 0.940 | N/A | 0.668 | 54:10 (29:26-115:50) |
|
| Align by NPV | 0.173 | 0.432 | 0.946 | 0.675 | N/A | 1.48 | 0.527 | 41:40 (7:31-68:30) |
aSensitivity, specificity, PPV, and NPV were calculated for the EDI at a threshold of 64.8 as suggested in Singh et al [11] and based off encounter-level performance. PICTURE thresholds were then aligned to match these statistics. The WDR is also calculated as 1 / PPV and represents the number of false alarms received for each true positive. This value is important in limiting alert fatigue for clinicians and indicates that PICTURE may yield as much as 17% fewer false alarms for each true positive.
bPPV: positive predictive value.
cNPV: negative predicative value.
dWDR: workup to detection ratio.
eF1 scores were calculated as the harmonic mean between PPV and sensitivity.
fLead times were determined using the intersection of true positives between PICTURE and the EDI, and were calculated as the time between a patient first crossing the threshold and their first deterioration event.
gEDI: Epic Deterioration Index.
hPICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events.
iN/A: not applicable.
Figure 6Sample trajectory of one patient. Panel A depicts the PICTURE predictions over 27 hours before the patient is eventually transferred to an ICU level of care (green bar). Two possible alert thresholds are noted: one (red: 0.165) based on the EDI’s sensitivity at a threshold of 64.8 (as suggested by Singh et al [11]), while the other (yellow: 0.048) is based on the EDI’s PPV at this threshold. Note that PICTURE peaks above the sensitivity-based threshold approximately 11 hours in advance of the ICU transfer and then remains elevated over the PPV threshold until the transfer occurs. * and † represent the first time points that PICTURE crossed each threshold, referenced in Table 7. Panel B demonstrates the EDI over the same time range, with the threshold of 64.8 suggested by Singh et al [11]. The EDI did not identify this patient as being at risk. EDI: Epic Deterioration Index; ICU: intensive care unit; PICTURE: Predicting Intensive Care Transfers and Other Unforeseen Events; PPV: positive predictive value.
Sample Predicting Intensive Care Transfers and Other Unforeseen Events explanations.
| Rank and feature namea | Value | Median (IQR)b | Shapley score | ||||
|
| |||||||
|
| 1. Oxygen supplementation (rolling 24 h max) | 7 L/min | 2.0 (0.0-3.0) | 1.06 | |||
|
| 2. SpO2d (rolling 24 h min) | 85% | 92.0 (90.0-94.0) | 0.93 | |||
|
| 3. Respiratory rate | 26 bpm | 20.0 (18.0-20.0) | 0.76 | |||
|
| 4. Temperature | 39.1 ˚C | 36.9 (36.8-37.2) | 0.32 | |||
|
| 5. Protein level | 5.7 | 6.0 (5.6-6.4) | 0.13 | |||
|
| |||||||
|
| 1. Oxygen supplementation (rolling 24 h max) | 35 L/min | 2.0 (0.0-3.0) | 1.93 | |||
|
| 2. SpO2 (rolling 24 h min) | 85% | 92.0 (90.0-94.0) | 1.09 | |||
|
| 3. Respiratory rate | 24 bpm | 20.0 (18.0-20.0) | 0.73 | |||
|
| 4. Heart ratee | 124 bpm | 83.0 (74.0-92.0) | 0.71 | |||
|
| 5. Temperature | 39.1˚C | 36.9 (36.8-37.2) | 0.32 | |||
aThe top 5 features corresponding to Predicting Intensive Care Transfers and Other Unforeseen Events predictions as it crosses the PPV-aligned threshold and the sensitivity-aligned threshold as noted in Figure 6. These predictions represent two possible locations where a clinician could receive an alert that their patient is deteriorating. Such information could be shared alongside the prediction score to provide better clinical utility to health care providers. Note that oxygenation (supplemental oxygen, SpO2, and respiratory rate) and temperature play a dominant role in both cases.
bThe median and IQR are included for comparison, and are calculated using the COVID-19 data set.
cPPV: positive predictive value.
dSpO2: oxygen saturation as measured by pulse oximetry.
eHeart rate represented the primary difference between these two time points. When the Predicting Intensive Care Transfers and Other Unforeseen Events score first exceeded the PPV threshold 12.5 hours before the intensive care unit transfer, the heart rate remained at 65 bpm and was not among the top features as measured by Shapley. At 11 hours before the event, when the Predicting Intensive Care Transfers and Other Unforeseen Events score was at its highest, the heart rate had jumped to 124 bpm and was the fourth-most influential feature as measured by Shapley values.