| Literature DB >> 33706377 |
Victor Alfonso Rodriguez1, Shreyas Bhave1, Ruijun Chen, Chao Pang1,2, George Hripcsak1, Soumitra Sengupta1, Noemie Elhadad1, Robert Green3, Jason Adelman4, Katherine Schlosser Metitiri5, Pierre Elias1, Holden Groves6, Sumit Mohan7, Karthik Natarajan1, Adler Perotte1.
Abstract
OBJECTIVE: Coronavirus disease 2019 (COVID-19) patients are at risk for resource-intensive outcomes including mechanical ventilation (MV), renal replacement therapy (RRT), and readmission. Accurate outcome prognostication could facilitate hospital resource allocation. We develop and validate predictive models for each outcome using retrospective electronic health record data for COVID-19 patients treated between March 2 and May 6, 2020.Entities:
Keywords: COVID-19; artificial; patient readmission; renal replacement therapy; respiration; supervised machine learning
Year: 2021 PMID: 33706377 PMCID: PMC7989331 DOI: 10.1093/jamia/ocab029
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Characteristics and target outcomes for patients with SARS-CoV-2–positive tests
| Development (CUIMC) (n = 2256) | Validation (Allen Hospital) (n = 855) |
| |
|---|---|---|---|
| Outcome | |||
| Mechanical ventilation | 352 (15.60) | 60 (7.02) | <.0001 |
| Renal replacement therapy | 142 (6.29) | 20 (2.34) | <.0001 |
| Readmission | 193 (8.55) | 77 (9.01) | .7216 |
| Age | <.0001 | ||
| <18 y | 50 (2.22) | 0 (0) | |
| 18-30 y | 113 (5.01) | 25 (2.92) | |
| 30-60 y | 761 (33.73) | 242 (28.30) | |
| 60-80 y | 916 (40.60) | 378 (44.21) | |
| >80 y | 416 (18.44) | 210 (24.56) | |
| Sex | .8987 | ||
| Female | 1005 (44.55) | 375 (43.86) | |
| Male | 1250 (55.41) | 479 (56.02) | |
| Missing | 1 (0.04) | 1 (0.12) | |
| Race | .1275 | ||
| American Indian or Alaska Native | 3 (0.13) | 1 (0.12) | |
| Asian | 29 (1.29) | 5 (0.58) | |
| Black or African American | 455 (20.17) | 192 (22.46) | |
| Native Hawaiian or Other Pacific Islander | 10 (0.44) | 0 (0) | |
| White | 542 (24.02) | 196 (22.92) | |
| Missing | 1217 (53.95) | 461 (53.92) | |
| Ethnicity | .0003 | ||
| Hispanic or Latino | 1068 (47.34) | 439 (51.35) | |
| Not Hispanic or Latino | 605 (26.82) | 254 (29.71) | |
| Missing | 583 (25.84) | 162 (18.95) | |
| DNR/DNI | 331 (14.67) | 169 (19.76) | <.0001 |
| Died in hospital | 228 (10.11) | 150 (17.54) | <.0001 |
Values are n (%).
CUIMC: Columbia University Irving Medical Center; DNI: do not intubate; DNR: do not resuscitate; SARS-CoV-2: severe acute respiratory syndrome coronavirus 2.
Performance metrics for all models and outcomes
| Outcome | Model | AUROC (Development) | AUPRC (Development) | AUROC (Validation) | AUPRC (Validation) |
|---|---|---|---|---|---|
| Mechanical ventilation | Logistic L1 | 0.869 (0.847-0.893) | 0.569 (0.510-0.624) | 0.741 (0.682-0.806) | 0.127 (0.052-0.157) |
| Logistic EN | 0.878 (0.858-0.902) | 0.562 (0.501-0.616) | 0.738 (0.675-0.805) | 0.141 (0.046-0.183) | |
| GBT | 0.869 (0.848-0.891) | 0.613 (0.555-0.668) | 0.743 (0.682-0.812) | 0.137 (0.047-0.175) | |
| Renal replacement therapy | Logistic L1 | 0.847 (0.815-0.882) | 0.381 (0.293-0.453) | 0.847 (0.772-0.936) | 0.325 (0.117-0.497) |
| Logistic EN | 0.844 (0.812-0.881) | 0.378 (0.295-0.451) | 0.841 (0.759-0.931) | 0.314 (0.113-0.476) | |
| GBT | 0.837 (0.805-0.871) | 0.325 (0.242-0.385) | 0.829 (0.761-0.912) | 0.196 (0.009-0.312) | |
| Readmission | Logistic L1 | 0.818 (0.789-0.847) | 0.293 (0.233-0.344) | 0.868 (0.823-0.917) | 0.505 (0.395-0.602) |
| Logistic EN | 0.830 (0.803-0.858) | 0.307 (0.249-0.353) | 0.871 (0.830-0.917) | 0.504 (0.388-0.604) | |
| GBT | 0.838 (0.814-0.864) | 0.287 (0.233-0.323) | 0.869 (0.830-0.910) | 0.427 (0.321-0.509) |
AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve; GBT: gradient boosted trees; Logistic EN: elastic-net logistic regression; Logistic L1: L1-penalized logistic regression.
The best performance for the given outcome according to the metric specified by the column heading.
Selected models are in bold for each outcome.
Figure 1.Receiver-operating characteristic (ROC) curves for ventilation, renal replacement therapy (RRT), and readmission. Curves are for each outcome’s selected model. Dark lines correspond to averages over all folds. Shaded areas correspond to 95% confidence intervals. AUC: area under the curve; AUROC: area under the receiver-operating characteristic curve.
Figure 2.Precision-recall curves for ventilation, renal replacement therapy (RRT), and readmission. Curves are for each outcome’s selected model. Dark lines correspond to averages over all folds. Shaded areas correspond to 95% confidence intervals. AUC: area under the curve; AUPRC: area under the precision-recall curve.
Figure 3.Calibration reliability curve for development and validation cohorts. The reliability curve shows how close each model is to a perfectly calibrated model. This plot is created by binning predicted probabilities and examining the true fraction of cases in each bin. The plot under each reliability curve shows the support (number of positives) in each bin. RRT: renal replacement therapy.
Figure 4.SHAP feature importances for ventilation, renal replacement therapy (RRT), and readmission. Each SHAP value plot displays a patient-level SHAP value as a point which lies on the horizontal axis and uses color to indicate whether the feature value for a patient was higher (red) or lower (blue) than average. SHAP values >1 indicate increased risk for a patient. SHAP values <1 indicate decreased risk. This SHAP plot allows for visualization of the distribution of effect sizes indicated by the spread of the points around 0 and shows the direction of the effect. As an example, a higher respiratory rate (red points are all >0) indicates higher risk for ventilation. The average of the absolute SHAP values (shown in parenthesis for each feature) across all points shows the overall importance of the feature. aPTT: activated partial thromboplastin time; MCHC: mean corpuscular hemoglobin concentration; SpO2: oxygen saturation.