| Literature DB >> 33933654 |
Elizabeth Mauer1, Jihui Lee1, Justin Choi2, Hongzhe Zhang1, Katherine L Hoffman1, Imaani J Easthausen1, Mangala Rajan2, Mark G Weiner1, Rainu Kaushal1, Monika M Safford2, Peter A D Steel3, Samprit Banerjee4.
Abstract
From early March through mid-May 2020, the COVID-19 pandemic overwhelmed hospitals in New York City. In anticipation of ventilator shortages and limited ICU bed capacity, hospital operations prioritized the development of prognostic tools to predict clinical deterioration. However, early experience from frontline physicians observed that some patients developed unanticipated deterioration after having relatively stable periods, attesting to the uncertainty of clinical trajectories among hospitalized patients with COVID-19. Prediction tools that incorporate clinical variables at one time-point, usually on hospital presentation, are suboptimal for patients with dynamic changes and evolving clinical trajectories. Therefore, our study team developed a machine-learning algorithm to predict clinical deterioration among hospitalized COVID-19 patients by extracting clinically meaningful features from complex longitudinal laboratory and vital sign values during the early period of hospitalization with an emphasis on informative missing-ness. To incorporate the evolution of the disease and clinical practice over the course of the pandemic, we utilized a time-dependent cross-validation strategy for model development. Finally, we validated our prediction model on an external validation cohort of COVID-19 patients served in a demographically distinct population from the training cohort. The main finding of our study is the identification of risk profiles of early, late and no clinical deterioration during the course of hospitalization. While risk prediction models that include simple predictors at ED presentation and clinical judgement are able to identify any deterioration vs. no deterioration, our methodology is able to isolate a particular risk group that remain stable initially but deteriorate at a later stage of the course of hospitalization. We demonstrate the superior predictive performance with the utilization of laboratory and vital sign data during the early period of hospitalization compared to the utilization of data at presentation alone. Our results will allow efficient hospital resource allocation and will motivate research in understanding the late deterioration risk group.Entities:
Keywords: COVID-19; Deterioration; EMR; Intubation; Machine learning; Prediction
Year: 2021 PMID: 33933654 PMCID: PMC8084618 DOI: 10.1016/j.jbi.2021.103794
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Features of labs and vitals for 24-hour and random index time.
| Feature | Labs | Vitals | |||
|---|---|---|---|---|---|
| 24-hr | random | 24-hr | random | ||
| Missing indicator | Y | Y | N | N | |
| Trend | 1) Originally recorded values | Y | Y | Y | Y |
| 2) Number of values recorded per calendar day | N | Y | N | N | |
| 3) Variance of values recorded per calendar day | N | Y | N | N | |
| 4) (Lower/Upper) abnormal values | Y | Y | Y | Y | |
| Clustering | LGMM | Y | Y | Y | N |
| DTW + Hierarchical clustering | N | N | N | Y | |
‘Y’ if the feature was calculated, ‘N’ if not.
Except SOFA score and variables used for defining SOFA score.
Fig. 1Time-dependent cross-validation scheme.
Patient characteristics at ED presentation: training (NYP-WCM) and validation (NYP-LMH) Cohorts.
| NYP-WCM | NYP-LMH | Overall | |
|---|---|---|---|
| (N = 1045) | (N = 292) | (N = 1337) | |
| Age | |||
| >=65 | 551 (52.7%) | 186 (63.7%) | 737 (55.1%) |
| Race | |||
| White | 400 (38.3%) | 61 (20.9%) | 461 (34.5%) |
| Black | 144 (13.8%) | 39 (13.4%) | 183 (13.7%) |
| Asian | 109 (10.4%) | 115 (39.4%) | 224 (16.8%) |
| Other | 220 (21.1%) | 47 (16.1%) | 267 (20.0%) |
| Not Specified | 172 (16.5%) | 30 (10.3%) | 202 (15.1%) |
| Sex | |||
| Male | 622 (59.5%) | 157 (53.8%) | 779 (58.3%) |
| BMI (kg/m^2) | |||
| <25 | 351 (33.6%) | 137 (46.9%) | 488 (36.5%) |
| 25 to <30 | 347 (33.2%) | 78 (26.7%) | 425 (31.8%) |
| >=30 | 332 (31.8%) | 69 (23.6%) | 401 (30.0%) |
| Missing | 15 (1.4%) | 8 (2.7%) | 23 (1.7%) |
| Active and/or former smoker/vaper | 295 (28.2%) | 86 (29.5%) | 381 (28.5%) |
| Required supplemental oxygen within the first 3 h of arrival | 576 (55.1%) | 142 (48.6%) | 718 (53.7%) |
| Diabetes Mellitus (DMI, DMII) | 321 (30.7%) | 96 (32.9%) | 417 (31.2%) |
| Hypertension (HTN) | 594 (56.8%) | 170 (58.2%) | 764 (57.1%) |
| Chronic Obstructive Pulmonary Disease (COPD) | 51 (4.9%) | 26 (8.9%) | 77 (5.8%) |
| Chronic Kidney Disease (CKD) | 51 (4.9%) | 16 (5.5%) | 67 (5.0%) |
| End Stage Renal Disease (ESRD) | 73 (7.0%) | 16 (5.5%) | 89 (6.7%) |
| Coronary Artery Disease (CAD) | 157 (15.0%) | 47 (16.1%) | 204 (15.3%) |
| Any Cancer | 85 (8.1%) | 13 (4.5%) | 98 (7.3%) |
| Any Immunosuppression | 35 (3.3%) | 0 (0%) | 35 (2.6%) |
| Fever | 734 (70.2%) | 168 (57.5%) | 902 (67.5%) |
| Cough | 724 (69.3%) | 191 (65.4%) | 915 (68.4%) |
| Diarrhea | 279 (26.7%) | 81 (27.7%) | 360 (26.9%) |
Fig. 2Examples of LGMM clusters of trajectories. FiO2 (left), level of supplemental oxygen (center), respiratory rate (right) Red is for Cluster 1 and blue is for Cluster 2. Solid line: locally weighted scatterplot smoothing (LOESS) curve Shades: 95% confidence interval.
Cross-validated C-indices: Training Cohort (NYP-WCM).
| Model 1 | Model 2 | Model 3 | |
|---|---|---|---|
| RSF | 0.698 | 0.933 | 0.943 |
| CART | 0.725 | 0.813 | 0.850 |
| Cox-Elastic Net | 0.724 | 0.839 | 0.894 |
| RSF | 0.690 | 0.920 | 0.927 |
| CART | 0.688 | 0.689 | 0.831 |
| Cox-Elastic Net | 0.687 | 0.792 | 0.868 |
| RSF | 0.648 | 0.852 | 0.891 |
| CART | 0.661 | 0.565 | 0.821 |
| Cox-Elastic Net | 0.660 | 0.708 | 0.845 |
‘hrs’=hours since index time where index time contingent on model scenario.
‘CART’=Classification and Regression Tree for survival.
‘Cox-Elastic Net’=Cox proportional hazards regression with predictors chosen through elastic net regression.
‘Model 1’=predictors at ED presentation alone; index time defined at 24 h of hospitalization.
‘Model 2’=predictors at ED presentation including longitudinal laboratory and vital sign features extracted up to 24 h; index time defined at 24 h of hospitalization.
‘Model 3’=predictors at ED presentation including longitudinal laboratory and vital sign features extracted up to random index time; index time defined at random time as discussed in text.
Comparative predictive performance of features (c-index).
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 | |
|---|---|---|---|---|---|---|---|---|
| RSF | 0.698 | 0.892 | 0.807 | 0.924 | 0.892 | 0.941 | 0.911 | 0.933 |
| RSF | 0.690 | 0.850 | 0.793 | 0.894 | 0.869 | 0.924 | 0.895 | 0.920 |
| RSF | 0.648 | 0.776 | 0.745 | 0.827 | 0.803 | 0.839 | 0.835 | 0.852 |
All models employed 24-hour index time.
‘RSF’=Random Survival Forest.
‘Model 1’=baseline features.
‘Model 2’=baseline features, missing lab indicators.
‘Model 3’=baseline features, trend features.
‘Model 4’=baseline features, cluster features.
‘Model 5’=baseline features, missing lab indicators, trend features.
‘Model 6’=baseline features, missing lab indicators, cluster features.
‘Model 7’=baseline features, trend features, cluster features.
‘Model 8’=baseline features, missing lab indicators, trend features, cluster features.
Fig. 3Predictor importance from RSF on training cohort (NYP-WCM). The top most important predictors determined as those that explain 70% of the total cumulative importance are shown. ‘age_gt65’=≥65 years of age. For labs and vitals, predictors are labeled as ‘
Fig. 4Kaplan-Meier estimates by risk profile: training cohort (NYP-WCM).
Patient Characteristics at ED Presentation by Risk Profile: Training Cohort (NYP-WCM).
| Early deterioration | Late deterioration | No deterioration | |
|---|---|---|---|
| (N = 105) | (N = 626) | (N = 314) | |
| Age | |||
| >=65 | 68 (64.8%) | 369 (58.9%) | 114 (36.3%) |
| Race | |||
| White | 39 (37.1%) | 241 (38.5%) | 120 (38.2%) |
| Black | 12 (11.4%) | 89 (14.2%) | 43 (13.7%) |
| Asian | 7 (6.7%) | 72 (11.5%) | 30 (9.6%) |
| Other | 26 (24.8%) | 121 (19.3%) | 73 (23.2%) |
| Not Specified | 21 (20.0%) | 103 (16.5%) | 48 (15.3%) |
| Sex | |||
| Male | 63 (60.0%) | 395 (63.1%) | 164 (52.2%) |
| BMI (kg/m^2) | |||
| <25 | 34 (32.4%) | 228 (36.4%) | 89 (28.3%) |
| 25 to <30 | 29 (27.6%) | 202 (32.3%) | 116 (36.9%) |
| >=30 | 40 (38.1%) | 187 (29.9%) | 105 (33.4%) |
| Missing | 2 (1.9%) | 9 (1.4%) | 4 (1.3%) |
| Active and/or former smoker/vaper | 37 (35.2%) | 184 (29.4%) | 74 (23.6%) |
| Required supplemental oxygen within the first 3 h of arrival | 88 (83.8%) | 365 (58.3%) | 123 (39.2%) |
| Diabetes Mellitus (DMI, DMII) | 38 (36.2%) | 206 (32.9%) | 77 (24.5%) |
| Hypertension (HTN) | 67 (63.8%) | 390 (62.3%) | 137 (43.6%) |
| Chronic Obstructive Pulmonary Disease (COPD) | 8 (7.6%) | 36 (5.8%) | 7 (2.2%) |
| Chronic Kidney Disease (CKD) | 6 (5.7%) | 38 (6.1%) | 7 (2.2%) |
| End Stage Renal Disease (ESRD) | 4 (3.8%) | 55 (8.8%) | 14 (4.5%) |
| Coronary Artery Disease (CAD) | 24 (22.9%) | 108 (17.3%) | 25 (8.0%) |
| Any Cancer | 8 (7.6%) | 59 (9.4%) | 18 (5.7%) |
| Any Immunosuppression | 3 (2.9%) | 24 (3.8%) | 8 (2.5%) |
| Fever | 77 (73.3%) | 433 (69.2%) | 224 (71.3%) |
| Cough | 72 (68.6%) | 432 (69.0%) | 220 (70.1%) |
| Diarrhea | 29 (27.6%) | 161 (25.7%) | 89 (28.3%) |
| Nausea or vomiting | 16 (15.2%) | 103 (16.5%) | 83 (26.4%) |
Fig. 5Kaplan-Meier estimates by risk profile: validation Cohort (NYP-LMH).