| Literature DB >> 33027032 |
Sulaiman Somani1, Adam J Russak1,2, Akhil Vaid1, Jessica K De Freitas1,3, Fayzan F Chaudhry1,3, Ishan Paranjpe1, Kipp W Johnson3, Samuel J Lee1, Riccardo Miotto1,3, Felix Richter1,3, Shan Zhao1,4, Noam D Beckmann3, Nidhi Naik1, Arash Kia5,6, Prem Timsina5,6, Anuradha Lala5,7, Manish Paranjpe8, Eddye Golden1, Matteo Danieletto1, Manbir Singh1, Dara Meyer3, Paul F O'Reilly3,9,10, Laura Huckins3,9,10, Patricia Kovatch11, Joseph Finkelstein5, Robert M Freeman5,6, Edgar Argulian12,13, Andrew Kasarskis3,5,14,15, Bethany Percha2, Judith A Aberg2,16, Emilia Bagiella6,7, Carol R Horowitz2,5, Barbara Murphy2, Eric J Nestler17,18, Eric E Schadt3,14, Judy H Cho19, Carlos Cordon-Cardo20, Valentin Fuster7,12,13, Dennis S Charney21, David L Reich4, Erwin P Bottinger1,22, Matthew A Levin3,4, Jagat Narula12,13, Zahi A Fayad23,24, Allan C Just25, Alexander W Charney3,9,10, Girish N Nadkarni1,2,19, Benjamin S Glicksberg1,3.
Abstract
BACKGROUND: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking.Entities:
Keywords: COVID-19; EHR; TRIPOD; clinical informatics; cohort; electronic health record; hospital; machine learning; mortality; performance; prediction
Mesh:
Year: 2020 PMID: 33027032 PMCID: PMC7652593 DOI: 10.2196/24018
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Study design and workflow. (A) Procedure for patient inclusion in our study. (B) Outcomes of interest. We trained the model on data taken at time of admission to predict the likelihood of either mortality or critical event occurrence at 3, 5, 7, and 10 days. (C) Strategy and design of the experiments. Patient clinical data from Mount Sinai Hospital (MSH) before the temporal split (May 1) were used to train and internally validate our XGBoost model in comparison with other baseline models. We then tested the series of XGBoost models on unimputed patient data on patients from four other external hospitals within the MSHS for external validation. h: hours; ICU: intensive care unit; lab: laboratory; MSB: Mount Sinai Brooklyn; MSHS: Mount Sinai Health System; MSM: Mount Sinai Morningside; MSQ: Mount Sinai Queens; MSW: Mount Sinai West (MSW); RT-PCR: reverse transcriptase–polymerase chain reaction; vitals: vital signs.
Demographic characteristics, clinical history, and vital signs of hospitalized patients with COVID-19 at baseline (N=4098).
| Characteristic on admission | Retrospective | Prospective | |||||||
|
|
|
| MSHa (n=1514) | OHb (n=2201) | MSH | OH | |||
|
| |||||||||
|
|
| ||||||||
|
|
| Male | 869 (57.4) | 1257 (57.1) | 104 (59.4) | 104 (50) | |||
|
|
| Female | 645 (42.6) | 944 (42.9) | 71 (40.6) | 104 (50) | |||
|
|
| ||||||||
|
|
| Other | 639 (42.2) | 804 (36.5) | 80 (45.7) | 53 (25.5) | |||
|
|
| Caucasian | 354 (23.4) | 533 (24.2) | 43 (24.6) | 56 (26.9) | |||
|
|
| African American | 357 (23.6) | 688 (31.3) | 37 (21.1) | 79 (38) | |||
|
|
| Unknown | 80 (5.3) | 45 (2) | —c | — | |||
|
|
| Asian | 77 (5.1) | 102 (4.6) | 10 (5.7) | 11 (5.3) | |||
|
|
| Pacific Islander | — | — | — | — | |||
|
|
|
|
| ||||||
|
|
| Non-Hispanic/Latino | 820 (54.2) | 1377 (62.6) | 98 (56) | 139 (66.8) | |||
|
|
| Hispanic/Latino | 421 (27.8) | 556 (25.3) | 50 (28.6) | 43 (20.7) | |||
|
|
| Unknown | 271 (17.9) | 236 (10.7) | 24 (13.7) | 26 (12.5) | |||
|
| Age, median (IQR) | 62.9 (50.7-73) | 69.6 (53.3-80) | 63.7 (51.2-73.8) | 69.8 (55.5-79.9) | ||||
|
|
| ||||||||
|
|
| 18-30 | 64 (4.2) | 46 (2.1) | 16 (9.1) | — | |||
|
|
| 31-40 | 155 (10.2) | 113 (5.1) | 13 (7.4) | 12 (5.8) | |||
|
|
| 41-50 | 165 (10.9) | 160 (7.3) | 14 (8) | 17 (8.2) | |||
|
|
| 51-60 | 291 (19.2) | 341 (15.5) | 33 (18.9) | 35 (16.8) | |||
|
|
| 61-70 | 394 (30) | 517 (20) | 40 (20) | 39 (20) | |||
|
|
| 71-80 | 258 (17) | 522 (23.7) | 41 (23.4) | 52 (25) | |||
|
|
| 81-90 | 142 (9.4) | 396 (18) | 13 (7.4) | 38 (18.3) | |||
|
|
| ≥90 | 45 (3) | 106 (5) | — | — | |||
|
| |||||||||
|
| Hypertension | 64 (4.2) | 46 (2.1) | 63 (40) | 83 (40) | ||||
|
| Atrial fibrillation | 155 (10.2) | 113 (5.1) | 13 (7) | 21 (10) | ||||
|
| Coronary artery disease | 165 (10.9) | 160 (7.3) | 32 (20) | 41 (20) | ||||
|
| Heart failure | 291 (19.2) | 341 (15.5) | 26 (10) | 30 (10) | ||||
|
| Stroke | 394 (30) | 517 (20) | 16 (9) | 10 (5) | ||||
|
| Chronic kidney disease | 258 (17) | 522 (23.7) | 32 (20) | 43 (20) | ||||
|
| Diabetes | 142 (9.4) | 396 (18) | 40 (20) | 54 (30) | ||||
|
| Asthma | 45 (3) | 106 (5) | 11 (6) | — | ||||
|
| Chronic obstructive pulmonary disease | 64 (4.2) | 46 (2.1) | 13 (7) | 11 (5) | ||||
|
| Cancer | 158 (10) | 124 (6) | 43 (20) | 14 (7) | ||||
|
| |||||||||
|
| Heart rate (beats per minute) | 87 (77-97) | 86 (76-98) | 85 (74-97.5) | 82 (72.8-96) | ||||
|
| Pulse oximetry (%) | 96 (94-97) | 96 (94-98) | 97 (95-98) | 97 (96-98) | ||||
|
| Respiration Rate (breaths per minute) | 20 (18-21) | 18 (18-20) | 18 (18-20) | 18 (18-20) | ||||
|
| Temperature (ºF) | 98.7 (98-99.9) | 98.5 (97.7- 99.3) | 98.1 (97.5-98.6) | 97.9 (97.3-98.6) | ||||
|
| Systolic blood pressure (mm Hg) | 125 (112-140) | 125 (111-140) | 122 (111.5-138) | 127 (112.8-141.2) | ||||
|
| Diastolic blood pressure (mm Hg) | 69 (61-78) | 72 (64-80) | 70 (60.5-78.5) | 72 (64-82) | ||||
|
| BMI (kg/m2) | 28.1 (24.4-32.8) | 27.5 (24.2-32.5) | 25.92 (21.9-30.4) | 27.7 (23.4-32.1) | ||||
aMSH: Mount Sinai Hospital.
bOH: other hospitals.
c—: Values with fewer than 10 patients per field are censored to protect patient privacy.
Admission laboratory parameters of hospitalized patients with COVID-19 at baseline (N=4098), median (IQR).
| Laboratory parameters | Retrospective | Prospective | ||||
|
|
| MSHa (n=1514) | OHb (n=2201) | MSH | OH | |
|
| ||||||
|
| Sodium (mEq/L) | 138 (135-140) | 139 (136-142) | 139 (136-141) | 139 (136-141) | |
|
| Potassium (mEq/L) | 4 (3.6-4.5) | 4.3 (3.9-4.7) | 4 (3.7-4.4) | 4.3 (3.8-4.6) | |
|
| Creatinine (mg/dL) | 0.91 (0.7-1.5) | 1.01 (0.8-1.7) | 0.89 (0.7-1.6) | 1.12 (0.7-2.1) | |
|
| Lactate (mg/dL) | 1.8 (1.4-2.3) | 1.4 (1.1-2) | 1.8 (1.4-2.3) | 1.49 (1-1.9) | |
|
| ||||||
|
| White blood cells (103/µL) | 7 (5-10.2) | 7.6 (5.5-10.9) | 7.3 (5.1-10.7) | 8.3 (6.3-11.9) | |
|
| Lymphocyte percentage | NA (NA-NA) | 14.2 (8.6-21.3) | NA (NA-NA) | 14.7 (9.9-21.6) | |
|
| Hemoglobin (mEq/L) | 12.2 (10.7-13.5) | 12.7 (11.1-13.9) | 10.5 (9.1-12.8) | 11.1 (9.2-12.8) | |
|
| Red blood cell distribution width (%) | 4.2 (3.7-4.6) | 4.28 (3.8-4.7) | 3.69 (3.1-4.3) | 3.79 (3.2-4.5) | |
|
| Platelets (n) | 220 (165-291) | 208 (158-281) | 224 (166.2-304) | 211 (149.2-285.2) | |
|
| ||||||
|
| Alanine aminotransferase (units/L) | 30 (18-53) | 31 (19-54) | 26 (13.8-51) | 23 (14-36) | |
|
| Aspartate aminotransferase (units/L) | 42 (28-66) | 45 (30-74) | 30 (20-50.5) | 30 (19-49) | |
|
| Albumin (g/dL) | 2.9 (2.5-3.2) | 2.9 (2.5-3.2) | 2.9 (2.5-3.4) | 2.9 (2.3-3.3) | |
|
| Total bilirubin (mg/dL) | 0.6 (0.4-0.8) | 0.6 (0.4-0.8) | 0.7 (0.4-1) | 0.5 (0.4-0.7) | |
|
| ||||||
|
| Prothrombin time (s) | 14.5 (13.6-16) | 14.9 (13.9-16.5) | 14.8 (13.6-16.2) | 15.05 (13.7-17.6) | |
|
| Partial Thromboplastin time (s) | 32.9 (29.2-38.5) | 34.8 (30.3-41.5) | 32.6 (28.8-37.8) | 36.1 (31-45.9) | |
|
| ||||||
|
| PCO2c (mmHg) | 42 (37-47) | 42 (37-53) | 44 (39-49) | 42 (37-48.5) | |
|
| pH | 7.4 (7.3-7.4) | 7.36 (7.3-7.4) | 7.39 (7.4-7.4) | 7.36 (7.3-7.4) | |
|
| ||||||
|
| C-reactive protein (mg/L) | 116.4 (57.1-199.5) | 132.4 (65.8-218.9) | 62.2 (17-148.9) | 73.7 (33.5-181.8) | |
|
| Ferritin (ng/mL) | 800 (365-1916) | 906 (438-2056) | 485 (200.2-1031.5) | 690 (303.5-1470.2) | |
|
| D-dimer (ng/mL) | 1.44 (0.8-3) | 2.42 (1.2-4.4) | 1.66 (0.9-3.1) | 1.97 (1.1-3.8) | |
|
| Creatinine phosphokinase (units/L) | 146 (70-488) | 220 (76.8-501.8) | 194.5 (93.2-290.8) | 271.5 (48.8-611.5) | |
|
| Lactate dehydrogenase (units/L) | 423 (315-571) | 466.5 (356.2-652.2) | 334 (251.5-472) | 364 (266.8-487) | |
|
| ||||||
|
| Troponin I (ng/mL) | 0.05 (0-0.2) | 0.064 (0-0.2) | 0.05 (0-0.1) | 0.0525 (0-0.1) | |
aMSH: Mount Sinai Hospital.
bOH: other hospitals.
cPCO2: partial pressure of carbon dioxide.
Figure 2Comparison of the performance of the XGBoost and baseline models. Performance of the XGBoost classifier by ROC curves (left) and PR curves (right) on the unimputed data set (red) for mortality (top) and critical event (bottom) prediction versus the three baseline models: XGBoost classifier on the imputed data set (purple), LASSO (green), and LR (orange). LASSO: least absolute shrinkage and selection operator; PRC: precision-recall curve; ROC: receiver operating characteristic; XGB: Extreme Gradient Boosting.
Figure 3Performance of the XGBoost classifier by ROC curves (left) and precision-recall curves (right) for mortality (top) and critical events (bottom) in validation experiments of generalizability and time. For generalizability, we show our XGBoost model from cross-validation on MSH and applied to all other hospitals. We also show the performance of the model on prospective patients who were unseen at the time of the original experiment at MSH and all other hospitals in the same time frame. Ext. Val.: external validation; Int. Val.: internal validation; MSH: Mount Sinai Hospital; OH: other hospitals; PRC: precision-recall curve; Prosp. Val: prospective validation; ROC: receiver operating characteristic.
Figure 4SHAP summary plots for critical event (A) and mortality (D) at 7 days showing the SHAP values for the 10 most important features for the respective XGBoost models. Features in the summary plots (y-axis) are organized by their mean absolute SHAP values (x-axis), which represent the importance of the features in driving the prediction of the classifiers for patients. (B) and (C) Dependency plots demonstrating how different values can affect the SHAP score and ultimately impact classifier decisions for LDH and glucose, respectively, for critical event prediction. (E) and (F) Dependency plots for age and C-reactive protein levels. Patients with missing values for a feature in the dependency plot are clustered in the shaded area to the left. LDH: lactate dehydrogenase; RDW: red cell distribution width; SHAP: SHapley Additive exPlanation.