Elliot H Akama-Garren1, Jonathan X Li2,3. 1. Harvard Medical School, Boston, MA, 02115, USA. elliot_akama-garren@hms.harvard.edu. 2. Harvard Medical School, Boston, MA, 02115, USA. 3. Division of General Medicine, Beth Israel Deaconess Medical Center, Boston, MA, 02115, USA.
Abstract
There is currently limited clinical ability to identify COVID-19 patients at risk for severe outcomes. To unbiasedly identify metrics associated with severe outcomes in COVID-19 patients, we conducted a retrospective study of 835 COVID-19 positive patients at a single academic medical center between March 10, 2020 and October 13, 2020. As of December 1, 2020, 656 (79%) patients required hospitalization and 149 (18%) died. Unbiased comparisons of all clinical characteristics and mortality revealed that abnormal pH (OR 8.54, 95% CI 5.34-13.6), abnormal creatinine (OR 6.94, 95% CI 4.22-11.4), and abnormal PTT (OR 4.78, 95% CI 3.11-7.33) were most significantly associated with mortality. Correlation with ordinal severity scores confirmed these associations, in addition to associations between respiratory rate (Spearman's rho = -0.56), absolute neutrophil count (Spearman's rho = -0.5), and C-reactive protein (Spearman's rho = 0.59) with disease severity. Unsupervised principal component analysis and machine learning model classification of patient demographics, laboratory results, medications, comorbidities, signs and symptoms, and vitals are capable of separating patients on the basis of COVID-19 mortality (AUC 0.82). This retrospective analysis identifies laboratory and clinical metrics most relevant to predict COVID-19 severity.
There is currently limited clinical ability to identify COVID-19 patients at risk for severe outcomes. To unbiasedly identify metrics associated with severe outcomes in COVID-19 patients, we conducted a retrospective study of 835 COVID-19 positive patients at a single academic medical center between March 10, 2020 and October 13, 2020. As of December 1, 2020, 656 (79%) patients required hospitalization and 149 (18%) died. Unbiased comparisons of all clinical characteristics and mortality revealed that abnormal pH (OR 8.54, 95% CI 5.34-13.6), abnormal creatinine (OR 6.94, 95% CI 4.22-11.4), and abnormal PTT (OR 4.78, 95% CI 3.11-7.33) were most significantly associated with mortality. Correlation with ordinal severity scores confirmed these associations, in addition to associations between respiratory rate (Spearman's rho = -0.56), absolute neutrophil count (Spearman's rho = -0.5), and C-reactive protein (Spearman's rho = 0.59) with disease severity. Unsupervised principal component analysis and machine learning model classification of patient demographics, laboratory results, medications, comorbidities, signs and symptoms, and vitals are capable of separating patients on the basis of COVID-19 mortality (AUC 0.82). This retrospective analysis identifies laboratory and clinical metrics most relevant to predict COVID-19 severity.
As the number of COVID-19 deaths approaches 3.5 million worldwide as of May 11, 2021, there is increasing need to better understand what disease mechanisms and clinical correlates lead to poor outcomes. SARS-CoV-2 infection may result in a spectrum of severity ranging from asymptomatic disease to hospitalization requiring mechanical ventilation [1-7], making identification of patients at risk for severe COVID-19 at initial presentation imperative yet complex. Case series of hospitalized COVID-19 patients during the early pandemic identified key risk groups of severe COVID-19 [8-17], including patients with diabetes, obesity, chronic kidney disease, liver disease, and patients above 65 years old. Cytokine profiling [18] and multi-dimensional flow cytometry [19-22] have identified hematologic profiles associated with severe COVID-19. Over the course of the pandemic, these advances along with improvements in supportive care such as prone positioning [23-25] have led to reductions in disease mortality [26, 27].Despite these advances, clinical prediction of COVID-19 prognosis at the time of initial presentation remains imperfect [28]. A better understanding of the clinical correlates of COVID-19 severity would improve prognostic and therapeutic approaches to disease assessment. With an accumulating number of SARS-CoV-2 positive patients with a range of clinical outcomes, we are increasingly able to perform unbiased analyses across more diverse multi-dimensional clinical metrics, in order to identify novel associations with COVID-19 severity. We sought to leverage these data to determine which clinical characteristics are most useful to predict COVID-19 severity. Here, we perform analyses of over 1,700 clinical metrics including laboratory results, vitals, demographics, medications, and disease outcomes in 835 COVID-19 positive patients to identify correlates of disease severity.
Methods
Study design
This study was conducted at the Beth Israel Deaconess Medical Center (BIDMC) in Boston. The BIDMC Institutional Review Board approved this retrospective cohort study (2020P000699) as minimal risk using data collected during routine clinical care and waived the requirement for informed consent. BIDMC patients who presented for care and with confirmed SARS-CoV-2 infection by positive result of nasopharyngeal sample polymerase chain reaction between March 10, 2020 and October 13, 2020, and who had available past medical history, were included.Data were obtained from the BIDMC COVID-19 Observational Research Effort (CORE) Data Registry REDCap database and BIDMC InSIGHT CORE service. Laboratory values were obtained from inpatient data acquired over the course of an individual patient’s admission. When multiple laboratory draws were present over the course of a patient’s admission, mean, maximum, and minimum laboratory values for each test collected were calculated for each patient. Time to follow-up was determined by the number of days between the earliest COVID-19 test date and date of death or December 1, 2020, the final date of follow-up, if still alive. COVID-19 severity was graded by the NIH Ordinal Severity Scale. Patients were stratified into eight groups with lower scores corresponding to greater severity: (1) death, (2) invasive mechanical ventilation, (3) noninvasive ventilation, (4) supplemental oxygen, (5) no supplemental oxygen but requiring medical care, (6) no supplemental oxygen and not requiring medical care, (7) limitation in activities, or (8) no limitation in activities.
Principal component analysis (PCA)
Outcome metrics including mortality, hospitalization length and status, ICU length and status, ventilation and renal replacement therapy requirement, NIH Ordinal Severity Score, pathology results, and medications prescribed after COVID-19 diagnosis were excluded to allow for unsupervised PCA. Patients and metrics with missing data were excluded from analysis, and categorical factor variables were converted to dummy numerical variables. Data were scaled to unit variance and principal component analysis was performed using factoextra (version 1.0.7). The top two principal components were used for two-dimensional mapping of patient data and variable eigenvectors.
Machine learning classification
Mortality status was added to the data set used for PCA to allow for construction of a supervised machine learning classifier. All machine learning analyses were performed in R (version 3.6.1). Training and test data sets were created using the createDataPartition function in caret (version 6.0), with 75% of patients allocated to the training data set. Training data were preprocessed by centering and scaling and training was performed using ten separate tenfold repeated cross-validations for resampling. A gradient boosting machine model [29, 30] was built using 100 trees, a tree complexity of 2, and a learning rate of 0.1 using the train function in caret. Training performance was measured using area under the ROC curve, and variable importance was calculated using the varImp function in caret. Model performance was tested on the test data set and evaluated using MLeval (version 0.3).
Statistical analysis
All statistical analyses were performed in R (version 3.6.1). Bar graphs and violin plots were created using ggpubr (version 0.4.0), correlation plots were created using corrplot (version 0.84), Kaplan–Meier plots were created using survminer (version 0.4.8) and survival (version 3.2–7), and scatter plots and forest plots were created using ggplot2 (version 3.3.0). Heatmaps and hierarchical clustering were performed using pheatmap (version 1.0.12). Volcano plots were generated using EnhancedVolcano (version 1.4.0), and significant differences (absolute logFC > 0.2 and P-val < 0.05) were highlighted in red. When data were missing, these patients were not included in a given univariate analysis, eliminating potential confounding due to the presence or absence of a given clinical metric. When multiple comparisons were made, p values were corrected by the Benjamini–Hochberg procedure and a false discovery rate < 0.05 was considered significant.
Results
Demographics, comorbidities, and outcomes of COVID-19 patients
A total of 835 patients with PCR confirmed SARS-CoV-2 infection were included (Table 1). The median age was 64 years (IQR, 50–76 years; range, 17–102 years) and 438 (52%) were female. Of these patients, 363 (43%) were white and 253 (30%) were black. Past medical history was available for 549 patients and among these patients, common comorbidities included hypertension (347; 63%), diabetes (224; 41%), obesity (157; 30%), chronic kidney disease (144; 26%), and cancer (131; 24%). Active prescriptions at time of COVID-19 diagnosis were available for 697 patients, and among these the most common categories of prescribed drugs included antihypertensive drugs (500; 72%), antihistamines (324; 46%), and antiglycemic drugs (241; 35%). Most patients had an elevated temperature (median Tmax 100; IQR 99–100) and were tachypneic (median 19; IQR 18–21) but had normal heart rates (median 85; IQR 76–94). As of December 1, 2020, 656 (79%) patients required hospitalization, 336 (40%) required supplemental oxygen, 310 (37%) required intensive care unit (ICU) stays, and 196 (23%) required mechanical ventilation. Among patients who were hospitalized the median total length of stay was 9 days (IQR, 2–5 days) and among patients treated in the ICU the median length of stay in the ICU was 8 days (IQR, 3–17 days). NIH Ordinal Scoring was available for 322 patients, and mean ordinal score was 3.7 (SD 1.7). Overall, 149 (18%) patients died at the time of censoring.
Table 1
Demographics, comorbidities, and outcomes of COVID-19 patients
Overall
Alive
Dead
P value
(N = 835)
(N = 686)
(N = 149)
Gender
0.117
Female
438 (52%)
369 (54%)
69 (46%)
Male
397 (48%)
317 (46%)
80 (54%)
Age
64 (50–76)
61 (47–73)
73 (63–84)
< 0.001
Race
0.0919
Native American
1 (0%)
1 (0%)
0 (0%)
Asian
31 (4%)
23 (3%)
8 (5%)
Black
253 (30%)
206 (30%)
47 (32%)
Declined
1 (0%)
1 (0%)
0 (0%)
Native Hawaiian
2 (0%)
1 (0%)
1 (1%)
Other
90 (11%)
84 (12%)
6 (4%)
Unknown
94 (11%)
73 (11%)
21 (14%)
White
363 (43%)
297 (43%)
66 (44%)
ABO Type
0.487
A
71 (9%)
43 (6%)
28 (19%)
AB
11 (1%)
8 (1%)
3 (2%)
B
40 (5%)
29 (4%)
11 (7%)
O
104 (12%)
63 (9%)
41 (28%)
Missing
609 (72.9%)
543 (79.2%)
66 (44.3%)
BMI
29 (25–34)
29 (25–34)
30 (24–36)
0.905
Comorbidities available
549 (66%)
459 (67%)
90 (60%)
Hypertension
347 (63%)
278 (61%)
69 (77%)
0.00549
Chronic kidney disease
144 (26%)
107 (23%)
37 (41%)
< 0.001
Diabetes
224 (41%)
172 (37%)
52 (58%)
< 0.001
Obesity
167 (30%)
136 (30%)
31 (34%)
0.434
Rheumatologic disease
127 (23%)
100 (22%)
27 (30%)
0.12
Autoimmune disease
49 (9%)
43 (9%)
6 (7%)
0.535
Cancer
131 (24%)
98 (21%)
33 (37%)
0.00287
Immunosuppressive Disease
128 (23%)
103 (22%)
25 (28%)
0.338
COPD
72 (13%)
54 (12%)
18 (20%)
0.0517
Asthma
81 (15%)
66 (14%)
15 (17%)
0.691
Coronary artery disease
130 (24%)
97 (21%)
33 (37%)
0.00241
Cerebrovascular disease
67 (12%)
46 (10%)
21 (23%)
< 0.001
Medications available
697 (83%)
568 (83%)
129 (87%)
Corticosteroid
179 (26%)
140 (25%)
39 (30%)
0.231
Calcineurin inhibitors
16 (2%)
12 (2%)
4 (3%)
0.726
Antirheumatic therapy
9 (1%)
7 (1%)
2 (2%)
1
Immunosuppressive therapy
46 (7%)
32 (6%)
14 (11%)
0.0361
Chemotherapy
26 (4%)
19 (3%)
7 (5%)
0.385
Antiglycemic therapy
241 (35%)
192 (34%)
49 (38%)
0.424
Asthma therapy
227 (33%)
178 (31%)
49 (38%)
0.177
Biologics
1 (0%)
1 (0%)
0 (0%)
1
Osteoporosis therapy
13 (2%)
9 (2%)
4 (3%)
0.43
Antihypertensive therapy
500 (72%)
392 (69%)
108 (84%)
0.00119
Labs
Absolute lymphocyte count (106/mL)
1.2 (0.83–1.6)
1.2 (0.89–1.6)
0.97 (0.67–1.3)
< 0.001
C-Reactive protein (mg/L)
94 (52–150)
82 (42–130)
140 (100–180)
< 0.001
Creatinine (mg/dL)
1.0 (0.73–1.7)
0.91 (0.70–1.3)
1.8 (1.1–2.8)
< 0.001
Ferritin (ng/mL)
680 (300–1500)
570 (260–1200)
1400 (570–2900)
< 0.001
D-Dimer (ng/mL FEU)
1300 (720–2700)
1100 (650–2200)
2400 (1200–4800)
< 0.001
Creatine kinase (IU/L)
150 (69–380)
140 (65–360)
170 (82–530)
0.0455
INR
1.2 (1.1–1.4)
1.2 (1.1–1.3)
1.3 (1.2–1.5)
< 0.001
Lactate dehydrogenase (IU/L)
330 (260–430)
320 (240–400)
420 (310–560)
< 0.001
pH
7.1 (6.7–7.3)
7.0 (6.5–7.3)
7.2 (7.0–7.3)
0.00105
Platelet count (106/mL)
230 (180–310)
250 (190–320)
190 (140–260)
< 0.001
PT (s)
13 (12–15)
13 (12–15)
14 (13–17)
< 0.001
PTT (s)
35 (30–55)
33 (29–47)
53 (35–70)
< 0.001
Absolute neutrophil count (106/mL)
5.5 (3.8–8.2)
5.0 (3.6–7.3)
7.8 (5.1–12)
< 0.001
A1c (%)
7.7 (6.4–9.3)
7.6 (6.3–9.3)
7.8 (7.2–8.8)
0.629
Vitals
Respiratory rate
19 (18–21)
19 (18–20)
23 (20–25)
< 0.001
Heart rate
85 (76–94)
84 (74–93)
89 (82–98)
< 0.001
Tmax
100 (99–100)
100 (99–100)
100 (100–100)
< 0.001
SBP (minimum)
99 (91–110)
99 (92–110)
95 (84–110)
0.0275
DBP (minimum)
57 (49–65)
57 (50–65)
54 (44–63)
0.00543
Status
< 0.001
Inpatient
656 (79%)
510 (74%)
146 (98%)
Outpatient
179 (21%)
176 (26%)
3 (2%)
Outcomes
Supplemental O2
336 (40%)
263 (38%)
73 (49%)
0.0208
Mechanical ventilation
196 (23%)
106 (15%)
90 (60%)
< 0.001
Total encounters
1.0 (1.0–1.0)
1.0 (1.0–1.0)
1.0 (1.0–1.0)
0.261
Length admission
9.0 (5.0–18)
8.0 (4.0–19)
12 (7.0–17)
< 0.001
Ordinal score
4.0 (2.0–5.0)
4.0 (4.0–5.0)
1.0 (1.0–1.0)
< 0.001
ICU admission
133 (16%)
87 (13%)
46 (31%)
< 0.001
ICU days
8.0 (3.0–17)
8.0 (2.0–19)
9.0 (4.0–14)
< 0.001
Continuous data presented as mean (95% CI). P values computed by Chi-squared test for categorical data and Wilcoxon signed-rank test for continuous data
Demographics, comorbidities, and outcomes of COVID-19 patientsContinuous data presented as mean (95% CI). P values computed by Chi-squared test for categorical data and Wilcoxon signed-rank test for continuous data
Clinical predictors of COVID-19 outcomes
To validate our ability to identify risk factors for COVID-19 severity, we compared mortality rates among currently recognized comorbidities for COVID-19 (Fig. 1A). In our cohort, hypertension (OR 2.14, 95% CI 1.27–3.60), chronic kidney disease (OR 2.30, 95% CI 1.44–3.68), cardiovascular disease (OR 2.73, 95% CI 1.54–4.84), diabetes (OR 2.28, 95% CI 1.44–3.60), coronary artery disease (OR 2.16, 95% CI 1.33–3.50), and cancer (OR 2.13, 95% CI 1.32–3.45) were associated with COVID-19 mortality. Risks for hospitalization included hypertension (OR 2.42, 95% CI 1.64–3.57), male gender (OR 1.69, 95% CI 1.21–2.38), diabetes (OR 2.17, 95% CI 1.43–3.29), chronic kidney disease (OR 2.42, 95% CI 1.44–3.68), coronary artery disease (OR 2.59, 95% CI 1.51–4.42), and COPD (OR 3.63, 95% CI 1.65–7.96), whereas risks for ICU admission only included male gender (OR 2.17, 95% CI 1.42–3.31) and diabetes (OR 2.27, 95% CI 1.35–3.81). Notably, male gender was not significantly associated with mortality among COVID-19 patients in our cohort (OR 1.35, 95% CI 0.95–1.93).
Fig. 1
Univariate analyses identify key laboratory parameters associated with mortality in COVID-19 patients. a Forest plot comparing odds ratios of selected comorbidities with mortality, hospitalization, and ICU admission in COVID-19 patients. Horizontal lines indicate 95% CI. b Volcano plots of odds ratios of laboratory results, demographics, medications, comorbidities, and signs and symptoms with mortality, hospitalization, and ICU admission in COVID-19 patients. P values corrected for multiple comparisons by Benjamini–Hochberg procedure and significant metrics (P-adj < 0.05) indicated in red. c Heatmap of adjusted p values from Mann–Whitney U tests for continuous laboratory values and demographic information between patients requiring or not requiring ICU admission, supplement oxygen, mechanical ventilation, hospitalization, and death. Metrics significantly altered between alive and dead patient cohorts are shown and arranged by increasing adjusted p value. d Violin plots of the most significantly altered clinical metrics alive and dead patient cohorts. Mann–Whitney U test p value shown
Univariate analyses identify key laboratory parameters associated with mortality in COVID-19 patients. a Forest plot comparing odds ratios of selected comorbidities with mortality, hospitalization, and ICU admission in COVID-19 patients. Horizontal lines indicate 95% CI. b Volcano plots of odds ratios of laboratory results, demographics, medications, comorbidities, and signs and symptoms with mortality, hospitalization, and ICU admission in COVID-19 patients. P values corrected for multiple comparisons by Benjamini–Hochberg procedure and significant metrics (P-adj < 0.05) indicated in red. c Heatmap of adjusted p values from Mann–Whitney U tests for continuous laboratory values and demographic information between patients requiring or not requiring ICU admission, supplement oxygen, mechanical ventilation, hospitalization, and death. Metrics significantly altered between alive and dead patient cohorts are shown and arranged by increasing adjusted p value. d Violin plots of the most significantly altered clinical metrics alive and dead patient cohorts. Mann–Whitney U test p value shownIn order to unbiasedly compare the relative association of clinical characteristics with COVID-19 outcomes, we calculated the odds ratios among binary categorical clinical metrics measured, including laboratory results, demographics, medications, comorbidities, and signs and symptoms (Fig. 1B). Mortality was most significantly associated with abnormal pH (OR 8.54, 95% CI 5.34–13.6), abnormal creatinine (OR 6.94, 95% CI 4.22–11.4), and abnormal PTT (OR 4.78, 95% CI 3.11–7.33). Hospitalization was most significantly associated with abnormal D-dimer (OR 8.87, 95% CI 4.18–18.8), NSAID use (OR 0.24, 95% CI 0.15–0.38), and abnormal C-reactive protein (OR 6.43, 95% CI 3.30–12.5), and ICU admission was associated with requiring supplemental oxygen at admission (OR 8.34, 95% CI 4.91–14.1), abnormal pH (OR 13.1, 95% CI 7.71–22.5), and abnormal PTT (OR 7.36, 95% CI 4.42–12.2).We next sought to compare the relative association between continuous variables and COVID-19 outcomes. Mann–Whitney U tests between mortality and laboratory values and demographic information revealed that elevated creatinine was most significantly associated with mortality (average maximum creatinine 3.97 in dead vs 1.97 in alive, adjusted P-val < 2 × 10–16) (Fig. 1C). Other significant associations with mortality included decreased albumin (average minimum albumin 2.50 in dead vs 3.22 in alive, adjusted P-val < 2 × 10–16), decreased lymphocyte count (average minimum lymphocytes 7.53 in dead vs 13.56 in alive, adjusted P-val < 2 × 10–16), elevated phosphate (average maximum phosphate 6.50 in dead vs 4.61 in alive, adjusted P-val < 2 × 10–16), and older age (average age 71.9 years in dead vs 59.5 in dead, adjusted P-val = 8.6 × 10–16) (Fig. 1D). Comparisons in hospitalization, ventilation, oxygen requirement, and ICU admission patient groups revealed similar associations between abnormal creatinine, albumin, lymphocytes, and phosphate and COVID-19 outcomes (Fig. 1C). These results suggest that laboratory abnormalities might be more informative in predicting outcomes from COVID-19 than patient demographic information including comorbidities.To quantify and rank the effects of clinical metrics on time to death following COVID-19 diagnosis, we performed Kaplan–Meier analysis of patient survival using positive COVID-19 test date and date of death. Among the 149 (18%) of patients that died, the median survival time after COVID-19 diagnosis was 13 days (IQR, 7–28 days) (Fig. 2A). Regression analysis of demographics, laboratory results, medications, comorbidities, and vitals against survival probability revealed that abnormal pH (HR 6.5, 95% CI 4.2–10), stratified age groups (HR = 1.5, 95% CI 1.3–1.7), abnormal albumin (HR 3.6, 95% CI 2.4–5.5), and abnormal phosphate (HR 4.7, 95% CI 2.7–8.1) were most significantly associated with increased risk of COVID-19 death (Fig. 2B). These risks are greater than those associated with currently accepted comorbidities for severe COVID-19 in our cohort, such as hypertension (HR 2.0, 95% CI 1.2–3.3), diabetes (HR 2.1, 95% CI 1.4–3.3), and chronic kidney disease (HR 2.2, 95% CI 1.4–3.3) (Fig. 2C). Both race (HR 0.99, 95% CI 0.92–1.1) and gender (HR 1.3, 95% CI 0.91–1.7) were not significantly associated with decreased survival following COVID-19 diagnosis in our cohort.
Fig. 2
Unbiased identification of metrics most associated with increased risk of dying following COVID-19 diagnosis. A Kaplan–Meier plot of patient survival following COVID-19 diagnosis. B Volcano plot of hazard ratios (HR) calculated from unbiased Cox regression analysis between all measured patient metrics and patient survival following COVID-19 diagnosis. P values were calculated using the Wald test statistic and corrected for multiple comparisons by Benjamini–Hochberg procedure. Significant metrics (P-adj < 0.05) indicated in red. C Kaplan–Meier plots of patient survival following COVID-19 diagnosis stratified by indicated patient demographic or laboratory result. Log rank test p value indicated on plots and 95% CI indicated by shading
Unbiased identification of metrics most associated with increased risk of dying following COVID-19 diagnosis. A Kaplan–Meier plot of patient survival following COVID-19 diagnosis. B Volcano plot of hazard ratios (HR) calculated from unbiased Cox regression analysis between all measured patient metrics and patient survival following COVID-19 diagnosis. P values were calculated using the Wald test statistic and corrected for multiple comparisons by Benjamini–Hochberg procedure. Significant metrics (P-adj < 0.05) indicated in red. C Kaplan–Meier plots of patient survival following COVID-19 diagnosis stratified by indicated patient demographic or laboratory result. Log rank test p value indicated on plots and 95% CI indicated by shading
Clinical correlates of COVID-19 severity
To examine associations between clinical metrics and COVID-19 severity beyond binary categorical outcomes, we measured the correlation of each metric with NIH ordinal severity scores and total length of stay per patient (Fig. 3A). Ordinal score was most significantly correlated with maximum respiratory rate (Spearman’s rho = −0.56), maximum absolute neutrophil count (Spearman’s rho = −0.5), maximum C-reactive protein (Spearman’s rho = −0.52), and minimum albumin (Spearman’s rho = 0.5) (Fig. 3B). The total length of admission was most significantly correlated with maximum temperature (Spearman’s rho = 0.62), maximum phosphate (Spearman’s rho = 0.60), minimum hemoglobin (Spearman’s rho = −0.58), and minimum systolic blood pressure (Spearman’s rho = −0.53) (Fig. 3C). These results confirm our previous findings, suggesting that hematologic laboratory results are not only indicative of mortality in COVID-19 patients, but are also correlated with disease severity. These results also quantify the relative association of vitals such as respiratory rate and temperature with COVID-19 severity.
Fig. 3
Correlation between continuous clinical metrics and COVID-19 severity. A Ranked order plots of Spearman correlation coefficients between all clinical metrics and NIH ordinal score and total length of admission. Selected significant associations indicated on plot. B–C Scatter plots of correlation of selected clinical metrics and NIH ordinal score (B) or length of admission (C). Spearman correlation coefficient and p value indicated on plot, and regression line and 95% confidence interval indicated in blue
Correlation between continuous clinical metrics and COVID-19 severity. A Ranked order plots of Spearman correlation coefficients between all clinical metrics and NIH ordinal score and total length of admission. Selected significant associations indicated on plot. B–C Scatter plots of correlation of selected clinical metrics and NIH ordinal score (B) or length of admission (C). Spearman correlation coefficient and p value indicated on plot, and regression line and 95% confidence interval indicated in blueTo determine relationships between multiple categorical and numerical outcomes and metrics, we performed correlation analysis across patient demographics, selected laboratory results, medications, comorbidities, vitals, and outcomes including continuous metrics of COVID-19 severity (Fig. 4). In addition to the associations noted previously, this analysis revealed significant correlations between COVID-19 outcomes and clinical interventions such as ICU admission and mechanical ventilation. As expected, comorbidities were highly correlated with prescriptions for appropriate medications (e.g., diabetes and antiglycemic drugs) as well as corresponding laboratory results (e.g., chronic kidney disease and mean creatinine). Notably, comorbidities were more closely associated with corresponding medications than COVID-19 outcomes, whereas laboratory values and vitals were more closely associated with COVID-19 outcomes than corresponding comorbidities. Overall, this correlation analysis revealed the heterogeneity of COVID-19 patient presentation, and the relative utility of a spectrum of patient information in predicting COVID-19 severity.
Fig. 4
Correlation analysis reveals heterogeneity and associations among COVID-19 patient characteristics and outcomes. Correlation plot of Spearman correlation coefficients between indicated clinical metrics and measures of disease outcomes among COVID-19 patients. Matrix display order was determined by angular order of eigenvectors. *P < 0.05, **P < 0.01, ***P < 0.001
Correlation analysis reveals heterogeneity and associations among COVID-19 patient characteristics and outcomes. Correlation plot of Spearman correlation coefficients between indicated clinical metrics and measures of disease outcomes among COVID-19 patients. Matrix display order was determined by angular order of eigenvectors. *P < 0.05, **P < 0.01, ***P < 0.001
Principal component analysis and machine learning classification segregates COVID-19 patients by mortality
To determine whether COVID-19 patients can be stratified by severity based on clinical metrics typically present at admission to the emergency department, we performed unsupervised principal component analysis (PCA). We excluded metrics of COVID-19 outcomes and severity and metrics that would not be known at admission, such as pathology results and medications placed after COVID-19 diagnosis. Only patients for whom full demographic, laboratory, medication history, comorbidities, past medical history, and vitals were available were included, leaving 237 metrics across 209 patients. PCA distilled these 237 metrics into two dimensions, which were most defined by immunosuppression and anemia in Dimension 1, and by AST, LDH, ALT, and ferritin in Dimension 2 (Fig. 5A). The eigenvectors for mean AST and maximum ferritin were orthogonal to the eigenvector for immunosuppression (Fig. 5B), suggesting that these metrics capture independent meta-characteristics of COVID-19 patients.
Fig. 5
Multivariate analyses segregate COVID-19 patients by disease severity. A Bar plot indicating contributions of the top ten metrics to the top two principal components identified by unsupervised principal component analysis (PCA) of COVID-19 patients. B Biplot of principle component scores of COVID-19 patients (dots) and variable loadings (vectors). The top four metrics with the greatest contribution to variability are shown. C PCA plots of COVID-19 patients according to the top two principal components and colored according to the indicated metric. D Receiver operator curve (left) and calibration plot (right) to assess ability of a supervised gradient boosting machine model to classify COVID-19 patient mortality using demographic, laboratory, medication history, comorbidities, past medical history, and vitals. Classification performance assessed by area under the curve (AUC). E Ranked plot of the importance scores of top 20 clinical metrics in the machine learning classifier constructed in (D)
Multivariate analyses segregate COVID-19 patients by disease severity. A Bar plot indicating contributions of the top ten metrics to the top two principal components identified by unsupervised principal component analysis (PCA) of COVID-19 patients. B Biplot of principle component scores of COVID-19 patients (dots) and variable loadings (vectors). The top four metrics with the greatest contribution to variability are shown. C PCA plots of COVID-19 patients according to the top two principal components and colored according to the indicated metric. D Receiver operator curve (left) and calibration plot (right) to assess ability of a supervised gradient boosting machine model to classify COVID-19 patient mortality using demographic, laboratory, medication history, comorbidities, past medical history, and vitals. Classification performance assessed by area under the curve (AUC). E Ranked plot of the importance scores of top 20 clinical metrics in the machine learning classifier constructed in (D)We next plotted the 209 patients present in our PCA in two-dimensional space. There was no clear distribution of COVID-19 patients in PCA space on the basis of demographic information such as gender, race, and age (Fig. 5C). However, when we visualized mortality, which was not a variable included in our PCA, there was a separation among COVID-19 patients in PCA space. Similar trajectories could be appreciated for COVID-19 severity and outcomes metrics, such as length of stay, mechanical ventilation requirement, and ordinal score (Fig. 5C). Trajectories of COVID-19 severity in PCA space were orthogonal to the eigenvector for immunosuppression, suggesting that although immunosuppression contributes to variability among COVID-19 patients, it likely does not contribute to disease severity.Given our ability to segregate patients by COVID-19 severity using unsupervised PCA, we next sought to design a machine learning classifier to predict patient mortality. Using mortality in addition to the 237 variables used for PCA above, we partitioned our COVID-19 patient cohort into a training set of 157 patients and a test set of 52 patients. The training set of patients was used to build a supervised gradient boosting machine model to classify patient mortality. Our model achieved a sensitivity of 0.53 (95% CI 0.39–0.67), specificity of 0.88 (95% CI 0.81–0.93), and area under curve (AUC) for the ROC curve of 0.87 (95% CI 0.80–0.94) based on the training data (Fig. 5D). When applied to the test set, our model correctly identified 6 of 15 patients who died following COVID-19 diagnosis, achieving an accuracy of 0.77 (95% CI 0.63–0.87), a sensitivity of 0.92, specificity of 0.40, and AUC ROC of 0.82. Variable importance scores extracted from the gradient boosting machine model revealed that absolute neutrophil count, PTT, and patient age were the most contributory to model prediction (Fig. 5E). Together our PCA and machine learning classifier suggest that COVID-19 severity and outcomes can be correlated with clinical characteristics known at the time of admission and confirm the importance of laboratory data over demographic information in predicting disease outcome.
Discussion
Here, we unbiasedly profile over 1700 unique clinical metrics in 835 COVID-19 patients to identify correlates of disease outcomes and severity. We observed similar odds ratios for COVID-19 mortality risk from comorbidities previously reported, such as increased age [11, 17, 31–33], hypertension [12], diabetes [8, 11–13], and chronic kidney disease [16]. Univariate, correlation, and multivariate analyses revealed strong associations between key laboratory parameters and COVID-19 severity. Several of these associations have been previously reported, such as elevated creatinine [34], decreased lymphocyte count [19, 20], elevated CRP [34], decreased hemoglobin [20], abnormal pH [35], decreased albumin [36], and elevated PTT [20]. Notably, through unbiased comparisons across all clinical metrics, we observed that these laboratory abnormalities are more strongly associated with mortality in COVID-19 patients than patient age, gender, comorbidities, or prescribed medications.As this was a retrospective cohort study of associations with COVID-19 outcomes, it remains unclear whether the metrics identified here predispose patients to worse outcomes or are a consequence of severe COVID-19 itself. Abnormal pH and increased respiratory rate in patients with severe COVID-19 is likely reflective of the eventual acute respiratory distress syndrome and tissue malperfusion experienced by these patients [5], whereas the elevated inflammatory markers we observed are characteristic of the systemic inflammation observed in some case of severe COVID-19 [3, 37, 38]. Some laboratory perturbations such as prolonged PTT might reflect interventions employed preferentially in COVID-19 patients such as anticoagulants. Other laboratory parameters such as decreased lymphocytes and albumin might represent a unique inflammatory phenotype that predisposes patients to severe COVID-19 [19]. Regardless of the root cause of the clinical associations we describe, we have identified key clinical metrics that may be obtained at emergency department admission to identify overall risk for COVID-19 mortality.We observed a mortality rate of 18% and hospitalization rate of 79%, in contrast to currently estimated case fatality rates of 0.9–7.2% [17, 33, 39, 40] for SARS-CoV-2. This is likely due to sampling bias as only patients who sought care at an academic medical center, obtained a laboratory confirmed COVID-19 diagnosis, and had available medication or past medical history were included. Alternatively, this might reflect the evolving mortality rate of the course of this pandemic, as our ability to diagnose and treat COVID-19 has improved the past year [41]. Nevertheless, a range of clinical presentations and disease severity scores are represented in our patient cohort, including outpatients and patients with asymptomatic disease.COVID-19 remains a great threat to society relative to other respiratory viral diseases due to its case fatality rate and its striking range of clinical presentations and severity [17, 42, 43]. This study offers an unbiased retrospective approach to identify potential associations with this fatality rate and spectrum of disease severity. Our data suggest that increased absolute neutrophil count, decreased albumin, and decreased lymphocytes are key correlates of severe COVID-19 and are clinical characteristics available at initial admission that might be informative of disease prognosis. By identifying which COVID-19 patients are most at risk for severe disease, we may be better able to provide early and targeted therapeutic interventions, thereby combatting the current pandemic in an orthogonal but complementary approach to the preventative approaches currently being pursued across the world.Below is the link to the electronic supplementary material.Supplementary file1 (DOCX 23 kb)
Authors: Diane Marie Del Valle; Seunghee Kim-Schulze; Hsin-Hui Huang; Noam D Beckmann; Sharon Nirenberg; Bo Wang; Yonit Lavin; Talia H Swartz; Deepu Madduri; Aryeh Stock; Thomas U Marron; Hui Xie; Manishkumar Patel; Kevin Tuballes; Oliver Van Oekelen; Adeeb Rahman; Patricia Kovatch; Judith A Aberg; Eric Schadt; Sundar Jagannath; Madhu Mazumdar; Alexander W Charney; Adolfo Firpo-Betancourt; Damodara Rao Mendu; Jeffrey Jhang; David Reich; Keith Sigel; Carlos Cordon-Cardo; Marc Feldmann; Samir Parekh; Miriam Merad; Sacha Gnjatic Journal: Nat Med Date: 2020-08-24 Impact factor: 53.440
Authors: Tomer Zohar; Carolin Loos; Stephanie Fischinger; Caroline Atyeo; Chuangqi Wang; Matthew D Slein; John Burke; Jingyou Yu; Jared Feldman; Blake Marie Hauser; Tim Caradonna; Aaron G Schmidt; Yongfei Cai; Hendrik Streeck; Edward T Ryan; Dan H Barouch; Richelle C Charles; Douglas A Lauffenburger; Galit Alter Journal: Cell Date: 2020-11-03 Impact factor: 41.582
Authors: Sara Y Tartof; Lei Qian; Vennis Hong; Rong Wei; Ron F Nadjafi; Heidi Fischer; Zhuoxin Li; Sally F Shaw; Susan L Caparosa; Claudia L Nau; Tanmai Saxena; Gunter K Rieg; Bradley K Ackerson; Adam L Sharp; Jacek Skarbinski; Tej K Naik; Sameer B Murali Journal: Ann Intern Med Date: 2020-08-12 Impact factor: 25.391
Authors: Ninh T Nguyen; Justine Chinn; Jeffry Nahmias; Sarah Yuen; Katharine A Kirby; Sam Hohmann; Alpesh Amin Journal: JAMA Netw Open Date: 2021-03-01