| Literature DB >> 33024092 |
Yue Gao1,2, Guang-Yao Cai1,2, Wei Fang3, Hua-Yi Li1,2, Si-Yuan Wang1,2, Lingxi Chen4, Yang Yu1,2, Dan Liu1,2, Sen Xu1,2, Peng-Fei Cui1,2, Shao-Qing Zeng1,2, Xin-Xia Feng5, Rui-Di Yu1,2, Ya Wang1,2, Yuan Yuan1,2, Xiao-Fei Jiao1,2, Jian-Hua Chi1,2, Jia-Hao Liu1,2, Ru-Yuan Li1,2, Xu Zheng1,2, Chun-Yan Song1,2, Ning Jin1,2, Wen-Jian Gong1,2, Xing-Yu Liu1,2, Lei Huang6, Xun Tian6, Lin Li7, Hui Xing7, Ding Ma1,2, Chun-Rui Li8, Fei Ye9, Qing-Lei Gao10,11.
Abstract
Soaring cases of coronavirus disease (COVID-19) are pummeling the global health system. Overwhelmed health facilities have endeavored to mitigate the pandemic, but mortality of COVID-19 continues to increase. Here, we present a mortality risk prediction model for COVID-19 (MRPMC) that uses patients' clinical data on admission to stratify patients by mortality risk, which enables prediction of physiological deterioration and death up to 20 days in advance. This ensemble model is built using four machine learning methods including Logistic Regression, Support Vector Machine, Gradient Boosted Decision Tree, and Neural Network. We validate MRPMC in an internal validation cohort and two external validation cohorts, where it achieves an AUC of 0.9621 (95% CI: 0.9464-0.9778), 0.9760 (0.9613-0.9906), and 0.9246 (0.8763-0.9729), respectively. This model enables expeditious and accurate mortality risk stratification of patients with COVID-19, and potentially facilitates more responsive health systems that are conducive to high risk COVID-19 patients.Entities:
Mesh:
Year: 2020 PMID: 33024092 PMCID: PMC7538910 DOI: 10.1038/s41467-020-18684-2
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Study design.
MRPMC mortality risk prediction model for COVID-19, SFT training cohort of Sino-French New City Campus of Tongji Hospital, SFV internal validation cohort of Sino-French New City Campus of Tongji Hospital, OV Optical Valley Campus of Tongji Hospital, CHWH The Central Hospital of Wuhan.
Baseline characteristics of individuals by cohort.
| SFT cohort | SFV cohort | OV cohort | CHWH cohort | |
|---|---|---|---|---|
| Characteristics | ( | ( | ( | ( |
| Age | 62 (51–71) | 63 (51–70) | 63 (50–70) | 62.5 (55–72) |
| Sex | ||||
| Female | 306 (49.3%) | 311 (50.0%) | 427 (53.3%) | 53 (45.7%) |
| Male | 315 (50.7%) | 311 (50.0%) | 374 (46.7%) | 63 (54.3%) |
| Comorbidity number | 1 (0–2) | 1 (0–2) | 1 (0–2) | 2 (1–3) |
| Comorbidity | ||||
| Hypertension | 245 (39.5%) | 244 (39.2%) | 321 (40.3%) | 43 (37.1%) |
| Diabetes | 110 (17.7%) | 110 (17.7%) | 121 (15.2%) | 16 (13.8%) |
| CHD | 72 (11.6%) | 59 (9.5%) | 68 (8.5%) | 16 (13.8%) |
| CLD | 26 (4.2%) | 19 (3.1%) | 33 (4.1%) | 7 (6.0%) |
| Tumor | 22 (3.5%) | 21 (3.4%) | 20 (2.5%) | 51 (44.0%) |
| HBV | 16 (2.6%) | 13 (2.1%) | 24 (3.0%) | 1 (1.0%) |
| CKD | 13 (2.1%) | 8 (1.3%) | 11 (1.4%) | 1 (0.9%) |
| COPD | 4 (0.6%) | 7 (1.1%) | 7 (0.9%) | 1 (0.9%) |
| Fever | 533 (86.0%) | 527 (84.9%) | 584 (73.0%) | 71 (61.2%) |
| Temp (max) ≥ 39 °C | 169 (27.4%) | 194 (31.5%) | 158 (19.8%) | 16 (14.2%) |
| Cough | 450 (72.6%) | 436 (70.2%) | 601 (75.1%) | 63 (54.3%) |
| Dyspnea | 313 (50.5%) | 283 (45.6%) | 274 (34.2%) | 37 (31.9%) |
| Sputum | 233 (37.6%) | 228 (36.7%) | 344 (43.0%) | 32 (27.6%) |
| Fatigue | 253 (40.8%) | 233 (37.5%) | 250 (31.2%) | 43 (37.1%) |
| Diarrhea | 186 (30.0%) | 167 (26.9%) | 135 (16.9%) | 9 (7.8%) |
| Myalgia | 133 (21.5%) | 144 (23.2%) | 129 (16.1%) | 20 (17.2%) |
| Vomiting | 30 (4.8%) | 31 (5.0%) | 32 (4.0%) | 3 (2.6%) |
| Conscious at admission | 595 (95.8%) | 600 (96.5%) | 786 (98.1%) | 79 (68.1%) |
| Respiratory rate, per min | 20 (20–22) | 21 (20–24) | 21 (20–24) | 21 (20–24) |
| MAP, mmHg | 96.7 (88.7–104.7) | 97.2 (89.7–105.6) | 96.3 (87.7–106.7) | 93.3 (86.9–101.5) |
| SpO2, % | 95 (91–97) | 95 (91–97) | 96 (94–97) | 95.5 (93–97.3) |
| Vital status | ||||
| Death | 86 (13.8%) | 89 (14.3%) | 60 (7.5%) | 19 (16.4%) |
| Discharge | 535 (86.2%) | 533 (85.7%) | 741 (92.5%) | 97 (83.6%) |
| Follow-up, days | 23 (15–30) | 21 (15–29) | 19 (14–26) | 17 (12–24) |
Continuous variables are presented as median (interquartile ranges [IQR]), while categorical variables as counts and percentages (%).
SFT cohort training cohort of Sino-French New City Campus of Tongji Hospital, SFV cohort internal validation cohort of Sino-French New City Campus of Tongji Hospital, OV cohort external validation cohort of Optical Valley Campus of Tongji Hospital, CHWH cohort external validation cohort of The Central Hospital of Wuhan, Follow-up time from admission to death or discharge, CHD coronary heart disease, CLD chronic liver disease, HBV hepatitis B virus, CKD chronic kidney disease, COPD chronic obstructive pulmonary disease, MAP mean arterial pressure.
Fig. 2Feature selection by LASSO.
a LASSO variable trace profiles of the 34 features whose intracohort missing rates were less than 5%. The vertical dashed line shows the best lambda value 0.014 chosen by tenfold cross validation. b Feature coefficient of LASSO with best lambda value 0.014. High-risk (positive coefficient) and low-risk (negative coefficient) features are colored in red and blue, respectively. Gray features with coefficient 0 were considered redundant and removed, resulting in 14 features left for downstream prognosis modeling. LASSO least absolute shrinkage and selection operator, BUN blood urea nitrogen, RR respiratory rate, COPD chronic obstructive pulmonary disease, Hb hemoglobin, WB, white blood cell count, Cr creatinine, GGT gamma-glutamyl transferase, TB total bilirubin, AST aspartate aminotransferase, ALT alanine transaminase, MAP mean arterial pressure, ALB albumin, SpO2 oxygen saturation, CKD chronic kidney disease.
Fig. 3Predictive performance of models across cohorts.
AUC to assess the performance of mortality risk prediction of models (LR, SVM, GBDT, NN, and MRPMC) in a SFV cohort, b OV cohort, and c CHWH cohort, respectively. Source data are provided as a Source Data file. Kaplan–Meier curves indicating overall survival of patients with high and low mortality risk in d SFV cohort, e OV cohort, and f CHWH cohort, respectively. The tick marks refer to censored patients. The dark red or blue line indicates the survival probability, and the light red or blue areas represent the 95% confidence interval of survival probability (p < 0.0001). AUC area under the receiver operating characteristics curve, SFV internal validation cohort of Sino-French New City Campus of Tongji Hospital, OV Optical Valley Campus of Tongji Hospital, CHWH The Central Hospital of Wuhan, LR logistic regression, SVM support vector machine, GBDT gradient boosted decision tree, NN neural network, MRPMC mortality risk prediction model for COVID-19.
Performance for mortality risk prediction of models in validation cohorts.
| AUC (95% CI) | Accuracy (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | F1 | Kappa | Brier | |
|---|---|---|---|---|---|---|---|---|---|
| Internal validation cohort (SFV) | |||||||||
| MRPMC | 0.9621 (0.9464–0.9778) | 92.4% (90.1–94.4%) | 57.3% (46.4–67.7%) | 98.3% (96.8–99.2%) | 85.0% (73.4–92.9%) | 93.2% (90.8–95.2%) | 0.685 | 0.644 | 0.051 |
| SVM | 0.9594 (0.9424–0.9764) | 92.4% (90.1–94.4%) | 60.7% (49.8–70.9%) | 97.8% (96.1–98.8%) | 81.8% (70.4–90.2%) | 93.7% (91.4–95.6%) | 0.697 | 0.655 | 0.052 |
| GBDT | 0.9454 (0.9246–0.9662) | 91.5% (89.0–93.6%) | 60.7% (49.8–70.9%) | 96.6% (94.7–98.0%) | 75.0% (63.4–84.5%) | 93.6% (91.3–95.5%) | 0.696 | 0.643 | 0.066 |
| LR | 0.9614 (0.9456–0.9772) | 92.1% (89.7–94.1%) | 56.2% (45.3–66.7%) | 98.1% (96.6–99.1%) | 83.3% (71.5–91.7%) | 93.1% (90.6–95.0%) | 0.671 | 0.628 | 0.051 |
| NN | 0.9615 (0.9456–0.9774) | 92.1% (89.7–94.1%) | 51.7% (40.8–62.4%) | 98.9% (97.6–99.6%) | 88.5% (76.6–95.7%) | 92.5% (90.0–94.5%) | 0.653 | 0.612 | 0.051 |
| External validation cohort (OV) | |||||||||
| MRPMC | 0.9760 (0.9613–0.9906) | 95.5% (93.8–96.8%) | 45.0% (32.1–58.4%) | 99.6% (98.8–99.9%) | 90.0% (73.5–97.9%) | 95.7% (94.0–97.0%) | 0.600 | 0.579 | 0.029 |
| SVM | 0.9774 (0.9640–0.9908) | 95.8% (94.1–97.0%) | 50.0% (36.8–63.2%) | 99.5% (98.6–99.9%) | 88.2% (72.6–96.7%) | 96.1% (94.5–97.4%) | 0.638 | 0.618 | 0.028 |
| GBDT | 0.9536 (0.9279–0.9793) | 94.8% (93.0–96.2%) | 48.3% (35.2–61.6%) | 98.5% (97.4–99.3%) | 72.5% (56.1–85.4%) | 95.9% (94.3–97.2%) | 0.580 | 0.553 | 0.039 |
| LR | 0.9721 (0.9568–0.9875) | 95.4% (93.7–96.7%) | 45.0% (32.1–58.4%) | 99.5% (98.6–99.9%) | 87.1% (70.2–96.4%) | 95.7% (94.0–97.0%) | 0.593 | 0.572 | 0.031 |
| NN | 0.9754 (0.9602–0.9906) | 95.6% (94.0–96.9%) | 46.7% (33.7–60.0%) | 99.6% (98.8–99.9%) | 90.3% (74.3–98.0%) | 95.8% (94.2–97.1%) | 0.615 | 0.595 | 0.028 |
| External validation cohort (CHWH) | |||||||||
| MRPMC | 0.9246 (0.8763–0.9729) | 87.9% (80.6–93.2%) | 42.1% (20.3–66.5%) | 96.9% (91.2–99.4%) | 72.7% (39.0–94.0%) | 89.5% (82.0–94.7%) | 0.533 | 0.470 | 0.083 |
| SVM | 0.9067 (0.8482–0.9652) | 88.8% (81.6–93.9%) | 57.9% (33.5–79.8%) | 94.6% (88.4–98.3%) | 68.8% (41.3–89.0%) | 92.0% (84.8–96.5%) | 0.629 | 0.563 | 0.090 |
| GBDT | 0.9021 (0.8347–0.9694) | 87.9% (80.6–93.2%) | 31.6% (12.6–56.6%) | 99.0% (94.4–100.0%) | 85.7% (42.1–99.6%) | 88.1% (80.5–93.5%) | 0.462 | 0.410 | 0.089 |
| LR | 0.9213 (0.8710–0.9717) | 87.1% (79.6–92.6%) | 36.8% (16.3–61.6%) | 96.9% (91.2–99.4%) | 70.0% (34.8–93.3%) | 88.7% (81.1–94.0%) | 0.483 | 0.417 | 0.091 |
| NN | 0.9202 (0.8700–0.9705) | 88.8% (81.6–93.9%) | 47.4% (24.5–71.1%) | 96.9% (91.2–99.4%) | 75.0% (42.8–94.5%) | 90.4% (83.0–95.3%) | 0.581 | 0.520 | 0.083 |
SFV internal validation cohort of Sino-French New City Campus of Tongji Hospital, OV Optical Valley Campus of Tongji Hospital, CHWH The Central Hospital of Wuhan, MRPMC mortality risk prediction model for COVID-19, SVM support vector machine, GBDT gradient boosted decision tree, LR logistic regression, NN neural network, AUC area under the receiver operating characteristics curve, PPV positive predictive value, NPV negative predictive value, 95% CI 95% confidence interval.
Fig. 4Statistical analysis of features included in models.
a Heatmap representing the correlation between continuous features included in MRPMC using Spearman’s correlation coefficient. The colors in the plot represent the correlation coefficients. The redder the color, the stronger the positive monotonic relationship. The bluer the color, the stronger the negative monotonic relationship. The size of the circle represents the absolute value of the correlation coefficient, where a larger circle represents a stronger correlation. The numbers in the lower triangle represent the value of correlation coefficient. b Scaled importance rank of all features included in MRPMC for identifying high mortality risk COVID-19 patients included in the models. The size of circles represents the value of relative importance. The different color of circles represents the feature importance in different models. c Box and jitter plots showing distribution of continuous features included in MRPMC between deceased patients (n = 254) and discharged patients (n = 1906). The center line represents the median of the feature. Box limits represent upper and lower quartiles. Whiskers represent 1.5 times interquartile range. Gray points represent outliers. The median [IQR] of the features shown in Fig. 4c were listed in Supplementary Table 4. Wilcoxon test was used in the univariate comparison between groups and a two-tailed p < 0.05 was considered as statistically significant. ***p < 0.001. Source data are provided as a Source Data file. MRPMC mortality risk prediction model for COVID-19, ALB albumin, SpO oxygen saturation, BUN blood urea nitrogen, RR respiratory rate, LYM lymphocyte count, PLT platelet count, No. comorbidities number of comorbidities, CKD chronic kidney disease, IQR interquartile range.