| Literature DB >> 33424133 |
Simin Li1, Yulan Lin2, Tong Zhu1, Mengjie Fan1, Shicheng Xu1, Weihao Qiu1, Can Chen1, Linfeng Li1, Yao Wang1, Jun Yan1, Justin Wong3, Lin Naing4, Shabei Xu5.
Abstract
To predict the mortality of patients with coronavirus disease 2019 (COVID-19). We collected clinical data of COVID-19 patients between January 18 and March 29 2020 in Wuhan, China . Gradient boosting decision tree (GBDT), logistic regression (LR) model, and simplified LR were built to predict the mortality of COVID-19. We also evaluated different models by computing area under curve (AUC), accuracy, positive predictive value (PPV), and negative predictive value (NPV) under fivefold cross-validation. A total of 2924 patients were included in our evaluation, with 257 (8.8%) died and 2667 (91.2%) survived during hospitalization. Upon admission, there were 21 (0.7%) mild cases, 2051 (70.1%) moderate case, 779 (26.6%) severe cases, and 73 (2.5%) critically severe cases. The GBDT model exhibited the highest fivefold AUC, which was 0.941, followed by LR (0.928) and LR-5 (0.913). The diagnostic accuracies of GBDT, LR, and LR-5 were 0.889, 0.868, and 0.887, respectively. In particular, the GBDT model demonstrated the highest sensitivity (0.899) and specificity (0.889). The NPV of all three models exceeded 97%, while their PPV values were relatively low, resulting in 0.381 for LR, 0.402 for LR-5, and 0.432 for GBDT. Regarding severe and critically severe cases, the GBDT model also performed the best with a fivefold AUC of 0.918. In the external validation test of the LR-5 model using 72 cases of COVID-19 from Brunei, leukomonocyte (%) turned to show the highest fivefold AUC (0.917), followed by urea (0.867), age (0.826), and SPO2 (0.704). The findings confirm that the mortality prediction performance of the GBDT is better than the LR models in confirmed cases of COVID-19. The performance comparison seems independent of disease severity. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at(10.1007/s00521-020-05592-1).Entities:
Keywords: COVID-19; China; Machine learning; Mortality; Prediction
Year: 2021 PMID: 33424133 PMCID: PMC7783503 DOI: 10.1007/s00521-020-05592-1
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Fig. 1The process of our model
Fig. 2Statistical result of patients
Baseline characteristic of the patients on admission
| Features | Total ( | Survival ( | Death ( | AUC | |
|---|---|---|---|---|---|
| Age (years), median (IQR) | 61.876(49.737–69.539) | 60.703(48.381–68.692) | 69.577(62.709–78.333) | < 0.001 | 0.718 |
| Gender (%) | |||||
| Female | 1443 (49.4) | 1267 (47.5) | 176 (68.5) | < 0.001 | 0.605 |
| Male | 1481 (50.6) | 1400 (52.5) | 81 (31.5) | < 0.001 | 0.605 |
| Any | 1263 (43.2) | 1108 (41.5) | 155 (60.3) | < 0.001 | 0.594 |
| Cardiovascular disease | 998.0 (34.1) | 878.0 (32.9) | 120.0 (46.7) | ||
| Coronary disease | 208.0 (7.1) | 173.0 (6.5) | 35.0 (13.6) | < 0.001 | 0.536 |
| Hypertension | 865.0 (29.6) | 764.0 (28.6) | 101.0 (39.3) | 0.001 | 0.553 |
| Cerebrovascular disease | 87.0 (3.0) | 70.0 (2.6) | 17.0 (6.6) | 0.001 | 0.520 |
| COPD | 35.0 (1.2) | 27.0 (1.0) | 8.0 (3.1) | 0.009 | 0.511 |
| Diabetes | 397.0 (13.6) | 358.0 (13.4) | 39.0 (15.2) | 0.445 | 0.509 |
| Malignancy | 70.0 (2.4) | 53.0 (2.0) | 17.0 (6.6) | < 0.001 | 0.523 |
| Infectious disease | 92.0 (3.1) | 78.0 (2.9) | 14.0 (5.4) | 0.037 | 0.513 |
| Tuberculosis | 52.0 (1.8) | 44.0 (1.6) | 8.0 (3.1) | 0.130 | 0.507 |
| CKD | 17.0 (0.6) | 12.0 (0.4) | 5.0 (1.9) | 0.013 | 0.507 |
| Hepatitis | 45.0 (1.5) | 40.0 (1.5) | 5.0 (1.9) | 0.591 | 0.502 |
| Mild | 21 (0.7) | 21 (0.8) | 0 (0.0) | 0.250 | 0.504 |
| Moderate | 2051 (70.1) | 1956 (73.3) | 95 (37.0) | < 0.001 | 0.682 |
| Severe | 779 (26.6) | 645 (24.2) | 134 (52.1) | < 0.001 | 0.640 |
| Critical | 73 (2.5) | 45 (1.7) | 28 (10.9) | < 0.001 | 0.546 |
| Fever | 1964.0 (67.2) | 1788.0 (67.0) | 176.0 (68.5) | 0.677 | 0.507 |
| Cough | 1510.0 (51.6) | 1381.0 (51.8) | 129.0 (50.2) | 0.648 | 0.508 |
| Pant | 42.0 (1.4) | 33.0 (1.2) | 9.0 (3.5) | 0.009 | 0.511 |
| Dyspnea | 962.0 (32.9) | 844.0 (31.6) | 118.0 (45.9) | < 0.001 | 0.571 |
| Dizzy | 63.0 (2.2) | 48.0 (1.8) | 15.0 (5.8) | < 0.001 | 0.520 |
| Pharyngalgia | 129.0 (4.4) | 128.0 (4.8) | 1.0 (0.4) | < 0.001 | 0.522 |
| Temperature (°C) | 36.8 (0.7) | 36.8 (0.7) | 37.0 (0.9) | < 0.001 | 0.585 |
| Pulse (rates/min) | 90.8 (22.0) | 90.4 (20.0) | 95.5 (27.5) | < 0.001 | 0.571 |
| RR (rates/min) | 23.5 (2.0) | 23.4 (2.0) | 25.2 (10.0) | < 0.001 | 0.682 |
| SBP (mmHg) | 175.2 (24.0) | 179.1 (23.0) | 133.1 (26.0) | 0.134 | 0.522 |
| DBP (mmHg) | 81.0 (17.0) | 81.1 (16.0) | 80.3 (17.0) | 0.211 | 0.516 |
| SPO2 (%) | 95.4 (3.0) | 96.2 (2.0) | 87.1 (15.0) | < 0.001 | 0.729 |
| Laboratory test, median (IQR) | < 0.001 | ||||
| WBC (× 109/L) | 5.78(4.55–7.39) | 5.69(4.49–7.145) | 8.595(5.677–12.928) | < 0.001 | 0.721 |
| Neutrophil (× 109/L) | 3.73(2.67–5.28) | 3.58(2.62–4.945) | 7.465(4.5–11.622) | < 0.001 | 0.790 |
| Lymphocyte (× 109/L) | 1.22(0.81–1.68) | 1.29(0.89–1.73) | 0.585(0.42–0.8) | < 0.001 | 0.847 |
| NLR | 2.906(1.81–5.418) | 2.69(1.756–4.57) | 12.211(6.49–23.396) | < 0.001 | 0.883 |
| Platelets (× 109/L) | 222.0(170.0–284.0) | 225.0(176.0–289.0) | 152.0(112.0–222.0) | < 0.001 | 0.728 |
| ESR (mm/h) | 28.0(13.0–55.0) | 27.0(12.0–54.0) | 35.0(18.0–60.0) | 0.008 | 0.562 |
| LDH (U/L) | 241.0(192.5–328.0) | 233.0(189.0–305.0) | 485.0(363.0–639.0) | < 0.001 | 0.876 |
| CRP (mg/L) | 10.2(1.6–55.9) | 7.8(1.4–43.2) | 103.7(59.85–162.4) | < 0.001 | 0.873 |
| HDL-C (mmol/L) | 0.96(0.79–1.2) | 0.98(0.812–1.22) | 0.76(0.55–0.92) | < 0.001 | 0.743 |
| Procalcitonin (μg/L) | 0.06(0.04–0.12) | 0.06(0.04–0.09) | 0.245(0.13–0.712) | < 0.001 | 0.870 |
| Ferritin (ng/mL) | 473.0(233.675–915.2) | 421.7(213.7–792.35) | 1436.8(771.75–2444.5) | < 0.001 | 0.826 |
| Total bilirubin (μmol/L) | 8.85(6.6–12.1) | 8.6(6.4–11.7) | 12.0(8.7–17.6) | < 0.001 | 0.692 |
| ALT (U/L) | 22.0(14.0–38.0) | 22.0(14.0–37.0) | 24.0(17.25–42.0) | 0.001 | 0.562 |
| AST (U/L) | 25.0(18.0–36.0) | 24.0(18.0–34.0) | 41.0(29.0–58.0) | < 0.001 | 0.755 |
| Prealbumin (g/L) | 231.0(167.0–278.0) | 236.0(178.0–279.0) | 118.0(99.5–141.5) | < 0.001 | 0.843 |
| Albumin (g/L) | 36.7(32.6–40.85) | 37.4(33.4–41.3) | 31.3(28.2–34.2) | < 0.001 | 0.191 |
| BUN (mmol/L) | 4.5(3.5–5.8) | 4.4(3.4–5.5) | 8.3(5.5–12.775) | < 0.001 | 0.811 |
| Creatinine (μmol/L) | 68.0(56.0–83.0) | 67.0(56.0–81.0) | 86.5(67.0–110.75) | < 0.001 | 0.704 |
| eGFR (ml/min) | 93.4(79.3–104.0) | 94.3(81.9–104.9) | 73.2(48.7–90.6) | < 0.001 | 0.740 |
| TNF-α (pg/ml) | 8.1(6.5–10.5) | 7.9(6.4–10.0) | 11.45(9.025–18.975) | < 0.001 | 0.760 |
| IL-2R (pg/ml) | 405.0(281.0–649.0) | 381.0(277.0–581.0) | 1096.5(726.75–1717.0) | < 0.001 | 0.881 |
| IL-6 (pg/ml) | 6.03(2.76–22.525) | 5.025(2.63–18.362) | 59.69(23.16–122.0) | < 0.001 | 0.887 |
| IL-8 (pg/ml) | 10.9(7.6–18.075) | 10.4(7.325–16.65) | 23.95(13.55–52.35) | < 0.001 | 0.785 |
| IL-10 (pg/ml) | 8.6(6.3–13.4) | 7.9(6.1–11.6) | 14.6(9.525–25.5) | < 0.001 | 0.748 |
Continuous variables were expressed as medians with interquartile range (IQRs) ALT, alanine aminotransferase; AST, aspartate aminotransferase; COPD, chronic obstructive pulmonary disease; CKD, chronic kidney diseases; WBC, white blood cell count; CRP, C-reactive protein; ESR, erythrocyte sedimentation rate (ESR); NLR, neutrophil-to-lymphocyte ratio; LDH, lactic dehydrogenase; eGFR, estimated glomerular filtration rate; HDL-C = high-density lipoprotein cholesterol; SBP = systolic blood pressure; RR, respiratory rate; DBP, diastolic blood pressure; BUN, blood urea nitrogen; AUC, area under curve
Top ten features with highest predictive ability
| Feature no. | Feature added | AUC on train | AUC on test | |
|---|---|---|---|---|
| 1.0 | LDH | < 0.001 | 0.840 | 0.876 |
| 2.0 | BUN | < 0.001 | 0.882 | 0.877 |
| 3.0 | Lymphocyte (%) | < 0.001 | 0.895 | 0.903 |
| 4.0 | Age | < 0.001 | 0.903 | 0.911 |
| 5.0 | SPO2 | < 0.001 | 0.915 | 0.917 |
| 6.0 | Platelets | < 0.001 | 0.923 | 0.925 |
| 7.0 | CRP | < 0.001 | 0.930 | 0.921 |
| 8.0 | IL-10 | 0.001 | 0.932 | 0.930 |
| 9.0 | HDL-C | 0.005 | 0.934 | 0.932 |
| 10.0 | SaO2 | 0.005 | 0.935 | 0.931 |
LDH, lactic dehydrogenase; BUN, blood urea nitrogen; CRP, C-reactive protein; HDL-C = high-density lipoprotein cholesterol; AUC, area under curve
Prediction accuracy of different models in different cohort
| No. of included feature | LR model | LR-5 model | GBDT model | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 152 | 5 | 83 | |||||||
| Total | Non-severe | Severe | Total | Non-severe | Severe | Total | Non-severe | Severe | |
| Total (death) | 2924(257) | 2072(95) | 852(162) | 2924(257) | 2072(95) | 852(162) | 2924(257) | 2072(95) | 852(162) |
| Threshold | 0.110 | 0.110 | 0.110 | 0.140 | 0.140 | 0.140 | 0.090 | 0.090 | 0.090 |
| Fivefold AUC | 0.928 | 0.924 | 0.891 | 0.913 | 0.895 | 0.887 | 0.941 | 0.932 | 0.918 |
| AUC on testing set | 0.928 | 0.946 | 0.855 | 0.915 | 0.902 | 0.864 | 0.939 | 0.940 | 0.897 |
| AUC on training set | 0.937 | 0.931 | 0.913 | 0.913 | 0.897 | 0.888 | 0.997 | 0.997 | 0.997 |
| Sensitivity (95%CI) | 0.878 | 0.933 | 0.714 | 0.898 | 0.952 | 0.711 | 0.899 | 0.940 | 0.774 |
| Specificity (95% CI) | 0.769 | 0.714 | 0.806 | 0.771 | 0.588 | 0.871 | 0.788 | 0.619 | 0.903 |
| Accuracy | 0.868 | 0.922 | 0.732 | 0.887 | 0.938 | 0.743 | 0.889 | 0.924 | 0.799 |
| Positive predictive value | 0.381 | 0.357 | 0.397 | 0.402 | 0.333 | 0.435 | 0.432 | 0.351 | 0.483 |
| Negative predictive value | 0.975 | 0.984 | 0.941 | 0.978 | 0.983 | 0.956 | 0.978 | 0.979 | 0.972 |
AUC, area under curve