| Literature DB >> 33897907 |
Muhammad E H Chowdhury1, Tawsifur Rahman2, Amith Khandakar1, Somaya Al-Madeed3, Susu M Zughaier4, Suhail A R Doi5, Hanadi Hassen3, Mohammad T Islam6.
Abstract
COVID-19 pandemic has created an extreme pressure on the global healthcare services. Fast, reliable, and early clinical assessment of the severity of the disease can help in allocating and prioritizing resources to reduce mortality. In order to study the important blood biomarkers for predicting disease mortality, a retrospective study was conducted on a dataset made public by Yan et al. in [1] of 375 COVID-19 positive patients admitted to Tongji Hospital (China) from January 10 to February 18, 2020. Demographic and clinical characteristics and patient outcomes were investigated using machine learning tools to identify key biomarkers to predict the mortality of individual patient. A nomogram was developed for predicting the mortality risk among COVID-19 patients. Lactate dehydrogenase, neutrophils (%), lymphocyte (%), high-sensitivity C-reactive protein, and age (LNLCA)-acquired at hospital admission-were identified as key predictors of death by multi-tree XGBoost model. The area under curve (AUC) of the nomogram for the derivation and validation cohort were 0.961 and 0.991, respectively. An integrated score (LNLCA) was calculated with the corresponding death probability. COVID-19 patients were divided into three subgroups: low-, moderate-, and high-risk groups using LNLCA cutoff values of 10.4 and 12.65 with the death probability less than 5%, 5-50%, and above 50%, respectively. The prognostic model, nomogram, and LNLCA score can help in early detection of high mortality risk of COVID-19 patients, which will help doctors to improve the management of patient stratification.Entities:
Keywords: COVID-19; Early warning tool; Machine learning; Predicting mortality risk; Prognostic model
Year: 2021 PMID: 33897907 PMCID: PMC8058759 DOI: 10.1007/s12559-020-09812-7
Source DB: PubMed Journal: Cognit Comput ISSN: 1866-9956 Impact factor: 4.890
Statistical analysis of the characteristic of the subjects’ data
| Item | Survived | Death | Total | Method | Statistic | ||
|---|---|---|---|---|---|---|---|
| 1 | Gender | Chi-square test | X2 = 21.70 | < 0.00001 | |||
• Male (%) • Female (%) | 98 (49%) 103 (51%) | 126 (72%) 48 (28%) | 224 (60%) 151 (40%) | ||||
| 2 | Age | Rank-sum test | Z = − 11 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 201 (0) 50.2 ± 15 51 37, 62 18, 88 | 174 (0) 68.8 ± 11.8 69 62.2, 77 19, 95 | 375 (0) 58.8 ± 16.5 62 46, 70 18, 95 | ||||
| 3 | Lactate dehydrogenase | Rank-sum test | Z = − 13.18 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 193 (8) 271 ± 102 250 203, 312 119, 799 | 163 (11) 642 ± 341 567 428, 762 188,1867 | 356 (19) 441 ± 305 336 239, 564 119, 1867 | ||||
| 4 | Neutrophils (%) | Rank-sum test | Z = − 12.88 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 194 (7) 65.7 ± 13.8 66.2 56.5, 75.4 1.7, 95.1 | 162 (12) 87 ± 9.86 89.5 83.2, 93.7 18.2, 98.7 | 356 (19) 75.4 ± 16.1 77.5 64.3, 89.2 1.7, 98.7 | ||||
| 5 | Lymphocyte (%) | Rank-sum test | Z = 11.97 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 194 (7) 24.8 ± 11.4 23.8 16.6, 33.5 4.1, 60 | 162 (12) 7.6 ± 6.22 5.8 3.3, 10.1 0, 44.3 | 356 (19) 17 ± 12.7 14.4 6.1, 25.2 0, 60 | ||||
| 6 | High-sensitivity C-reactive protein | Rank-sum test | Z = − 11.93 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 194 (7) 36 ± 44 19 4, 50 0, 237 | 159 (15) 127 ± 75.5 114 62, 179 4, 320 | 353 (22) 77 ± 75.4 53 12, 118 0, 320 | ||||
| 7 | Serum sodium | Rank-sum test | Z = − 1.57 | 0.12 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 193 (8) 138.9 ± 3.38 139.2 136.6, 141 125, 146.4 | 161 (13) 139.9 ± 8.37 138.9 135.8, 143 115.4, 179 | 354 (21) 139.3 ± 6.18 139 136.3, 142 115.4, 179 | ||||
| 8 | Eosinophil (%) | Rank-sum test | Z = 6.63 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 194 (7) 0.7 ± 0.941 0.3 0, 1.1 0, 6.40 | 162 (12) 0.11 ± 0.38 0.00 0.0, 0.0 0, 3.70 | 356 (19) 0.44 ± 0.79 0.00 0.00,0.53 0.00, 6.40 | ||||
| 9 | Serum chloride | Rank-sum test | Z = − 0.65 | 0.52 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 193 (8) 100.8 ± 3.8 101.3 98.8, 103.3 85.6, 109.1 | 161 (13) 101.5 ± 8.56 100.6 97.1, 105.5 71.5, 140 | 354 (21) 101.1 ± 6.42 101.1 97.9, 103.9 71.5, 140 | ||||
| 10 | Monocyte (%) | Rank-sum test | Z = 8.42 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 194 (7) 8.4 ± 3.15 8.2 6.6, 10.1 0.7, 15.8 | 152 (12) 5.1 ± 4.31 4 2.4, 6.3 0.3, 35.2 | 356 (19) 6.9 ± 4.08 6.8 3.8, 9.2 0.3, 35.2 | ||||
| 11 | International standard ratio | Rank-sum test | Z = − 9.4 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 189 (12) 1.055 ± 0.086 1.040 1, 1.1 0.84, 1.33 | 163 (11) 1.37(1.01) 1.22 1.1, 1.37 0.88, 13.48 | 352 (23) 1.2 ± 0.709 1.1 1, 1.2 0.8, 13.5 | ||||
| 12 | Activation of partial thromboplastin time | Rank-sum test | Z = − 1.2 | 0.23 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 165 (36) 40.1 ± 5.7 39.9 35.9, 43.5 22, 56.9 | 133 (41) 41.9 ± 11.4 39.4 35, 45.4 25.3,137 | 298 (77) 41 ± 8.7 40 36, 44 22, 137 | ||||
| 13 | Hypersensitive cardiac troponin I | Rank-sum test | Z = − 5.82 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 141 (60) 12 ± 53.3 3 2, 7 2, 617 | 146 (28) 1391 ± 5748 41 15, 271 2, 50,000 | 287 (88) 714 ± 414 11 3, 50 2, 50,000 | ||||
| 14 | Brain natriuretic peptide precursor (NT-proBNP) | Rank-sum test | Z = − 3.87 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 128 (73) 1039 ± 6620 65 23, 178 5, 70,000 | 139 (35) 2806 ± 5906 827 362, 2402 24, 45,850 | 267 (108) 1959 ± 6308 271 68, 935 5,70,000 | ||||
| 15 | Albumin | Rank-sum test | Z = 10.64 | < 0.0001 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 193 (8) 37.1 ± 4.53 37.4 34.2, 40.2 22.6, 48.6 | 163 (11) 30.3 ± 4.22 30.1 27.6, 33 18.5, 40.9 | 356 (19) 34 ± 5.57 34.2 29.9, 38.3 18.5, 48.6 | ||||
| 16 | Mean corpuscular hemoglobin concentration | Rank-sum test | Z = − 2.27 | 0.023 | |||
• • Mean ± SD • Median • Q1, Q3 • Min, max | 194 (7) 343 ± 13.9 344 335, 351 306,416 | 162 (12) 346 ± 18.7 346 337,354 299,488 | 356 (19) 345 ± 16.3 345 336, 352 299, 488 | ||||
| 17 | Outcome (%) | 201(54%) | 174(46%) | 375 |
Fig. 1Patients’ outcome tree with the initial condition of the patients in admission
Fig. 2Comparison of the top-ranked 10 features identified using Multi-Tree XGBoost algorithm from data imputed using MICE (top) and (− 1) (bottom)
Fig. 3Comparison of the receive operating characteristic (ROC) plots for top-ranked 1 up to 10 features using the data imputation using MICE (left) and (− 1) (right) while feature selection and classification techniques were same
Comparison of the average performance matrix and confusion matrix from five-fold cross-validation for top1 to 10 features using data imputation using (-1) (A) and mice (B)
| Weighted average (95% confidence interval) | Confusion matrix | |||||||
|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | PLR | NLR | Death | Survived | |||
| TP | FN | FP | TN | |||||
| A (imputation using − 1) | ||||||||
| Top 1 feature | 87 ± 3.92 | 87.4 ± 3.01 | 7.4 ± 4.1 | 0.15 ± 0.1 | 142 | 32 | 14 | 187 |
| Top 2 features | 88.04 ± 3.13 | 88 ± 3.5 | 8.1 ± 5.1 | 0.14 ± 0.08 | 148 | 26 | 17 | 184 |
| Top 3 features | 90 ± 3.8 | 88.9 ± 3.78 | 9.3 ± 6.9 | 0.12 ± 0.09 | 155 | 19 | 19 | 182 |
| Top 4 features | 90.5 ± 3.92 | 90.7 ± 3.72 | 11.8 ± 10.1 | 0.10 ± 0.09 | 157 | 17 | 18 | 183 |
| Top 5 features | 90.1 ± 3.6 | 90.03 ± 3.5 | 10.5 ± 7.9 | 0.11 ± 0.086 | 155 | 19 | 18 | 183 |
| Top 6 features | 90.08 ± 2.7 | 90 ± 2.4 | 9.63 ± 5.1 | 0.11 ± 0.06 | 154 | 20 | 19 | 182 |
| Top 7 features | 89.8 ± 2.3 | 90.16 ± 3.4 | 10.5 ± 7.5 | 0.12 ± 0.05 | 156 | 18 | 21 | 180 |
| Top 8 features | 89.3 ± 3.6 | 89.1 ± 3 | 8.96 ± 5.5 | 0.12 ± 0.08 | 155 | 19 | 21 | 180 |
| Top 9 features | 89.6 ± 3.2 | 88.9 ± 3.5 | 9.06 ± 6.2 | 0.11 ± 0.07 | 153 | 21 | 20 | 181 |
| Top 10 features | 89.01 ± 3.3 | 89.01 ± 4 | 9.46 ± 7.3 | 0.13 ± 0.083 | 154 | 20 | 21 | 180 |
| B (imputation using MICE) | ||||||||
| Top 1 feature | 88.2 ± 7.4 | 87.6 ± 3.5 | 7.91 ± 5.6 | 0.13 ± 0.17 | 143 | 31 | 13 | 188 |
| Top 2 features | 87.7 ± 4.4 | 87.01 ± 3.5 | 7.37 ± 4.6 | 0.14 ± 0.11 | 145 | 29 | 17 | 184 |
| Top 3 features | 87.1 ± 3.5 | 87 ± 4.1 | 7.53 ± 5.2 | 0.15 ± 0.09 | 148 | 26 | 22 | 179 |
| Top 4 features | 89.2 ± 2.8 | 89 ± 3.2 | 8.93 ± 5.6 | 0.12 ± 0.07 | 155 | 19 | 22 | 179 |
| Top 5 features | 92 ± 2.6 | 92 ± 3 | 13.52 ± 10.6 | 0.09 ± 0.06 | 160 | 14 | 16 | 185 |
| Top 6 features | 92.3 ± 2.45 | 92 ± 4.1 | 15.86 ± 16.5 | 0.085 ± 0.06 | 162 | 12 | 17 | 184 |
| Top 7 features | 90.2 ± 5 | 90.6 ± 3.5 | 11.37 ± 9.3 | 0.11 ± 0.12 | 158 | 16 | 22 | 179 |
| Top 8 features | 89.9 ± 4.8 | 90.2 ± 3.8 | 11.02 ± 9.3 | 0.11 ± 0.11 | 158 | 16 | 23 | 178 |
| Top 9 features | 89.2 ± 2.8 | 89.03 ± 3.2 | 8.97 ± 5.6 | 0.12 ± 0.07 | 155 | 19 | 22 | 179 |
| Top 10 features | 88 ± 3.4 | 89.6 ± 3.7 | 9.82 ± 7.5 | 0.14 ± 0.08 | 156 | 18 | 23 | 178 |
The logistic regression analysis to construct the nomogram for death prediction
| Outcome | Coef | Std. err | z | [95% conf. interval] | ||
|---|---|---|---|---|---|---|
| Lactate dehydrogenase | 0.0070514 | 0.0017099 | 4.12 | 0.000 | 0.0037001 | 0.0104027 |
| Neutrophils | − 0.0327053 | 0.0568836 | − 0.57 | 0.565 | − 0.1441951 | 0.0787845 |
| Lymphocyte | − 0.1624422 | 0.0806231 | − 2.01 | 0.044 | − 0.3204607 | − 0.0044238 |
| High-sensitivity CRP | 0.0110451 | 0.0043462 | 2.54 | 0.011 | 0.0025267 | 0.0195635 |
| Age | 0.0735038 | 0.0185211 | 3.97 | 0.000 | 0.0372032 | 0.1098045 |
| _cons | − 3.662636 | 5.65169 | − 0.65 | 0.517 | − 14.73975 | 7.414473 |
Fig. 4Calibration plot comparing predicted and actual death probability of patients with COVID-19. a Internal validation. b External validation
Fig. 5Decision curves analysis comparing different models to predict the death probability of patients with COVID-19. The net benefit balances the mortality risk and potential harm from unnecessary over-intervention for patients with COVID-19
Fig. 6Multivariate logistic regression-based nomogram to predict the probability of death. Nomogram for prediction of death was created using the following five predictors: lactate dehydrogenase, neutrophils (%), lymphocytes (%), high-sensitivity C-reactive protein, and age
LNLCA score from nomogram and corresponding death probability of COVID-19 patients
| Patient group | LNLCA score | Death probability |
|---|---|---|
| 7.45 | 0.001 | |
| Low | 9.2 | 0.01 |
| 10.4 | 0.05 | |
| 10.95 | 0.1 | |
| 11.6 | 0.2 | |
| Moderate | 11.99 | 0.3 |
| 12.4 | 0.4 | |
| 12.65 | 0.5 | |
| 12.95 | 0.6 | |
| 13.3 | 0.7 | |
| 13.7 | 0.8 | |
| High | 14.3 | 0.9 |
| 14.8 | 0.95 | |
| 16.2 | 0.99 | |
| 17.85 | 0.999 |
Fig. 7An example nomogram-based score to predict the probability of death of a COVID-19 patient from test set (9 days before the actual outcome)
Association between different risk groups and actual outcome in the training cohort using Fisher exact probability test
| Risk category | Outcome | Overall | |
|---|---|---|---|
| Alive | Death | ||
| Low-risk | 83 (100.0%) | 0 (0%) | 83 (100.0%) |
| Moderate-risk | 41 (77.36%) | 12 (22.64%) | 53 (100.0%) |
| High-risk | 15 (11.9%) | 111 (88.1%) | 126 (100.0%) |
| Overall | 139 (53%) | 123 (47%) | 262 (100.0%) |
P value among three group is less than 0.001
P value of Low-risk group vs Moderate-risk group is less than 0.001
P value of Low-risk group vs High-risk group is less than 0.001
P value of Moderate-risk group vs High-risk group is less than 0.001
Association between different risk groups and actual outcome in the testing cohort using Fisher exact probability test
| Risk category | Outcome | Overall | |
|---|---|---|---|
| Alive | Death | ||
| Low-risk | 41 (100%) | 0 (0%) | 41 (100.0%) |
| Moderate-risk | 17 (77.27%) | 5 (22.73%) | 22 (100.0%) |
| High-risk | 3 (6%) | 47 (94%) | 50 (100.0%) |
| Overall | 61 (54%) | 52 (46%) | 113 (100.0%) |
P value among three group is less than 0.001
P value of Low-risk group vs Moderate-risk group is 0.0037
P value of Low-risk group vs High-risk group is less than 0.001
P value of Moderate-risk group vs High-risk group is less than 0.001
Fig. 8Estimation of the prediction of the patients’ outcome for 52 test patients with death outcome. The model was trained on the data present at admission, and multiple samples from a patient were used to predict the patient to be in high-risk group in the earliest time after admission. Note: “0” denotes the death outcome event for each patient, and vertical lines represent the time of admission with respect to death. Solid red line starts from the earliest prediction time point of death prediction, and the dotted line represents the delay between admission and death prediction by the model using the LNLCA model