| Literature DB >> 32746957 |
Xuedi Ma1, Michael Ng2, Shuang Xu3, Zhouming Xu1,2, Hui Qiu4, Yuwei Liu1, Jiayou Lyu1, Jiwen You1, Peng Zhao1, Shihao Wang1, Yunfei Tang1, Hao Cui5, Changxiao Yu5, Feng Wang6,7,8, Fei Shao5,9, Peng Sun3, Ziren Tang5,9.
Abstract
This study aimed to identify clinical features for prognosing mortality risk using machine-learning methods in patients with coronavirus disease 2019 (COVID-19). A retrospective study of the inpatients with COVID-19 admitted from 15 January to 15 March 2020 in Wuhan is reported. The data of symptoms, comorbidity, demographic, vital sign, CT scans results and laboratory test results on admission were collected. Machine-learning methods (Random Forest and XGboost) were used to rank clinical features for mortality risk. Multivariate logistic regression models were applied to identify clinical features with statistical significance. The predictors of mortality were lactate dehydrogenase (LDH), C-reactive protein (CRP) and age based on 500 bootstrapped samples. A multivariate logistic regression model was formed to predict mortality 292 in-sample patients with area under the receiver operating characteristics (AUROC) of 0.9521, which was better than CURB-65 (AUROC of 0.8501) and the machine-learning-based model (AUROC of 0.4530). An out-sample data set of 13 patients was further tested to show our model (AUROC of 0.6061) was also better than CURB-65 (AUROC of 0.4608) and the machine-learning-based model (AUROC of 0.2292). LDH, CRP and age can be used to identify severe patients with COVID-19 on hospital admission.Entities:
Keywords: COVID-19; Random Forest; machine-learning methods; mortality risk; prognosis
Mesh:
Year: 2020 PMID: 32746957 PMCID: PMC7426607 DOI: 10.1017/S0950268820001727
Source DB: PubMed Journal: Epidemiol Infect ISSN: 0950-2688 Impact factor: 2.451
Baseline characteristics of the clinical variables of 305 patients with COVID-19
| Total ( | Survivor ( | Non-survivor ( |
|---|---|---|
| Sex | ||
| Men | 109 (44.5%) | 40 (66.7%) |
| Women | 136 (55.5%) | 20 (33.3%) |
| Fever | ||
| Yes | 191 (78.0%) | 47 (78.3%) |
| No | 54 (22.0%) | 13 (21.7%) |
| Dizziness | ||
| Yes | 14 (5.7%) | 3 (5.0%) |
| No | 231 (94.3%) | 57 (95.0%) |
| Fatigue | ||
| Yes | 117 (47.8%) | 34 (56.7%) |
| No | 128 (52.2%) | 26 (43.3%) |
| Nausea | ||
| Yes | 24 (9.8%) | 4 (6.7%) |
| No | 221 (90.2%) | 56 (93.3%) |
| Diarrhea | ||
| Yes | 43 (17.6%) | 12 (20.0%) |
| No | 202 (82.4%) | 48 (80.0%) |
| Muscle pain | ||
| Yes | 49 (20.0%) | 9 (15.0%) |
| No | 196 (80.0%) | 51 (85.0%) |
| Confusion | ||
| Yes | 9 (3.7%) | 9 (15.0%) |
| No | 236 (96.3%) | 51 (85.0%) |
| Difficulty in breathing | ||
| Yes | 135 (55.1%) | 41 (68.3%) |
| No | 110 (44.9%) | 19 (31.7%) |
| Cough | ||
| Yes | 158 (64.5%) | 41 (68.3%) |
| No | 87 (35.5%) | 19 (31.7%) |
| Expectoration | ||
| Yes | 73 (29.8%) | 24 (40.0%) |
| No | 172 (70.2%) | 36 (60.0%) |
| CT: exudative lesion | ||
| Yes | 58 (23.7%) | 12 (20.0%) |
| No | 187 (76.3%) | 48 (80.0%) |
| CT: ground glass shadow | ||
| Yes | 52 (21.2%) | 5 (8.3%) |
| No | 193 (78.8%) | 55 (91.7%) |
| Hypertension | ||
| Yes | 41 (16.7%) | 16 (26.7%) |
| No | 204 (83.3%) | 44 (73.3%) |
| Accouchement | ||
| Yes | 15 (6.1%) | 0 (0.0%) |
| No | 230 (93.9%) | 60 (100.0%) |
Fig. 1.Flow chart of the study process.
Relative importance values by Random Forest and XGBoost
| Random Forest result | XGBoost result | ||
|---|---|---|---|
| Variable | Relative importance by Random Forest (%) | Variable | Relative importance by XGBoost (%) |
| LDH | 17.69 | LDH | 23.74 |
| CRP | 8.60 | Age | 12.10 |
| Lymphocyte count | 8.06 | Neutrophil | 7.09 |
| Age | 7.70 | Aspartate transaminase | 6.69 |
| Blood urea nitrogen | 6.69 | Lymphocyte count | 6.25 |
| Aspartate transaminase | 5.69 | CRP | 6.23 |
| Neutrophil | 5.37 | Blood urea nitrogen | 4.88 |
| White blood cell | 5.04 | White blood cell | 4.42 |
| Creatinine | 4.70 | Normal platelet | 3.61 |
| Normal platelet | 3.67 | Respiratory rate | 3.44 |
| Respiratory rate | 3.29 | Alanine aminotransferase | 3.13 |
| Monocytes | 3.24 | Systolic pressure | 3.12 |
| Systolic pressure | 3.21 | Creatinine | 3.07 |
| Total bilirubin | 3.02 | Heart rate | 2.32 |
| Heart rate | 2.82 | Total bilirubin | 2.18 |
| Diastolic pressure | 2.69 | Monocytes | 2.01 |
| Alanine aminotransferase | 2.41 | Diastolic pressure | 1.98 |
| Temperature | 1.92 | Temperature | 1.60 |
Multivariate logistic regression of three selected variables for mortality prediction
| Model | Coefficient | Standard deviation error | Confidence Interval (5%) |
|---|---|---|---|
| Constant | −10.5772 | 1.794 | (−14.093, −7.061) |
| LDH | 0.0076 | 0.0016 | (0.0045–0.0106) |
| CRP | 0.0175 | 0.0060 | (0.0060–0.0289) |
| Age | 0.0857 | 0.0215 | (0.0441–0.1280) |
Fig. 2.The ROC curves of the obtained mortality model, CURB-65 and the machine-learning-based model on XGBoost13 for different data sets. (a) In-sample data set, (b) out-sample data set.