| Literature DB >> 35177120 |
Yibai Xiong1, Yan Ma1, Lianguo Ruan2, Dan Li3, Cheng Lu4, Luqi Huang5.
Abstract
BACKGROUND: Coronavirus disease 2019 (COVID-19) is still ongoing spreading globally, machine learning techniques were used in disease diagnosis and to predict treatment outcomes, which showed favorable performance. The present study aims to predict COVID-19 severity at admission by different machine learning techniques including random forest (RF), support vector machine (SVM), and logistic regression (LR). Feature importance to COVID-19 severity were further identified.Entities:
Keywords: COVID-19; Logistic regression; Machine learning; Random Forest; Severity; Support vector machine
Mesh:
Year: 2022 PMID: 35177120 PMCID: PMC8851750 DOI: 10.1186/s40249-022-00946-4
Source DB: PubMed Journal: Infect Dis Poverty ISSN: 2049-9957 Impact factor: 4.520
Fig. 1Flow chart of feature selection. WBC White blood cell, RBC red blood cell, ALB albumin, r-GT r-glutamyl transpeptidase, CHE cholinesterase, HCT hematocrit, CK creatine kinase
Fig. 2The heat map of correlation between features. Color indicates the value of the correlation coefficient (r). The color intensity is proportional to the correlation coefficient (r), with positive correlations (r > 0) shown and negative correlations (r < 0), lower intensive color indicates lower correlations, in the study, 23 features selected by machine learning techniques, each feature is weakly correlated with each other (r < 0.4). Twenty-three features including C-CT, fever, MT, HR, SBP, HB, NLR, RDW-CV, IMG, IL-6, ESR, IBIL, ALP CysC, LDH, Ca2+, AFU, AMS, RBP, Hs-cTn, MB, PT(s), D-dimer. C-CT chest computed tomography, MT malignant tumor, HR heart rate, SBP systolic blood pressure, NLR neutrophil-to-lymphocyte ratio, HB hemoglobin concentration, RDW-CV red cell volume distribution width, LDH lactate dehydrogenase, IBIL indirect bilirubin, PT prothrombin time, ESR erythrocyte sedimentation rate, AFU α-fucosidase, RBP retinol-Binding protein, IL-6 interleukin-6, Hs-cTn hypersensitive troponin, AMS amylase, CysC cystatin C, IMG immature granulocyte, ALP alkaline phosphatas, MB myoglobin
Twenty-three features of COVID-19 patients in the cohort
| Variables | All cases | Non-severe cases | Severe cases | |
|---|---|---|---|---|
| ( | ( | ( | ||
| Fever | 88 (30.7) | 57 (31.3) | 31 (29.5) | 0.751 |
| MT | 16 (5.6) | 11 (6.0) | 5 (4.8) | 0.648 |
| HR, beats/min | 88 (80–99) | 87 (80–98) | 88 (84–99) | 0.122 |
| SBP, mmHg | 127 (117–138) | 125 (115–136) | 131 (121–138) | 0.008 |
| Laboratory findings | ||||
| NLR | 3.28 (2.17–5.77) | 2.71 (1.70–4.15) | 5.73 (3.17–9.97) | < 0.001 |
| HB | 127.0 (117.0–136.0) | 128.0 (119.0–136.0) | 123 (114.0–136.0) | 0.109 |
| RDW-CV | 12.4 (11.9–12.9) | 12.3 (11.9–12.8) | 12.5 (11.9–13.0) | 0.032 |
| LDH | 258.0 (200.0–346.5) | 231.0 (184.7–276.5) | 343.0 (261.0–452.0) | < 0.001 |
| IBIL | 8.4 (6.1–11.6) | 8.9 (6.6–11.8) | 7.8 (5.6–10.80) | 0.048 |
| D-Dimer | 0.7 (0.4–1.5) | 0.5 (0.3–0.9) | 1.1 (0.6–1.5) | < 0.001 |
| PT | 11.3 (10.7–12.0) | 11.2 (10.6–11.7) | 11.7 (10.8–12.6) | < 0.001 |
| Ca2+ | 2.10 (1.99–2.17) | 2.09 (22.0–52.25) | 2.01 (1.93–2.10) | < 0.001 |
| ESR | 43.0 ( (26.0–58.0) | 40.5 (18.0–27.0) | 50.3 (35.0–65.5) | < 0.001 |
| AFU | 23.0 (19.0–27.0) | 23.0 (18.0–27.0) | 23.0 (20.0–27.0) | 0.508 |
| RBP | 26.4 (19.0–40.4) | 26.4 (20.6–40.5) | 26.4 (17.1–40.0) | 0.433 |
| IL-6 | 8.3 (6.5–11.7) | 8.2 (6.5–10.2) | 9.2 (7.2–12.9) | 0.001 |
| Hs-cTn | 3.6 (1.5–8.3) | 2.9 (1.1–5.8) | 6.2 (2.8–12.4) | < 0.001 |
| AMS | 62.0 (49.0–79.5) | 62.0 (49.3–74.0) | 68.0 (49.0–97.0) | 0.078 |
| CysC | 0.85 (0.75–1.03) | 0.83 (0.73–0.94) | 0.95 (0.78–1.16) | < 0.001 |
| IMG | 0.01 (0.01–0.04) | 0.01 (0.00–0.03) | 0.03 (0.01–0.11) | < 0.001 |
| ALP | 76.0 (59.5–94.0) | 76.0 (59.3–95.8) | 75.0 (60.0–93.0) | 0.806 |
| MB | 42.4 (30.7–64.5) | 39.1 (28.0–53.0) | 50.1 (38.8–87.3) | < 0.001 |
| C-CTa | < 0.001 | |||
| 0 | 63 (22.0%) | 63 (34.6%) | 0 (0%) | |
| 1 | 29 (10.1%) | 23 (12.6%) | 6 (5.7%) | |
| 2 | 195 (67.9%) | 96 (52.7%) | 99 (94.3%) |
Data were presented median (IQR) or n (%). P values were estimated by comparing variables between severe and non-severe cases
a“0” denotes” no lung lesions”, “1” denotes “unilateral lung lesions”, and “2” denotes “bilateral lung lesions”
MT Malignant tumor, HR Heart rate, SBP Systolic blood pressure, NLR Neutrophil-to-lymphocyte ratio, HB Hemoglobin concentration, RDW-CV Red cell volume distribution width, LDH Lactate dehydrogenase, IBIL Indirect bilirubin, PT Prothrombin time, ESR Erythrocyte sedimentation rate, AFU α-fucosidase, RBP Retinol-Binding protein, IL-6 Interleukin-6, Hs-cTn Hypersensitive troponin, AMS Amylase, CysC cystatin C, IMG Immature granulocyte, ALP Alkaline phosphatas, MB Myoglobin, C-CT Chest computed tomography
Predictive performance for COVID-19 severity
| Model | AUC | Sensitivity (%) | Specificity (%) | Accuracy (%) | Cut off | YI |
|---|---|---|---|---|---|---|
| RF | 0.970 | 96.7 | 69.5 | 84.5 | 0.62 | 0.662 |
| SVM | 0.948 | 93.9 | 79.0 | 88.5 | 0.795 | 0.729 |
| LR | 0.928 | 92.3 | 72.3 | 85.2 | 0.57 | 0.646 |
AUC area under curve, YI youden index, RF random Forest, SVM support vector machine, LR logistic regression
Fig. 3ROC curves of RF, SVM, and LR. ROC receiver operating characteristic, AUC area under curve, RF random forest, SVM support vector machine, LR logistic regression
Fig. 4Feature importance: The top 23 features ranked by relative importance to COVID-19 severity by RF. C-CT chest computed tomography, MT malignant tumor, HR heart rate, SBP systolic blood pressure, NLR neutrophil-to-lymphocyte ratio, HB hemoglobin concentration, RDW-CV red cell volume distribution width, LDH lactate dehydrogenase, IBIL indirect bilirubin, PT prothrombin time, ESR erythrocyte sedimentation rate, AFU α-fucosidase, RBP retinol-binding protein, IL-6 interleukin-6, Hs-cTn hypersensitive troponin, AMS amylase, CysC cystatin C, IMG immature granulocyte, ALP alkaline phosphatas, MB myoglobin