| Literature DB >> 35591841 |
Maria Elena Laino1, Elena Generali2,3, Tobia Tommasini1, Giovanni Angelotti1, Alessio Aghemo2,3, Antonio Desai2,3, Pierandrea Morandini1, Giulio G Stefanini2,4, Ana Lleo2,3, Antonio Voza2,5, Victor Savevski1.
Abstract
Introduction: Identifying SARS-CoV-2 patients at higher risk of mortality is crucial in the management of a pandemic. Artificial intelligence techniques allow one to analyze large amounts of data to find hidden patterns. We aimed to develop and validate a mortality score at admission for COVID-19 based on high-level machine learning. Material and methods: We conducted a retrospective cohort study on hospitalized adult COVID-19 patients between March and December 2020. The primary outcome was in-hospital mortality. A machine learning approach based on vital parameters, laboratory values and demographic features was applied to develop different models. Then, a feature importance analysis was performed to reduce the number of variables included in the model, to develop a risk score with good overall performance, that was finally evaluated in terms of discrimination and calibration capabilities. All results underwent cross-validation.Entities:
Keywords: SARS; interleukin-6; pneumonia; troponin
Year: 2022 PMID: 35591841 PMCID: PMC9103632 DOI: 10.5114/aoms/144980
Source DB: PubMed Journal: Arch Med Sci ISSN: 1734-1922 Impact factor: 3.707
Demographic and clinical characteristics of patients admitted to hospital for COVID-19
| Parameter | Total | Training cohort | Test cohort |
|---|---|---|---|
| Male sex | 729 (64.1) | 572 (63.6) | 158 (66.4) |
| Age [years] | 70 (58–80) | 71 (59–80) | 70 (58–81) |
| Comorbidities: | |||
| Number of comorbidities: | |||
| 0 | 290 (25.5) | 228 (25.4) | 62 (26) |
| 1 | 351 (30.9) | 281 (31.2) | 70 (29.4) |
| ≥ 2 | 496 (43.6) | 390 (43.4) | 106 (44.5) |
| Hypertension | 581 (51.1) | 456 (50.8) | 125 (52.5) |
| Diabetes (type 1 and 2) | 214 (18.8) | 168 (18.7) | 46 (19.3) |
| Chronic cardiac disease | 291 (25.6) | 218 (24.3) | 73 (30.7) |
| Chronic pulmonary disease | 151 (13.3) | 125 (13.9) | 26 (10.9) |
| Chronic kidney disease | 108 (9.5) | 88 (9.8) | 20 (8.4) |
| Moderate or severe liver disease | 13 (1.1) | 9 (1) | 4 (1.7) |
| Malignant neoplasm | 197 (17.3) | 159 (17.7) | 38 (16) |
| Neurologic disease | 174 (15.3) | 140 (15.6) | 34 (14.3) |
| Charlson Comorbidity Index | 4 (3–6) | 4 (3–6) | 4 (3–6) |
| Vital parameters at admission: | |||
| Respiratory rate [breaths/min] | 18 (17–20) | 18 (17–20) | 18 (17–20) |
| Oxygen saturation (%) | 94 (90–96) | 94 (90–96) | 94 (90–96) |
| Systolic blood pressure [mm Hg] | 126 (117–137) | 127 (118–137) | 125 (117–137) |
| Diastolic blood pressure [mm Hg] | 73 (67–78) | 73 (67–79) | 72 (67–77) |
| Heart rate [bpm] | 80 (75–91) | 83 (75–92) | 81 (75–90) |
| p/F ratio [mm Hg] | 295 (238–347) | 295 (233–347) | 295 (247–350) |
| Glasgow Coma Scale | 15 (15–15) | 15 (15–15) | 15 (15–15) |
| Laboratory tests at admission: | |||
| White blood cell count [109/l] | 6.98 (5.23–9.89) | 6.89 (5.14–9.74) | 7.33 (5.8–10.1) |
| Neutrophil count [109/l] | 5.3 (3.7–8.1) | 5.3 (3.6–8.3) | 5.55 (4.1–8.1) |
| Lymphocyte count [109/l] | 0.9 (0.6–1.2) | 0.9 (0.6–1.2) | 0.9 (0.6–1.2) |
| Hemoglobin [g/dl] | 13.7 (12.4–14.8) | 13.8 (12.6–14.7) | 13.6 (12.4–14.8) |
| Platelet count [109/l] | 207 (158–271) | 212 (161–270) | 205 (155–271) |
| Ferritin [ng/ml] | 504 (232–967) | 469 (229–966) | 532 (249–968) |
| Creatinine [mg/dl] | 0.95 (0.77–1.27) | 0.95 (0.78–1.24) | 0.95 (0.78–1.24) |
| Urea [mmol/l] | 19 (14–27.7) | 19 (14–27.6) | 19.2 (14.1–28) |
| Bilirubin [mg/dl] | 0.7 (0.5–0.9) | 0.7 (0.5–0.9) | 0.7 (0.5–0.9) |
| LDH [IU/l] | 331 (256–428) | 331 (255–426) | 332 (263–441) |
| AST [UI/l] | 37 (26–54) | 37 (26–60) | 37 (27–53) |
| ALT [UI/l] | 27 (18–46) | 28 (18–51) | 27 (18–45) |
| C-reactive protein [mg/l] | 88.4 (36–146.9) | 87.1 (34.1–145.4) | 97.5 (42.4–150.4) |
| CPK [U/l] | 96 (57–194) | 96.5 (57–194) | 92 (57–189) |
| High sensitive troponin I [ng/l] | 10.9 (5.7–28) | 10.8 (5.6–28.4) | 11.3 (6.3–26.3) |
| BNP [pg/ml] | 66 (31–156) | 68 (32–157) | 62 (27–156) |
| Interleukin 6 [pg/ml] | 45 (21–84) | 45 (20.5–85) | 46 (21–79) |
p < 0.05.
Clinical outcomes in the cohort
| Variable | Total | Training | Test | |
|---|---|---|---|---|
| Death | 252 (22.2) | 201 (26.4) | 51 (15.6) | < 0.001 |
| ICU admission | 141 (12.4) | 103 (13.5) | 38 (11.6) | NS |
| Non-invasive ventilation | 135 (11.9) | 107 (14.1) | 28 (8.6) | 0.011 |
| Length of stay [days] | 10 (7–18) | 10 (6–18) | 11 (7–17) | NS |
Figure 1Flow diagram of study design
Discriminatory performances of different machine learning models after the randomized search
| Model | Macro average | Micro average | Weighted average | ||||||
|---|---|---|---|---|---|---|---|---|---|
| f1 | Precision | Recall | f1 | Precision | Recall | f1 | Precision | Recall | |
| Gradient boosting | 0.729413 | 0.768761 | 0.713077 | 0.839942 | 0.839942 | 0.839942 | 0.829948 | 0.833368 | 0.839942 |
| Logistic regression | 0.744540 | 0.741612 | 0.767781 | 0.823369 | 0.823369 | 0.823369 | 0.827546 | 0.841610 | 0.823369 |
| Random forest | 0.738860 | 0.733639 | 0.769879 | 0.8166888 | 0.8166888 | 0.8166888 | 0.823591 | 0.842133 | 0.8166888 |
| XGB | 0.694104 | 0.772580 | 0.673127 | 0.831651 | 0.831651 | 0.831651 | 0.811808 | 0.823766 | 0.831651 |
XGB – extreme gradient boosting.
Figure 2Receiver operator characteristic (ROC) curves and confusion matrix for the logistic regression model after the randomized search
Figure 3Receiver operator characteristic curves and confusion matrix for the random forest model after the feature selection iterative process
Figure 4Feature importance plot (Gini importance or mean decrease impurity) for the gradient boosting classifier after the feature selection step (threshold 0.02)
Performances of the gradient boosting classifier after the feature selection iterative process.
| Model | Macro average | Micro average | Weighted average | ||||||
|---|---|---|---|---|---|---|---|---|---|
| f1 | Precision | Recall | f1 | Precision | Recall | f1 | Precision | Recall | |
| Random forest | 0.72945 | 0.718855 | 0.777584 | 0.797596 | 0.797596 | 0.797596 | 0.810366 | 0.842462 | 0.797596 |
Performance metrics of the risk score to rule out and rule in mortality at different cut-off values in validation cohort
| Cut-off value | Sensitivity (%) | Specificity (%) | NPP (%) | PPV (%) |
|---|---|---|---|---|
| Rule out mortality: | ||||
| ≤ 3 | 41.4 | 97.2 | 32.0 | 98.1 |
| ≤ 7 | 62.9 | 90.8 | 41.1 | 96.0 |
| ≤ 9 | 75.3 | 79.7 | 47.9 | 92.9 |
| ≤ 10 | 78.9 | 75.3 | 50.4 | 91.8 |
| Rule in mortality: | ||||
| ≤ 10 | 78.5 | 75.3 | 92.5 | 47.5 |
| ≤ 11 | 74.1 | 78.9 | 91.5 | 50.0 |
| ≤ 12 | 62.2 | 85.8 | 88.9 | 55.3 |
| ≤ 13 | 52.2 | 91.2 | 87.0 | 62.7 |
| ≤ 15 | 29.9 | 95.7 | 82.8 | 66.4 |
NPV – negative predictive value, PPV – positive predictive value.
Figure 5Receiver operator characteristic (ROC) curve of the prediction score