| Literature DB >> 35098205 |
Elham Jamshidi1, Amirhossein Asgary2, Nader Tavakoli3, Alireza Zali1, Soroush Setareh2, Hadi Esmaily4, Seyed Hamid Jamaldini5, Amir Daaee6, Amirhesam Babajani7, Mohammad Ali Sendani Kashi8, Masoud Jamshidi9, Sahand Jamal Rahi10, Nahal Mansouri11,12.
Abstract
Rationale: Given the expanding number of COVID-19 cases and the potential for new waves of infection, there is an urgent need for early prediction of the severity of the disease in intensive care unit (ICU) patients to optimize treatment strategies.Entities:
Keywords: COVID-19; ICU—intensive care unit; SARS-CoV-2; artificial intelligence; machine learning (ML)
Year: 2022 PMID: 35098205 PMCID: PMC8792458 DOI: 10.3389/fdgth.2021.681608
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Machine learning methods with their parameters.
|
|
|
|
|---|---|---|
| Random forest | Number of trees | 50 |
| Min. number of samples at a leaf node | 0.1% of all samples | |
| Criterion | Gini | |
| Logistic regression | C | 1.0 |
| Gradient boosting | Number of boosting stages to perform | 10 |
| Fraction of samples used for fitting individual base learners | 0.8 | |
| Min. number of samples at a leaf node | 10% of all samples | |
| Number of iterations with no change required for early stopping | 3 | |
| Max. number of features considered when looking for a split | 3 | |
| Support vector machine | C | 1.0 |
| Kernel type | RBF | |
| Kernel coefficient | 1/number of features | |
| Artificial neural network | Number of hidden layers | 3 |
| Output space dimensionality for each hidden layer | 32, 16, 8 | |
| Activation function for each layer | Tanh, tanh, tanh, sigmoid |
Figure 1Investigation of model performance. Mean area under the receiver operating characteristic curve (ROC-AUC) of random forest, logistic regression, gradient boosting classifier, support vector machine classifier, and artificial neural network models for training and test sets of cross-validation iterations. The random forest model shows superior performance on validation sets. The random forest model predicts patient outcomes with a 70% sensitivity and 75% specificity.
Figure 2Feature importance in random forest model. The importance of the random forest features using local interpretable model-agnostic explanation submodular-pick with six submodules. Each submodule is related to a patient subpopulation (six subpopulation in this case) and represents decision criteria for them in the model. Negative values (blue) indicate favorable parameters suggesting a better prognosis, and positive values (red) indicate unfavorable parameters suggesting a worse prognosis.
Characteristics of intensive care unit patients with COVID-19 in our data.
|
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|
| Gender | 105 (100) | 158 (100) | 1.70 | 0.19 | ||||
| Male | 63 (36.4) | .. | 110 (64.6) | .. | 173 | .. | .. | |
| Female | 42 (46.7) | .. | 48 (53.3) | .. | 90 | .. | .. | |
| Comorbidity | .. | 105 (100) | .. | 158 (100) | .. | .. | .. | |
| Autoimmune disorder | 2 (33.3) | .. | 4 (66.7) | .. | 6 | 0.10 | 0.74 | |
| Cancer | 6 (42.9) | .. | 8 (57.1) | .. | 14 | 0.05 | 0.82 | |
| Cardiovascular disorder | 25 (29.1) | .. | 61 (70.9) | .. | 86 | 4.22 | 0.04 | * |
| Diabetes mellitus | 35 (38.0) | .. | 57 (62.0) | .. | 92 | 0.13 | 0.71 | |
| Thrombosis | 2 (40.0) | .. | 3 (60.0) | .. | 5 | 0.0003 | 0.99 | |
| Hypertension | 32 (34.0) | .. | 62 (66.0) | .. | 94 | 1.35 | 0.24 | |
| Hepatic failure | 2 (40.0) | .. | 3 (60.0) | .. | 5 | 0.0007 | 0.99 | |
| Neurological disorder | 8 (16.0) | .. | 42 (84.0) | .. | 50 | 11.93 | <0.001 | *** |
| Respiratory disorder | 7 (24.1) | .. | 22 (75.9) | .. | 29 | 3.01 | 0.08 | * |
|
|
|
|
|
|
| |||
| Age (years) | 58.0 (47.0–73.0) | 105 (100) | 72.5 (64.0–80.75) | 158 (100) | .. | 0.35 | <0.001 | *** |
| pH | 7.42 (7.375–7.457) | 87 (82) | 7.4 (7.33–7.441) | 129 (81) | 7.31–7.41 | 0.18 | 0.05 | * |
| pCO2 (mm Hg) | 38.4 (34.8–45.1) | 87 (82) | 40.2 (33.9–47.1) | 125 (79) | 35–40 | 0.09 | 0.66 | |
| pO2 (mm Hg) | 37.05 (25.1–57.425) | 86 (81) | 39.9 (26.975–56.65) | 124 (78) | 42–51 | 0.08 | 0.81 | |
| HCO3 (meq·L) | 25.5 (22.825–28.575) | 86 (81) | 24.2 (21.2–27.55) | 123 (77) | 22–26 | 0.15 | 0.14 | |
| O2 saturation (%) | 72.7 (48.3–89.2) | 85 (80) | 73.5 (50.2–88.95) | 123 (77) | −2.0 to 2.0 | 0.08 | 0.87 | |
| Base excess (mEq/L) | 2.2 (−0.55 to 4.65) | 87 (82) | 0.6 (−3.1 to 3.275) | 126 (79) | .. | 0.18 | 0.06 | * |
| Total buffer base (mEq/L) | 49.1 (46.65–51.75) | 87 (82) | 47.5 (43.75–50.375) | 126 (79) | .. | 0.20 | 0.01 | * |
| Base excess in the extracellular fluid (mEq/L) | 2.2 (−0.4 to 4.9) | 87 (82) | 0.35 (−3.175 to 3.75) | 126 (79) | .. | 0.21 | 0.01 | * |
| White blood cells count (x1000·mm3) | 7.4 (5.0–11.225) | 104 (99) | 9.7 (7.1–13.45) | 155 (98) | 4.0–10.0 | 0.23 | 0.002 | ** |
| Band (%) | 3.0 (2.0–5.5) | 23 (21) | 3.0 (2.0–6.0) | 38 (24) | .. | 0.05 | 1 | |
| Segment (%) | 78.0 (70.65–83.0) | 87 (82) | 82.8 (77.05–86.95) | 119 (75) | .. | 0·25 | 0.002 | ** |
| Lymphocyte (%) | 14.0 (10.0–20.225) | 86 (81) | 10.7 (6.85–15.4) | 119 (75) | .. | 0.25 | 0.002 | ** |
| Monocyte (%) | 6.0 (4.0–8.5) | 45 (42) | 5.0 (3.35–7.0) | 59 (37) | .. | 0.17 | 0.34 | |
| Basophil (%) | 0.3 (0.2–0.8) | 13 (12) | 0.1 (0.0–0.1) | 13 (8) | .. | 0.61 | 0.01 | |
| Red blood cells count (mill·mm3) | 4.335 (3.83–4.908) | 102 (97) | 4.185 (3.64–4.748) | 154 (97) | 4.2–5.4 | 0.12 | 0.28 | |
| Hemoglobin (g·dl) | 12.6 (10.95–13.8) | 103 (98) | 12.2 (10.2–13.75) | 155 (98) | 12.0–16.0 | 0.07 | 0.81 | |
| Hematocrite (%) | 37.0 (32.85–41.2) | 103 (98) | 36.6 (31.45–40.75) | 155 (98) | 36–46 | 0.06 | 0.93 | |
| Mean corpuscular volume (fL) | 85.0 (81.4–88.65) | 103 (98) | 88.0 (84.65–91.9) | 155 (98) | 77–97 | 0.24 | <0.001 | *** |
| Mean corpuscular hemoglobin (Pgm) | 28.7 (26.6–29.85) | 103 (98) | 29.6 (27.8–30.55) | 155 (98) | 26–32 | 0.21 | 0.006 | ** |
| Mean corpuscular hemoglobin concentration (%) | 33.1 (32.45–34.4) | 103 (98) | 33.3 (31.95–34.15) | 155 (98) | 32–36 | 0.09 | 0.59 | |
| Platelet count (x1000·mm3) | 196.0 (151.5–260.0) | 103 (98) | 179.0 (125.0–255.0) | 155 (98) | 140–440 | 0.17 | 0.04 | * |
| Red cell distribution width (%) | 13.95 (13.2–14.825) | 88 (83) | 14.6 (13.75–16.0) | 131 (82) | 11.0–16.0 | 0.23 | 0.006 | ** |
| Platelet distribution width (FL) | 12.8 (11.5–14.0) | 85 (80) | 13.2 (11.4–14.7) | 120 (75) | 10.0–17.0 | 0.13 | 0.32 | |
| Mean platelet volume (FL) | 9.7 (9.175–10.5) | 84 (80) | 10.0 (9.3–10.7) | 120 (75) | 8.5–12.5 | 0.13 | 0.30 | |
| Platelet larger cell ratio (%) | 24.4 (19.85–29.3) | 83 (79) | 26.7 (21.05–30.825) | 120 (75) | 17–45 | 0.17 | 0.07 | * |
| C-reactive protein (mg·l) | 48.0 (24.0–48.0) | 56 (53) | 48.0 (48.0–48.0) | 67 (42) | <6 | 0.23 | 0.05 | * |
| Erythrocyte sedimentation rate (mm · hr) | 42.0 (27.5–68.5) | 55 (52) | 59.0 (33.75–75.25) | 56 (35) | <20 | 0.24 | 0.06 | * |
| Albumin level (g·dl) | 3.3 (3.0–3.7) | 48 (45) | 2.9 (2.6–3.2) | 71 (44) | 3.5–5.5 | 0.37 | <0.001 | *** |
| Serum calcium level (mg·dl) | 8.8 (8.3–9.2) | 72 (68) | 8.6 (7.9–9.2) | 97 (61) | 8.6–10.6 | 0.13 | 0.37 | |
| Inorganic P level (mg·dl) | 3.3 (2.45–4.4) | 59 (56) | 4.0 (2.95–5.4) | 87 (55) | 2.5–5.0 | 0.20 | 0.08 | |
| Serum Na level (mg·dl) | 137.5 (135.0–140.0) | 102 (97) | 139.0 (135.0–142.0) | 155 (98) | 136–145 | 0.17 | 0.03 | * |
| Serum K level (mg·dl) | 4.3 (3.925–4.6) | 102 (97) | 4.4 (4.0–4.85) | 155 (98) | 3.7–5.5 | 0.11 | 0.37 | |
| Serum Mg level (mg·dl) | 2.25 (2.0–2.5) | 66 (62) | 2.4 (2.0–2.7) | 96 (60) | 1.8–2.6 | 0.14 | 0.32 | |
| Uric acid level (mg·dl) | 6.7 (4.05–9.0) | 15 (14) | 8.2 (5.95–9.95) | 31 (19) | 3.4–7.0 | 0.37 | 0.10 | |
| Fasting plasma glucose (mg·dl) | 124.0 (105.0–177.0) | 65 (61) | 154.0 (120.5–246.5) | 99 (62) | .. | 0.21 | 0.04 | * |
| Blood urea nitrogen (mg·dl) | 16.0 (11.25–22.5) | 102 (97) | 30.0 (21.0–52.5) | 156 (98) | 5.0–23.0 | 0.47 | <0.001 | *** |
| Creatinine (mg·dl) | 1.1 (0.9–1.4) | 102 (97) | 1.5 (1.2–2.2) | 156 (98) | 0.5–1.5 | 0.31 | <0.001 | *** |
| Aspartate aminotransferase (IU·L) | 40.0 (29.0–55.0) | 83 (79) | 45.0 (31.5–82.5) | 112 (70) | 5.0–40.0 | 0.17 | 0.10 | |
| Alanine aminotransferase (IU·L) | 26.0 (16.0–38.5) | 83 (79) | 25.0 (18.0–45.0) | 113 (71) | 5.0–40.0 | 0.11 | 0.54 | |
| Lactate dehydrogenase (U·L) | 710.0 (561.0–1019.0) | 57 (54) | 859.0 (623.5–1256.0) | 95 (60) | 225–500 | 0.17 | 0.20 | |
| Creatine phosphokinase (IU·L) | 233.0 (89.0–546.5) | 59 (56) | 204.0 (83.0–434.0) | 91 (57) | 24–195 | 0.08 | 0.90 | |
| Creatine phosphokinase-MB (U·L) | 30.0 (22.5–41.0) | 35 (33) | 30.0 (24.0–49.0) | 41 (25) | 5–25 | 0.10 | 0.96 | |
| Alkaline phosphatase (IU·L) | 180.0 (132.5–248.5) | 75 (71) | 193.0 (155.75–264.25) | 96 (60) | 64–306 | 0.12 | 0.46 | |
| Total bilirubin (mg·dl) | 0.7 (0.5–0.9) | 46 (43) | 0.8 (0.6–1.45) | 71 (44) | 0.2–1.2 | 0.25 | 0.04 | * |
| Direct bilirubin (mg·dl) | 0.25 (0.2–0.3) | 46 (43) | 0.3 (0.2–0.6) | 71 (44) | 0–0.4 | 0.27 | 0.02 | * |
| Prothrombin time (Sec) | 14.9 (14.0–16.5) | 81 (77) | 16.0 (14.3–18.0) | 121 (76) | 12.0–13.0 | 0.23 | 0.009 | ** |
| Prothrombin time activity (%) | 81.0 (72.0–89.0) | 40 (38) | 73.0 (54.5–85.5) | 51 (32) | 85–100 | 0.29 | 0.03 | * |
| International normalized ratio (index) | 1.2 (1.08–1.3) | 81 (77) | 1.3 (1.108–1.6) | 120 (75) | 1.0–1.1 | 0.29 | <0.001 | *** |
| Partial thromboplastin time (Sec) | 34.0 (30.0–40.0) | 81 (77) | 36.0 (31.0–45.75) | 122 (77) | 25–45 | 0.16 | 0.12 | |
| D-Dimer (ng·ml) | 1513.0 (1071.0–2207.0) | 9 (8) | 1875.0 (1558.0–5269.0) | 11 (6) | .. | 0.28 | 0.68 | |
| Interleukin-6 (pg·ml) | 115.0 (76.0–147.25) | 10 (9) | 107.0 (47.0–301.0) | 17 (10) | .. | 0.37 | 0.27 | |
| Fibrin degeneration product (mg·L) | 12.0 (12.0–12.0) | 1 (0) | 18.0 (18.0–18.0) | 1 (0) | .. | 1 | 1 | |
| Troponin (ng·L) | 227.5 (78.375–1400.5) | 12 (11) | 49.0 (28.2–1027.0) | 21 (13) | .. | 0.27 | 0.53 | |
| Fibrinogen (mg·dl) | 546.0 (477.5–727.0) | 7 (6) | 482.5 (226.75–615.75) | 18 (11) | 308–613 | 0.38 | 0.33 | |
| Hemoglobin A1c (%) | 7.2 (6.9–7.5) | 5 (4) | 6.4 (5.5–7.0) | 9 (5) | .. | 0.55 | 0.22 | |
SIG. Key: * = <0.1, ** = <0.01, *** = <0.001.
Figure 3The relation between prediction horizon and performance. Where x-axis denotes days from ICU admission to outcome. Distribution of days between intensive care unit admission and outcome (bars on the left vertical axis) and corresponding random forest model's area under the receiver operating characteristic curve scores for each bin (red line on the right vertical axis). Our model has the best performance to predict outcomes in a 15-day period.
Figure 4Investigation of clinical impacts and benefits of the model. Decision curve (Top) and clinical impact curve (Bottom) of the random forest model. The decision curve compares the net benefits of an intervention in three scenarios: intervention for all patients (blue dotted line), intervention for no patients (gray dotted line), and intervention for high-risk patients based on the model prediction (red line). The clinical impact curve compares the number of patients classified as high risk by model and the number of patients with a really poor bad outcome who were classified as high risk, for all possible high-risk thresholds in model prediction from 0 to 1.
Predictors with the highest predictive value, selected in this study, along with studies referring to them.
|
|
|
|
|---|---|---|
| Gender | Sex-dependent differences in clinical manifestation | ( |
| Age | Higher age affects COVID-19 poor outcomes | ( |
| Blood Urea Nitrogen | Assumed highest weights for prognosis | ( |
| Creatinine | A lower creatinine clearance levels increases the mortality | ( |
| INR | INR >1.3 significantly increases mortality | ( |
| Albumin | Assumed highest weights for prognosis | ( |
| WBC | Abnormal white blood cell count increases mortality | ( |
| Neutrophil count | affects COVID-19 poor outcomes | ( |
| Lymphocyte count | Lymphocytes <10% increases mortality | ( |
| RDW | RDW >14.5% increases mortality | ( |
| MCH | Abnormal MCH increases mortality | ( |
| Neurological disorders | Affects the COVID-19 outcome | ( |
| Cardiovascular disorders | Affects the COVID-19 outcome | ( |
| Respiratory disorders | Affects the COVID-19 outcome | ( |