| Literature DB >> 35353302 |
Lorenzo Famiglini1, Andrea Campagner2, Anna Carobene3, Federico Cabitza2,4.
Abstract
In this article, we discuss the development of prognostic machine learning (ML) models for COVID-19 progression, by focusing on the task of predicting ICU admission within (any of) the next 5 days. On the basis of 6,625 complete blood count (CBC) tests from 1,004 patients, of which 18% were admitted to intensive care unit (ICU), we created four ML models, by adopting a robust development procedure which was designed to minimize risks of bias and over-fitting, according to reference guidelines. The best model, a support vector machine, had an AUC of .85, a Brier score of .14, and a standardized net benefit of .69: these scores indicate that the model performed well over a variety of prediction criteria. We also conducted an interpretability study to back up our findings, showing that the data on which the developed model is based is consistent with the current medical literature. This also demonstrates that CBC data and ML methods can be used to predict COVID-19 patients' ICU admission at a relatively low cost: in particular, since CBC data can be quickly obtained by means of routine blood exams, our models could be used in resource-constrained settings and provide health practitioners with rapid and reliable indications.Entities:
Keywords: COVID-19; Complete blood count; Machine learning; Prognostic models; eXplainable AI
Year: 2022 PMID: 35353302 PMCID: PMC8965547 DOI: 10.1007/s11517-022-02543-x
Source DB: PubMed Journal: Med Biol Eng Comput ISSN: 0140-0118 Impact factor: 3.079
Fig. 1Percentage of missing values in the dataset
Distribution of the demographic and CBC predictive features
| Feature | Unit of measure | Mean | Std | Min-max range | 25–75% | Missing (%) |
|---|---|---|---|---|---|---|
| Mean corpuscular volume (MCV) | 88.26 | 6.90 | [54.7, 121.5] | [85, 92.4] | 0 | |
| Neutrophils count (NE) | % | 70.69 | 14.97 | [9.4, 99.6] | [61.2, 82.4] | 0 |
| Platelets (PLT) | 269.41 | 125.44 | [10, 1019.5] | [180, 337] | 0 | |
| Red blood cells (RBC) | 4.21 | 0.75 | [1.75, 7.12] | [3.7, 4.73] | 0 | |
| Mean platelet volume (MPV) | fL | 10.74 | 1.07 | [7.8, 15.6] | [10, 11.32] | 3.03 |
| Mean corpuscular hemoglobin (MCH) | pg/Cell | 29.09 | 2.60 | [15, 65.7] | [28.1, 30.5] | 0 |
| Monocytes count (MOT) | 0.67 | 0.38 | [0, 4.8] | [0.4, 0.9] | 0 | |
| Basophils count (BAT) | 0.03 | 0.05 | [0, 0.7] | [0, 0] | 0 | |
| Erythrocyte distribution width (RDW) | CV% | 14.36 | 2.36 | [6.71, 31.8] | [12.9, 15.1] | 0.16 |
| Neutrophils count (NET) | 6.63 | 4.58 | [0.3, 47.4] | [3.5, 8.4] | 0 | |
| Eosinophils count (EO) | % | 1.83 | 2.50 | [0, 34.8] | [0.1, 2.5] | 0 |
| Hemoglobin (HGB) | g/dL | 12.20 | 2.11 | [4.8, 19.7] | [10.6, 13.7] | 0 |
| Lymphocytes count (LY) | % | 18.69 | 12.01 | [0.1, 76] | [9.4, 25.3] | 0 |
| Eosinophils count (EOT) | 0.15 | 0.28 | [0, 5] | [0, 0.2] | 0 | |
| White blood cells (WBC) | 8.89 | 5.15 | [0.7, 111.1] | [5.6, 10.9] | 0 | |
| Basophils count (BA) | % | 0.44 | 0.35 | [0, 3.4] | [0.2, 0.6] | 0 |
| Mean corpuscular hemoglobin concentration (MCHC) | g Hb/dL | 32.96 | 1.47 | [25.9, 58.7] | [32.1, 33.9] | 0 |
| Hematocrit (HCT) | % | 36.95 | 5.91 | [16.8, 59.7] | [32.7, 41.1] | 0 |
| Lymphocytes count (LYT) | 1.40 | 1.55 | [0, 82.8] | [0.8, 1.7] | 0 | |
| Monocytes count (MO) | % | 8.35 | 3.93 | [0, 38.9] | [5.5, 10.7] | 0 |
| Age | Years | 64.76 | 15.47 | [0, 100] | [55, 77] | 0 |
Results of the Silhouette index and Fisher exact test for comparing the proportion of ICU admissions in the two clusters, for each time window
| Time window | Silh | |
|---|---|---|
| 0 | 0.0002 | 0.75 |
| 1 | 0.0166 | 0.72 |
| 2 | 0.0674 | 0.73 |
| 3 | 0.0205 | 0.72 |
| 4 | 0.0235 | 0.74 |
| 5 | 0.0215 | 0.73 |
| 6 | 0.0645 | 0.74 |
| 7 | 0.0582 | 0.73 |
| 8 | 0.0494 | 0.74 |
| 9 | 0.0121 | 0.73 |
| 10 | 0.1118 | 0.75 |
| 11 | 0.2558 | 0.75 |
| 12 | 0.2077 | 0.74 |
| 13 | 0.4504 | 0.75 |
| 14 | 0.2454 | 0.75 |
| 15 | 0.1161 | 0.74 |
| 16 | 0.3942 | 0.75 |
| 17 | 0.0782 | 0.74 |
| 18 | 0.0515 | 0.73 |
| 19 | 1 | 0.72 |
| 20 | 0.0864 | 0.72 |
| 21 | 0.2182 | 0.73 |
| 22 | 0.4357 | 0.71 |
| 23 | 0.2728 | 0.70 |
| 24 | 0.0872 | 0.74 |
| 25 | 0.5271 | 0.72 |
| 26 | 0.5294 | 0.73 |
| 27 | 0.3016 | 0.75 |
| 28 | 0.6631 | 0.71 |
| 29 | NA | 0.74 |
| 30 | NA | 0.74 |
| 31 | NA | 0.76 |
Fig. 2Scatter plot of the lymphocytes and neutrophils counts for the patients in the severity and normal clusters (see Section 2.2)
Range of evaluated hyper-parameters for the ML models. The values selected as a result of the nested cross-validation procedure are highlighted in bold
| Algorithm | Hyper-parameter | Value range |
|---|---|---|
| DT | criterion | gini, |
| max_depth | [3, 6] ( | |
| min_samples_split | [2, 40] ( | |
| min_samples_leaf | [1, 20] ( | |
| MLP | learning_rate_init | [1e-5, 5e-2] ( |
| learning_rate | constant, | |
| solver | adam, | |
| max_iter | [5, 100] ( | |
| first_layer | [10, 150] ( | |
| second_layer | [5, 100] ( | |
| alpha | [1e-5, 5e-2] ( | |
| XGB | n_estimators | [5, 200] ( |
| max_depth | [2, 30] ( | |
| reg_alpha | [0, 5] ( | |
| reg_lambda | [0,5] ( | |
| gamma | [0, 5] ( | |
| learning_rate | [0.005, 0.5] ( | |
| SVM | C | [1e-10, 1e10] ( |
| kernel | sigmoid, |
Fig. 3Results of the nested CV for all the evaluated models
Results of the evaluated ML models
| Model | Accuracy | Sensitivity | Specificity | Precision | AUC | F2 | Brier | sNB |
|---|---|---|---|---|---|---|---|---|
| MLP | 0.79 | 0.29 | 0.40 | 0.71 | 0.31 | 0.204 | 0.50 | |
| DT | 0.76 | 0.78 | 0.41 | 0.76 | 0.59 | 0.171 | 0.60 | |
| SVM | 0.57 | 0.88 | 0.56 | |||||
| XGB | 0.80 | 0.65 | 0.83 | 0.46 | 0.81 | 0.145 | 0.50 |
Fig. 4ROC curves for the evaluated ML models
Fig. 5Calibration curves for the evaluated ML models
Fig. 6Decision curves for the evaluated ML models
Fig. 7Shapley value–based interpretability analysis of the developed SVM model. For the sex variable, 1 denotes a male patient while 0 a female patient