| Literature DB >> 35426076 |
Jean-Baptiste Excoffier1, Noémie Salaün-Penquer2, Matthieu Ortala2, Mathilde Raphaël-Rousseau3, Christos Chouaid4,5, Camille Jung6.
Abstract
The COVID-19 pandemic rapidly puts a heavy pressure on hospital centers, especially on intensive care units. There was an urgent need for tools to understand typology of COVID-19 patients and identify those most at risk of aggravation during their hospital stay. Data included more than 400 patients hospitalized due to COVID-19 during the first wave in France (spring of 2020) with clinical and biological features. Machine learning and explainability methods were used to construct an aggravation risk score and analyzed feature effects. The model had a robust AUC ROC Score of 81%. Most important features were age, chest CT Severity and biological variables such as CRP, O2 Saturation and Eosinophils. Several features showed strong non-linear effects, especially for CT Severity. Interaction effects were also detected between age and gender as well as age and Eosinophils. Clustering techniques stratified inpatients in three main subgroups (low aggravation risk with no risk factor, medium risk due to their high age, and high risk mainly due to high CT Severity and abnormal biological values). This in-depth analysis determined significantly distinct typologies of inpatients, which facilitated definition of medical protocols to deliver the most appropriate cares for each profile. Graphical Abstract Graphical abstract represents main methods used and results found with a focus on feature impact on aggravation risk and identified groups of patients.Entities:
Keywords: COVID-19; Explainable artificial intelligence; Instance selection; Machine learning
Mesh:
Year: 2022 PMID: 35426076 PMCID: PMC9009979 DOI: 10.1007/s11517-022-02540-0
Source DB: PubMed Journal: Med Biol Eng Comput ISSN: 0140-0118 Impact factor: 3.079
Advantages of using influence values over initial values for clustering
| Initial values | Influences values | |
|---|---|---|
| Unit | Several different units (for example age in years, | Single unit, the same as the prediction task |
| oxygen saturation in percentages or a comorbidity | (a percentage in our study since the prediction is | |
| being a binary indicator). | the probability of aggravation). | |
| Threshold | Differences are linear (it is the same between an | Take into account non-linearity and sudden |
| age of 55 and 60 as between 60 and 65). | thresholds (for example a sudden increase of the | |
| aggravation risk starting at the age of 60, meaning that | ||
| a difference between an age of 60 and 65 will be | ||
| much larger than between 55 and 60 in terms of | ||
| influences). |
Population characteristics
| Total | Non-aggravation | Aggravation | |||
|---|---|---|---|---|---|
| Quantitative | Nb patients | 409 | 233 (57.0) | 176 (43.0) | |
| features | Age (years) | 61.58 (± 23.9) | 54.77 (± 25.9) | 70.59 (± 17.2) | < 0.01 ** |
| Platelets (G/L) | 227.48 (± 93.9) | 238.9 (± 101.6) | 212.37 (± 80.5) | 0.115 | |
| Eosinophils (G/L) | 0.08 (± 0.2) | 0.11 (± 0.3) | 0.05 (± 0.2) | 0.4075 | |
| Neutrophils (G/L) | 11.05 (± 17.3) | 9.8 (± 14.9) | 12.69 (± 20.0) | 1.0 | |
| CRP (mg/L) | 85.26 (± 84.2) | 61.98 (± 68.3) | 116.07 (± 93.2) | < 0.01 ** | |
| O2 Saturation (%) | 95.02 (± 4.6) | 96.28 (± 2.8) | 93.35 (± 5.8) | < 0.01 ** | |
| Serum Creatinine ( | 95.13 (± 119.6) | 82.3 (± 62.9) | 112.12 (± 166.1) | 0.31 | |
| White Globules (G/L) | 20.44 (± 227.1) | 9.05 (± 5.4) | 35.52 (± 346.1) | 1.0 | |
| Blood Sugar (mmol/L) | 7.23 (± 3.0) | 6.72 (± 2.1) | 7.91 (± 3.7) | < 0.01 ** | |
| Systolic Blood Pressure (mmHg) | 132.71 (± 24.1) | 131.8 (± 24.2) | 133.92 (± 24.1) | 1.0 | |
| Hemoglobin (mmol/L) | 12.72 (± 2.1) | 12.6 (± 2.1) | 12.87 (± 2.0) | 1.0 | |
| Arterial pulse (bpm) | 96.31 (± 23.6) | 98.35 (± 27.1) | 93.6 (± 17.6) | 0.8175 | |
| Qualitative | Gender | 190 (46.5) | 126 (54.1) | 64 (36.4) | < 0.05 * |
| features | Anosmia Ageusia | 362 (88.5) | 196 (84.1) | 166 (94.3) | 0.0575 |
| Cancer | 28 (6.8) | 15 (6.4) | 13 (7.4) | 1.0 | |
| Cardiovascular | 141 (34.5) | 75 (32.2) | 66 (37.5) | 1.0 | |
| Overweight/Obesity | 76 (18.6) | 39 (16.7) | 37 (21.0) | 1.0 | |
| Insulin Intake | 40 (9.8) | 15 (6.4) | 25 (14.2) | 0.3575 | |
| Type 2 Diabetes | 77 (18.8) | 33 (14.2) | 44 (25.0) | 0.2025 | |
| CT Severity: 0 | 180 (44.0) | 123 (52.8) | 57 (32.4) | < 0.01 ** | |
| CT Severity: 1 | 34 (8.3) | 21 (9.0) | 13 (7.4) | 1.0 | |
| CT Severity: 2 | 69 (16.9) | 47 (20.2) | 22 (12.5) | 1.0 | |
| CT Severity: 3 | 66 (16.1) | 34 (14.6) | 32 (18.2) | 1.0 | |
| CT Severity: 4 | 53 (13.0) | 8 (3.4) | 45 (25.6) | < 0.01 ** | |
| CT Severity: 5 | 7 (1.7) | 0 (0.0) | 7 (4.0) | 0.18 |
Results are presented with mean and standard deviation for quantitative features, and numbers and proportion for qualitative features. All qualitative features are binary indicators except for CT Severity. Thus, CT Severity was split into six binary indicators, one for every modality
Gender value is 0 for man and 1 for woman
For visual help for p-value, a single star (*) denotes a p-value strictly inferior to 0.05 and two stars (**) denotes a p-value strictly inferior to 0.01
Confusion matrix
| Non-aggravation | Aggravation | |
|---|---|---|
| Non-aggravation | 186 | 47 |
| Aggravation | 55 | 121 |
Performance measures per ground-truth class
| Ground truth | Sensitivity | Precision | F1-score |
|---|---|---|---|
| Non-aggravation | 79.8 | 77.2 | 78.5 |
| Aggravation | 68.8 | 72.0 | 70.3 |
Fig. 1ROC and PR curves
Fig. 2Feature importance ranking and distribution of influences linked to feature initial values (best viewed in color). Feature importance ranking (left graph) is established by averaging the absolute influences over all patients. As for distribution of influences (right graph), each dot represents a particular patient, with feature names on the y-axis and influence values on the x-axis. A protective factor is indicated by a negative influence since it decreased the probability of aggravation, while a risk factor is indicated by a positive influence. The initial value is represented by the color of the dot, through the colormap shown on the far right of the graph. For comorbidity indicators, 0 indicates the absence and 1 the presence, while genre value is 0 for man and 1 for woman
Fig. 3Univariate graphs for effect of Age, CT Severity, CRP and O2 Saturation. Each dot represents a patient, with the feature value on the x-axis and the associated influence on the y-axis
Fig. 4Bivariate graphs for interaction effect of Age with Gender and CT Severity (best viewed in color). Each dot represents a particular patient. Color and size of the dot give information about the two feature initial values, respectively color for the feature located on the x-axis (Age) and size for the feature on the y-axis. The colormap for the x-axis feature is the same used in Fig. 2 with blue indicating a young age and red an old age
Fig. 5Bivariate graphs for interaction effect of Age with CRP, O2 Saturation, Eosinophils and Platelets (best viewed in color). Reading information is the same as Fig. 4
Cluster characteristics
| Cluster 1 | Cluster 2 | Cluster 3 | |||
|---|---|---|---|---|---|
| Cluster | Nb patients | 129 (31.5) | 172 (42.1) | 108 (26.4) | |
| Nb Aggravations | 23 (17.8) | 73 (42.4) | 80 (74.1) | ||
| Model prediction of aggravation (%) | 20.64 (14.3) | 42.42 (17.7) | 75.22 (13.5) | ||
| Quantitative | Age (years) | 34.2 (± 17.0) | 78.49 (± 11.7) | 67.34 (± 14.5) | < 0.01 ** |
| features | Platelets (G/L) | 243.42 (± 106.9) | 220.13 (± 85.3) | 220.15 (± 89.0) | 1.0 |
| Eosinophils (G/L) | 0.08 (± 0.2) | 0.12 (± 0.3) | 0.03 (± 0.1) | < 0.01 ** | |
| Neutrophils (G/L) | 8.48 (± 12.2) | 10.91 (± 16.8) | 14.31 (± 22.3) | < 0.01 ** | |
| CRP (mg/L) | 59.2 (± 64.5) | 54.99 (± 51.0) | 164.59 (± 96.3) | < 0.01 ** | |
| O2 Saturation (%) | 96.72 (± 2.5) | 95.25 (± 5.1) | 92.63 (± 4.6) | < 0.01 ** | |
| Serum Creatinine ( | 75.93 (± 66.1) | 96.92 (± 78.1) | 115.23 (± 196.7) | < 0.01 ** | |
| White Globules (G/L) | 9.28 (± 5.6) | 35.67 (± 350.1) | 9.51 (± 4.7) | 1.0 | |
| Blood Sugar (mmol/L) | 6.32 (± 1.9) | 6.97 (± 2.3) | 8.73 (± 4.2) | < 0.01 ** | |
| Systolic Blood Pressure (mmHg) | 126.25 (± 19.0) | 137.22 (± 26.2) | 133.23 (± 24.6) | < 0.05 * | |
| Hemoglobin (mmol/L) | 12.71 (± 2.3) | 12.59 (± 1.9) | 12.93 (± 2.0) | 1.0 | |
| Arterial pulse (bpm) | 108.02 (± 27.9) | 88.19 (± 19.8) | 95.25 (± 17.0) | < 0.01 ** | |
| Qualitative | Gender | 69 (53.5) | 81 (47.1) | 40 (37.0) | 0.9975 |
| features | Anosmia Ageusia | 101 (78.3) | 161 (93.6) | 100 (92.6) | < 0.01 ** |
| Cancer | 4 (3.1) | 18 (10.5) | 6 (5.6) | 0.8975 | |
| Cardiovascular | 17 (13.2) | 78 (45.3) | 46 (42.6) | < 0.01 ** | |
| Overweight/Obesity | 27 (20.9) | 24 (14.0) | 25 (23.1) | 1.0 | |
| Insulin Intake | 7 (5.4) | 19 (11.0) | 14 (13.0) | 1.0 | |
| Type 2 Diabetes | 7 (5.4) | 40 (23.3) | 30 (27.8) | < 0.01 ** | |
| CT Severity: 0 | 73 (56.6) | 97 (56.4) | 10 (9.3) | < 0.01 ** | |
| CT Severity: 1 | 10 (7.8) | 22 (12.8) | 2 (1.9) | 0.1325 | |
| CT Severity: 2 | 25 (19.4) | 39 (22.7) | 5 (4.6) | < 0.01 ** | |
| CT Severity: 3 | 19 (14.7) | 13 (7.6) | 34 (31.5) | < 0.01 ** | |
| CT Severity: 4 | 2 (1.6) | 1 (0.6) | 50 (46.3) | < 0.01 ** | |
| CT Severity: 5 | 0 (0.0) | 0 (0.0) | 7 (6.5) | < 0.01 ** |
Results are presented with numbers and proportions for Nb patients, Nb Aggravations and qualitative features, and mean and standard deviation for Model prediction of aggravation (%) and quantitative features
Gender value is 0 for man and 1 for woman
For visual help for p-value, a single star (*) denotes a p-value strictly inferior to 0.05 and two stars (**) denotes a p-value strictly inferior to 0.01
Fig. 6Influences of patients corresponding to the medoids of the three identified clusters. Feature names are represented by their initials for cluster 2 and 3. Initial feature values are indicated after the hyphen and rounded so they all appear as integers