| Literature DB >> 30457997 |
Alexander Aushev1, Vicent Ribas Ripoll2, Alfredo Vellido1, Federico Aletti3, Bernardo Bollen Pinto4,5, Antoine Herpain6, Emiel Hendrik Post6, Eduardo Romay Medina7, Ricard Ferrer8,9, Giuseppe Baselli10, Karim Bendjelid4,5.
Abstract
Circulatory shock is a life-threatening disease that accounts for around one-third of all admissions to intensive care units (ICU). It requires immediate treatment, which is why the development of tools for planning therapeutic interventions is required to deal with shock in the critical care environment. In this study, the ShockOmics European project original database is used to extract attributes capable of predicting mortality due to shock in the ICU. Missing data imputation techniques and machine learning models were used, followed by feature selection from different data subsets. Selected features were later used to build Bayesian Networks, revealing causal relationships between features and ICU outcome. The main result is a subset of predictive features that includes well-known indicators such as the SOFA and APACHE II scores, but also less commonly considered ones related to cardiovascular function assessed through echocardiograpy or shock treatment with pressors. Importantly, certain selected features are shown to be most predictive at certain time-steps. This means that, as shock progresses, different attributes could be prioritized. Clinical traits obtained at 24h. from ICU admission are shown to accurately predict cardiogenic and septic shock mortality, suggesting that relevant life-saving decisions could be made shortly after ICU admission.Entities:
Mesh:
Year: 2018 PMID: 30457997 PMCID: PMC6245679 DOI: 10.1371/journal.pone.0199089
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Clinical data summary.
| Parameter | Cardiogenic shock | Septic shock | p-values |
|---|---|---|---|
| Number of patients | 25 | 50 | - |
| Gender (Males:Females) | 20:5 | 34:16 | - |
| Outcome (Alive:Dead) | 19:6 | 38:12 | - |
| Body mass index | 26.8 ± 5.93 | 25.22 ± 5.54 | 0.2591 |
| Total days in ICU | 6.6 ± 5.42 | 7.56 ± 6.54 | 0.5289 |
| SOFAT1 | 10.72 ± 4.27 | 11.50 ± 3.7 | 0.4165 |
| SOFAT2 | 8.33 ± 4.92 | 8.77 ± 4.3 | 0.699 |
| SOFAT3 | 5.87 ± 3.68 | 6.75 ± 4.7 | 0.5334 |
| APACHE IIT1 | 24.12 ± 9.79 | 23.16 ± 7.52 | 0.6396 |
| APACHE IIT2 | 18.42 ± 9.41 | 16.36 ± 8.45 | 0.3530 |
| APACHE IIT3 | 12.8 ± 5.35 | 14.83 ± 5.93 | 0.2727 |
| Lactate levelsT1 | 6.04 ± 4.3 | 4.84 ± 2.58 | 0.1431 |
| Lactate levelsT2 | 1.52 ± 0.91 | 2.1 ± 1.52 | 0.1126 |
| Lactate levelsT3 | 1.25 ± 0.5 | 1.43 ± 0.58 | 0.3309 |
Summary of patient populations with cardiogenic and septic shock. Numerical parameters of both populations (mean ± std) are compared against each other with the Welch’s t-test, where the null hypothesis is that both parameters have identical mean values.
Fig 1Graphical depiction map of missing values in ShockOmics dataset.
The rows and the columns of the map correspond to observations and features, respectively. The black color represents missing values, the white color corresponds to present values in the ShockOmics dataset.
Initial feature set description.
| Name of the feature | Observations | Type |
|---|---|---|
| Which type of shock | 75 | categorical (4 val) |
| Lactate levels (mmol/L)T1 | numerical (cont) | |
| Lactate levels (mmol/L)T2 | numerical (cont) | |
| Lactate levels (mmol/L)T3 | numerical (cont) | |
| Mean arterial pressure (mmHg)T1 | 75 | numerical (cont) |
| Mean arterial pressure (mmHg)T2 | numerical (cont) | |
| Mean arterial pressure (mmHg)T3 | numerical (cont) | |
| SOFAT1 | 75 | numerical (cont) |
| SOFAT2 | numerical (cont) | |
| SOFAT3 | numerical (cont) | |
| APACHE IIT1 | 75 | numerical (cont) |
| APACHE IIT2 | 71 | numerical (cont) |
| APACHE IIT3 | 44 | numerical (cont) |
| Result in ICU | 75 | categorical (2 val) |
The columns of the table correspond to the name of the feature, the number of available observations and the type of the feature. Initial feature set (IFS) consists of 13 features and the target (Result in ICU). Features with missing observations are highlighted in bold in the Observations column.
Fig 2Graphical representation of the experimental pipeline.
The process of FS and classification consists of the following steps: 1) create 100 random splits of the dataset: 75% for training and 25% for testing; 2) for each split impute both sets separately, using the imputed training set for a test set imputation; 3) after that, use the FS technique on the imputed training set, varying the size of the selected feature set (from a minimum of 2 to a maximum of 60), then choose the selected features in both sets, creating new pairs of training and test sets; 4) finally, using sets with different amounts of features, train and evaluate a ML model and choose the one with the highest AUC; record which and how many features were used for training this model, and its performance measures. Repeat these steps for all 100 random splits for each dataset and FS technique.
The performance comparison of the feature sets.
| # Features | Data | Method | Accuracy | MCC | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|
| - | - | Majority class | 0.763 ± 0.09 | 0.0 ± 0.0 | 0.0 ± 0.0 | 1.0 ± 0.0 | 0.5 ± 0.0 |
| 11 | IFS | - | 0.644 ± 0.124 | 0.028 ± 0.255 | 0.269 ± 0.252 | 0.767 ± 0.14 | 0.518 ± 0.139 |
| 13.5 ± 13.6 | T1 | UFS | 0.828 ± 0.083 | 0.573 ± 0.183 | 0.731 ± 0.217 | 0.873 ± 0.095 | 0.802 ± 0.108 |
| 16.5 ± 13.2 | T1 | RFE | 0.823 ± 0.079 | 0.549 ± 0.19 | 0.702 ± 0.23 | 0.876 ± 0.089 | 0.789 ± 0.114 |
| 14.5 ± 13.0 | T1 | UFS+RFE | 0.814 ± 0.077 | 0.52 ± 0.181 | 0.656 ± 0.226 | 0.876 ± 0.092 | 0.766 ± 0.106 |
| 18.1 ± 10.8 | T1 | RF | |||||
| 15.7 ± 12.8 | T1 | Aggr. | 0.834 ± 0.08 | 0.577 ± 0.186 | 0.711 ± 0.221 | 0.887 ± 0.09 | 0.799 ± 0.109 |
| 10.8 ± 14.2 | T1+T2 | UFS | 0.839 ± 0.085 | 0.584 ± 0.207 | 0.715 ± 0.229 | 0.889 ± 0.091 | 0.802 ± 0.116 |
| 17.5 ± 15.8 | T1+T2 | RFE | 0.807 ± 0.089 | 0.503 ± 0.205 | 0.637 ± 0.265 | 0.874 ± 0.101 | 0.755 ± 0.122 |
| 12.9 ± 10.7 | T1+T2 | UFS+RFE | 0.796 ± 0.074 | 0.465 ± 0.175 | 0.596 ± 0.251 | 0.873 ± 0.097 | 0.734 ± 0.109 |
| 21.2 ± 14.4 | T1+T2 | RF | |||||
| 15.6 ± 13.9 | T1+T2 | Aggr. | 0.83 ± 0.087 | 0.561 ± 0.208 | 0.678 ± 0.246 | 0.89 ± 0.095 | 0.784 ± 0.12 |
| 17.9 ± 17.2 | Full | UFS | 0.754 ± 0.105 | 0.31 ± 0.23 | 0.431 ± 0.256 | 0.861 ± 0.129 | 0.646 ± 0.119 |
| 19.7 ± 16.5 | Full | RFE | 0.758 ± 0.088 | 0.334 ± 0.213 | 0.483 ± 0.278 | 0.855 ± 0.114 | 0.669 ± 0.118 |
| 17.4 ± 15.2 | Full | UFS+RFE | 0.743 ± 0.09 | 0.311 ± 0.21 | 0.477 ± 0.251 | 0.839 ± 0.123 | 0.658 ± 0.111 |
| 21.9 ± 15.5 | Full | RF | |||||
| 19.2 ± 16.2 | Full | Aggr. | 0.769 ± 0.096 | 0.371 ± 0.228 | 0.501 ± 0.267 | 0.864 ± 0.117 | 0.683 ± 0.123 |
Best feature sets obtained in the FS experiments and their general performance across 100 random data splits. The columns correspond to the number of features, the data that were used to obtain the feature set, the FS method and five performance measures (mean ± std(p − value)). Welch’s t-test was used to obtain p-values for the null hypothesis that two performance measures had identical values. Each measure was tested against the same measures of the (T1+T2, RF)) model. The IFS dataset was used as a baseline for comparison. The best results for all measures in each dataset group are highlighted in bold.
The performance of the feature sets selected for the causal discovery experiments.
| # Features | Data | Method | Accuracy | MCC | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|
| 18.1 ± 10.8 | T1 | RF | 0.87 ± 0.07 | 0.668 ± 0.152 | 0.755 ± 0.198 | 0.922 ± 0.074 | 0.839 ± 0.092 |
| 15.7 ± 12.8 | T1 | Aggr. | 0.834 ± 0.08 | 0.577 ± 0.186 | 0.711 ± 0.221 | 0.887 ± 0.09 | 0.799 ± 0.109 |
| 21.2 ± 14.4 | T1+T2 | RF | |||||
| 15.6 ± 13.9 | T1+T2 | Aggr. | 0.83 ± 0.087 | 0.561 ± 0.208 | 0.678 ± 0.246 | 0.89 ± 0.095 | 0.784 ± 0.12 |
| 21.9 ± 15.5 | Full | RF | 0.819 ± 0.082 | 0.53 ± 0.176 | 0.614 ± 0.244 | 0.902 ± 0.087 | 0.758 ± 0.111 |
| 19.2 ± 16.2 | Full | Aggr. | 0.769 ± 0.096 | 0.371 ± 0.228 | 0.501 ± 0.267 | 0.864 ± 0.117 | 0.683 ± 0.123 |
The columns of the table correspond to the number of features, the data that was used to obtain the feature set, the FS method used and five performance measures (mean ± std(p value)). Each measure was tested against the same measures of the (T1+T2, RF) model. The best results in the performance measures columns are highlighted in bold.
Fig 3The (T1, RF) CBN.
The CBN for the (T1, RF) feature set, obtained: a) without the target feature; b) with the target feature.
Fig 4The (T1+T2, RF) CBN.
The CBN for the (T1+T2, RF) feature set obtained: a) without the target feature; b) with the target feature.