| Literature DB >> 36216874 |
Kostas Stoitsas1, Saurabh Bahulikar2, Leonie de Munter3, Mariska A C de Jongh4, Maria A C Jansen4, Merel M Jung2, Marijn van Wingerden2, Katrijn Van Deun5.
Abstract
Predicting recovery after trauma is important to provide patients a perspective on their estimated future health, to engage in shared decision making and target interventions to relevant patient groups. In the present study, several unsupervised techniques are employed to cluster patients based on longitudinal recovery profiles. Subsequently, these data-driven clusters were assessed on clinical validity by experts and used as targets in supervised machine learning models. We present a formalised analysis of the obtained clusters that incorporates evaluation of (i) statistical and machine learning metrics, (ii) clusters clinical validity with descriptive statistics and medical expertise. Clusters quality assessment revealed that clusters obtained through a Bayesian method (High Dimensional Supervised Classification and Clustering) and a Deep Gaussian Mixture model, in combination with oversampling and a Random Forest for supervised learning of the cluster assignments provided among the most clinically sensible partitioning of patients. Other methods that obtained higher classification accuracy suffered from cluster solutions with large majority classes or clinically less sensible classes. Models that used just physical or a mix of physical and psychological outcomes proved to be among the most sensible, suggesting that clustering on psychological outcomes alone yields recovery profiles that do not conform to known risk factors.Entities:
Mesh:
Year: 2022 PMID: 36216874 PMCID: PMC9550811 DOI: 10.1038/s41598-022-21390-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1PCA biplot for the set of General Health variables with indication of the ten clusters obtained with kml3d.
Figure 2Optimum number of clusters with kml3d for the four different cases of variables and k-means.
Tuning parameters for HDclassif and Deepgmm, optimal number of clusters and BIC for the different set of variables (cases).
| Case | Method | Algorithm | Initialization | Model | No. of clusters | BIC |
|---|---|---|---|---|---|---|
| Physical health | HDclassif | SEM | k-means | "AKJBKQKDK" | 7 | − 211,842 |
| Psychological health | HDclassif | EM | k-means | "AKJBKQKDK" | 10 | − 159,809 |
| General Health | HDclassif | EM | k-means | "AKBKQKDK" | 6 | − 390,612 |
| Physical health (pre-injury) | HDclassif | EM | k-means | "AKJBKQKDK" | 6 | − 133,284 |
| Physical health | Deepgmm | – | hclass | – | 6 | − 254,308 |
| Psychological health | Deepgmm | – | random | – | 6 | − 180,027 |
| General health | Deepgmm | – | random | – | 6 | − 469,750 |
| Physical health (pre-injury) | Deepgmm | – | random | – | 6 | − 137,868 |
Example of comparison models for the classification of six clusters obtained for the case of Physical Health (pre-injury) with HDclassif method.
| Model | Mean accuracy % (f1_macro) | 95% CI for accuracy | Optimized hyper-parameters |
|---|---|---|---|
| Logistic regression | 36.53 (33.33) | [35.60–37.46] | Solver = ‘newton-cg’, C = 10, penalty = ‘ |
| Logistic regression (under-sampling) | 36.11 (34.10) | [34.44–37.79] | Solver = ‘newton-cg’, C = 10−2, penalty = ‘ |
| Logistic regression (smote) | 37.88 (35.53) | [36.83–38.94] | Solver = ‘saga’, C = 10−2, penalty = ‘ |
| Logistic regression (over-sampling) | 37.20 (35.64) | [35.96–38.45] | Solver = ‘lib-linear’, C = 10−2, penalty = ‘ |
| Random forest | 36.98 (35.89) | [34.69–39.27] | Estimators = 200, max depth = 15, min samples split = 5 |
| Random forest (under-sampling) | 36.07 (35.12) | [34.24–37.89] | Estimators = 50, max depth = 5, min samples split = 10 |
| Random forest (smote) | 54.75 (53. 67) | [52.79–56.70] | Estimators = 500, max depth = 50, min samples split = 2 |
| Random forest (over-sampling) | 69.12 (68.71) | [67.81–70.44] | Estimators = 500, max depth = 50, min samples split = 2 |
| XGBClassifier (over-sampling) | 68.52 (67.78) | [67.37–69.67] | Estimators = 500, max depth = 10 |
| XGBClassifier (smote) | 57.14 (56.23) | [55.95–58.34] | Estimators = 500, max depth = 20 |
Summary of classification results and quality assessment for clusters obtained from longitudinal data.
| Case | Method | Optimum Nr of clusters | Majority baseline | Accuracy (f1_macro) | 95% CI for accuracy | Clinical sensibleness |
|---|---|---|---|---|---|---|
| Physical Health | kml3d | 8 | 16.48 | 51.89 (50.03) | [50.84–52.94] | +++ |
| Psychological Health | kml3d | 9 | 26.15 | 83.12 (82.63) | [82.36–83.87] | ++ |
| General Health | kml3d | 10 | 17.07 | 68.26 (68.02) | [67.66–68.85] | +++ |
| Physical Health (pre-injury) | kml3d | 8 | 18.08 | 61.56 (60.74) | [60.32–62.81] | ++ |
| Physical Health | HDclassif | 7 | 24.49 | 69.52 (68.61) | [68.00–71.05] | +++ |
| Psychological Health | HDclassif | 10 | 19.12 | 70.24 (69.75) | [68.78–71.70] | + |
| Physical Health (pre-injury) | HDclassif | 6 | 26.07 | 69.12 (68.64) | [67.81–70.44] | +++ |
| Psychological Health | Deepgmm | 6 | 84.70 | 99.96 (98.67) | [99.94–99.98] | + |
| General Health | Deepgmm | 6 | 62.13 | 98.20 (97.87) | [97.87–98.54] | ++ |
| Physical Health (pre-injury) | Deepgmm | 6 | 61.02 | 94.78 (93.55) | [94.37–95.18] | + |
Best models based on classification metrics and clinical sensibleness are in bold.
Significant predictors and accuracy for the case of General Health and different clustering techniques before and after Boruta.
| Method | Nr clusters | Accuracy (f1_macro) | Important Predictors | Accuracy (f1_macro) with important predictors |
|---|---|---|---|---|
| kml3d | 10 | 68.26 (68.02) | ‘Age’, ‘Injury severity score’, | 69.13 (68.23) |
| ‘Comorbidities’, ‘BMI’, ‘Status score’, | ||||
| ‘Pre-injury EQ-VAS’, ‘Frailty’, ‘Admission days in hospital’ | ||||
| HDclassif | 6 | 73.96 (72.59) | ‘Age’, ‘Injury severity score’, | 73.82 (72.43) |
| ‘Comorbidities’, ‘BMI’, ‘Status score’, | ||||
| ‘Pre-injury EQ-VAS’, ‘Frailty’, ‘Admission days in hospital’ | ||||
| Deepgmm | 6 | 98.20 (97.87) | ‘Age’, ‘Category accident’, ‘Admission days in hospital’, | 98.26 (97.92) |
| ‘Injury severity score’, ‘Education level’, ‘Comorbidities’, | ||||
| ‘Status score’, ‘Pre-injury EQ-VAS’, ‘Frailty’, | ||||
| ‘Traumatic brain injury’, ‘Gender’, ‘Pre-injury cognition’ |
Descriptive statistics for clusters obtained with different methods and that have been evaluated as Highly (+++), Medium (+ +) and Poorly ( +) sensible.
| Case/method | Cluster | Age | Frailty | Comorbidities | Severity score | Admission days in hospital | Gender | Hip fracture | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | S.E | Mean | S.E | Mean | S.E | Mean | S.E | Mean | S.E | Male % | Female % | No % | Yes % | ||
| General Health | 1 | 58.52 | 0.69 | 0.63 | 0.07 | 0.60 | 0.04 | 5.76 | 0.19 | 3.91 | 0.17 | 65.47 | 34.53 | 83.30 | 16.70 |
| HDclassif | 2 | 61.78 | 0.48 | 1.95 | 0.12 | 0.88 | 0.03 | 6.32 | 0.13 | 5.13 | 0.12 | 51.98 | 48.02 | 78.80 | 21.20 |
| Highly sensible (+++) | 3 | 64.48 | 0.64 | 2.79 | 0.16 | 1.08 | 0.04 | 6.33 | 0.16 | 5.91 | 0.19 | 52.73 | 47.27 | 77.00 | 23.00 |
| 4 | 65.91 | 0.57 | 4.86 | 0.19 | 1.44 | 0.04 | 7.01 | 0.16 | 7.50 | 0.20 | 39.70 | 60.30 | 72.00 | 28.00 | |
| 5 | 73.89 | 0.81 | 5.72 | 0.27 | 1.87 | 0.07 | 7.02 | 0.22 | 8.17 | 0.40 | 35.73 | 64.27 | 61.40 | 38.60 | |
| 6 | 74.72 | 0.93 | 7.96 | 0.28 | 2.04 | 0.08 | 7.76 | 0.28 | 9.65 | 0.54 | 30.95 | 69.05 | 54.80 | 45.20 | |
| Physical Health (pre-injury) | A | 63.82 | 0.61 | 1.89 | 0.11 | 0.97 | 0.04 | 6.06 | 0.15 | 4.83 | 0.14 | 49.49 | 50.51 | 78.14 | 21.86 |
| kml3d | B | 58.42 | 0.63 | 1.06 | 0.09 | 0.54 | 0.03 | 5.53 | 0.16 | 3.93 | 0.15 | 66.21 | 33.79 | 86.01 | 13.99 |
| Medium sensible (++) | C | 58.06 | 0.68 | 1.46 | 0.15 | 0.69 | 0.03 | 6.72 | 0.19 | 5.31 | 0.17 | 57.11 | 42.89 | 79.77 | 20.23 |
| D | 69.27 | 0.66 | 3.53 | 0.18 | 1.58 | 0.05 | 6.51 | 0.18 | 6.84 | 0.22 | 41.43 | 58.57 | 69.47 | 30.53 | |
| E | 63.22 | 0.74 | 2.81 | 0.21 | 1.19 | 0.04 | 7.09 | 0.22 | 7.30 | 0.27 | 43.37 | 56.63 | 76.91 | 23.09 | |
| F | 72.78 | 0.95 | 6.02 | 0.28 | 1.84 | 0.07 | 7.72 | 0.28 | 10.00 | 0.49 | 27.78 | 72.22 | 59.34 | 40.66 | |
| G | 73.64 | 0.84 | 7.09 | 0.24 | 2.05 | 0.08 | 7.31 | 0.28 | 8.27 | 0.37 | 34.96 | 65.04 | 62.18 | 37.82 | |
| H | 78.45 | 0.82 | 9.47 | 0.22 | 2.38 | 0.08 | 7.70 | 0.28 | 9.60 | 0.53 | 28.21 | 71.79 | 50.32 | 49.68 | |
| Psychological health | 1 | 64.21 | 1.44 | 3.38 | 0.42 | 1.19 | 0.09 | 6.90 | 0.38 | 5.95 | 0.35 | 52.02 | 47.98 | 70.71 | 29.29 |
| Deepgmm | 2 | 64.22 | 2.15 | 3.78 | 0.77 | 1.37 | 0.16 | 8.59 | 0.87 | 8.67 | 0.79 | 40.24 | 59.76 | 76.52 | 23.48 |
| Poorly sensible (+) | 3 | 64.79 | 0.30 | 5.04 | 0.48 | 1.37 | 0.09 | 5.98 | 0.30 | 6.40 | 0.47 | 39.47 | 60.53 | 73.68 | 26.32 |
| 4 | 64.84 | 0.66 | 3.62 | 0.1 | 1.17 | 0.02 | 6.54 | 0.07 | 6.22 | 0.10 | 48.50 | 51.40 | 71.95 | 28.05 | |
| 5 | 66.70 | 1.82 | 3.84 | 0.59 | 1.34 | 0.12 | 6.76 | 0.57 | 6.05 | 0.63 | 40.87 | 59.13 | 74.41 | 25.59 | |
| 6 | 67.53 | 1.54 | 4.31 | 0.54 | 1.39 | 0.11 | 6.93 | 0.42 | 7.38 | 0.61 | 37.50 | 62.50 | 69.37 | 30.63 | |
Figure 3The two graphs at the top present recovery based on EQ-VAS and EQ-5D for the case of General Health with HDclassif. The two graphs at the bottom depict psychological condition (high values indicate high stress and anxiety) of various clusters after the injury.