| Literature DB >> 35743806 |
Xue Zhou1, Keijiro Nakamura2, Naohiko Sahara2, Masako Asami2, Yasutake Toyoda2, Yoshinari Enomoto2, Hidehiko Hara2, Mahito Noro3, Kaoru Sugi3, Masao Moroi2, Masato Nakamura2, Ming Huang4, Xin Zhu1.
Abstract
Identifying patient prognostic phenotypes facilitates precision medicine. This study aimed to explore phenotypes of patients with heart failure (HF) corresponding to prognostic condition (risk of mortality) and identify the phenotype of new patients by machine learning (ML). A unsupervised ML was applied to explore phenotypes of patients in a derivation dataset (n = 562) based on their medical records. Thereafter, supervised ML models were trained on the derivation dataset to classify these identified phenotypes. Then, the trained classifiers were further validated on an independent validation dataset (n = 168). Finally, Shapley additive explanations were used to interpret decision making of phenotype classification. Three patient phenotypes corresponding to stratified mortality risk (high, low, and intermediate) were identified. Kaplan-Meier survival curves among the three phenotypes had significant difference (pairwise comparison p < 0.05). Hazard ratio of all-cause mortality between patients in phenotype 1 (n = 91; high risk) and phenotype 3 (n = 329; intermediate risk) was 2.08 (95%CI 1.29-3.37, p = 0.003), and 0.26 (95%CI 0.11-0.61, p = 0.002) between phenotype 2 (n = 142; low risk) and phenotype 3. For phenotypes classification by random forest, AUCs of phenotypes 1, 2, and 3 were 0.736 ± 0.038, 0.815 ± 0.035, and 0.721 ± 0.03, respectively, slightly better than the decision tree. Then, the classifier effectively identified the phenotypes for new patients in the validation dataset with significant difference on survival curves and hazard ratios. Finally, age and creatinine clearance rate were identified as the top two most important predictors. ML could effectively identify patient prognostic phenotypes, facilitating reasonable management and treatment considering prognostic condition.Entities:
Keywords: heart failure; machine learning; mortality risk; patient phenotypes; prognosis
Year: 2022 PMID: 35743806 PMCID: PMC9224610 DOI: 10.3390/life12060776
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Figure 1The proposed system frame for phenotype exploration and classification combining unsupervised and supervised machine learning methods.
Figure 2Characteristics of patients clusters 1 (a), 2 (b), 3 (c), 4 (d), and 5 (e), respectively. The bottom violin plot in (a) showed the characteristics adjusted by gender. “***” in (a) indicates p < 0.001; “**” in (b) indicates p < 0.01.
Figure 3Overall (a) and cardiovascular (b) survival curves of the three identified phenotypes.
Univariate analysis for phenotype 1 and phenotype 3.
| Variables | Phenotype 1 (High Risk) | Phenotype 3 (Intermediate Risk) | ||
|---|---|---|---|---|
| Hazard Ratio | Hazard Ratio | |||
| NYHA at discharge | 3.61 (2.19–5.94) | <0.001 | 2.31 (1.43-3.75) | <0.001 |
| Low ADL at discharge | 2.94 (1.32–6.58) | 0.009 | - 1 | |
| DOACWFuse at discharge | 0.11 (0.01–0.79) | 0.030 | - | |
| eGFR at discharge | 0.97 (0.95–0.99) | 0.009 | 0.97 (0.95–0.99) | 0.002 |
| Ccr at discharge | 0.97 (0.93–0.999) | 0.045 | 0.96 (0.93–0.98) | <0.001 |
| Creatinine at discharge | 1.15 (1.04–1.27) | 0.007 | 1.78 (1.37–2.30) | <0.001 |
| SBP at admission | 0.98 (0.97–0.998) | 0.029 | - | |
| SBP at discharge | 0.95 (0.93–0.96) | <0.001 | 0.98 (0.97–0.9998) | 0.047 |
| DBP at discharge | 0.95 (0.92–0.99) | 0.006 | - | |
| HR at discharge | 1.02 (1.002–1.05) | 0.035 | - | |
| TR | - | 1.29 (1.03–1.62) | 0.025 | |
| logNT-proBNP | - | 2.64 (1.45–4.79) | 0.001 | |
| Albumin | - | 0.55 (0.31–0.97) | 0.039 | |
1 Not statistically significant.
Figure 4Classification performance of decision tree and random forest. (a,b) Show ROC curves of the two models on internal validation. (c–f) Show survival curves of phenotypes classified by decision tree and random forest for patients in the independent validation dataset, respectively.
Figure 5Feature importance for overall (a), phenotype 1 (b), phenotype 2 (c), and phenotype 3 (d), interpreted by SHAP.
Figure 6Cutoff values of age (a) and Ccr (b) and their exact impact on model output.
Hazard ratio between two patient groups classified by cutoff values of the top two most important predictors.
| Variables and Cutoff Values | Hazard Ratio | |
|---|---|---|
| Age < 73 years | 0.28 (0.13–0.58) | <0.001 |
| Age > 80 years | 2.22 (1.40–3.55) | <0.001 |
| Ccr at discharge < 20 mL/min | 3.63 (2.34–5.63) | <0.001 |
| Ccr at discharge > 28 mL/min | 0.35 (0.22–0.55) | <0.001 |