| Literature DB >> 35676413 |
Joon-Tae Kim1, Nu Ri Kim2, Su Hoon Choi3, Seungwon Oh3, Man-Seok Park2, Seung-Han Lee2, Byeong C Kim2, Jonghyun Choi4, Min Soo Kim5.
Abstract
Clustering stroke patients with similar characteristics to predict subsequent vascular outcome events is critical. This study aimed to compare several clustering methods, particularly a deep neural network-based model, and identify the best clustering method with a maximally distinct 1-year outcome in patients with ischemic stroke. Prospective stroke registry data from a comprehensive stroke center from January 2011 to July 2018 were retrospectively analyzed. Patients with acute ischemic stroke within 7 days of onset were included. The primary outcomes were the composite of all strokes (either hemorrhagic or ischemic), myocardial infarction, and all-cause mortality within one year. Neural network-based clustering models (deep lifetime clustering) were compared with other clustering models (k-prototype and semi-supervised clustering, SSC) and a conventional risk score (Stroke Prognostic Instrument-II, SPI-II) to obtain a distinct distribution of 1-year vascular events. Ultimately, 7,650 patients were included, and the 1-year primary outcome event occurred in 13.1%. The DLC-Kuiper UB model had a significantly higher C-index (0.674), log-rank score (153.1), and Brier score (0.08) than the other cluster models (SSC and DLC-MMD) and the SPI-II score. There were significant differences in primary outcome events among the 3 clusters (41.7%, 13.4%, and 6.5% in clusters 0, 1, and 2, respectively) when the DLC-Kuiper UB model was used. A neural network-based clustering model, the DLC-Kuiper UB model, can improve the clustering of stroke patients with a maximally distinct distribution of 1-year vascular outcomes among each cluster. Further studies are warranted to validate this deep neural network-based clustering model in ischemic stroke.Entities:
Mesh:
Year: 2022 PMID: 35676413 PMCID: PMC9177616 DOI: 10.1038/s41598-022-13636-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
General characteristics of subjects.
| Variables | Values |
|---|---|
| N | 7650 |
| Age, yr, mean (SD) | 68.6 (12.4) |
| Male, N (%) | 4433 (57.9) |
| Within 24 h | 5830 (76.2) |
| Beyond 24 h | 1820 (23.8) |
| BMI, mean (SD) | 23.5 (3.3) |
| Initial NIHSS, med (IQR) | 3 (1, 9) |
| Prestroke disability (pre-mRS > 1), N (%) | 1154 (15.1) |
| LAA | 2269 (29.7) |
| SVO | 801 (10.5) |
| CE | 1899 (24.8) |
| OE | 125 (1.6) |
| UD | 2556 (33.4) |
| History of TIA | 114 (1.5) |
| History of stroke | 1308 (17.1) |
| History of peripheral artery diseases | 47 (0.6) |
| History of coronary artery diseases | 432 (5.6) |
| HTN | 4435 (58.0) |
| DM | 2078 (27.2) |
| Dyslipidemia | 1190 (15.6) |
| Smoking | |
| Never | 5128 (67.0) |
| Current | 1496 (19.6) |
| Ex-smoker (quit > = 5 yr) | 444 (5.8) |
| Recent smoker (quit < 5 yr) | 582 (7.6) |
| Atrial fibrillation | 1889 (24.7) |
| High risk of cardioembolism | 1709 (22.3) |
| Congestive heart failure | 31 (0.4) |
| Antiplatelet | 1750 (22.9) |
| Anticoagulant | 337 (4.4) |
| Antihypertensive | 3619 (47.3) |
| Antidiabetic | 1696 (22.2) |
| Statin | 855 (11.2) |
| White blood cell counts, 103/µL | 8.43 (3.13) |
| Hemoglobin, mg/dL | 13.6 (1.9) |
| Platelet counts, 103/µL | 221.6 (65.9) |
| Glucose, mg/dL | 136.6 (55.2) |
| Creatinine, mg/dL | 0.90 (0.73) |
| Systolic blood pressure, mmHg | 139.1 (114.0) |
| No stenosis | 3073 (40.2) |
| Mild stenosis < 50% | 479 (6.3) |
| Moderate-to-severe stenosis > 50% | 1171 (15.3) |
| Complete occlusion | 2927 (38.3) |
BMI; body mass index, NIHSS; National Institutes of Health Stroke Scale, TOAST; Trials of Org.
C-index, Log-rank score, and Brier score for different clustering methods in datasets.
| SPI-II | K-prototype | SSC-Bair | DLC-MMD | DLC-Kuiper UB | ||
|---|---|---|---|---|---|---|
| C-index | 0.615 (0.015) | 0.523 (0.012) | 0.523 (0.013) | 0.673 (0.030) | 0.709 (0.019) | < 0.001 |
| Log-rank score | 116.990 (12.122) | 4.061 (2.356) | 4.515 (2.971) | 476.833 (110.306) | 817.736 (140.927) | < 0.001 |
| Brier score | 0.085 (0.001) | 0.086 (0.001) | 0.086 (0.001) | 0.081 (0.002) | 0.076 (0.002) | < 0.001 |
| C-index | 0.628 (0.022) | 0.531 (0.018) | 0.532 (0.024) | 0.654 (0.032) | 0.690 (0.018) | < 0.001 |
| Log-rank score | 43.690 (9.587) | 3.555 (2.547) | 3.673 (2.706) | 106.983 (30.735) | 170.533 (24.017) | < 0.001 |
| Brier score | 0.084 (0.002) | 0.086 (0.002) | 0.086 (0.002) | 0.080 (0.001) | 0.077 (0.001) | < 0.001 |
| C-index | 0.614 (0.024) | 0.525 (0.022) | 0.524 (0.020) | 0.657 (0.039) | 0.674 (0.027) | < 0.001 |
| Log-rank score | 38.247 (9.315) | 2.810 (2.769) | 2.796 (2.753) | 106.249 (33.097) | 153.099 (32.740) | < 0.001 |
| Brier score | 0.085 (0.002) | 0.086 (0.002) | 0.086 (0.002) | 0.081 (0.002) | 0.079 (0.001) | < 0.001 |
P-value; One-Way ANOVA.
Figure 1C-index (A), log-rank score (B), and Brier score (C) for different clustering methods applied to the datasets.
One-year outcomes of patient groups according to the clustering method (k = 3/test set).
| Cluster 0 | Cluster 1 | Cluster 2 | ||
|---|---|---|---|---|
| N | 949 | 168 | 413 | |
| Primary outcome | 136 (14.3%) | 16 (9.5%) | 49 (11.9%) | 0.158 |
| Stroke | 26 (2.7%) | 3 (1.8%) | 14 (3.4%) | 0.557 |
| MI | 0 (0.0%) | 0 (0.0%) | 2 (0.5%) | 0.067 |
| All-cause mortality | 117 (12.3%) | 14 (8.3%) | 37 (9.0%) | 0.096 |
| N | 164 | 952 | 414 | |
| Primary outcome | 15 (9.1%) | 136 (14.3%) | 50 (12.1%) | 0.150 |
| Stroke | 3 (1.8%) | 26 (2.7%) | 14 (3.4%) | 0.579 |
| MI | 0 (0.0%) | 0 (0.0%) | 2 (0.5%) | 0.067 |
| All-cause mortality | 13 (7.9%) | 117 (12.3%) | 38 (9.2%) | 0.100 |
| N | 966 | 527 | 37 | |
| Primary outcome | 69 (7.1%) | 117 (22.2%) | 15 (40.5%) | < 0.001 |
| Stroke | 20 (2.1%) | 23 (4.4%) | 0 (0.0%) | 0.022 |
| MI | 1 (0.1%) | 1 (0.2%) | 0 (0.0%) | 0.885 |
| All-cause mortality | 52 (5.4%) | 101 (19.2%) | 15 (40.5%) | < 0.001 |
| N | 156 | 677 | 697 | |
| Primary outcome | 65 (41.7%) | 91 (13.4%) | 45 (6.5%) | < 0.001 |
| Stroke | 8 (5.1%) | 23 (3.4%) | 12 (1.7%) | 0.031 |
| MI | 0 (0.0%) | 2 (0.3%) | 0 (0.0%) | 0.283 |
| All-cause mortality | 63 (40.4%) | 69 (10.2%) | 36 (5.2%) | < 0.001 |
P-value; Chi-squared test.
Figure 2Kaplan–Meier curves of the primary outcome within 1 year from different clustering methods for K = 3 (3 clusters). (A) K-prototype, (B) SSC-Bair, (C) DLC-MMD, and (D) DLC-Kuiper UB, all test set results.
Figure 3Feature importance matrix plots. In the bar plot, the SHAP value implies the degree of contribution of a specific feature. The higher the SHAP value is, the larger the model contribution of a specific feature.