| Literature DB >> 32483590 |
Caitlin E Coombes1,2, Zachary B Abrams2, Suli Li3, Lynne V Abruzzo4, Kevin R Coombes2.
Abstract
OBJECTIVE: Unsupervised machine learning approaches hold promise for large-scale clinical data. However, the heterogeneity of clinical data raises new methodological challenges in feature selection, choosing a distance metric that captures biological meaning, and visualization. We hypothesized that clustering could discover prognostic groups from patients with chronic lymphocytic leukemia, a disease that provides biological validation through well-understood outcomes.Entities:
Keywords: chronic lymphocytic leukemia; clinical informatics, mixed-type data; clustering; unsupervised machine learning
Mesh:
Substances:
Year: 2020 PMID: 32483590 PMCID: PMC7647286 DOI: 10.1093/jamia/ocaa060
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Clinical characteristics of chronic lymphocytic leukemia (CLL) patients
| Patients n (%) | |
|---|---|
|
| 247 |
|
| |
| Male | 173 (70.0) |
| Female | 74 (30.0) |
|
| |
| Asian | 1 (0.4) |
| Black | 11 (4.5) |
| Hispanic | 7 (2.8) |
| White | 228 (92.3) |
|
| |
| Low (0–2) | 196 (79.4) |
| High (3–4) | 51 (26.0) |
|
| |
| del13q | 90 (36.4) |
| +12 | 37 (15.0) |
| FISH normal | 73 (29.6) |
| del11q | 34 (13.8) |
| del17p | 13 (5.3) |
|
| |
| Mutated | 106 (43.1) |
| Unmutated | 140 (56.9) |
|
| |
| Never treated | 20 (8.1) |
| Treated with FCR | 227 (91.9) |
|
|
|
| Minimum | 26.74 |
| Median | 55.87 |
| Maximum | 82.41 |
Selected clinical and routine laboratory data, somatic mutation status, and common recurrent cytogenetic abnormalities collected at time of diagnosis on 247 treatment-naïve patients diagnosed with CLL and obtained by chart review.
Figure 1.Data transformation A: (A) Kaplan-Meier survival curve, (B) MDS plot, and (C) t-SNE plot for 7 unsupervised clusters of CLL patients. Unsupervised machine learning, using k-means clustering with Partitioning Around Medoids (PAM) and the Sokal&Michener distance yields 7 clinical phenotypes with significant differences in overall survival (OS) (P = .0164). Clusters separated by MDS along the first dimension reflect OS outcomes.
Informative, identifying features of clusters for data transformations A and B, in order of overall survival
| Data Transformation A. | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| ID |
| Sex |
| ZAP70 | Döhner | CD38 | Light Chain | Other | |
|
| 19 | M | Mutated | − | Low | Lambda | Hypogammaglobulinemia | ||
|
| 29 | F | Mutated | − | del13q | Low | |||
|
| 36 | M | Mutated | − | del13q | Low | |||
|
| 27 | M | Unmutated | High | Kappa | ||||
|
| 25 | M | Unmutated | − | Low | Anemia | |||
|
| 38 | Unmutated | + | Low | Kappa | ||||
|
| 22 | Unmutated | + | Low | Anemia | ||||
|
| |||||||||
| Data Transformation B. | |||||||||
| ID |
|
|
| ZAP70 | CD38 | Age (yrs) | Prolymphocytes (%) | Light Chain | Otder |
|
| 46 | Mutated | − | Low | < 65 | < 10 | Lambda | ||
|
| 37 | Mutated | − | Low | < 65 | < 10 | Kappa | ||
|
| 26 | M | Unmutated | < 65 | < 10 | Anemia | |||
|
| 44 | Unmutated | + | Low | < 65 | < 10 | Kappa | ||
|
| 12 | M | Unmutated | + | Low | ≥ 65 | Lambda | Anemia | |
|
| 31 | M | Unmutated | + | High | < 65 | < 10 | Kappa | |
Notes: Clusters are ordered by predicted survival outcome, from longest survival (A1 or B1) to shortest (A7 or B6). Characteristic features of each cluster, defined as a feature present in at least 75% of members of a given cluster, include known indicators of superior prognosis (IGHV-mutated status and female sex) and poor prognosis (ZAP70 positivity). For complete results and percentages, see Supplementary Table B.1.
M, male; F, female.
Figure 2.Data transformation B: (A) Kaplan-Meier survival curve, (B) MDS plot, and (C) t-SNE plot for 6 unsupervised clusters of CLL patients. Unsupervised machine learning, using k-means clustering with Partitioning Around Medoids (PAM) and the Sokal&Michener distance yields 7 clinical phenotypes with significant differences in time-to-progression (TTP) (P = .0451). (Supplementary Figure 1) Clusters separated by MDS along the first dimension reflect order of overall survival (OS) outcomes.