| Literature DB >> 30323170 |
Alexandra L Young1,2, Razvan V Marinescu3,4, Neil P Oxtoby3,4, Martina Bocchetta5, Keir Yong5, Nicholas C Firth3,4, David M Cash3,5, David L Thomas6,7, Katrina M Dick5, Jorge Cardoso3,5,8, John van Swieten9, Barbara Borroni10, Daniela Galimberti11,12, Mario Masellis13, Maria Carmela Tartaglia14, James B Rowe15, Caroline Graff16, Fabrizio Tagliavini17, Giovanni B Frisoni18, Robert Laforce19, Elizabeth Finger20, Alexandre de Mendonça21, Sandro Sorbi22,23, Jason D Warren5, Sebastian Crutch5, Nick C Fox5, Sebastien Ourselin3,5,6,8, Jonathan M Schott5, Jonathan D Rohrer5, Daniel C Alexander3,4.
Abstract
The heterogeneity of neurodegenerative diseases is a key confound to disease understanding and treatment development, as study cohorts typically include multiple phenotypes on distinct disease trajectories. Here we introduce a machine-learning technique-Subtype and Stage Inference (SuStaIn)-able to uncover data-driven disease phenotypes with distinct temporal progression patterns, from widely available cross-sectional patient studies. Results from imaging studies in two neurodegenerative diseases reveal subgroups and their distinct trajectories of regional neurodegeneration. In genetic frontotemporal dementia, SuStaIn identifies genotypes from imaging alone, validating its ability to identify subtypes; further the technique reveals within-genotype heterogeneity. In Alzheimer's disease, SuStaIn uncovers three subtypes, uniquely characterising their temporal complexity. SuStaIn provides fine-grained patient stratification, which substantially enhances the ability to predict conversion between diagnostic categories over standard models that ignore subtype (p = 7.18 × 10-4) or temporal stage (p = 3.96 × 10-5). SuStaIn offers new promise for enabling disease subtype discovery and precision medicine.Entities:
Mesh:
Year: 2018 PMID: 30323170 PMCID: PMC6189176 DOI: 10.1038/s41467-018-05892-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Conceptual overview of SuStaIn. The Underlying model panel (a) considers a patient cohort to consist of an unknown set of disease subtypes. The input data (Input data panel, b), which can be entirely cross-sectional, contains snapshots of biomarker measurements from each subject with unknown subtype and unknown temporal stage. SuStaIn recovers the set of disease subtypes and their temporal progression (as shown in the Output panel, c) via simultaneous clustering and disease progression modelling. Given a new snapshot, SuStaIn can estimate the probability the subject belongs to each subtype and stage, by comparing the snapshot with the reconstruction (as shown in the Application panel, d). This figure depicts two hypothetical disease subtypes, labelled I and II, and the biomarkers are regional brain volumes, but SuStaIn is readily applicable to any scalar disease biomarker and any number of subtypes. The colour of each region indicates the amount of pathology in that region, ranging from white (no pathology) to red to magenta to blue (maximum pathology)
Fig. 2SuStaIn modelling of genetic frontotemporal dementia using GENFI data. a The progression pattern of each of four subtypes that SuStaIn identifies. Each progression pattern consists of a sequence of stages in which regional brain volumes in mutation carriers (symptomatic and presymptomatic) reach different z-scores relative to non-carriers. Intuitively (for a more precise description see Methods: Uncertainty Estimation), at each stage the colour in each region indicates the level of severity of volume loss: white is unaffected; red is mildly affected (z-score of 1); magenta is moderately affected (z-score of 2); and blue is severely affected (z-score of 3 or more). The circle labelled A indicates the asymmetry of the atrophy pattern (absolute value of the difference in volume between the left and right hemispheres divided by the total volume of the left and right hemispheres) at each stage for each subtype. CVS is the model cross-validation similarity (see Methods: Similarity between two progression patterns): the average similarity of the subtype progression patterns across cross-validation folds, which ranges from 0 (no similarity) to 1 (maximum similarity). f is the proportion of participants estimated to belong to each subtype. b The contribution of each genotype to each of the SuStaIn subtypes. This is calculated as the probability an individual has a particular genotype given that they belong to a particular subtype
Fig. 3SuStaIn modelling of sporadic Alzheimer’s disease using ADNI data. The rows show the progression pattern of the three subtypes identified by SuStaIn. Diagrams as in Fig. 2, but the z-scores are measured relative to amyloid-negative (cerebrospinal fluid Aβ1–42 > 192 pg per ml) cognitively normal subjects, i.e. cognitively normal subjects with no evidence of amyloid pathology on cerebrospinal fluid. The cerebellum was not included as a region in the Alzheimer’s disease analysis and so is shaded in dark grey
Fig. 4Reproducibility of the SuStaIn subtypes in Fig. 3. A largely independent Alzheimer’s disease data set (only 59 subjects are in both the 576 subject data set used to generate this figure and the 793 subject data set used in Fig. 3) consisting of those with regional brain volume measurements from 1.5T MRI scans, rather than 3T MRI scans, are shown. Diagrams are as in Fig. 3, with the rows showing the progression pattern of one of the subtypes identified by SuStaIn. SuStaIn modelling identifies three major subtypes: a typical, a cortical and a subcortical subtype, which are in good agreement with the three subtypes in Fig. 3, as well as an additional very small outlier group (only 4%) with a subtype we term parietal. This small subgroup may represent outliers with a posterior cortical atrophy phenotype
Fig. 5SuStaIn subtyping and staging of genetic frontotemporal dementia and Alzheimer’s disease. a, b The assignability of the disease subtypes estimated by SuStaIn for genetic frontotemporal dementia, and Alzheimer’s disease. Each scatter plot visualises the probability that each individual belongs to each of the SuStaIn subtypes estimated for a genetic frontotemporal dementia (as shown in Fig. 2a), and b. Alzheimer’s disease (as shown in Fig. 3). In the triangle scatter plots, each of the corners corresponds to a probability of 1 of belonging to that subtype, and 0 for the other subtypes; the centre point of the triangle corresponds to a probability of 1/3 of belonging to each subtype. c, d The probability subjects from each of the diagnostic groups belong to each of the SuStaIn stages for c genetic frontotemporal dementia and d Alzheimer’s disease. CN = cognitively normal; MCI = mild cognitive impairment; AD = Alzheimer’s disease
Ability of subtypes to distinguish between different genotypes in symptomatic mutation carriers in GENFI using the SuStaIn subtypes in Fig. 2a
|
|
|
| |
|---|---|---|---|
| Asymmetric frontal (threshold | 93% (13) | 9% (1) | 4% (1) |
| Temporal (threshold | 0% (0) | 91% (10) | 21% (5) |
| Frontotemporal | 0% (0) | 0% (0) | 42% (10) |
| Subcortical | 7% (1) | 0% (0) | 33% (8) |
| Accuracy | 93% (13/14) | 91% (10/11) | 75% (18/24) |
Each entry is the percentage (number) of participants of a particular genotype assigned to that subtype. The final row indicates the percentage (fraction) of participants assigned to the correct subtype from each genotype. The results show that SuStaIn can accurately discriminate genotypes, validating the ability of SuStaIn to identify distinct phenotypes that align with known genetic groups
As Table 1, but for subtypes obtained from a subtypes-only model that accounts for heterogeneity in disease subtype but not disease stage, i.e. the subtypes in Fig. 6
|
|
|
| |
|---|---|---|---|
| Severe frontal (threshold | 57% (8) | 9% (1) | 4% (1) |
| Severe temporal (threshold | 0% (0) | 64% (7) | 8% (2) |
| Mild frontotemporal | 43% (6) | 27% (3) | 88% (21) |
| Accuracy | 57% (8/14) | 64% (7/11) | 88% (21/24) |
The results show that SuStaIn (Table 1) provides much better discrimination of the different genotypes than the subtypes-only model shown here, demonstrating the added utility of a model that accounts for heterogeneity in disease stage
Utility of SuStaIn subtype and stage for predicting the risk of conversion from mild cognitive impairment to Alzheimer’s disease
| SuStaIn subtype | SuStaIn stage | Age | Sex | Education |
| |
|---|---|---|---|---|---|---|
| S–C–T | 1.57** | 1.13† | 0.98 | 0.98 | 0.93~ | 1.82† |
| S–C | 1.76~ | 1.16† | 0.95* | 1.03 | 0.92 | 1.53* |
| C–T | 1.48~ | 1.11† | 0.98 | 0.87 | 0.97 | 1.84† |
| S–T | 2.11* | 1.13† | 1.02 | 1.13 | 0.90* | 2.13† |
Each row shows Hazards ratios for a different Cox Proportional Hazards model that estimates the risk of conversion from mild cognitive impairment to Alzheimer’s disease using ADNI data. Each column shows the estimated hazard ratio for each variable. Each hazards ratio tells you how the risk of conversion changes for each unit increase of a particular variable: a ratio of 1 means no modification of the risk, a ratio >1 means there is an increase of the risk, and a ratio less than 1 means there is a reduction of the risk. For the first model (S–C–T) it is assumed that the hazard ratio increases multiplicatively from the Subcortical subtype (S) to the Cortical subtype (C) to the Typical subtype (T), i.e. the S–C–T model estimates that each SuStaIn subtype has a hazards ratio 1.57 times that of the previous subtype (i.e. the cortical group have a 1.57 times greater risk of conversion than the subcortical group, the typical group have a 1.57 times greater risk of conversion than the cortical group, and the typical group have a 2.46 (1.572) times greater risk of conversion than the subcortical group). In the remaining models only two groups are compared at a time to demonstrate that the results are similar without this assumption, although the statistical power is reduced. This result demonstrates the added utility of both disease subtypes and stages obtained from SuStaIn for predicting conversion between mild cognitive impairment and Alzheimer’s disease, with both subtype and stage modifying the risk of conversion.
Statistical significance is indicated as: ~p < 0.1, *p < 0.05, **p < 0.01, †p < 1 × 10−3
Fig. 6Subtypes-only model for GENFI; not accounting for disease stage heterogeneity. Brain diagrams as in Fig. 2, but here each diagram represents a different subtype, which we refer to as severe frontal, severe temporal and mild frontotemporal. There is no notion of disease stage in the subtypes-only model