| Literature DB >> 30853713 |
Benjamin Goudey1,2,3, Bowen J Fung1,4, Christine Schieber1, Noel G Faux5,6.
Abstract
It is increasingly recognized that Alzheimer's disease (AD) exists before dementia is present and that shifts in amyloid beta occur long before clinical symptoms can be detected. Early detection of these molecular changes is a key aspect for the success of interventions aimed at slowing down rates of cognitive decline. Recent evidence indicates that of the two established methods for measuring amyloid, a decrease in cerebrospinal fluid (CSF) amyloid β1-42 (Aβ1-42) may be an earlier indicator of Alzheimer's disease risk than measures of amyloid obtained from Positron Emission Tomography (PET). However, CSF collection is highly invasive and expensive. In contrast, blood collection is routinely performed, minimally invasive and cheap. In this work, we develop a blood-based signature that can provide a cheap and minimally invasive estimation of an individual's CSF amyloid status using a machine learning approach. We show that a Random Forest model derived from plasma analytes can accurately predict subjects as having abnormal (low) CSF Aβ1-42 levels indicative of AD risk (0.84 AUC, 0.78 sensitivity, and 0.73 specificity). Refinement of the modeling indicates that only APOEε4 carrier status and four plasma analytes (CGA, Aβ1-42, Eotaxin 3, APOE) are required to achieve a high level of accuracy. Furthermore, we show across an independent validation cohort that individuals with predicted abnormal CSF Aβ1-42 levels transitioned to an AD diagnosis over 120 months significantly faster than those with predicted normal CSF Aβ1-42 levels and that the resulting model also validates reasonably across PET Aβ1-42 status (0.78 AUC). This is the first study to show that a machine learning approach, using plasma protein levels, age and APOEε4 carrier status, is able to predict CSF Aβ1-42 status, the earliest risk indicator for AD, with high accuracy.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30853713 PMCID: PMC6409361 DOI: 10.1038/s41598-018-37149-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic characteristics of the ADNI data set separated into training and validation cohorts, corresponding to individuals with and without CSF measures respectively.
| Dataset | Training | Validation | ||||
|---|---|---|---|---|---|---|
| Diagnosis | CN | MCI | AD | CN | MCI | AD |
| Number of participants (n) | 58 | 198 | 102 | 0 | 198 | 10 |
| Age (mean, (SD)) | 75.11 (5.77) | 74.35 (7.48) | 74.90 (7.91) | — | 75.13 (7.32) | 73.73 (10.04) |
| Gender; female (n, (%)) | 28 (48.28) | 65 (32.82) | 43 (42.16) | — | 75 (37.88) | 4 (40.00) |
| Years of education (mean, (SD)) | 15.67 (2.78) | 15.81 (2.99) | 15.16 (3.29) | — | 15.48 (3.09) | 14.70 (2.06) |
| 5 (8.62) | 106 (53.54) | 71 (69.61) | — | 105 (53.03) | 5 (50.00) | |
| CSF | 0 (0) | 146 (73.7) | 94 (92.16) | — | — | — |
| PET imaging (n, (%)) | 28 (48.28) | 71 (35.86) | 9 (8.82) | — | 108 (65.55) | — |
Columns in each cohort provide a further breakdown into individuals that are cognitively normal (CN), have mild cognitive impairment (MCI) or Alzheimer’s disease (AD). The units of each cell are shown in parentheses in the row names and commonly include number of patients (n) or mean of a given quantity (mean). If a secondary measure (percent (%) or standard deviation (SD)) is also present, it is listed in brackets next to the primary measure.
Mean and standard deviation (in parentheses) of performance metrics (area under the receiver operator curve, AUC; accuracy, Acc; sensitivity, Sens; specificity, Spec and R2 for the regression models) for the different Random Forest models using different feature sets across all cross-validation folds.
| Feature set | Regression | Binary | |||||||
|---|---|---|---|---|---|---|---|---|---|
| AUC |
| Acc | Sens | Spec | AUC | Acc | Sens | Spec | |
| BPM | 0.830 (0.08) | 0.274 (0.11) | 0.784 (0.09) | 0.536 (0.14) | |||||
| BM | 0.788 (0.08) | 0.209 (0.13) | 0.744 (0.08) | 0.779 (0.09) | 0.679 (0.16) | 0.784 (0.08) | 0.737 (0.08) | 0.855 (0.08) | 0.507 (0.15) |
| BP | 0.782 (0.10) | 0.552 (0.14) | |||||||
| B | 0.795 (0.08) | 0.746 (0.06) | 0.751 (0.08) | 0.800 (0.07) | 0.745 (0.07) | 0.731 (0.09) |
| ||
| BP | 0.812 (0.08) | 0.270 (0.14) | 0.749 (0.07) |
| 0.638 (0.15) | ||||
Left and right halves are for the regression and binary tasks respectively. For each metric, bold faced text indicates the highest mean score or models whose performance is not significantly lower (via a Wilcox rank signed test, Bonferroni corrected significance threshold of 0.05/5 = 0.01). Features sets describe combinations of (B) baseline model (age and APOEε4 carrier status), (P) Proteomics, (M) Metabolomics.
Figure 1ROC curves comparing different sets of features to determine predictive value for the (a) regression, and (b) binary tasks. Different colours of lines correspond to different feature sets, (B) baseline model (age and APOEε4 carrier status), (P) Proteomics, (M) Metabolomics, with corresponding AUCs shown in the legend under each plot. The model BP in subplot (a) indicates the performance of the RF using feature selection.
Figure 2Partial dependency plots of the five features selected from the full BP model using a recursive feature elimination approach. Each subplot shows how the variation of a specific feature impacts that predicted levels of CSF Aβ1–42 assuming the other four features are fixed.
Figure 3Kaplan-Meier curves for (a) the training cohort stratified by actual CSF Aβ status, and the validation cohort stratified by predicted CSF Aβ1–42 status from the (b) B model, (c) BP model and (d) BP5 model. The bands along the curves represent the 95% confidence intervals. Hazards ratios and 95% confidence intervals for the abnormal group compared to the normal are shown in the bottom left of each subplot. In all cases, the low CSF Aβ1–42 group transitioned to AD diagnosis significantly faster then the normal group (p = 3.97 × 10−7, 7.89 × 10−6, 9.96 × 10−4, 1.65 × 10−3 for the four plots left-right). CN individuals were not included in this analysis because there were no CN individuals present in the validation cohort.
Figure 4ROC curves comparing different sets of features to determine PET-based Aβ status on the (a) training and (b) validation cohorts. Different colours of lines correspond to different feature sets with corresponding AUCs shown in the legend under each plot. Results in the training cohort are more useful as a measure of similarity between the tasks of predicting CSF and PET Aβ status given that this was the data used to train the CSF model and hence the AUC are upwardly biased, especially for more complex models (e.g. BP).