| Literature DB >> 34294691 |
Yanting Huang1, Xiaobo Sun2, Huige Jiang3, Shaojun Yu1, Chloe Robins4, Matthew J Armstrong5, Ronghua Li5, Zhen Mei4, Xiaochuan Shi6, Ekaterina Sergeevna Gerasimov4, Philip L De Jager7, David A Bennett8, Aliza P Wingo9,10, Peng Jin5, Thomas S Wingo11,12, Zhaohui S Qin13.
Abstract
Alzheimer's disease (AD) is influenced by both genetic and environmental factors; thus, brain epigenomic alterations may provide insights into AD pathogenesis. Multiple array-based Epigenome-Wide Association Studies (EWASs) have identified robust brain methylation changes in AD; however, array-based assays only test about 2% of all CpG sites in the genome. Here, we develop EWASplus, a computational method that uses a supervised machine learning strategy to extend EWAS coverage to the entire genome. Application to six AD-related traits predicts hundreds of new significant brain CpGs associated with AD, some of which are further validated experimentally. EWASplus also performs well on data collected from independent cohorts and different brain regions. Genes found near top EWASplus loci are enriched for kinases and for genes with evidence for physical interactions with known AD genes. In this work, we show that EWASplus implicates additional epigenetic loci for AD that are not found using array-based AD EWASs.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34294691 PMCID: PMC8298578 DOI: 10.1038/s41467-021-24710-8
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1Overview of EWASplus approach.
The EWASplus procedure is composed of four major steps: (1) Training data collection from existing EWASs; (2) External feature (from sources such as ENCODE and Roadmap Epigenome consortia) selection; (3) Ensemble learning; and (4) Genome-wide CpGs risk prediction, in which trained ensemble learning model is applied genome-wide to score all CpGs.
Summary of performance evaluation of all six AD-related traits.
| Outcome | Outcome type | AUC | AUPR | F1 | Precision | Recall |
|---|---|---|---|---|---|---|
| Beta-amyloid | Pathologic, IHC | 0.850 | 0.539 | 0.492 | 0.423 | 0.589 |
| Braak staging | Pathologic, Silver Stain | 0.860 | 0.599 | 0.530 | 0.487 | 0.581 |
| CERAD | Pathologic, Silver Stain | 0.833 | 0.502 | 0.508 | 0.457 | 0.571 |
| Cognitive trajectory | Clinical | 0.831 | 0.591 | 0.516 | 0.451 | 0.604 |
| Global pathology | Pathologic, Silver Stain | 0.882 | 0.622 | 0.577 | 0.507 | 0.671 |
| Neurofibrillary tangles | Pathologic, IHC | 0.962 | 0.858 | 0.754 | 0.677 | 0.852 |
The performance evaluation is on independent testing set on 450K array. The result reported here is in imbalanced setting (positive to negative CpGs ratio 1:10), which is closer to the real imbalanced setting in the human genome.
Fig. 2Summary EWASplus results.
a ROC curves of the predictive performance of EWASplus on the six traits in the ROS/MAP cohort. b Precision-recall curves of the predictive performance of EWASplus on the six traits in the ROS/MAP cohort. Source data are provided as a Source data file.
Fig. 3Genome-wide prediction results.
a Manhattan plots for neurofibrillary tangles: the top panel is for on-450K CpGs with EWAS p-values and the bottom panel is for whole-genome CpGs with imputed LRS by EWASplus. The y-axis is the log-scale rank scores. The top-ranked CpG has the LRS of 7.42 (about empirical p-value of 3.8 × 10−8); the top 100th ranked CpG has the LRS of 5.42 (about empirical p-value of 3.8 × 10−6) and the top 10,000th ranked CpG (about empirical p-value of 3.8 × 10−4) has the LRS of 3.42. b Raw and normalized stacked-proportion histograms for different genomic annotation types. Source data are provided as a Source data file. c The difference of observed and expected chromatin states proportion for the top 10,000 loci across the six AD-related traits: Beta-amyloid, Braak staging, CERAD, cognitive trajectory, global pathology, and neurofibrillary tangles. Source data are provided as a Source data file. The annotated chromatin states are from Roadmap Epigenetics Project and we used the core 15-state model chromatin states for the dorsolateral prefrontal cortex tissue type. To minimize ambiguity, we require only a single annotation type is assigned for each CpG site. if a CpG has multiple annotations, we only record the most “significant” annotation with the following order: enhancer > promoter > exon > intron > near gene (1–5 kb to the TSS) > intergenic. We do not list 5′ UTR and 3′ UTR since these two types are within the first and last exon of each gene according to the UCSC annotation system.
Top ten CpG loci for six AD-relevant traits.
| Chr | Position (bp) | Beta-amyloid | Braak staging | CERAD | Cognitive decline | Global pathology | Neurofibrillary tangles | Genes within 50 kb of associated CpG |
|---|---|---|---|---|---|---|---|---|
| 5 | 172175606 | 4.679 | 4.662 | 5.009 | 4.658 | 6.248 | 3.720 | |
| 7 | 47367933 | 4.008 | 4.736 | 5.251 | 4.254 | 5.278 | 5.210 | |
| 6 | 35286078 | 5.141 | 5.947 | 4.497 | 3.731 | 3.762 | 5.072 | |
| 19 | 10736075 | 4.130 | 5.977 | 3.387 | 4.047 | 4.400 | 5.845 | |
| 9 | 116225986 | 3.408 | 4.977 | 4.253 | 3.630 | 5.307 | 5.893 | |
| 1 | 59280358 | 3.065 | 6.248 | 3.973 | 5.179 | 5.130 | 3.755 | |
| 7 | 151433271 | 3.577 | 4.432 | 4.265 | 4.739 | 4.743 | 4.814 |
For each AD-associated trait, this table provides a Log-Scale Rank Scores (LRS) where an LRS of 1 corresponds to 90% percentile, LRS = 2 corresponds to 99% percentile, and so forth. Genes within 50 kbp of the region are provided. Genes with prior evidence of being associated with AD given in bold.
Comparison of number and proportion of differentially methylated CpGs in various categories of CpGs.
| # of positives in EWASplus predicted positives (%) total = 38 | # of positive in on-array positives (%) total = 10 | # of positives in EWASplus predicted negatives (%) total = 10 | |
|---|---|---|---|
| Any trait | 25 (65.8) | 6 (60.0) | 3 (10.0) |
| Beta-amyloid | 17 (44.7) | 1 (10.0) | 2 (20.0) |
| Braak staging | 11 (28.9) | 2 (20.0) | 1 (10.0) |
| CERAD | 17 (44.7) | 2 (20.0) | 3 (30.0) |
| Cognitive trajectory | 7 (18.4) | 3 (30.0) | 0 (0.0) |
| Global pathology | 13 (34.2) | 2 (20.0) | 3 (30.0) |
| Neurofibrillary tangles | 16 (42.1) | 5 (50.0) | 1 (10.0) |
Methylation level is measured by targeted bisulfite sequencing experiment.
Summary of performance evaluation on three additional cohorts of samples: London, Mount Sinai, and Arizona.
| Cohort | Brain tissue | AUC | AUPR | F1 | Precision | Recall |
|---|---|---|---|---|---|---|
| London | Prefrontal cortex | 0.697 | 0.272 | 0.325 | 0.248 | 0.471 |
| Mount Sinai | Prefrontal cortex | 0.863 | 0.604 | 0.481 | 0.364 | 0.708 |
| Arizona | Middle temporal gyrus | 0.699 | 0.233 | 0.275 | 0.196 | 0.461 |
Fig. 4Manhattan plot of neurofibrillary tangles EWAS at the HoxA locus on chromosome 7.
a Array-based EWAS p-values. The most significant CpG identified by De Jager et al. are shown with an arrow. b EWASplus predicted LRS. c The landscape of the HoxA cluster genes.