| Literature DB >> 27258058 |
Qiuyi Zhang1, Yang Zhao1, Ruyang Zhang1, Yongyue Wei1, Honggang Yi1, Fang Shao1, Feng Chen1.
Abstract
An epigenome-wide association study (EWAS) is a large-scale study of human disease-associated epigenetic variation, specifically variation in DNA methylation. High throughput technologies enable simultaneous epigenetic profiling of DNA methylation at hundreds of thousands of CpGs across the genome. The clustering of correlated DNA methylation at CpGs is reportedly similar to that of linkage-disequilibrium (LD) correlation in genetic single nucleotide polymorphisms (SNP) variation. However, current analysis methods, such as the t-test and rank-sum test, may be underpowered to detect differentially methylated markers. We propose to test the association between the outcome (e.g case or control) and a set of CpG sites jointly. Here, we compared the performance of five CpG set analysis approaches: principal component analysis (PCA), supervised principal component analysis (SPCA), kernel principal component analysis (KPCA), sequence kernel association test (SKAT), and sliced inverse regression (SIR) with Hotelling's T2 test and t-test using Bonferroni correction. The simulation results revealed that the first six methods can control the type I error at the significance level, while the t-test is conservative. SPCA and SKAT performed better than other approaches when the correlation among CpG sites was strong. For illustration, these methods were also applied to a real methylation dataset.Entities:
Mesh:
Year: 2016 PMID: 27258058 PMCID: PMC4892473 DOI: 10.1371/journal.pone.0156895
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
18-21]. In our analysis, we used the first k PCs instead of p CpGs to test the association with the disease outcome, in which k is the number of PCs that explain more than 80% percent of the total variation. A k-df likelihood ratio test can be used to test the significance of the CpG set.
Parameter settings of virtual datasets.
| Simulations | Number of causal CpGs | Location of causal CpGs | Correlation coefficient (r) | Values of |
|---|---|---|---|---|
| Scenario 1 | ||||
| 1.1 | 0 | - | 0.2/0.4/0.6/0.8 | - |
| 1.2 | 1 | 1 | 0.2/0.4/0.6/0.8 | 0.5/0.6/0.7/0.8/0.9/1.0 |
| 1.3 | 2 | 1 and 2 | 0.2/0.4/0.6/0.8 | 0.1/0.2/0.3/0.4/0.5 |
| Scenario 2 | ||||
| 2.1 | 0 | - | 0.2/0.4/0.6/0.8 | - |
| 2.2 | 1 | 1 | 0.2/0.4/0.6/0.8 | 0.5/0.6/0.7/0.8/0.9/1.0 |
| 2.3 | 1 | 5 | 0.2/0.4/0.6/0.8 | 0.5/0.6/0.7/0.8/0.9/1.0 |
| 2.4 | 1 | 10 | 0.2/0.4/0.6/0.8 | 0.5/0.6/0.7/0.8/0.9/1.0 |
| 2.5 | 2 | 1 and 5 | 0.2/0.4/0.6/0.8 | 0.1/0.2/0.3/0.4/0.5 |
| 2.6 | 2 | 1 and 10 | 0.2/0.4/0.6/0.8 | 0.1/0.2/0.3/0.4/0.5 |
| 2.7 | 2 | 5 and 10 | 0.2/0.4/0.6/0.8 | 0.1/0.2/0.3/0.4/0.5 |
Parameter settings based on real methylation datasets.
| Simulations | Number of causal CpGs | Location of causal CpGs | Values of |
|---|---|---|---|
| 1 | 0 | - | - |
| 2 | 1 | 1 | 4.0/5.0 |
Empirical Type I error rates at α = 0.05 level under different scenarios.
| 0.2 | 0.0504 | 0.0504 | 0.0546 | 0.0456 | 0.0492 | 0.0412 | 0.0486 |
| 0.4 | 0.0512 | 0.0506 | 0.0500 | 0.0490 | 0.0536 | 0.0498 | 0.0452 |
| 0.6 | 0.0472 | 0.0530 | 0.0572 | 0.0478 | 0.0438 | 0.0506 | 0.0340 |
| 0.8 | 0.0470 | 0.0514 | 0.0456 | 0.0446 | 0.0460 | 0.0468 | 0.0244 |
| 0.2 | 0.0494 | 0.0496 | 0.0518 | 0.0560 | 0.0552 | 0.0544 | 0.0450 |
| 0.4 | 0.0424 | 0.0478 | 0.0482 | 0.0504 | 0.0486 | 0.0524 | 0.0422 |
| 0.6 | 0.0476 | 0.0464 | 0.0498 | 0.0510 | 0.0454 | 0.0512 | 0.0356 |
| 0.8 | 0.0420 | 0.0538 | 0.0506 | 0.0484 | 0.0482 | 0.0514 | 0.0184 |
Fig 1(a) Simulated power at single causal CpG model based on 10 CpGs from the same distribution (mean methylation level = 0.6). The regression coefficient in the disease model, β = 0.7. (b) Simulated power at single causal CpG model based on 10 CpGs from different distributions. The 5th CpG is set as the causal CpG. The regression coefficient in the disease model, β = 0.7.
Fig 2(a) Simulated power at two causal CpGs model based on 10 CpGs from the same distribution (mean methylation level = 0.6). The regression coefficients in the disease model, β = β = 0.5. (b) Simulated power at two causal CpGs model based on 10 CpGs from different distributions. 1st and 5th CpGs are set as the causal CpGs. The regression coefficients in the disease model, β = β = 0.5.
Empirical Type I error rates based on a real methylation dataset.
| Gene | PCA | SPCA | KPCA | SKAT | SIR | ||
|---|---|---|---|---|---|---|---|
| 0.0572 | 0.0578 | 0.0554 | 0.0480 | 0.0568 | 0.0584 | 0.0372 | |
| 0.0537 | 0.0556 | 0.0483 | 0.0514 | 0.0582 | 0.0518 | 0.0422 |
Fig 3(a) Simulated power based on the PTPRD gene. (b) Simulated power based on the MLH1 gene.
CpG set analysis results of DNA methylation datasets from epigenome studies.
| Gene | Number of CpGs | |||||||
|---|---|---|---|---|---|---|---|---|
| PCA | SPCA | KPCA | SKAT | SIR | ||||
| 6 | 1.11E-03 | 4.06E-05 | 2.42E-03 | 5.36E-04 | 5.39E-03 | 3.74E-03 | 1.88E-03 | |
| 51 | 3.32E-06 | 1.03E-10 | 9.55E-08 | 1.57E-08 | 2.80E-02 | 1.23E-03 | 4.08E-09 | |