| Literature DB >> 29237399 |
Carlos Ruiz-Arenas1,2,3, Juan R González4,5,6.
Abstract
BACKGROUND: DNA methylation is an epigenetic process that regulates gene expression. Methylation can be modified by environmental exposures and changes in the methylation patterns have been associated with diseases. Methylation microarrays measure methylation levels at more than 450,000 CpGs in a single experiment, and the most common analysis strategy is to perform a single probe analysis to find methylation probes associated with the outcome of interest. However, methylation changes usually occur at the regional level: for example, genomic structural variants can affect methylation patterns in regions up to several megabases in length. Existing DMR methods provide lists of Differentially Methylated Regions (DMRs) of up to only few kilobases in length, and cannot check if a target region is differentially methylated. Therefore, these methods are not suitable to evaluate methylation changes in large regions. To address these limitations, we developed a new DMR approach based on redundancy analysis (RDA) that assesses whether a target region is differentially methylated.Entities:
Keywords: DNA methylation; Epigenomics; Gene expression; Microarray; Region analysis
Mesh:
Year: 2017 PMID: 29237399 PMCID: PMC5729265 DOI: 10.1186/s12859-017-1986-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Analysis of methylation regions using current methods (40 samples)
| Region size | Sim. DMP % | Diff. Means | CpGs in Bumps (%) | R2 (sd) | |
|---|---|---|---|---|---|
| Target region | Random region | ||||
| 500Kb | 30 | 0.3 | 28.10 | 0.647 (0.109) | 0.026 (0.010) |
| 30 | 0.1 | 3.23 | 0.430 (0.098) | 0.025 (0.010) | |
| 30 | 0.05 | 0.21 | 0.346 (0.124) | 0.026 (0.010) | |
| 10 | 0.3 | 9.16 | 0.442 (0.075) | 0.024 (0.008) | |
| 10 | 0.1 | 0.45 | 0.277 (0.101) | 0.026 (0.009) | |
| 10 | 0.05 | 0.02 | 0.205 (0.135) | 0.025 (0.008) | |
| 300Kb | 30 | 0.3 | 28.10 | 0.674 (0.113) | 0.026 (0.013) |
| 30 | 0.1 | 3.76 | 0.417 (0.083) | 0.025 (0.008) | |
| 30 | 0.05 | 0.11 | 0.332 (0.127) | 0.026 (0.011) | |
| 10 | 0.3 | 8.80 | 0.446 (0.084) | 0.026 (0.009) | |
| 10 | 0.1 | 0.30 | 0.278 (0.100) | 0.026 (0.010) | |
| 10 | 0.05 | 0.02 | 0.178 (0.112) | 0.026 (0.011) | |
| 100Kb | 30 | 0.3 | 27.80 | 0.682 (0.125) | 0.025 (0.011) |
| 30 | 0.1 | 3.58 | 0.430 (0.103) | 0.026 (0.011) | |
| 30 | 0.05 | 0.15 | 0.317 (0.122) | 0.026 (0.011) | |
| 10 | 0.3 | 8.72 | 0.453 (0.104) | 0.027 (0.014) | |
| 10 | 0.1 | 0.35 | 0.249 (0.112) | 0.025 (0.010) | |
| 10 | 0.05 | 0.00 | 0.171 (0.123) | 0.025 (0.011) | |
| 50Kb | 30 | 0.3 | 27.70 | 0.705 (0.120) | 0.027 (0.012) |
| 30 | 0.1 | 3.93 | 0.426 (0.096) | 0.027 (0.012) | |
| 30 | 0.05 | 0.10 | 0.308 (0.096) | 0.026 (0.010) | |
| 10 | 0.3 | 8.60 | 0.442 (0.120) | 0.028 (0.017) | |
| 10 | 0.1 | 0.42 | 0.250 (0.100) | 0.026 (0.013) | |
| 10 | 0.05 | 0.03 | 0.149 (0.090) | 0.028 (0.015) | |
Values represent the mean of the 200 simulations. Sim. DMP %, percentage of DMPs introduced in the simulation; Diff. Means, Difference in mean methylation between groups A and B; CpGs in Bumps (%), proportion of CpGs in the modified region that are within a bump with FDR < 0.05. R , R2 estimate of RDA model; Target region, region that includes our simulated DMPs; Random region, region without any of the simulated DMPs
Fig. 1Precision and power of RDA, DMRcate, and blockFinder for simulated sets of 40 samples. To be considered a true positive, DMRcate regions should comprise at least 50% of the simulated region. A DMRcate region including CpGs outside the DMR was considered as a false positive. DMRcate parameters were set to preserve the default smoothing window while allowing for DMRs as big as our target DMRs. Each sub figure represents a different scenario and each shape a different size of the simulated DMR. Results were derived from 200 simulations
Fig. 2RDA biplot of the HER2 region (BRCA dataset) for crude and adjusted models. The adjusted model also includes surrogate variables in the analysis. The points represent the samples and are coloured according to HER2 status (HER2-, Negative; HER2+, Positive). Negative and Positive labels are placed at the centroid of their respective groups. CpGs most strongly associated with the RDA components are represented with labels. R2 is the proportion of variance in methylation data explained by the model. The p-value of the RDA model was computed by sampling the HER2 sample status variable
Fig. 3Manhattan plot of the adjusted analysis in the BRCA dataset analysis. These results reflect the statistical significance of the association between each CpG and HER2 status. CpGs are ordered by chromosome and position. CpGs in the HER2 region are highlighted in green
Fig. 4RDA biplot of the HER2 region (BRCA dataset) using HER2 and ER sample status. The adjusted model also includes surrogate variables in the analysis. The points represent the samples and are coloured according to the combination of HER2 and ER status. ER-, ER+, HER- and HER+ labels are placed at the centroid of their respective groups. The CpGs most associated with the RDA components are represented with labels. R2 is the proportion of variance in methylation data explained by the model. The p-value of the RDA model was computed by sampling the HER2 and ER sample status variables