| Literature DB >> 24589664 |
Douglas D Baumann1, R W Doerge2.
Abstract
By incorporating annotation information into the analysis of next-generation sequencing DNA methylation data, we provide an improvement in performance over current testing procedures. Methylation analysis using genome information (MAGI) is applicable for both unreplicated and replicated data, and provides an effective analysis for studies with low sequencing depth. When compared with current tests, the annotation-informed tests provide an increase in statistical power and offer a significance-based interpretation of differential methylation.Entities:
Keywords: annotation informed; differential methylation; epigenetics; epigenomics; statistical bioinformatics; testing methylation
Mesh:
Substances:
Year: 2014 PMID: 24589664 PMCID: PMC4063829 DOI: 10.4161/epi.28322
Source DB: PubMed Journal: Epigenetics ISSN: 1559-2294 Impact factor: 4.528

Figure 1. Representation of the data structure and testing framework for NGS differential methylation studies (forward strand shown). For each cytosine, the number of methylated reads (filled circles) and unmethylated reads (unfilled circles) are recorded. These values are also recorded in binary representation for each cytosine, where a filled triangle indicates that the proportion of methylated reads for the given cytosine has exceeded a predetermined threshold (e.g., 40%), and an unfilled triangle indicates that this proportion was not exceeded. Tests for differential methylation are performed for each individual cytosine using the read information with subsequent summarization over the region (MAGIC), or with a single region-level test (MAGIG) using the summarized read information.

Figure 2. Simulation results in unreplicated settings. Panel columns represent the separation of binomial probabilities for read methylation (Online Methods). Panel rows represent the transition matrices used in the Hidden Markov Model (HMM) process used to generate cytosine methylation status (Online Methods). Increases in the statistical power of the MAGIG over MAGIC are evident across the simulation settings. Observed false discovery rates (FDRs) are lower in MAGIG (2–19%) when compared with MAGIC (4–43%). However, both FDRs increase with greater separation of binomial probabilities and decrease with greater correlation between cytosines. Modest statistical power increases are observed when the average sequencing depth is increased from 7 to 15 with similar observed FDRs.
Table 4. Exploration and impact of low-coverage filtering on significance results from MAGIC and MAGIG for 33,759 analyzed gene regions
| Filtering Level | % Filtered | MAGIC | MAGIG | Intersection |
| No Filtering | 0 | 216 | 3146 | 181 |
| 5 | 40 | 612 | 2926 | 310 |
| 7 | 51 | 669 | 2132 | 275 |
| 10 | 67 | 651 | 1528 | 246 |
Arabidopsis data (Col-0 vs. met1–3) from Lister et al. (2008) were analyzed using both MAGIC and MAGIG with varying degrees of low-coverage filtering. Significance thresholds of 0.10 and 0.05 were employed as example thresholds for each method, respectively. The “Filtering Level” represents the threshold for which individual cytosines are removed from downstream analyses, while % filtered indicates the percentage removed as a result of the filtering. Specifically, if the coverage of a given cytosine is below this threshold in either sample, the cytosine information is not used. As the filtering becomes more strict (i.e., higher filtering level), the number of significant subsets decreases using MAGIG, and increases using MAGIC. A balance between increased detection for MAGIC and decreased detection for MAGIG occurs when the filtering level is set to 5.
Table 1. Representation of the data structure and testing framework MAGI differential methylation studies
| Cytosine Index | ||||||||
|---|---|---|---|---|---|---|---|---|
| Rep. | 1 | Summary Information | ||||||
| 1 | (m111 g,D111 g) | (m11cg,D11cg) | (m11Cg, D11Cg) | → | ||||
| Trt1 | (m1j1g, D1j1g) | (m1jcg, D1jcg) | (m1jCg, D1jCg) | → | ||||
| (m1J1g, D121 g) | (m1Jcg, D1Jcg) | (m1JCg, D1JCg) | → | |||||
| 1 | (m211 g, D211 g) | (m21cg, D21cg) | (m21Cg, D21Cg) | → | ||||
| Trt2 | (m2j1g, D2j1g) | (m2jcg, D2jcg) | (m2jCg, D2jCg) | → | ||||
| (m2J1g, D2J1g) | (m2Jcg, D2Jcg) | (m2JCg, D2JCg) | → |
For each treatment i, replicate j, cytosine c, and gene g, the number of methylated reads (mijcg) and the sequencing depth (total number of reads mapped to the cytosine, Dijcg) are recorded. MAGIC tests for differential methylation at each cytosine using a Fisher’s Exact Test (no replicates) or a logistic regression (replicates); if the proportion of positive base-pair decisions exceeds a predefined threshold, the subset is declared differentially methylated. MAGIG first summarizes the read information for each cytosine within each treatment and replicate, and then performs tests on this summarized information. For treatment i and replicate j for gene g, Mijg represents summary information on the number of cytosines for which mijcg/Dijcg exceeds a predetermined threshold TS. Given Mijg and the subset length Cg, tests similar to those used for the base-pair level framework (MAGIC) can be employed.
Table 2. Cytosine-specific methylation status transition matrices for methylated genes
| UM | M | UM | M | UM | M | |||||
| UM | 0.35 | 0.65 | UM | 0.50 | 0.50 | UM | 0.35 | 0.65 | ||
| M | 0.15 | 0.85 | M | 0.15 | 0.85 | M | 0.35 | 0.65 |
“M” and “UM” represent methylated and unmethylated status, respectively. Unmethylated gene transition matrices are formed similarly, with elements on each diagonal interchanged. Transition matrix (a) forms chains with longer homogeneous strings of methylated cytosines, while matrices (b) and (c) allow more unmethylated cytosines to be generated when the gene is methylated.
Table 3. Binomial probabilities for assigning methylated status (MR) to a read for unmethylated and methylated cytosines (UC and MC, respectively)
| Setting | Separation | P(MR|UC) | P(MR|MC) |
| 1 | Large | 0.10 | 0.80 |
| 2 | Medium | 0.15 | 0.70 |
| 3 | Small | 0.15 | 0.60 |
Setting 1 indicates a large separation of read probabilities, and settings 2 and 3 decrease this level of separation.

Figure 3. Simulation results in replicated settings. Panel columns represent the separation of binomial probabilities for read methylation (Methods). Panel rows represent the transition matrices used in the Hidden Markov Model (HMM) process used to generate cytosine methylation status (Methods). Increases in power of MAGIG were evident as the binomial probabilities of methylation increase in separation (i.e., from “Small” to “Large”). Very small power increases can be observed when increasing the average sequencing depth from 7 to 15. In general, replication may be sufficient to overcome the differences in sequencing depth, as well as the differences between MAGIC and MAGIG.