| Literature DB >> 18413340 |
Christoph Bock1, Jörn Walter, Martina Paulsen, Thomas Lengauer.
Abstract
Genomic DNA methylation profiles exhibit substantial variation within the human population, with important functional implications for gene regulation. So far little is known about the characteristics and determinants of DNA methylation variation among healthy individuals. We performed bioinformatic analysis of high-resolution methylation profiles from multiple individuals, uncovering complex patterns of inter-individual variation that are strongly correlated with the local DNA sequence. CpG-rich regions exhibit low and relatively similar levels of DNA methylation in all individuals, but the sequential order of the (few) methylated among the (many) unmethylated CpGs differs randomly across individuals. In contrast, CpG-poor regions exhibit substantially elevated levels of inter-individual variation, but also significant conservation of specific DNA methylation patterns between unrelated individuals. This observation has important implications for experimental analysis of DNA methylation, e.g. in the context of epigenome projects. First, DNA methylation mapping at single-CpG resolution is expected to uncover informative DNA methylation patterns for the CpG-poor bulk of the human genome. Second, for CpG-rich regions it will be sufficient to measure average methylation levels rather than assaying every single CpG. We substantiate these conclusions by an in silico benchmarking study of six widely used methods for DNA methylation mapping. Based on our findings, we propose a cost-optimized two-track strategy for mammalian methylome projects.Entities:
Mesh:
Year: 2008 PMID: 18413340 PMCID: PMC2425484 DOI: 10.1093/nar/gkn122
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Functions for simulating experimental methods of DNA methylation mapping in silico
| Method name | References | Method type | Comment | Simulation function |
|---|---|---|---|---|
| Differential methylation hybridization (DMH) | Huang | Methylation-specific fodigestion, qualitative | Quantification is difficult due to different oligomer affinities and DNA melting temperatures | A1: HiMeth if #CpGpattern* & meth ≥ 50% ≥3 A2: HiMeth if #CpGpattern* & meth ≥ 50% ≥2 A3: HiMeth if #CpGpattern* & meth ≥ 50% ≥1 *pattern in {ACGT, CCGC, CCGG, GCGC} |
| Sequencing of methylation- specific digestion products | Rollins | Methylation-specific digestion, quantitative | Quantification is possible if sequencing depth is high | B1: Profile(all CpGs in ACGT patterns) B2: Profile(all CpGs in CCGC patterns) B3: Profile(all CpGs in CCGG patterns) B4: Profile(all CpGs in GCGC patterns) B5: Profile(all CpGs in all four patterns) |
| Methyl-DNA immunoprecipitation plus tiling microarrays (MeDIP-chip) | Weber | Immunoprecipitation, qualitative | Quantification is difficult due to different oligomer affinities and DNA melting temperatures | C1: HiMeth if #CpGmeth ≥ 67% ≥4* C2: HiMeth if #CpGmeth ≥ 50% ≥3* C3: HiMeth if #CpGmeth ≥ 33% ≥2* *minimum value per 200 bp |
| Sequencing of MeDIP-generated DNA libraries (MeDIP-seq) | Established at several labs, e.g. at the Max Planck Institute for Molecular Genetics (H. Lehrach, personal communication) | Immunoprecipitation, quantitative | Quantification is possible if the enrichment is statistically corrected for local differences in CpG density | D1: Value(Mean(all CpGs)) D2: Value(Median(all CpGs)) |
| Microarray hybridization of bisulfite-converted DNA | Adorjan | Bisulfite conversion, qualitative | Quantification has been attempted but is often unreliable | E1: HiMeth if mean (all CpGs) ≥ 67% E2: HiMeth if mean (all CpGs) ≥ 50% E3: HiMeth if mean (all CpGs) ≥ 33% |
| Direct sequencing of bisulfite-converted DNA | Eckhardt | Bisulfite conversion, quantitative | Quantitative and applicable to either all CpGs of an amplicon (by Sanger sequencing) or to a subset (by primer extension or pyrosequencing) | F1: Profile(all CpGs) F2–F5: Profile(1 to 4 random CpGs) F6: Profile(center CpG)] F7: Profile(first and last CpG) F8: Profile(CpGs at positions 1/3 and 2/3) F9: Profile(first, center and last CpG) F10–F20: Profile (0%, 10%, … , 100% of CpGs, rounded to the next integer value and randomly selected) |
| Rule-based guess (for comparison as a negative control) | None | No DNA methylation data is taken into account | Worst-case baseline that any method should compare favorably with | G1: Value(0% methylated) G2: Value(50% methylated) G3: Value(100% methylated) G4: Value(LowMeth) G5: Value(MeanMeth) G6: Value(HiMeth) G7: Profile(random methylation values) |
This table summarizes the experimental methods for DNA methylation mapping that are covered in this study, and it describes the functions that were constructed to simulate them in silico (rightmost column). The simulation functions are written in an abbreviated notation, as if-clauses, as profile statements or as value assignments. (i) For if-clause rules, a methylation constant named HiMeth is assigned to all CpGs in amplicons identified as high-methylation and a constant named LowMeth is assigned to all CpGs in low-methylation amplicons. We set HiMeth = 80.39% and LowMeth = 13.13%, which are the mean methylation levels of all amplicon that exceed or fall below 50% methylation, respectively, in the HEP dataset. (ii) For profile statements, a subset of CpGs that fulfill the condition in brackets are selected and the methylation values of all unselected CpGs are determined by interpolation or extrapolation. (iii) Value assignments are a special case of profile statements, in which no CpGs are selected and the methylation values of all CpGs are set to a constant value (MeanMeth = 56.91% for the HEP dataset). #CpGcondition stands for the number of CpGs in the amplicon that fulfill the condition. The source code implementing each of these rules is available on request (written in the Python programming language).
Figure 1.DNA methylation variation among healthy individuals (schematic figure). This figure displays artificial DNA methylation data for two amplicons with two unrelated samples/profiles each, which were designed to illustrate the effect of the three measures of inter-individual variation used in this study. The typical amplicon with high overall methylation (blue profiles, top) has a relatively high pairwise deviation between means (v3) and a pairwise deviation between high-resolution profiles (v1) that is substantially lower than the deviation between mean and high-resolution profile (v2), which is reflected in a substantial correlation between the rising and falling of the DNA methylation profile curves over the length of the amplicon. In contrast, the typical amplicon with low overall methylation (red profiles, bottom) has a low pairwise deviation between means (v3) and similar values for pairwise deviation between high-resolution profiles (v1) and deviation between mean and high-resolution profile (v2), indicating that the fluctuations in the profiles are not inter-individually conserved and presumably random.
Figure 2.Effect of average amplicon methylation (left) and overlap with bona fide CpG islands (right) on inter-individual variation of DNA methylation. This figure shows the means of the three measures of DNA methylation variation as bar plots. In the left panel, values are reported separately for the top-25% most unmethylated amplicons with an average amplicon methylation of <11.5% (this threshold is motivated in the Materials and Methods section) and for the remaining 75% of amplicons. In the right panel, distinction is made between amplicons that overlap with a bona fide CpG island (4) and those that do not. In both cases, error bars represent 95% confidence intervals under the assumption of normal distribution and the P-values in the legends are based on two-sample, two-sided, t-tests between the group means for each measure.
Figure 3.Benchmarking results for experimental mapping of DNA methylation. This figure displays the results of in silico benchmarking of different DNA methylation mapping methods for all amplicons. The y-axis shows vmethod values for all experimental methods included in this study (A1–F9, described in Table 1) and for seven negative controls, which are based on guessing rules rather than on experimental data (G1–G7, described in Table 1). The standard boxplot format is used (boxes show center quartiles, whiskers extend to the most extreme data point which is no more than 1.5 times the interquartile range from the box) and outliers are hidden.
Correlation between high-resolution improvement and its potential predictors
This table displays pairwise Pearson correlation coefficients for the accuracy improvement of high-resolution methylation mapping (first row) and several potential factors of influence. Orange (light) boxes mark strong positive correlation and blue (dark) boxes mark strong negative correlation.