| Literature DB >> 20852634 |
Christoph Bock1, Eleni M Tomazou, Arie B Brinkman, Fabian Müller, Femke Simmer, Hongcang Gu, Natalie Jäger, Andreas Gnirke, Hendrik G Stunnenberg, Alexander Meissner.
Abstract
DNA methylation plays a key role in regulating eukaryotic gene expression. Although mitotically heritable and stable over time, patterns of DNA methylation frequently change in response to cell differentiation, disease and environmental influences. Several methods have been developed to map DNA methylation on a genomic scale. Here, we benchmark four of these approaches by analyzing two human embryonic stem cell lines derived from genetically unrelated embryos and a matched pair of colon tumor and adjacent normal colon tissue obtained from the same donor. Our analysis reveals that methylated DNA immunoprecipitation sequencing (MeDIP-seq), methylated DNA capture by affinity purification (MethylCap-seq), reduced representation bisulfite sequencing (RRBS) and the Infinium HumanMethylation27 assay all produce accurate DNA methylation data. However, these methods differ in their ability to detect differentially methylated regions between pairs of samples. We highlight strengths and weaknesses of the four methods and give practical recommendations for the design of epigenomic case-control studies.Entities:
Mesh:
Year: 2010 PMID: 20852634 PMCID: PMC3066564 DOI: 10.1038/nbt.1681
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1Outline of the DNA methylation technology comparison
Four methods for DNA methylation mapping were compared on two pairs of samples. The resulting 16 DNA methylation maps were bioinformatically analyzed and benchmarked against each other. In addition, clonal bisulfite sequencing was performed on selected genomic regions to validate DNA methylation differences that were detected exclusively by one method.
Summary of DNA methylation mapping experiments
| Run. No. | Method | Sample name | #lanes | #reads | #reads | Alignment | #reads | #reads | Unique read |
|---|---|---|---|---|---|---|---|---|---|
| 1 | MeDIP | HUES6 ES cell line | 2 | 37,086,239 | 22,798,831 | 61.5% | 12,849,623 | 9,949,208 | 56.4% |
| 2 | MeDIP | HUES8 ES cell line | 2 | 36,078,308 | 24,266,670 | 67.3% | 12,287,174 | 11,979,496 | 50.6% |
| 3 | MeDIP | Primary colon tumor | 2 | 33,453,797 | 18,582,183 | 55.5% | 7,006,484 | 11,575,699 | 37.7% |
| 4 | MeDIP | Matched normal colon tissue | 2 | 37,789,936 | 21,793,567 | 57.7% | 10,360,103 | 11,433,464 | 47.5% |
Technical notes: (i) Between two and four lanes [#lanes] were sequenced on the Illumina Genome Analyzer II, which yielded approximately 30 to 40 million 36-basepair, singled-end reads [#reads (total)] per sample and method. These reads were aligned to the human genome [# reads (aligned)], and the number of unique [#reads (unique)] as well as duplicate reads [#reads (duplicates)] was calculated by counting the first read that aligns to a specific genomic position as unique and all further occurrences of the same genomic position as duplicates (genomic positions were defined by the combination of chromosome, read start position and strand). (ii) Three lanes were sequenced for samples 5–8, one lane per eluate (high, medium and low, as described in the Methods section). (iii) Samples 11 and 12 were part of a sequencing optimization run that resulted in lower sequencing yield and reduced alignment rates. Four lanes were sequenced to reach the target of 30 to 40 million reads per sample and method. (iv) All other samples were sequenced on two lanes each. There were no failed runs that had to be redone, and all sequencing data that were generated for the current project are summarized in this table. (v) The yield of a single lane of Illumina Genome Analyzer II sequencing strongly increased during the course of this study. As of June 2010, we observe total read numbers per lane that average around 40 million for MeDIP and MethylCap, and which are close to 30 million for RRBS. Current alignment rates range from 60% to 80% for typical runs with all three methods.
Figure 2Comparison of DNA methylation maps obtained with MeDIP, MethylCap, RRBS and Infinium
DNA methylation maps were generated using MeDIP (first two tracks, in green), MethylCap (three tracks in blue, grey and red), RRBS (stacked blue tracks) and Infinium (single black track with percentage values) and converted into UCSC Genome Browser tracks. The screenshot shows the HOXA cluster in a human ES cell line (HUES6). Each track represents data from a single sequencing lane (MeDIP, MethylCap, RRBS) or microarray hybridization (Infinium). MeDIP and MethylCap data are visually similar to ChIP-seq data, with peaks in regions that exhibit high density of the target molecule (5-methyl-cytosine) and troughs in regions with low density of methylated cytosines. The height of the peaks represents the number of reads in each genomic interval, for each track normalized to the same genome-wide read count (note the twofold compressed scaling of the MethylCap tracks relative to the MeDIP tracks, which is indicative of higher dynamic range for MethylCap compared to MeDIP). RRBS gives rise to clusters of CpGs with absolute DNA methylation measurements, separated by regions that are not covered due to the reduced-representation property of the RRBS protocol. Each data point corresponds to the methylation level at a single CpG, and dark blue points indicate higher methylation levels than light blue points. Infinium data is represented in a similar way as the RRBS data, and the methylation levels at single CpGs are shown as percentage values. The three grey columns highlight regions that are illustrative of specific properties of the enrichment methods: (1) A promoter region that is CpG-poor and therefore not detectable by MeDIP or MethylCap – independent of its DNA methylation level; (2) a promoter region that contains many CpGs but low levels of DNA methylation, which also results in the absence of MeDIP and MethylCap peaks; and (3) a CpG island that exhibits a strong enrichment peak for both MeDIP and MethylCap although the RRBS data indicates that it is only partially methylated. For reference, the CpG density is indicated by stacked points (black) at the bottom of the diagram, and CpG islands (red) as well as known genes (blue) are listed as described previously64,65. All DNA methylation maps are available online as custom tracks for interactive visualization in the UCSC Genome Browser (http://meth-benchmark.computational-epigenetics.org/).
Figure 3Quantification of DNA methylation with MeDIP, MethylCap and RRBS
Absolute DNA methylation levels were calculated from the data obtained by MeDIP (panel A), MethylCap (panel B) and RRBS (panel C), respectively, and compared to DNA methylation levels determined by the Infinium assay. For MeDIP and MethylCap, sequencing reads were counted in 1-kilobase regions surrounding each CpG that is interrogated by the Infinium assay, and a regression model was used to infer absolute DNA methylation levels. Scatterplots and correlation coefficients were calculated on a test set that was not used for model fitting or feature selection. For RRBS, the DNA methylation level was determined as the percentage of methylated CpGs within 200 basepairs surrounding each CpG that is interrogated by the Infinium assay. Data shown are for the HUES6 human ES cell line, and regions that did not have sufficient sequencing coverage were excluded.
Figure 4Genomic coverage of MeDIP, MethylCap, RRBS and Infinium
Genomic coverage was quantified by the number of DNA methylation measurements that overlap with CpG islands (top row), gene promoters (center row) and a 1-kilobase tiling of the genome (bottom row). For MeDIP and MethylCap, the number of measurements is equal to the number of unique sequencing reads that fall inside each region. For RRBS, it refers to the number of valid DNA methylation measurements at CpGs within each region (one RRBS sequencing read typically yields one measurement, but can also give rise to more than one measurement if it contains several CpGs). For Infinium, the number of measurements is equal to the number of CpGs within each region that are present on the HumanMethylation27 microarray. CpG islands were calculated using CgiHunter (http://cgihunter.bioinf.mpi-inf.mpg.de/), requiring a minimum CpG observed vs. expected ratio of 0.6, a minimum GC content of 0.5 and a minimum length of 700 basepairs64. Promoter regions were calculated based on Ensembl gene annotations, such that the region starts one kilo-base upstream of the annotated transcription start site (TSS) and extends to one kilobase downstream of the TSS. The genomic tiling was obtained by sliding a 1-kilobase window through the genome such that each tile starts at the position where the previous tile ends. No repeat-masking was performed for any of the three types of genomic regions. Data are shown for the HUES6 human ES cell line.
Figure 5Detection of differentially methylated regions with MeDIP, MethylCap and RRBS
Average DNA methylation measurements were calculated for each CpG island and compared between two human ES cell lines (HUES6 and HUES8). Total read frequencies are shown for MeDIP (panel A) and MethylCap (panel B), and mean DNA methylation levels are shown for RRBS (panel C). Regions with insufficient sequencing coverage were excluded. The Venn diagram (panel D) displays the total number and mutual overlap of differentially methylated CpG islands that could be identified by each method. CpG islands were classified as hypermethylated or hypomethylated (depending on the directionality of the difference) if the absolute DNA methylation difference exceeded 20% (for RRBS) or if there was at least a twofold difference in read number between the two samples (for MeDIP and MethylCap) – but only if Fisher’s exact test with multiple-testing correction gave rise to an estimated false-discovery rate of differential DNA methylation that was less than 0.1.
Validation of method-specific DMRs for MeDIP, MethylCap and RRBS
| DMR location | Description | Experimental validation | MeDIP | MethylCap | RRBS |
|---|---|---|---|---|---|
| Intergenic CpG island ~30kb up-stream of GRID1, partial overlap with degenerate L1 element | HUES6: 38/56 (68%) methylated CpGs HUES8: 26/44 (59%) methylated CpGs ➔ | hypermethylated (q=1.1E-04) | insignificant (q=0.59) | insignificant (q=0.43) | |
| CpG island overlapping with the terminal exon of TRIM72 | HUES6: 342/362 (94%) methylated CpGs HUES8: 466/523 (89%) methylated CpGs ➔ marginally | insignificant (q=0.73) | insufficient coverage | ||
| CpG island overlapping with the putative promoter region of RPS6KC1 | HUES6: 53/60 (88%) methylated CpGs HUES8: 45/50 (90%) methylated CpGs ➔ | hypermethylated (q= 3.0E-06) | insignificant (q=0.97) | insignificant (q=0.29) | |
| CpG island overlapping with the putative promoter region of REM1 | HUES6: 5/72 (7%) methylated CpGs HUES8: 78/84 (93%) methylated CpGs ➔ | insufficient coverage | insufficient coverage | ||
| CpG island overlapping with the putative promoter region of RBM43 and a known copy-number variation | HUES6: 161/208 (77%) methylated CpGs HUES8: 9/104 (9%) methylated CpGs ➔ | insignificant (q=0.18) | insufficient coverage | ||
| Intergenic CpG island ~60kb up-stream of NUFIP1, partial overlap with degenerate Alu element | HUES6: 80/88 (91%) methylated CpGs HUES8: 41/79 (52%) methylated CpGs ➔ | insignificant (q=0.40) | insufficient coverage | ||
| CpG island overlapping with an internal exon and intron of IGF2BP2 | HUES6: 5/90 (6%) methylated CpGs HUES8: 88/90 (98%) methylated CpGs ➔ | insufficient coverage | insignificant (q=0.18) | ||
| Intergenic CpG island ~20kb up-stream of DYNC1LI1 | HUES6: 41/121 (34%) methylated CpGs HUES8: 130/143 (91%) methylated CpGs ➔ | insufficient coverage | insignificant (q=0.52) |
Eight method-specific DMRs were selected based on the DNA methylation maps of MeDIP, MethylCap and RRBS and experimentally validated using bisulfite sequencing. The validation candidates were manually chosen from the list of CpG islands (Figure 5), such that each region is identified as DMR by only one method, while the other two methods do not detect any significant (or suggestive) difference in DNA methylation between the two human ES cell lines (HUES6 and HUES8). The validation was performed by clonal bisulfite sequencing with an average of 11 clones per region in each of the two human ES cell lines. The p-values in column 3 were calculated from the clonal bisulfite sequencing data using Fisher’s exact test, based on the DNA methylation levels of individual CpGs. The q-values in column 4 to 6 were derived from the DNA methylation maps as described in the Methods section. One out of three MeDIP-specific DMRs, three out of three MethylCap-specific DMRs and two out of two RRBS-specific DMRs were confirmed by clonal bisulfite sequencing data (bold print). All genomic coordinates refer to the amplicon on which the validation was performed, which was chosen such that it overlaps with the most differentially methylated region within the selected CpG islands. The coordinates are given relative to the NCBI36 (hg18) assembly of the human genome. A detailed documentation of the validation experiments is available online (Supplementary File 1).