| Literature DB >> 29801521 |
Thadeous J Kacmarczyk1, Mame P Fall2, Xihui Zhang2, Yuan Xin2, Yushan Li2, Alicia Alonso2, Doron Betel2,3.
Abstract
BACKGROUND: DNA methylation in CpG context is fundamental to the epigenetic regulation of gene expression in higher eukaryotes. Changes in methylation patterns are implicated in many diseases, cellular differentiation, imprinting, and other biological processes. Techniques that enrich for biologically relevant genomic regions with high CpG content are desired, since, depending on the size of an organism's methylome, the depth of sequencing required to cover all CpGs can be prohibitively expensive. Currently, restriction enzyme-based reduced representation bisulfite sequencing and its modified protocols are widely used to study methylation differences. Recently, Agilent Technologies, Roche NimbleGen, and Illumina have ventured to both reduce sequencing costs and capture CpGs of known biological relevance by marketing in-solution custom-capture hybridization platforms. We aimed to evaluate the similarities and differences of these four methods considering each platform targets approximately 10-13% of the human methylome.Entities:
Keywords: 5mC; Bisulfite sequencing; CpG; DNA methylation; Methylome capture; RRBS
Mesh:
Year: 2018 PMID: 29801521 PMCID: PMC5970534 DOI: 10.1186/s13072-018-0190-4
Source DB: PubMed Journal: Epigenetics Chromatin ISSN: 1756-8935 Impact factor: 4.954
Protocol comparison
| DNA requirement | ERRBS | SSMethylSeq | CpGiant | TruSeqEpic | WGBS-PBAT | ||
|---|---|---|---|---|---|---|---|
| 75 ng > 40 kb | 1 µg | 3 µg | 0.25 µg | 1 µg | 0.5 µg | 100 ng > 40 kb | |
| DNA processing | MspI digestion to completion followed by fractionation of 84–334 bp | Sonication to 150–200 bp | Sonication to 180–220 bp | Sonication to 180–220 bp | Single stranded and fragmented during bisulfite conversion | ||
| Enrichment method | Size fractionation of 84–334 bp sizes | Hybridization to RNA capture probes | Hybridization to oligo probes containing fully, partially and unmethylated cytosines from both strands | Hybridization to oligo probes of stranded design | None | ||
| Bisulfite conversion step | Post-adapter ligation | Post-hybridization capture | Pre-hybridization capture | Post-hybridization capture | Pre-adapter tagging | ||
| Zymo Research Bisulfite Conversion kit | EZ DNA Methylation (50 °C, 55 cycles) | EZ DNA Methylation-Gold (64 °C, 2.5 h) | EZ-Methylation Lightning (54 °C, 1 h) | EZ-Methylation Lightning (54 °C, 2 h) | EZ DNA Methylation-Gold | ||
| Total PCR amplification cycles | 18; Post enrichment and bisulfite conversion | 14; Post enrichment and bisulfite conversion (8 for amplification and 6 for Indexing) | 29; 13 post-bisulfite conversion and 16 post enrichment | 27; 11 post-bisulfite conversion and 16 post enrichment | 11; Post enrichment and bisulfite conversion | 10; Post-bisulfite conversion | |
| DNA Polymerase (uracil tolerant) | FastStart Taq (Roche) | Taq2000 (Agilent Technologies) | HiFi HotSart Uracil + (Kapa Biosystems) | HiFi HotSart Uracil + (Kapa Biosystems) | FailSafe Enzyme (Epicentre) | ||
| Predicted number of targeted CpG sites | 6.6 M | 3.7 M | 5.6 M | 3.3 M | 56 M | ||
| Relative price (library preparation + PE100 sequencing @300 M reads) | 13.5% | 15.5% | 15.5% | 3.3% (@75 M reads) | 100% | ||
Fig. 1Length and overlap of design regions and MspI predicted regions. a Boxplot showing the distributions of targeted region’s lengths. Region lengths and CpG sites, for the capture methods, were extracted from the designed regions provided by the manufacturer and for ERRBS, extracted from fragments 84–334 bp where the genome was computationally digested with MspI. SSMethylSeq and ERRBS show similar region proportions, while CpGiant regions are generally longer and both CpGiant and TruSeqEpic are more variable in length. b Barplots showing the pairwise overlap comparison of each of the platform’s CpG coverage, number of region CpGs overlapping (red), number of CpGs unique to platform regions (blue). The capture platforms have a higher degree of common regions with each other than either one with ERRBS
Fig. 2Read alignment and strand parity. a Percent alignment of uniquely aligned reads (green), ambiguously mapped reads (gray), reads with no alignment (pink) and rejected reads (blue). b The fraction of CpG’s covered at ≥ 10× coverage grouped by strand, forward/+ strand (blue) and the reverse/− strand (pink). The strand specific protocols of SSMethylSeq and TruSeqEpic platforms are evident by the high proportion of reads mapped to the reverse strand. Note that for TruSeqEpic analysis we combined the sequencing results of two libraries and for WGBS-PBAT we combined the results of two sequencing lanes to obtain a higher sequencing coverage
Fig. 3Strand symmetry of methylation values MA-plots. MA-plots of the log average of the methylation levels (A) on the x-axis and log ratio of the methylation levels (M) on the y-axis, between complementary CpG positions. Median absolute deviation (MAD) values are used to evaluate the agreement in methylation levels. The bimodal nature of methylation patterns (mostly unmethylated or methylated) is reflected in the high density at both ends of the x-axis. The artificially discordant sites introduced during ERRBS library preparation are identified as increased density off the center line at the low methylation values (at A < 0 range). a ERRBS_A, b ERRBS_B, c SSMethylSeq_A, d SSMethylSeq_B, e CpGiant_A, f CpGiant_B, g TruseqEpic, and h WGBS-PBAT
Fig. 4Platform CpG-unit region coverage and CpG-unit overlap. a Number of CpG-units identified in targeted and off-target regions by each platform. WGBS-PBAT (orange) covers ~ 14.4 M CpG-units; however, there is no notion of targeted regions in this platform. The targeted platforms predicted total CpG-units are depicted as gray bar and coverage of the CpG-units in the predicted regions (on-target) are shown in blue. CpG-units outside the predicted set (off-target) are shown in red bars. b Barplot of the percent recovery of targeted CpG-units. c Density plots showing the portion of the regions covered by the data where the y-axis is the length of the region from 0 to 100% and the color scale is fraction of CpG-units covering a location of the region. Primarily the end of the region is covered by all methods, however ERRBS shows abundant coverage in the start of the region and to a lesser degree throughout the region. These plots demonstrate that the reduced number of recovered CpG-units in ERRBS relative to the other platforms is attributed to increased number of missed CpG-units shown as increased density at 0% region. d Overlap of CpG-units across the four platforms. The UpSet visualization technique [43] for set intersections is displayed as matrix layout. Horizontal bars on the lower left indicate the total number of annotated CpG-units in the set. Dark circles in the matrix (lower right) indicate sets that are part of the intersection. Bars in the main plot area (upper right) indicate the number of intersecting CpG-units for the sets represented by the dark circles
Fig. 5Inter-platform CpG-unit overlap and methylation levels concordance. The upper triangle shows barplots for the number of overlapping CpG-units between any two samples for the portion of off-target non-overlapping (blue), on-target non-overlapping (salmon), off-target overlapping (shaded blue) on-target overlapping (shaded salmon). The lower triangle shows MA-plots of common CpG-units between any two samples, where M is the log ratio of the methylation levels and A is the average log methylation level from the two platforms. There are blue clouds that at log scale show the variance in methylation at low levels. See Additional file 1: Figure S3 for all pairwise comparisons
Fig. 6Summary of coverage and representation of annotated genomic regions by each platform. A region here is defined as a specific genomic feature. a The fraction of the regions covered by at least 1 CpG-unit for each sample for each region. b The fraction of a region’s complement of CpG-units covered for each sample for each region. c The fraction of a sample’s CpG-units annotated with a genomic feature
Fig. 7CpG-unit annotations overlap. Overlaps of all CpG-unit annotations across all platforms. The UpSet visualization technique [43] for set intersections is displayed as matrix layout. Horizontal bars on the lower left indicate the total number of annotated CpG-units in the set. Dark circles in the matrix (lower right) indicate sets that are part of the intersection. Bars in the main plot area (upper right) indicate the number of intersecting CpG-units for the sets represented by the dark circles. Roughly 29% of the annotations are common to all four suggesting that while there may be similar proportions of CpG-unit annotations (i.e., they may be covering similar regions), they are covering different loci within those regions