| Literature DB >> 24206606 |
Weilong Guo, Petko Fiziev, Weihong Yan, Shawn Cokus, Xueguang Sun, Michael Q Zhang, Pao-Yang Chen1, Matteo Pellegrini.
Abstract
BACKGROUND: DNA methylation is an important epigenetic modification involved in many biological processes. Bisulfite treatment coupled with high-throughput sequencing provides an effective approach for studying genome-wide DNA methylation at base resolution. Libraries such as whole genome bisulfite sequencing (WGBS) and reduced represented bisulfite sequencing (RRBS) are widely used for generating DNA methylomes, demanding efficient and versatile tools for aligning bisulfite sequencing data.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24206606 PMCID: PMC3840619 DOI: 10.1186/1471-2164-14-774
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The three main steps in the workflow of BS-Seeker2. (1) Index-building. Indexes for RRBS and WGBS are built separately from a three-letter converted genome. Four index instances are built to account for the asymmetric bisulfite-conversion of the two strands and properties of non-directional libraries. (2) Aligning reads to the indexes. Both WGBS and RRBS reads are converted to three-letters prior to mapping. For RRBS, adapters should be removed first. Converted reads are mapped onto four index instances for non-directional libraries (two instances for directional libraries), and mapping to each index instance will report two best hits. Multiple hits and mismatch numbers are checked before being reported as alignment results. The C-to-T match is regarded as a mismatch in this step, and is checked by the mismatch criteria. (3) Calling methylation level for each site. The user can decide whether to filter the un-converted reads in this step. BS-Seeker2 provides detailed outputs (BAM/SAM, wiggle, CGmap and ATCGmap files). Both the wiggle file and the BAM file can be directly imported in a genome browser, such as IGV. BS-Seeker2 is also integrated into the Galaxy web interface platform.
Figure 2Gapped alignment and local alignment. (A) An example shows how gapped alignment and local alignment work and occurrence condition. (B) Venn chart shows the percentages of the total reads from real WGBS testing data set that could be mapped by gapped alignment or local alignment utilizing Bowtie2-local but not by Bowtie.
Figure 3A diagram illustrating how specific indexes are built for RRBS. The original genome is cut by restriction enzyme(s) into fragments. Fragments with lengths in a specific range (e.g. from 50 bp to 300 bp) are selected, whereas unselected regions are masked. The unmasked genome is used for building the index.
Performance comparisons for mapping simulated RRBS reads to RR and WG indexes
| Mappability | 74.04% | ← | 72.52% | 74.41% | ← | 72.95% |
| User time | 1m23s | ← | 4m18s | 1m20s | ← | 4m37s |
| Accuracy | 100.00% | 100.00% | 99.33% | ← | 97.92% | |
100 k reads of length 50 bp are simulated from the RR genome. Mapping is done using BS-Seeker2 (Bowtie). Mapping to the reduced represented (RR) index is much faster than mapping to the whole genome (WG) index. For error-free samples, the mappability to RR is higher than WG by avoiding pseudo-multiple hits. For error-containing samples, mapping to the RR index will result in higher accuracy than mapping to the WG index. Arrows indicate the improvement directions.
Figure 4Filtering reads with incomplete bisulfite conversion. (A) Distribution of the unconverted ratio of CH sites (H = A, C, T) in phage DNA reads which has at least one CH site unconverted. Phage DNA is free of DNA methylation and used as a control. The distribution chart indicates two different categories: sporadic (red) and dense (blue) methylation. BS-Seeker2 provides an option for removing reads with dense non-CpG methylation. (B) Filtering un-converted reads makes the methylation levels of two technical replicates more similar. Error bar, SD.
Performance comparison of BS aligners on WGBS data
| | ||||||
|---|---|---|---|---|---|---|
| Simulation: error-free | ||||||
| map | 91.65% | 91.50% | 91.65% | 87.78% | 91.65% | 91.81% |
| acc | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% |
| Simulation: error-containing | ||||||
| map | 91.62% | 90.51% | 91.69% | 86.90% | 91.64% | 91.90% |
| acc | 99.22% | 99.73% | 99.82% | 99.86% | 99.80% | 99.82% |
| Real data | ||||||
| map | 83.80% | 72.94% | 71.89% | 70.31% | 73.15% | 72.84% |
In this table, BS-Seeker2 maps the reads to whole genome. map = mappability, acc = accuracy, local = local alignment model of Bowtie2, e2e = end-to-end alignment model of Bowtie2.
Performance comparison of BS aligners on RRBS data
| | ||||||
|---|---|---|---|---|---|---|
| Simulation: error-free | ||||||
| map | 78.29% | 78.02% | 78.29% | 72.51% | 78.08% | 78.63% |
| acc | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | 99.82% |
| Simulation: error-containing | ||||||
| map | 79.18% | 78.42% | 78.72% | 71.36% | 78.17% | 79.10% |
| acc | 98.11% | 98.59% | 99.02% | 99.61% | 98.82% | 98.81% |
| Real data | ||||||
| map | 64.45% | 48.78% | 47.29% | 44.24% | 46.89% | 45.64% |
map = mappability, acc = accuracy, local = local alignment mode, e2e = end-to-end alignment mode. In this table, BS-Seeker2 maps the reads to RR genome (fragment lengths ranging 20 bp ~ 400 bp).
Features supported by BS-Seeker2, Bismark and BSMAP
| Support local alignment | Yes | No | No |
| Tailored for one restriction enzyme RRBS | Yes | No | Yes |
| Map to reduced representation genome for RRBS | Yes | No | No |
| Option for removing un-converted reads | Yes | No | No |
| Tailored for double-restriction enzyme RRBS | Yes | No | No |
| # of supported input formats | 4 | 2 | 3 |
| # of supported output formats | 3 | 1 | 3 |
| Build-in adapter removing function | Yes | No | Yes |
| Generate wiggle file for methylation levels | Yes | No | No |
| Report reads coverage for AT | Yes | No | No |
| Able to manipulate all the parameters of Bowtie(2) | Yes | No | - |
| Programming language | Python | Perl | C++ |
| Mapping strategy | 3-letter | 3-letter | wild-card |
| Available in Galaxy Toolshed | Yes | Yes | Yes |
| Gapped alignment | Yes | Yes | Yes |
| Call methylation for CG | Yes | Yes | Yes |
| Support directional/non-directional sequencing | Yes/Yes | Yes/Yes | Yes/Yes |
| Support Single-end/Paired-end sequencing | Yes/Yes | Yes/Yes | Yes/Yes |