| Literature DB >> 20416082 |
Pao-Yang Chen1, Shawn J Cokus, Matteo Pellegrini.
Abstract
BACKGROUND: Bisulfite sequencing using next generation sequencers yields genome-wide measurements of DNA methylation at single nucleotide resolution. Traditional aligners are not designed for mapping bisulfite-treated reads, where the unmethylated Cs are converted to Ts. We have developed BS Seeker, an approach that converts the genome to a three-letter alphabet and uses Bowtie to align bisulfite-treated reads to a reference genome. It uses sequence tags to reduce mapping ambiguity. Post-processing of the alignments removes non-unique and low-quality mappings.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20416082 PMCID: PMC2871274 DOI: 10.1186/1471-2105-11-203
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The two library protocols generating bisulfite-converted reads. Cokus et al's experimental protocol uses two amplification steps for generating bisulfite-converted sequences and for high throughput sequencing. The bisulfite-converted reads are preceded by one of two tags in the first 5 nucleotides of reads. Lister et al's protocol generates bisulfite libraries using premethylated adapters, and in this case no tags are present.
Figure 2Schematic diagrams of the 4 forms of BS reads, mapping and post processing. 2A. BS reads may be in one of the 4 forms: +FW, +RC, -FW, -RC. 2B. Bowtie aligns C/T converted reads to the C/T converted strands. During the post processing, the number of mismatches is counted except those between read Ts and genomic Cs. Low-quality mappings with many mismatches are removed.
Mapping 1M synthetic human chr. 21 reads onto human chr. 21
| Aligner | Experimental protocol | Accuracy (%) | CPU time | ||||
|---|---|---|---|---|---|---|---|
| BS Seeker | Lister et al | 91.7 | 100 | 72.0 | 0 | 0 | 209 s d |
| BSMAP | Lister et al | 92.1 | 100 | 72.3 | 0 | 0 | 15h43m20s |
| RMAP | Lister et al | 91.7 | 100 | 72.0 | 0 | 0 | 185 s |
| MAQ | Lister et al | >99.9 | 93.4 | 67.7 | 0 | 0 | 353 s |
| BS Seeker | Cokus et al | 89.6 | 100 | 72.0 | 0 | 0 | 263 s d |
| BSMAP | Cokus et al | 89.8 | 99.6 | 72.4 | 0 | 0 | 15 h 46 m 40 s |
| RMAP | Cokus et al | 80.2 | 99.0 | 71.3 | 0 | 0.1 | 400 s |
| MAQ | Cokus et al | 73.0 | 92.2 | 68.2 | 0 | 0 | 665 s |
| BS Seeker | Lister et al | 91.2 | 99.54 | 71.5 | 0.4 | 0.4 | 217 s |
| BSMAP | Lister et al | 91.1 | 99.57 | 72.1 | 0.4 | 0.4 | 15 h 19 m 51 s |
| RMAP | Lister et al | 91.0 | 99.52 | 71.6 | 0.4 | 0.4 | 188 s |
| MAQ | Lister et al | 99.5 | 92.9 | 67.8 | 0.4 | 0.4 | 340 s |
a BS Seeker, BSMAP, and RMAP discard non-uniquely mapped reads. MAQ keeps non-uniquely mapped reads and assigns them to one of the best-matching positions. b Up to 2 mismatches are allowed. c The simulated methylation rates are set to be 72%, 0%, and 0% for CG, CHG, and CHH. d Preprocessing reference genome needs 2-5 cpu minutes for the first run.
Comparison of mapping efficiency on mapping three lanes of human data
| Number of reads | Uniquely mapped reads (%) | Running time | |||
|---|---|---|---|---|---|
| BS Seeker | RMAP | BS Seeker | RMAP | ||
| SRR019048 | 15,311,970 | 40.8 | 41.0 | 50 mins | 13.70 hours |
| SRR019501 | 7,217,883 | 52.0 | 52.5 | 26 mins | 11.94 hours |
| SRR019597 | 5,943,586 | 62.0 | 62.1 | 20 mins | 13.45 hours |