| Literature DB >> 29072146 |
Min Li1, Ping Huang1, Xiaodong Yan1, Jianxin Wang2, Yi Pan3,4, Fang-Xiang Wu1,5.
Abstract
BACKGROUND: Methylation is a common modification of DNA. It has been a very important and hot topic to study the correlation between methylation and diseases in medical science. Because of the special process with bisulfite treatment, traditional mapping tools do not work well with such methylation experimental reads. Traditional aligners are not designed for mapping bisulfite-treated reads, where the un-methylated 'C's are converted to 'T's.Entities:
Keywords: Bisulfite mapping; DNA methylation; Visual alignment
Mesh:
Substances:
Year: 2017 PMID: 29072146 PMCID: PMC5657078 DOI: 10.1186/s12859-017-1827-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Bisulfite treatment (un-methylated cytosines converted to uracils (U)) and PCR treatment (U converted into thymine (T), four distinct strands: bisulfite Watson, bisulfite Crick, reverse com-plement of bisulfite Watson, and reverse complement of bisulfite Crick)
Fig. 2Schematic diagrams of VAliBS (pre-processing, mapping, and post-processing)
Overlap of mapping rate between Bowtie2 and BWA on Illumina reads
| Mapping Tools | Illumina 75 bp | Illumina 100 bp |
|---|---|---|
| Bowtie2 | 9380 | 8676 |
| BWA | 9205 | 8836 |
| Overlap | 8799 | 7968 |
Fig. 3Example of error match by converting C to T. The positions marked with blue means methylated, because C in reads remain unchanged after bisulfite treatment. Positions marked with green means unmethylated. They converted to T after bisulfite treatment. Positions marked with red means false matching introduced after three-letter conversion. It should be a mismatch, because T can’t be converted to C
Fig. 4An example of visualization of VALiBS (operations and mapping results)
Comparison of VAliBS, Bismark, BS-Seeker2, and BSMAP on simulation data
| single end | VAliBS | BS-Seeker2 | Bismark | BSMAP | ||
| bowtie2 | Bowtie2 | Bowtie | Bowtie2 | Bowtie | ||
| Simulation: error-free | ||||||
| map | 92.80% | 91.50% | 91.65% | 87.78% | 91.65% | 91.81% |
| c-map | 92.09% | 91.50% | 91.65% | 87.78% | 91.65% | 91.81% |
| Simulation: error-containing | ||||||
| map | 92.67% | 90.51% | 91.69% | 86.90% | 91.64% | 91.90% |
| c-map | 91.23% | 90.26% | 91.59% | 86.79% | 91.46% | 91.82% |
| paired end | VAliBS | BS-Seeker2 | Bismark | BSMAP | ||
| Bowtie2 | Bowtie2 | Bowtie | Bowtie2 | Bowtie | ||
| Simulation: error-free | ||||||
| map | 92.79% | 78.02% | 78.29% | 72.51% | 78.08% | 78.63% |
| c-map | 92.08% | 78.02% | 78.29% | 72.51% | 78.08% | 78.49% |
| Simulation: error-containing | ||||||
| map | 94.24% | 78.42% | 78.72% | 71.36% | 78.17% | 79.10% |
| c-map | 92.64% | 77.07% | 77.95% | 71.08% | 77.25% | 78.16% |
Comparison of VAliBS, Bismark, BS-Seeker2, and BSMAP on single-end data (SRR299053/mouse) and paired-end data (SRR306438/human)
| mappability | VAliBS | BS-Seeker2 | Bismark | BSMAP | ||
|---|---|---|---|---|---|---|
| Bowtie2 | Bowtie2 | Bowtie | Bowtie2 | Bowtie | ||
| single end | 82.88% | 72.94% | 71.89% | 70.31% | 73.15% | 72.84% |
| paired end | 56.64% | 48.78% | 47.29% | 44.24% | 46.89% | 45.64% |
Features supported by Bismark, BS-Seeker, BS-Seeker2, BSMAP, and VAliBS
| Aligners | Bismark | BS-Seeker | BS-Seeker2 | BSMAP | VAliBS |
|---|---|---|---|---|---|
| O.S. | Linux,Mac | Linux,Mac | Linux, Unix, Mac | Linux, Unix, Mac | Linux |
| Seq.Plat. | I | I | I | I | I, 4 |
| Input | FASTA/Q | FASTA/Q | FASTA/Q, qseq | FASTA/Q SAM/BAM | FASTA/Q |
| Output | SAM | SAM | SAM/BAM | SAM BAM Native | SAM |
| Min. RL | 16 | 10 | 20 | 22 | |
| Max. RL | 10 K | 200 | 144 | ||
| #Mis | Score | 3 | Score | 15 | Score |
| Indels | Score | 0 | Score | 1 | Score |
| Gaps | N | N | Y | N | Y |
| Align. Reported | U | U | B,U,S | B,R,U | A,B |
| Alignment | G, L | G | G, L | ||
| Parallel | SM | SM | SM | SM | SM |
| QA | Y | Y | N | N | Y |
| PE | Y | Y | Y | Y | Y |
| Vis | N | N | N | N | Y |
Abbreviations in Table 4 are as following: 1) Sequencing Platform: I-Illumina; So-ABI Solid; 4-Roche 454; Sa-ABI Sanger; 2) Read Length: K denotes kilobases (1000 bases); M denotes meg-abases (1000 K bases); and * denotes a (unknown) large number; 3) Alignments reported: A-all, B-best; R-random; U-unique alignments only (no multimaps); S-user defined number of matches; 4) Alignment: G-(semi-)global (a.k.a. end-to-end); L-Local; 5) Parallelism: SM-shared-memory; DM-distributed memory; Cloud - Cloud computing; 6) Vis: visualization; 7) Y-Yes; N-No