| Literature DB >> 20562417 |
Brian D Ondov1, Charles Cochran, Mark Landers, Gavin D Meredith, Miroslav Dudas, Nicholas H Bergman.
Abstract
SUMMARY: Bisulfite sequencing allows cytosine methylation, an important epigenetic marker, to be detected via nucleotide substitutions. Since the Applied Biosystems SOLiD System uses a unique di-base encoding that increases confidence in the detection of nucleotide substitutions, it is a potentially advantageous platform for this application. However, the di-base encoding also makes reads with many nucleotide substitutions difficult to align to a reference sequence with existing tools, preventing the platform's potential utility for bisulfite sequencing from being realized. Here, we present SOCS-B, a reference-based, un-gapped alignment algorithm for the SOLiD System that is tolerant of both bisulfite-induced nucleotide substitutions and a parametric number of sequencing errors, facilitating bisulfite sequencing on this platform. An implementation of the algorithm has been integrated with the previously reported SOCS alignment tool, and was used to align CpG methylation-enriched Arabidopsis thaliana bisulfite sequence data, exhibiting a 2-fold increase in sensitivity compared to existing methods for aligning SOLiD bisulfite data. AVAILABILITY: Executables, source code, and sample data are available at http://solidsoftwaretools.com/gf/project/socs/Entities:
Mesh:
Substances:
Year: 2010 PMID: 20562417 PMCID: PMC2905549 DOI: 10.1093/bioinformatics/btq291
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Sensitivity of SOCS-B in aligning SOLiD bisulfite sequence data
| Errors | Mapreads | SOCS-B | SOCS-B |
|---|---|---|---|
| permitted | (reads aligned) | (reads aligned) | increase factor |
| 0 | 1 150 378 | 8 701 800 | 7.56 |
| 1 | 3 283 347 | 13 856 042 | 4.22 |
| 2 | 6 691 811 | 18 764 830 | 2.80 |
| 3 | 11 159 673 | 22 656 148 | 2.03 |
Alignments using mapreads were performed against reference sequences representing the fully bisulfite converted (both Watson and Crick strands) and unconverted genomes of A.thaliana and phage lambda, while alignments using SOCS-B were performed against only the unconverted genomes.