| Literature DB >> 18842598 |
Brian D Ondov1, Anjana Varadarajan, Karla D Passalacqua, Nicholas H Bergman.
Abstract
UNLABELLED: Here, we report the development of SOCS (short oligonucleotide color space), a program designed for efficient and flexible mapping of Applied Biosystems SOLiD sequence data onto a reference genome. SOCS performs its mapping within the context of 'color space', and it maximizes usable data by allowing a user-specified number of mismatches. Sequence census functions facilitate a variety of functional genomics applications, including transcriptome mapping and profiling, as well as ChIP-Seq. AVAILABILITY: Executables, source code, and sample data are available at http://socs.biology.gatech.edu/Entities:
Mesh:
Year: 2008 PMID: 18842598 PMCID: PMC2639273 DOI: 10.1093/bioinformatics/btn512
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Performance of SOCS in mapping SOLiD sequence data
| Mismatch | Time required | Number of additional reads |
|---|---|---|
| tolerance | mapped (percentage) | |
| 0 | 10.3 min | 4 004 404 (14.3%) |
| 1 | 11.9 min | 4 664 183 (16.7%, 31.0% total) |
| 2 | 15.7 min | 3 583 141 (12.8%, 43.8% total) |
| 3 | 35.4 min | 2 706 247 (9.7%, 53.5% total) |
| 4 | 3.5 h | 2 054 061 (7.4%, 60.9% total) |
| 5 | 22.1 h | 1 594 608 (5.7%, 66.6% total) |
SOCS was tested using a sample dataset containing 27 942 602 35-bp reads generated by the SOLiD sequencing system. The reads were drawn from an experiment in which an mRNA sample isolated from B. anthracis was sequenced, and they were mapped to the B. anthracis Ames Ancestor genome sequence. SOCS was run on an Apple Mac Pro (2×3.0 GHz Dual-core Xeon, 4 GB of RAM). Times shown are the totals required for both mapping and scoring functions at the specified mismatch tolerance, and they reflect a single-threaded execution. Multithreading improved overall runtimes considerably, particularly at mismatch tolerances ≥3.