| Literature DB >> 26368549 |
Davide Campagna1, Fabio Gasparini2, Nicola Franchi2, Lucia Manni2, Andrea Telatin3, Nicola Vitulo2, Loriano Ballarin2, Giorgio Valle1.
Abstract
SOLiD DNA sequences are typically analyzed using a reference genome, while they are not recommended for de novo assembly of genomes or transcriptomes. This is mainly due to the difficulty in translating the SOLiD color-space data into normal base-space sequences. In fact, the nature of color-space is such that any misinterpreted color leads to a chain of further translation errors, producing totally wrong results. Here we describe SATRAP, a computer program designed to efficiently translate de novo assembled color-space sequences into a base-space format. The program was tested and validated using simulated and real transcriptomic data; its modularity allows an easy integration into more complex pipelines, such as Oases for RNA-seq de novo assembly. SATRAP is available at http://satrap.cribi.unipd.it, either as a multi-step pipeline incorporating several tools for RNA-seq assembly or as an individual module for use with the Oases package.Entities:
Mesh:
Year: 2015 PMID: 26368549 PMCID: PMC4569514 DOI: 10.1371/journal.pone.0137436
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
2-base encoding table.
| Color | ||||
|---|---|---|---|---|
| 0 | AA | CC | GG | TT |
| 1 | AC | CA | GT | TG |
| 2 | AG | CT | GA | TC |
| 3 | AT | CG | GC | TA |
Colors are defined by numerical values (0, 1, 2, 3). Each color represents four possible dinucleotides.
Fig 1Flowchart of the color-translation process.
Step1: the first base (FTB) of each read can be translated from color-space with high accuracy; for each read the FTB is mapped on the contig. Step 2: check color coherence with neighboring FTBs; three conditions can be detected: a) FTBs coherent with their neighboring FTBs on both sides (such as the 'A' at the centre of the figure); FTB coherent only on one side (such as the 'G' that is coherent with the 'A', but not with the 'C'); FTBs with no coherence on both sides (such as the 'A' circled in red). The latter are removed from the assembly. Step 3 and 4: find regions delimited by two reliable start sites and translate color-space into base-space. Any remaining regions will be incoherent in terms of color compatibility. To resolve these regions the threshold for color reliability is calculated (Step 5) and the resulting value is used to establish the critical regions of the contig (Step 6).
Fig 2Effect of sequence coverage on color translation.
ASID, SATRAP and SOPRA were used to translate the color-space assemblies produced at different sequence coverage into base-space. The same set of reads was also assembled in base-space as a control.
Statistics of identified errors at different sequence coverage.
| Coverage | Substitution | Deletion | Insertion |
|---|---|---|---|
| 10X | 1 | 0.988 | 0.994 |
| 20X | 1 | 0.988 | 0.997 |
| 50X | 1 | 1 | 1 |
| 100X | 1 | 0.997 | 0.997 |