| Literature DB >> 23672450 |
Abstract
BACKGROUND: RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy.Entities:
Mesh:
Year: 2013 PMID: 23672450 PMCID: PMC3663818 DOI: 10.1186/1471-2164-14-328
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Chimera compositions among assembled transcripts before post-processing. Oases MN: Oases-M merging single k-mer assemblies of 21, 31, 41, 51 and 61; MW: Oases-M merging single k-mer assemblies of 19–71, with increment of 2; Trans-ABySS MK: Trans-ABySS merging single k-mer assemblies of 21, 31, 41, 51 and 61.
Figure 2Parameters for choosing the representative transcript for each locus in Oases. (A) Only considering transcripts that are longer than 0.3 of the longest transcript in the same locus; (B) only considering transcripts that are longer than 0.85 of the longest transcript in the same locus. A data point is plotted only when there are five or more loci of the same size in each data set.
Figure 3Overall comparison among assembly strategies. (A) Number of transcripts in each category; and (B) percent reference coverage, redundancy and chimera rate among assembly strategies. Cap3: redundancy reduction using cap3; blast: trans chimera cleanup using blastx against model protein database; Oases MK filter: filter loci from Oases single k-mer assemblies by number of transcripts per locus at k = 21, 31, 41 and 51, with k = 61 not subject to filtering by number of transcripts per locus, before combining them. Oases MN: Oases-M merging single k-mer assemblies of 21, 31, 41, 51 and 61; MW: Oases-M merging single k-mer assemblies of 19–71, with increment of 2; SOAPdenovo-Trans contigs: combining contigs from SOAPdenovo-Trans single k-mer assemblies of 21, 31, 41, 51 and 61; Trans-ABySS MK: Trans-ABySS merging single k-mer assemblies of 21, 31, 41, 51 and 61; Trinity pickH: only keeping the transcript with the highest read coverage for each subcomponent; Trinity removeL: when there are two or more transcripts per subcomponent, remove the one with the lowest read coverage.
Summary statistics among seven highest performing assembly strategies
| Oases filter1-cap3-blast | 26.50% | 45.70% | 10.52% | 0.96% | 2.42 |
| Oases filter1&3-cap3-blast | 27.67% | 45.65% | 11.73% | 1.11% | 2.41 |
| SOAPdenovo-Trans | 45.62% | 2.21% | |||
| SOAPdenovo-Trans contigs cap3-blast | 30.64% | 46.62% | 14.34% | 1.06% | 2.19 |
| Trans-ABySS MK-cap3-blast | 28.34% | 42.93% | 13.01% | 1.39% | 2.16 |
| Trinity blast | 26.22% | 9.58% | 1.69% | 3.11 | |
| Trinity pickH-cap-blast | 23.19% | 45.38% | 8.35% | 1.89 |
The best score for each column, measured as highest in reference coverage and lowest in chimera rate and redundancy, is in bold.