| Literature DB >> 21106091 |
Jeffrey Martin1, Vincent M Bruno, Zhide Fang, Xiandong Meng, Matthew Blow, Tao Zhang, Gavin Sherlock, Michael Snyder, Zhong Wang.
Abstract
BACKGROUND: Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21106091 PMCID: PMC3152782 DOI: 10.1186/1471-2164-11-663
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1A summary of the Rnnotator assembly pipeline.
Figure 2Read dereplication and filtering greatly reduces the coverage unevenness among genes in RNA-Seq data. Coverage of reference genes was calculated using raw reads, dereplicated reads, and filtered reads for Candida albicans SC5314.
Summary of the datasets used in this study
| Sequencing Statistics | ||
|---|---|---|
| Number of Lanes | 35 | 26 |
| Read Length | 28,34 | 34 |
| Number of reads | 186,148,364 | 318,539,427 |
| non strand-specific | 146,427,272 | 124,495,811 |
| strand-specific | 39,721,092 | 194,043,616 |
| Unique reads | 40,800,738 | 41,402,683 |
| Median gene coverage of ref. genes | 175x | 358x |
Figure 3An example of the assembled transcripts by the Rnnotator pipeline. A) A GBrowse snapshot of assembled transcripts illustrating the effect of different Velvet k-mer parameters. Current annotated genes are shown on top, genes from forward and reverse strand are represented in red and blue, respectively. In grey the assembled contigs for five k-mer lengths are shown. The merged contigs are shown at the bottom. B) Contigs are split according to stranded RNA-Seq read coverage (bottom) into transcripts from opposite strands (top). Read coverages are shown in log2 scale, reads originated from the forward strand are shown in red and those from reverse strand are shown in blue.
A comparison of the performance between the Rnnotator assembly and a single Velvet assembly.
| Rnnotator (non-stranded) | Rnnotator | Velvet | Oases | Multiple- | |
|---|---|---|---|---|---|
| ▪ Accuracy1 | 94.0 | 95.0 | 97.4 | 92.3 | 96.6 |
| ▪ Completeness2 | 81.9 | 80.4 | 66.7 | 79.9 | 85.9 |
| ▪ Contiguity3 | 58.4 | 58.0 | 46.6 | 47.9 | 37.3 |
| ▪ Gene fusions4 | 1.73 | 0.26 | 1.18 | 1.31 | 0.20 |
| ▪ Accuracy | 92.8 | 94.6 | 96.6 | 89.1 | 96.0 |
| ▪ Completeness | 82.9 | 82.2 | 74.0 | 82.1 | 88.2 |
| ▪ Contiguity | 59.1 | 59.4 | 43.3 | 48.6 | 48.7 |
| ▪ Gene fusions | 2.06 | 0.65 | 1.38 | 1.61 | 0.46 |
1Accuracy is defined by the percentage of contigs that share at least 95% identity with the reference genome;
2Completeness is the percentage of known genes covered by the contigs to at least 80% of the gene length;
3Contiguity is the percentage of complete genes covered by a single contig over at least 80% of the gene length.
4Gene fusions are the percentage of contigs that contain more than 50% of two or more annotated genes.
Figure 4Accuracy, completeness, and contiguity of assembled transcripts for . For contiguity only genes with > 80% completeness are shown. In panels D), E), and F) a box plot of median gene coverage by unique reads is shown for genes falling into each bin. Open circles above each boxplot depict outliers in the coverage distribution.