| Literature DB >> 20022975 |
Bo Li1, Victor Ruotti, Ron M Stewart, James A Thomson, Colin N Dewey.
Abstract
MOTIVATION: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.Entities:
Mesh:
Year: 2009 PMID: 20022975 PMCID: PMC2820677 DOI: 10.1093/bioinformatics/btp692
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The graphical model for RNA-Seq data used by our method.
Fractions of reads that are unmappable, map uniquely, map to multiple genes or are filtered in three RNA-Seq datasets
| Dataset | % unmapped | % unique | % multi | % filtered |
|---|---|---|---|---|
| Mouse Real | 46.2 | 44.4 | 9.2 | 0.2 |
| Mouse Sim | 47.6 | 43.2 | 8.7 | 0.6 |
| Maize Sim | 47.5 | 25.0 | 27.1 | 0.4 |
Error of the unique, rescue and em estimated gene expression levels with respect to sample expression values from simulations of mouse and maize RNA-Seq data
| Sample gene expression in NPM (ν) or TPM (τ) | ||||||||
|---|---|---|---|---|---|---|---|---|
| [1, 10) | [10, 102) | [102, 103) | [103, 104) | [104, 105) | All | |||
| Simulation of mouse RNA-Seq data | ||||||||
| N | 5577 | 5240 | 1028 | 114 | 9 | 11968 | ||
| MPE | 18.9 | 18.7 | 19.1 | 19.9 | 20.7 | 18.8 | ||
| 2.8 | 1.1 | 0.8 | 0.7 | 1.2 | 1.6 | |||
| ν | ||||||||
| EF | 93.9 | 96.2 | 96.5 | 100.0 | 100.0 | 95.2 | ||
| 26.9 | 6.1 | 6.4 | 7.9 | 33.3 | 15.9 | |||
| 18.8 | 2.0 | 0.8 | 0.0 | 0.0 | 9.7 | |||
| N | 6279 | 4025 | 886 | 111 | 15 | 11316 | ||
| MPE | 29.6 | 29.2 | 30.9 | 32.8 | 32.1 | 29.6 | ||
| 12.6 | 6.8 | 6.1 | 5.9 | 5.8 | 8.2 | |||
| τ | ||||||||
| EF | 93.7 | 93.9 | 95.6 | 99.1 | 100.0 | 94.0 | ||
| 79.5 | 73.2 | 72.2 | 69.4 | 66.7 | 76.6 | |||
| 27.8 | 6.2 | 1.1 | 0.0 | 0.0 | 17.7 | |||
| Simulation of maize RNA-Seq data | ||||||||
| N | 8934 | 4737 | 988 | 119 | 14 | 14792 | ||
| MPE | 86.8 | 87.8 | 88.7 | 88.1 | 85.9 | 87.3 | ||
| 11.3 | 3.3 | 0.9 | 0.6 | 0.7 | 6.6 | |||
| ν | 0.4 | |||||||
| EF | 97.3 | 97.3 | 97.5 | 93.3 | 100.0 | 97.3 | ||
| 65.8 | 42.6 | 22.7 | 11.8 | 7.1 | 55.0 | |||
| 40.5 | 16.5 | 6.4 | 2.5 | 21.4 | 30.2 | |||
| N | 9210 | 4931 | 1040 | 113 | 12 | 15306 | ||
| MPE | 86.1 | 84.2 | 85.2 | 80.5 | 96.3 | 85.5 | ||
| 21.3 | 11.8 | 8.9 | 8.5 | 7.7 | 16.0 | |||
| τ | 0.3 | |||||||
| EF | 97.2 | 96.7 | 97.1 | 98.2 | 100.0 | 97.0 | ||
| 89.4 | 88.3 | 85.8 | 82.3 | 91.7 | 88.8 | |||
| 47.5 | 18.8 | 6.1 | 4.4 | 16.7 | 35.1 | |||
Error measures are given for genes at different levels of expression, as well as for all genes with expression at least 1 NPM (ν) or 1 TPM (τ). Bold values indicate that the estimates are significantly (P<0.05) more accurate, as assessed by a paired Wilcoxon signed rank test. In all but one category, em is significantly more accurate than the others. For the highly expressed category (104−105 NPM) in maize, em actually performs slightly worse in terms of ν EF than rescue. We attribute this oddity to a couple of repetitive genes within the small number of genes (14) in this category.
Fig. 2.Gene expression estimation accuracy varies with read length given fixed base throughput (T). The curves are (1) mouse liver, T=375 × 106, (2) mouse liver, T=750 × 106, (3) mouse liver, T=1.5 × 107, (4) mouse brain, T=750 × 106 and (5) maize, T=750 × 106. The τ MPE was calculated with respect to the true expression values for all genes with true level at least 1 TPM.