| Literature DB >> 22537041 |
Roye Rozov1, Eran Halperin, Ron Shamir.
Abstract
BACKGROUND: RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samplesEntities:
Mesh:
Year: 2012 PMID: 22537041 PMCID: PMC3358656 DOI: 10.1186/1471-2105-13-S6-S2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A mesh representation of F(α) [equation (15)] showing non-convex behavior. P is a 10 × 2 constant matrix and α is varied on [0:50,0:50]. The case shown is for N = 10, M = 2 (ten samples, two α parameters). Non-convex behavior is demonstrated by the values on the plane defined by α1 = .06 on the range [0,50] on the right.
Figure 2Relative error measured on SEQEM-A and SEQEM-B data sets. MGMR outputs on SEQEM-A and SEQEM-B initializations were compared with SEQEM up to 100 iterations. MGMR outputs were recorded at 1-10, 20, 30, 40, 50 and 100 iterations. The first few iterations have been trimmed to allow a compact presentation.
MGMR vs. SEQEM error at 100 iterations on SEQEM-A and SEQEM-B data sets
| SEQEM-A sampling | SEQEM-B sampling | |||||||
|---|---|---|---|---|---|---|---|---|
| SEQEM | MGMR | SEQEM | MGMR | |||||
| Error | SD | Error | SD | Error | SD | Error | SD | |
| E | 1.27 | 1 * 10-2 | 1.03 | 0.14 | 1.50 | 0.70 | 0.82 | 6 * 10-3 |
| χ2 | 0.66 | 2 * 10-3 | 0.22 | 4 * 10-3 | 0.69 | 0.05 | 0.27 | 1 * 10-4 |
| KL | 0.29 | 7 * 10-4 | 0.14 | 1 * 10-4 | 0.18 | 2 * 10-4 | 0.17 | 1 * 10-4 |
These data sets were derived from SEQEM and MGMR(SEQEM) estimates, respectively, on 20 YRI samples. (E: relative error rate; χ2: Chi-squared error; KL: Kullback-Liebler divergence; SD: standard deviation)
MGMR vs. RSEM error at 100 iterations on RSEM-A and RSEM-B data sets
| RSEM-A Sampling | RSEM-B Sampling | |||||||
|---|---|---|---|---|---|---|---|---|
| RSEM | MGMR | RSEM | MGMR | |||||
| Error | SD | Error | SD | Error | SD | Error | SD | |
| E | 0.1 | 1 * 10-3 | 0.69 | 1 * 10-3 | 1.0 | 1 * 10-4 | 0.61 | 1 * 10-3 |
| χ2 | 0.02 | 6 * 10-4 | 1.25 | 0.01 | 0.02 | 9 * 10-4 | 0.58 | 3 * 10-4 |
| KL | 1.5 | 0.22 | 0.6 | 1 * 10-3 | 0.8 | 0.11 | 0.38 | 6 * 10-4 |
E: relative error rate; χ2: Chi-squared error; KL: Kullback-Liebler divergence; SD: standard deviation
Proportion of genes for which MGMR improves estimates on different data sets
| SEQEM-A | SEQEM-B | RSEM-A | RSEM-B | |
|---|---|---|---|---|
| Proportion | 104/285 | 78/285 | 126/524 | 173/524 |
| % | 36.5 | 27.3 | 24.0 | 33.0 |
Proportions of regions (genes for SEQEM and transcripts for RSEM, respectively) for which MGMR has lower relative error on average than each method compared to.