| Literature DB >> 21794104 |
Hyunsoo Kim1, Yingtao Bi, Sharmistha Pal, Ravi Gupta, Ramana V Davuluri.
Abstract
BACKGROUND: mRNA-Seq technology has revolutionized the field of transcriptomics for identification and quantification of gene transcripts not only at gene level but also at isoform level. Estimating the expression levels of transcript isoforms from mRNA-Seq data is a challenging problem due to the presence of constitutive exons.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21794104 PMCID: PMC3180389 DOI: 10.1186/1471-2105-12-305
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Algorithm and exon slices. (a) Algorithm flow chart. (b) The exon slices were determined by the genomic structures of the transcript isoforms having overlapping exons. The lower weight was applied for the shorter exon slice. This example shows four exon slices (s1-s4) obtained from three transcript isoforms. The observed RPKM of the smallest exon slice (s3) was smaller than 10 because its length is small.
Figure 2RPKM values of transcript isoforms of . RPKM values of transcript isoforms of ZNF580 and ZNF581 in the breast normal cell line (HME) and cancer cell line (MCF-7). The total number of exon slices in this transcript block having six overlapping transcripts was eight, and approximated RPKM values of the exon slices in MCF-7 cell line were α(s1) ≈ 10, α(s2) ≈ 8, α(s3) ≈ 14, α(s4) ≈ 30, α(s5) ≈ 20, α(s6) ≈ 0, α(s7) ≈ 45, and α(s8) ≈ 60, where α(·) is the RPKM value of an exon slice (for example, α(s1) and α(s8) are the RPKM values of the first and the eighth exon slices). Although the sixth exon slice can be expressed from uc002qln.1 and uc002qlq.1, the observed value was close to 0 due to its very small exon slice size. In order to handle this small exon effect (observed RPKM of very small exon slices is usually not reliable), we applied a lower weight.
Performance Comparison on the simulated mRNA-Seq data
| Algorithms | Condition | IsoformEx | ||
|---|---|---|---|---|
| The number of estimated transcripts (n) | vest ≥ 0 | 55416 | 55441 | 55441 |
| vest > 0.01 | 35064 | 25967 | 25839 | |
| vest ≥ 0 | 8.41 × 10-6 | 1.02 × 10-5 | 9.23 × 10-6 | |
| vest > 0.01 | 1.32 × 10-5 | 2.15 × 10-5 | 1.96 × 10-5 | |
| r = Corr(vtrue , vest ) | vest ≥ 0 | 0.921 | 0.839 | 0.913 |
| vest > 0.01 | 0.920 | 0.838 | 0.912 | |
Estimation error and correlation coefficient between estimated expression levels (vest) and known true expression levels (vtrue) for our simulated dataset when all estimated transcripts (vest ≥ 0) were considered or some expressed transcripts (vest > 0.01) were only considered. ptrue (i) denotes the i-th element of the proportion vector of true expression values (ptrue = vtrue/∑ vtrue), and pest (i) denotes the i-th element of the proportion vector of estimated expression values (pest = vest/∑ vest). The error was defined as the mean value of absolute difference between the true proportion vector and the proportion vector of the estimated values.
qRT-PCR validation in human breast cell lines for IsoformEx
| qRT-PCR Poly(A) RNA | IsoformEx | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| uc002cvt.2 | 234.7 | 302.3 | 0.4 | 27.4 | 43.9 | 0.7 | 22.3 | 33.2 | 0.6 | |
| uc002cvs.1 | 423.5 | 595.4 | 0.5 | 40.9 | 56.0 | 0.5 | 0.8 | 1.6 | 1.0 | |
| FCw | 0.9 | 1.0 | 0.6 | 0.4 | -4.8† | -4.4† | ||||
| uc002qlq.1 | 621.8 | 755.9 | 0.3 | 8.3 | 23.5 | 1.5 | 8.1 | 20.0 | 1.3 | |
| uc002qlp.1 | 277.9 | 381.3 | 0.5 | 7.7 | 14.4 | 0.9 | 2.2 | 5.3 | 1.3 | |
| FCw | -1.2 | -1.0 | -0.1 | -0.7 | -1.9 | -1.9 | ||||
| uc002xmn.1 | 10.7 | 530.0 | 5.6 | 1.8 | 14.7 | 3.0 | 0.0 | 12.0 | 6.9‡ | |
| uc002xmo.1 | 8.1 | 189.6 | 4.5 | 0.5 | 8.4 | 4.2 | 0.0 | 4.2 | 5.4‡ | |
| FCw | -0.4 | -1.5 | -2.0 | -0.8 | 0.0‡ | -1.5 | ||||
| uc003ngr.1 | 12.4 | 317.8 | 4.7 | 0.7 | 6.9 | 3.3 | 4.3 | 55.9 | 3.7 | |
| uc003ngs.1 | 538.2 | 19207.9 | 5.2 | 10.9 | 136.3 | 3.6 | 0.9 | 16.2 | 4.1 | |
| FCw | 5.4 | 5.9 | 4.0 | 4.3 | -2.2† | -1.8† | ||||
FCw: log2(a/b), where a and b are the second/first transcript expression within the same cell line. FCb: log2(a/b), where a and b are the expression values of a transcript in MCF-7 and HME cell line, respectively. †Fold change directions of Cufflinks estimations were erroneously flipped. ‡When b = 0, the fold change was computed from log2((a+0.1)/(b+0.1)).
Figure 3Agreement between estimated RPKM values and qRT-PCR validation for four transcript blocks in breast cell lines. We selected four transcript blocks and two transcripts for each transcript block. We compared qRT-PCR measurements and transcript expression levels estimated by several methods in the MCF7 cell line. When the corresponding method could correctly predict the direction of fold change within the HME cell line, 'o' mark is used. Otherwise, 'x' mark is used. Cufflinks*: the min-isoform-fraction parameter of Cufflinks was set to 0.0 in order to recover very low expressed transcripts. IsoformEx and Cufflinks* used the same Bowtie output files.
qRT-PCR validation in human breast cell lines for other methods
| qRT-PCR poly(A) RNA | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| uc002cvt.2 | 234.7 | 302.3 | 0.4 | 1378.0 | 1782.1 | 0.4 | 22.3 | 33.2 | 0.6 | |
| uc002cvs.1 | 423.5 | 595.4 | 0.5 | 47.0 | 82.0 | 0.8 | 0.8 | 1.7 | 1.1 | |
| FCw | 0.9 | 1.0 | -4.9† | -4.4† | -4.8† | -4.3† | ||||
| uc002qlq.1 | 621.8 | 755.9 | 0.3 | 358.9 | 687.6 | 0.9 | 8.6 | 20.0 | 1.2 | |
| uc002qlp.1 | 277.9 | 381.3 | 0.5 | 148.1 | 245.0 | 0.7 | 2.3 | 5.4 | 1.2 | |
| FCw | -1.2 | -1.0 | -1.3 | -1.5 | -1.9 | -1.9 | ||||
| uc002xmn.1 | 10.7 | 530.0 | 5.6 | 10.0 | 435.0 | 5.4 | 0.0 | 11.8 | 6.9‡ | |
| uc002xmo.1 | 8.1 | 189.6 | 4.5 | 9.7 | 250.9 | 4.7 | 0.0 | 4.2 | 5.4‡ | |
| FCw | -0.4 | -1.5 | 0.0 | -0.8 | 0.0‡ | -1.5 | ||||
| uc003ngr.1 | 12.4 | 317.8 | 4.7 | 77.5 | 566.1 | 2.9 | 4.4 | 55.9 | 3.7 | |
| uc003ngs.1 | 538.2 | 19207.9 | 5.2 | 21.7 | 499.0 | 4.5 | 0.9 | 16.2 | 4.2 | |
| FCw | 5.4 | 5.9 | -1.8† | -0.2† | -2.4† | -1.8† | ||||
FCw: log2(a/b), where a and b are the second/first transcript expression within the same cell line. FCb: log2(a/b), where a and b are the expression values of a transcript in MCF-7 and HME cell line, respectively. †Fold change directions of estimations were erroneously flipped. ‡When b = 0, the fold change was computed from log2((a+0.1)/(b+0.1)).