| Literature DB >> 22676709 |
Peter E Larsen1, Frank R Collart.
Abstract
BACKGROUND: Background: Deep RNA sequencing, the application of Next Generation sequencing technology to generate a comprehensive profile of the message RNA present in a set of biological samples, provides unprecedented resolution into the molecular foundations of biological processes. By aligning short read RNA sequence data to a set of gene models, expression patterns for all of the genes and gene variants in a biological sample can be calculated. However, accurate determination of gene model expression from deep RNA sequencing is hindered by the presence of ambiguously aligning short read sequences.Entities:
Mesh:
Year: 2012 PMID: 22676709 PMCID: PMC3494516 DOI: 10.1186/1756-0500-5-275
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Detection of gene expression
| 0.97 | 0.91 | 0.98 | 2.70 | 0.0 | 0.43 | |
| 0.97 | 0.91 | 0.97 | 2.67 | 0.0 | 0.21 | |
| 0.97 | 0.90 | 0.92 | 2.72 | 0.0 | 0.08 | |
| 0.97 | 0.88 | 0.74 | 3.44 | 0.0 | 0.00 | |
| 0.97 | 0.87 | 0.66 | 3.40 | 0.0 | 0.00 | |
Three ‘Bowtie’-based methods for gene model expression were considered, using gene model expression values from synthetic RNAseq data of 50, 25, 10, 1, and 0.5 million total 46-mer sequence reads. “Random” uses ‘Bowtie’s default random assignment of multiply aligning reads. “Unique” uses only reads with a single, unambiguous alignment location. “Accuracy” is the proportion of gene models correctly identified as either expressed or absent. For “Random” and “Unique”, detection of expression is defined as an RPKM value greater than 0. For “BowStrap”, detection of expression is defined as Benjamini-Hochburg corrected p-value < 0.05. “% False Positive” is the percent of gene models identified as expressed that are absent in synthetic RNAseq data.
Spearman’s correlation between calculated gene expression and known gene model expression
| 0.91 | 0.80 | 1.00 | |
| 0.90 | 0.80 | 1.00 | |
| 0.90 | 0.81 | 0.99 | |
| 0.89 | 0.81 | 0.96 | |
| 0.87 | 0.81 | 0.94 |
Spearman’s Rank Correlation was calculated between gene model expression values in synthetic read data and ‘Bowtie’-based methods for estimation of gene model expression levels. Three ‘Bowtie’-based methods for gene model expression were considered, using gene model expression values from synthetic RNAseq data of 50, 25, 10, 1, and 0.5 million total 46-mer sequence reads.
Figure 1MA Plots for different gene expression calculation methods. The 25 M read dataset was selected for this figure. Results for other size datasets are similar. ‘M’ is the log2 of calculated gene model expression level divided by known gene model expression level. ‘A’ is the log2 of average between calculated and known gene model expression level. Each point in scatter plot is result for a single gene model.
Runtime of BowStrap
| 35.5 | |
| 9.1 | |
| 6.9 | |
| 0.8 | |
| 0.2 |
BowStrap was used to generate gene model expression values from a synthetic RNAseq data of 50, 25, 10, 1, and 0.5 million total 46-mer sequence reads. “Total Core Hours” indicates total core hours required to perform 1000 bootstrap-style iterations. Multithread BowStrap decreases total time required approximately linearly with number of processors used, but memory required increases linearly.