| Literature DB >> 22087737 |
Alyssa C Frazee1, Ben Langmead, Jeffrey T Leek.
Abstract
BACKGROUND: RNA sequencing is a flexible and powerful new approach for measuring gene, exon, or isoform expression. To maximize the utility of RNA sequencing data, new statistical methods are needed for clustering, differential expression, and other analyses. A major barrier to the development of new statistical methods is the lack of RNA sequencing datasets that can be easily obtained and analyzed in common statistical software packages such as R. To speed up the development process, we have created a resource of analysis-ready RNA-sequencing datasets. 2 DESCRIPTION: ReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values. 3Entities:
Mesh:
Substances:
Year: 2011 PMID: 22087737 PMCID: PMC3229291 DOI: 10.1186/1471-2105-12-449
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Datasets available for download (truncated to 35 bp)
| Study | Organism | Number of bio reps | Number of reads |
|---|---|---|---|
| BodyMap | human | 19 | 2,197,622,796 |
| Cheung | human | 41 | 834,584,950 |
| Core | human | 2 | 8,670,342 |
| Gilad | human | 6 | 41,356,738 |
| MAQC | human | 14 | 71,970,164 |
| Montgomery | human | 60 | *886,468,054 |
| Pickrell | human | 69 | *886,468,054 |
| Sultan | human | 4 | 6,573,643 |
| Wang | human | 22 | 223,929,919 |
| Katz | mouse | 4 | 14,368,471 |
| Mortazavi | mouse | 3 | 61,732,881 |
| Trapnell | mouse | 4 | 111,376,152 |
| Yang | mouse | 1 | 27,883,862 |
| Bottomly | mouse | 21 | 343,445,340 |
| Nagalakshmi | yeast | 4 | 7,688,602 |
| Hammer | rat | 8 | 158,178,477 |
| modENCODE - worm | worm | 46 | 1,451,119,823 |
| modENCODE - fly | fly | 147 | 2,278,788,557 |
The "Number of bio reps" column contains the number of individual samples contained in the dataset, while the "Number of reads" column displays the number of uniquely aligned reads that were used to create the count table. A version of this table and an analogous table for the downloadables created by removing Myrna's truncate option are available on the website.
Figure 1Histogram of adjusted p-values from differential expression analysis on the 29 samples included in both Cheung and Montgomery. The p-values in the histogram are from paired t-tests on the 25% of genes with nonzero counts in at least one of the two studies. The peak near zero is somewhat indicative of technical variability between the two studies.
Figure 2Histogram of adjusted p-values from analysis of differential expression between YRI and CEU populations. The p-values in the histogram are from two-sample t-tests on the 25% of genes with nonzero counts in at least one of the two studies. The peak near zero indicates differential gene expression that may result from either technical or biological variability.