| Literature DB >> 25374457 |
Yan Guo1, Shilin Zhao1, Chung-I Li2, Quanhu Sheng1, Yu Shyr1.
Abstract
Sample size and power determination is the first step in the experimental design of a successful study. Sample size and power calculation is required for applications for National Institutes of Health (NIH) funding. Sample size and power calculation is well established for traditional biological studies such as mouse model, genome wide association study (GWAS), and microarray studies. Recent developments in high-throughput sequencing technology have allowed RNAseq to replace microarray as the technology of choice for high-throughput gene expression profiling. However, the sample size and power analysis of RNAseq technology is an underdeveloped area. Here, we present RNAseqPS, an advanced online RNAseq power and sample size calculation tool based on the Poisson and negative binomial distributions. RNAseqPS was built using the Shiny package in R. It provides an interactive graphical user interface that allows the users to easily conduct sample size and power analysis for RNAseq experimental design. RNAseqPS can be accessed directly at http://cqs.mc.vanderbilt.edu/shiny/RNAseqPS/.Entities:
Keywords: RNAseq; experiment design; power analysis; sample size calculation
Year: 2014 PMID: 25374457 PMCID: PMC4213196 DOI: 10.4137/CIN.S17688
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Example of the graphical interface of RNAseqPS.
Figure 2Examples of the power curves produced by RNAseqPS.
RNAseqPS input parameters.
| PARAMETERS | LOWER BOUND | UPPER BOUND | INTERVAL | NOTE |
|---|---|---|---|---|
| Sample size | 1 | 500 | 1 | Required when computing power |
| Desired power | 0.8 | 0.95 | 0.05 | Required when computing sample size. The minimum power should be no less than 80% |
| Expected fold change | 1.4 | 10 | 0.2 | The expected fold change between differentially expressed genes. This value is based on prior experience. If no previous data is available, a best guess is given by RNAseqPS |
| Average reads per gene | 1 | 100 | 10 | This can be computed as R/G, where R is the total number of reads sequenced and G is the total number of genes detected |
| Total number genes | 100 | 20000 | 100 | This is usually dependent on the gene transfer format (GTF) file used. A GTF file contains the annotation information regarding genes, and it is required for RNAseq analysis |
| Expected number of differentially expressed genes | 5 | 2000 | 50 | This is the number of genes you expect to see between the two conditions. It is also based on prior knowledge. When prior knowledge is unavailable, a best guess is provided by RNAseqPS |
| Dispersion | 0.1 | 2 | 0.1 | This parameter is used in the negative binomial model |