| Literature DB >> 12823867 |
Xuejun Peng1, Constance L Wood, Eric M Blalock, Kuey Chu Chen, Philip W Landfield, Arnold J Stromberg.
Abstract
BACKGROUND: Microarray technology has become a very important tool for studying gene expression profiles under various conditions. Biologists often pool RNA samples extracted from different subjects onto a single microarray chip to help defray the cost of microarray experiments as well as to correct for the technical difficulty in getting sufficient RNA from a single subject. However, the statistical, technical and financial implications of pooling have not been explicitly investigated.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12823867 PMCID: PMC166151 DOI: 10.1186/1471-2105-4-26
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Relationships among type I error rate, sample size, effect size, and power with or without pooling. Note: n is the number of biological replicates per treatment group; c is the number of gene chips per group; a is the type I error rate; EQ means that samples are pooled with equal contribution; NE means samples do not contribute equally when pooled together (weights assigned randomly to each chip: 0.7, 0.2, and 0.1).
Figure 2Approximately equivalent power curves under different pooling schemes. Power curves generated for two-sample t tests. Equal pooling assumed. Legend: n is the number of subjects per treatment group; c is the number of arrays per group. The five pooling schemes with different choices of number of subjects and number of arrays have approximately equivalent power curves when type I error rate is controlled at 0.05.
Figure 3Scatter plots of the P-values with different "virtual" pooling schemes. On Y-axis are the P-values from two-sample t-tests for 8799 genes on the RGU34A gene chip with no pooling (12 subjects, 12 arrays per group). On X axis are the P-values from two-sample t-tests for pool size 2 (12 subjects, 6 arrays per group), pool size 3 (12 subjects, 4 arrays per group), and pool size 4 (12 subjects, 3 arrays per group), respectively.
Agreement of significant genes between "virtual" pooling and no pooling with data from one real experiment. Note: total number of "genes" on the chip = 8799, α = 0.05. Pool size=number of subjects per chip (# subjects per group/ # chips per group)
| Pool size | # of subjects per group | # of arrays per group | # of significant genes | % agreement between pooling and no pooling |
| 1 | 12 | 12 | 228 | |
| 2 | 12 | 6 | 152 | 67% |
| 3 | 12 | 4 | 108 | 47% |
| 4 | 12 | 3 | 111 | 49% |
Observed effect sizes of data from some experiments with Affymetrix microarrays.
| Study | Subject | # of arrays per group | Genome | % of genes with effect size ≥ 0.5 |
| 1 | cell line | 9 | RGU 34 | 30.3 |
| 2 | rat | 10 | RGU 34 | 18.1 |
| 3 | rat | 16 | RGU 34 | 21.5 |
| 4 | mouse | 6 | MGU 74 | 16.5 |
| 5 | human | 14 | HGU133A | 10.6 |
| 6 | human | 25 | HGU133A | 38.9 |
Comparison of different pooling schemes and total cost using model data. Several pooling designs that can achieve power at least 0.8 while controlling type I error rate at 0.01 for an effect size of 1.0 are shown. Assuming a microarray chip costs $1000 and a subject costs $300, the total cost for each design is also computed and the optimal design with the minimal total cost is underlined. A function written in R (a free statistical software downloadable at ) to perform the above search automatically is attached as additional file.
| Number of chips per group | pool size | power | Total cost |
| 7 | 5 | 0.84 | |
| 8 | 5 | 0.91 | 40000 |
| 8 | 4 | 0.83 | 35200 |
| 9 | 4 | 0.89 | 39600 |
| 10 | 3 | 0.82 | 38000 |
| 11 | 3 | 0.87 | 41800 |
| 14 | 2 | 0.82 | 44800 |
| 26 | 1 | 0.82 | 67600 |