| Literature DB >> 17634103 |
Fei Ji1, Stephen J Finch, Chad Haynes, Nancy R Mendell, Derek Gordon.
Abstract
BACKGROUND: Studies of association methods using DNA pooling of single nucleotide polymorphisms (SNPs) have focused primarily on the effects of "machine-error", number of replicates, and the size of the pool. We use the non-centrality parameter (NCP) for the analysis of variance test to compute the approximate power for genetic association tests with DNA pooling data on cases and controls. We incorporate genetic model parameters into the computation of the NCP. Parameters involved in the power calculation are disease allele frequency, frequency of the marker SNP allele in coupling with the disease locus, disease prevalence, genotype relative risk, sample size, genetic model, number of pools, number of replicates of each pool, and the proportion of variance of the pooled frequency estimate due to machine variability. We compute power for different settings of number of replicates and total number of genotypings when the genetic model parameters are fixed. Several significance levels are considered, including stringent significance levels (due to the increasing popularity of 100 K and 500 K SNP "chip" data). We use a factorial design with two to four settings of each parameter and multiple regression analysis to assess which parameters most significantly affect power.Entities:
Mesh:
Year: 2007 PMID: 17634103 PMCID: PMC1947971 DOI: 10.1186/1471-2164-8-238
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The analysis of variance table for a two-stage nested design
| ANOVA Table | |||
| Source | DF | SS | E(MS) |
| Case or control ( | 1 | ||
| Pools nested in case or control ( | 2( | ||
| Replicates ( | |||
Abbreviations for column headings are as noted below.
DF: the degrees of freedom for the respective source row;
SS: the sum of squares for the respective source row;
E(MS): expectation of the mean square for the respective source row.
The sums of squares are based on the following terms:
The model we consider for individual pooled allele frequency estimates is
where the "group" effect associated with cases or controls is , i = 0,1 subject to the constraint ∑α= 0. The random effect associated with the jth pool in either cases or controls is , with . Finally, {E} are independent N(0, 1) random variables and is the variance of the allele frequency in the ith group.
Figure 1Power as a function of number of replicates (. Power values presented here are for studies with N = 10000, prevalence φ = 0.05, disease allele frequency p= 0.15, relative risk of homozygous for disease allele R2 = 3, minor SNP marker allele frequency q1 = 0.35, machine replicability variance factor m = 2.25, linkage disequilibrium and significance level alpha = 0.0001. *The horizontal line represents the power for specified parameters with individual genotyping using the 2 × 2 test of independence. Power with individual genotyping was computed using the method implemented in the Power for Association With Error (PAWE) website [27].
Figure 2Power as a function of number of replicates (. Power values presented here are for studies with N = 5000, prevalence φ = 0.05, disease allele frequency p= 0.15, relative risk of a genotype with at least one copy of the disease allele = 1.5, minor SNP marker allele frequency q1 = 0.35, machine replicability variance factor m = 2.25, linkage disequilibrium and significance level alpha = 0.0001. *The horizontal line represents the power for specified parameters with individual genotyping using the 2 × 2 test of independence. Power with individual genotyping was computed using the method implemented in the Power for Association With Error (PAWE) website [27].
Maximum power as a function of the number of genotyping(G = J × K), number of replicates giving maximum power (K(G)), number of replicates (K) at 95% of the maximum power at specific experimental and genetic parameters and the power at K = 1 when assuming no machine replicability variability (m = 1)
| Situation | MOI | G = 40 | G = 80 | G = 160 | G = 320 | G = 640 | ||||||
| 1 | 10000 | R | 2.2 | 0.0001 | 2.25 | 0.20 | 0.9 | 38%, 2, (2), 82% | 54%, 4, (3–4), 85% | 64%,6, (4–7), 87% | 72%, 10, (5–16), 87% | 77%, 13, (6–27), 88% |
| 2 | 10000 | R | 2.0 | 0.001 | 2.25 | 0.20 | 0.9 | 43%, 2, (2), 79% | 56%, 4, (3–4), 81% | 65%, 7, (4–7), 82% | 71%, 11, (6–16), 83% | 75%, 16, (7–32), 83% |
| 3 | 10000 | R | 1.8 | 0.01 | 2.25 | 0.20 | 0.9 | 50%, 2, (2), 79% | 62%, 4, (3–4), 80% | 68%, 7, (5–7), 80% | 72%, 13, (6–16), 80% | 75%, 20, (7–32), 81% |
| 4 | 10000 | R | 2.2 | 0.0001 | 2.25 | 0.15 | 1 | 69%, 2, (2), 98% | 84%, 4, (3–4), 99% | 90%, 6, (3–7), 99% | 95%, 10, (4–16), 99% | 96%, 13, (4–32), 99% |
| 5 | 10000 | R | 2.2 | 0.0001 | 2.0 | 0.15 | 1 | 75%, 2, (2), 98% | 87%, 4, (2–4), 99% | 92%, 5, (3–7), 99% | 95%, 8, (3–16), 99% | 97%, 12 (3–32), 99% |
| 6 | 10000 | R | 2.0 | 0.0001 | 2.25 | 0.15 | 1 | 43%,2, (2), 86% | 59%, 4, (3–4), 89% | 70%, 6, (4–7), 90% | 77%, 10, (5–16), 91% | 82%, 13, (6–28), 91% |
| 7 | 10000 | R | 2.0 | 0.0001 | 2.0 | 0.15 | 1 | 49%, 2 (2), 86% | 63%, 4, (3–4), 89% | 73%, 5, (4–7), 90% | 79%, 8, (4–16), 91% | 83%, 12, (5–27), 91% |
| 8 | 2000 | D | 1.5 | 0.0001 | 2.25 | 0.15 | 0.9 | 57%, 3, (2–3), 94% | 73%, 4, (3–6), 96% | 82%, 6, (4–10), 96% | 88%, 10, (4–18), -- | 91%, 13, (5–32), -- |
| 9 | 2000 | D | 1.5 | 0.0001 | 2.25 | 0.15 | 1 | 65%, 3, (2–3), 97% | 80%, 4, (3–6),98% | 88%, 6, (3–9), 98 | 92%, 10, (4–21), -- | 95%, 13, (4–40), -- |
| 10 | 2000 | D | 1.5 | 0.0001 | 2.0 | 0.15 | 1 | 70%, 2, (2–3), 97% | 83%, 4, (3–6), 98% | 90%, 5. (3–10), 98% | 94%, 8, (3–20), -- | 96%, 12, (4–37), -- |
We only consider designs in which the pool size (T) is between 10 and 500. The prevalence φ is 0.05 and disease allele frequency pis 0.15.
--: the size of pool is out of our consideration, no power is provided;
N: sample size in cases or controls;
MOI: mode of inheritance (R = recessive MOI and D= dominant MOI);
R2: relative risk for subjects homozygous for disease allele;
α: significance level;
m: machine replicability variance factor;
MAF: minor SNP marker allele frequency;
p: measure of linkage disequilibrium.
List of parameters considered in the multiple regression analysis
| Parameter | Description | Value |
| Number of case (control) subjects | 1000, 2000, 5000, 10000 | |
| Prevalence of the disease | 0.01, 0.1 | |
| Size of the pool | 25, 50, 100, 250, 500 | |
| Number of replicates of each pool | 1, 2, 4, 8 | |
| Disease allele frequency | 0.1, 0.25 | |
| MOI | Modes of inheritance | dominant, recessive, multiplicative |
| Genotype relative risk of homozygote of disease allele | *1.2, 1.5, 2.25, 4 | |
| Minor SNP marker allele frequency | 0.1, 0.35 | |
| Machine replicability variance factor | 2.05, 2.1, 2.25, 3 |
*R1 is obtained according to the relationship between R1 and R2; that is for multiplicative MOI, R2 = ; dominant MOI, R1 = R2; recessive MOI, R1 = 1. We considered all 30720 (44 × 23 × 3 × 5) situations generated from the parameters listed above.