| Literature DB >> 18276640 |
Stuart Macgregor1, Zhen Zhen Zhao, Anjali Henders, Martin G Nicholas, Grant W Montgomery, Peter M Visscher.
Abstract
Genome-wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA-pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and showed that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300-based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII-based pooling only extracts approximately 30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100-fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing approximately 20-fold reduction in total cost compared with IG-based alternatives. The large cost savings with Illumina HumanHap300-based pooling imply that future studies need only be limited by the availability of samples and not cost.Entities:
Mesh:
Year: 2008 PMID: 18276640 PMCID: PMC2346606 DOI: 10.1093/nar/gkm1060
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Fig. 1.Affymetrix Genechip HindIII versus Illumina HumanHap300 array-specific error plots. The plots show the difference in allele frequency estimates for a pair of arrays for each type on the control pool (actual difference in frequency for each pool = 0). Affymetrix results are from a pair of 50K Genechip HindIII arrays and the Illumina results are from a pair of 300K HumanHap300 arrays. These results are for a single pair of arrays; in practice the array-specific error will be reduced through the use of multiple arrays.
Fig. 2.Publicly available caucasian control individual genotyping frequencies versus pooling frequencies for Affymetrix Genechip HindIII and Illumina HumanHap300 arrays. The data are the 15 645 SNPs in common between the Affymetrix Genechip HindIII and Illumina HumanHap300 arrays. The frequency of the sample of 271 publicly available caucasian controls is on the y-axis, with the pooling frequencies from the N=384 pooled case/controls on the x-axis. The broken line is y=x. The solid line is the regression line.
Fig. 3.Comparison of Illumina HumanHap300-based pooling and individual genotyping for 104 SNPs. The solid line is y=x. A total of 53 SNPs were selected independently of pooling results and 51 SNPs were selected on the basis of pooling results.
Fig. 4.Power curves for individual genotyping and pooling. Power is for 2000 cases, 2000 controls. ‘30x HumanHap300’ assumes 6 Illumina HumanHap300 arrays per N=400 pool. ‘15x HumanHap300’ assumes 3 Illumina HumanHap300 arrays per N=400 pool. ‘10x HumanHap300’ assumes 2 Illumina HumanHap300 arrays per N=400 pool. ‘15x Genechip HindIII’ assumes 3 Affymetrix Genechip HindIII arrays per N=400 pool. PSD is taken to be 0.009 for Illumina HumanHap arrays, 0.024 for Affymetrix Genechip HindIII arrays. Assumptions for power calculation are a multiplicative disease model, marker allele frequency and disease allele frequency both =0.4, complete linkage disequilibrium between marker and disease alleles, alpha = 0.0000001 (i.e. 500 000 tests), disease prevalence 0.01.