| Literature DB >> 20088020 |
Joanna J Zhuang1, Krina Zondervan, Fredrik Nyberg, Chris Harbron, Ansar Jawaid, Lon R Cardon, Bryan J Barratt, Andrew P Morris.
Abstract
Genome-wide association (GWA) studies have proved extremely successful in identifying novel genetic loci contributing effects to complex human diseases. In doing so, they have highlighted the fact that many potential loci of modest effect remain undetected, partly due to the need for samples consisting of many thousands of individuals. Large-scale international initiatives, such as the Wellcome Trust Case Control Consortium, the Genetic Association Information Network, and the database of genetic and phenotypic information, aim to facilitate discovery of modest-effect genes by making genome-wide data publicly available, allowing information to be combined for the purpose of pooled analysis. In principle, disease or control samples from these studies could be used to increase the power of any GWA study via judicious use as "genetically matched controls" for other traits. Here, we present the biological motivation for the problem and the theoretical potential for expanding the control group with publicly available disease or reference samples. We demonstrate that a naïve application of this strategy can greatly inflate the false-positive error rate in the presence of population structure. As a remedy, we make use of genome-wide data and model selection techniques to identify "axes" of genetic variation which are associated with disease. These axes are then included as covariates in association analysis to correct for population structure, which can result in increases in power over standard analysis of genetic information from the samples in the original GWA study. (c) 2010 Wiley-Liss, Inc.Entities:
Mesh:
Year: 2010 PMID: 20088020 PMCID: PMC2962805 DOI: 10.1002/gepi.20482
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Fig. 1Power of a GWA study of 500 cases to detect association of a causal variant with allele frequency 20% for a range of heterozygous genotype relative risks under a multiplicative model with disease prevalence of 0.1%. Results are presented for a trend test of association for a significance level of 5%, with the number of control samples ranging from 500 to 5,000 individuals.
False-positive error rate (FPER) of three trend tests of association over 5,000 replicates of 100 cases, 100 controls and 100 samples from each of three external cohorts: T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS
| T_CC | T_F | T_Fmds | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Allelic odds ratio | Allelic odds ratio | Allelic odds ratio | |||||||
| FPER | Mean | 5–95% | FPER | Mean | 5–95% | FPER | Mean | 5–95% | |
| 0 | 5.0% | 1.00 | 0.66–1.52 | 5.0% | 0.99 | 0.71–1.36 | 4.9% | 0.99 | 0.71–1.37 |
| 0.001 | 5.6% | 1.00 | 0.65–1.53 | 5.4% | 0.99 | 0.70–1.37 | 5.3% | 0.99 | 0.70–1.38 |
| 0.002 | 5.2% | 1.00 | 0.65–1.54 | 5.6% | 0.99 | 0.70–1.39 | 5.6% | 0.99 | 0.70–1.40 |
| 0.005 | 5.1% | 0.99 | 0.64–1.51 | 6.9% | 0.99 | 0.69–1.40 | 6.4% | 0.99 | 0.66–1.45 |
| 0.01 | 5.0% | 1.00 | 0.66–1.53 | 8.3% | 1.00 | 0.69–1.44 | 6.3% | 1.00 | 0.65–1.53 |
| 0.02 | 5.3% | 1.00 | 0.65–1.53 | 11.7% | 1.00 | 0.66–1.50 | 6.1% | 0.99 | 0.64–1.54 |
| 0.05 | 4.9% | 1.00 | 0.66–1.52 | 21.5% | 1.01 | 0.60–1.68 | 5.2% | 1.00 | 0.65–1.54 |
| 0.1 | 5.2% | 1.00 | 0.66–1.52 | 31.6% | 1.03 | 0.56–1.96 | 5.5% | 1.00 | 0.65–1.54 |
Mean maximum likelihood estimates of the allelic odds ratio are presented, together with the 5- and 95-percentiles over 5,000 replicates of data. Results are presented for varying degrees of population structure, represented by F, for a significance level of 5%.
Fig. 2Power of three trend tests of association at a 5% significance level for a high-risk allele frequency of 20% as a function of the allelic odds ratio in the absence of population structure (F = 0): T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS. Power is estimated over 5,000 replicates of 100 cases, 100 controls, and 100 samples from each of three external cohorts.
Fig. 3Power of three trend tests of association at a 5% significance level for a high-risk allele frequency of 20%, as a function of the allelic odds ratio in the presence of population structure (F = 0.01): T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS. Power is estimated over 5,000 replicates of 100 cases, 100 controls, and 100 samples from each of three external cohorts.
Power of three trend tests of association for a SNP with minor allele frequency of 20% and a heterozygous genotype relative risk of 1.5 over 5,000 replicates of 100 cases, 100 controls and 100 samples from each of three external cohorts: T_CC, cases against controls from the source population, without correction for population structure; T_F, cases against control cohort expanded by external samples, without correction for population structure; T_Fmds, cases against control cohort expanded by external samples, corrected for up to three axes of genetic variation determined through MDS
| T_CC | T_F | T_Fmds | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Allelic odds ratio | Allelic odds ratio | Allelic odds ratio | |||||||
| Power | Mean | 5–95% | Power | Mean | 5–95% | Power | Mean | 5–95% | |
| 0 | 40.3% | 1.51 | 1.03–2.26 | 59.5% | 1.50 | 1.11–2.02 | 59.6% | 1.51 | 1.11–2.03 |
| 0.001 | 41.1% | 1.51 | 1.02–2.28 | 59.7% | 1.50 | 1.10–2.04 | 59.5% | 1.51 | 1.10–2.05 |
| 0.002 | 40.5% | 1.51 | 1.00–2.30 | 58.7% | 1.50 | 1.08–2.06 | 58.6% | 1.51 | 1.08–2.08 |
| 0.005 | 40.5% | 1.51 | 1.03–2.25 | 57.2% | 1.49 | 1.09–2.08 | 50.4% | 1.51 | 1.05–2.18 |
| 0.01 | 40.4% | 1.51 | 1.00–2.29 | 58.2% | 1.50 | 1.05–2.13 | 45.1% | 1.53 | 1.01–2.33 |
| 0.02 | 41.0% | 1.51 | 1.02–2.26 | 57.3% | 1.50 | 1.01–2.23 | 42.7% | 1.53 | 1.02–2.31 |
| 0.05 | 41.1% | 1.51 | 1.03–2.27 | 57.7% | 1.52 | 0.93–2.48 | 42.4% | 1.53 | 1.01–2.31 |
| 0.1 | 41.0% | 1.52 | 1.00–2.29 | 58.0% | 1.53 | 0.82–2.90 | 41.8% | 1.53 | 1.01–2.31 |
Mean maximum likelihood estimates of the allelic odds ratio are presented, together with the 5- and 95-percentiles over 5,000 replicates of data. Results are presented for varying degrees of population structure, represented by F, for a significance level of 5%.