| Literature DB >> 21541012 |
Bogdan Pasaniuc1, Noah Zaitlen, Guillaume Lettre, Gary K Chen, Arti Tandon, W H Linda Kao, Ingo Ruczinski, Myriam Fornage, David S Siscovick, Xiaofeng Zhu, Emma Larkin, Leslie A Lange, L Adrienne Cupples, Qiong Yang, Ermeg L Akylbekova, Solomon K Musani, Jasmin Divers, Joe Mychaleckyj, Mingyao Li, George J Papanicolaou, Robert C Millikan, Christine B Ambrosone, Esther M John, Leslie Bernstein, Wei Zheng, Jennifer J Hu, Regina G Ziegler, Sarah J Nyante, Elisa V Bandera, Sue A Ingles, Michael F Press, Stephen J Chanock, Sandra L Deming, Jorge L Rodriguez-Gil, Cameron D Palmer, Sarah Buxbaum, Lynette Ekunwe, Joel N Hirschhorn, Brian E Henderson, Simon Myers, Christopher A Haiman, David Reich, Nick Patterson, James G Wilson, Alkes L Price.
Abstract
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21541012 PMCID: PMC3080860 DOI: 10.1371/journal.pgen.1001371
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Average statistical power of simulated case-control scores in African Americans computed using (a) typed or (b) imputed genotypes.
| Typed Genotypes | ||||||
| R = 1.2 random | R = 1.2 Δ>0.4 | R = 1.5 random | R = 1.5 Δ>0.4 | R = 2.0 random | R = 2.0 Δ>0.4 | |
| ATT χ2(1dof) | 0.0017 | 0.0026 | 0.3803 | 0.5533 | 0.8351 | 0.9769 |
| SNP1 χ2(1dof) | 0.0014 | 0.0012 | 0.3628 | 0.4181 | 0.8279 | 0.9362 |
| ADM χ2(1dof) | 0.0001 | 0.0013 | 0.0081 | 0.0903 | 0.0737 | 0.6306 |
| SUM χ2(2dof) | 0.0012 | 0.0028 | 0.3555 | 0.624 | 0.8287 | 0.9874 |
| MIX χ2(1dof) | 0.0021 | 0.0046 | 0.4131 | 0.6899 | 0.8486 | 0.9907 |
For each score we list the proportion of SNPs for which the score attains genome-wide significance (defined as P<5e-08 for all scores except ADM, P<1e-05 for ADM), for random SNPs as well as SNPs in the top decile of population differences (Δ>0.4), for R = 1.2, R = 1.5, R = 2.0 simulations (see main text). For R = 1.0 the power is 0 for all scores. In general the MIX score shows an increase in statistical power relative to the ATT score, and a further increase in power relative to the SNP1 score, which is analogous to disease mapping in European or African populations. ATT-dose denotes ATT test using imputation dosages.
Figure 1Statistical power of SNP1, ATT, MIX scores as a function of population differentiation.
We plot the average power of each score as a function of allele frequency difference between CEU and YRI, for the R = 1.5 simulation only.
Figure 2Imputation accuracy as a function of population differentiation.
We plot the average imputation accuracy as a function of allele frequency difference between CEU and YRI both when CEU+YRI was used as reference and when using the local ancestry aware framework.
Disease scoring when the causal SNP is not typed or imputed.
| Score | Average maximum χ2 value | Proportion of regions that are genome wide significant | ||
| ATT χ2(1dof) | 26.17 |
| 0.3834 |
|
| SNP1 χ2(1dof) | 25.47 |
| 0.3622 |
|
| ADM χ2(1dof) | 4.23 |
| 0.0135 |
|
| SUM χ2(2dof) | 28.62 |
| 0.3571 |
|
| MIX χ2(1dof) | 27.46 |
| 0.4158 |
|
We list the average maximum statistic and the percentage of times it attains genome wide significance (defined as P<5e-08 for all scores except ADM, P<1e-05 for ADM) for each of the case-control scores obtained in a region of 40 SNPs centered around the 100,000 simulated causal SNPs with R = 1.5. The results obtained when the score at the simulated causal SNP was removed from the computation of the maximum are denoted in bold. The MIX score outperforms the other scores both when the causal is present or unobserved in the data.
Results for CHD and T2D case-control phenotypes.
| CHD | ||||||||||
| SNP | chrom | position (build35) | CEUfreq | YRIfreq | ATT | SNP1 | ADM | SUM | HET | MIX |
| rs17577085 | 5 | 141,843,788 | 0.11 | 0.00 | 2.66 | 1.54 | 1.46 | 2.00 | 0.00 | 2.06 |
| rs4244029* | 5 | 141,893,025 | 0.08 | 0.27 | 2.66 | 2.84 | 1.31 | 3.06 | 0.56 | 2.51 |
| Best score | 5 | - | - | - | 2.66 | 2.84 | 1.93 |
| - | 2.51 |
| rs325105 | 6 | 147,805,960 | 0.47 | 0.012 | 2.62 | 1.65 | 0.81 | 1.57 | 0.65 | 2.15 |
| rs325129* | 6 | 147,848,836 | 0.25 | 0.74 | 3.22 | 2.55 | 1.05 | 2.57 | 0.26 | 3.12 |
| Best score | 6 | - | - | - |
| 2.86 | 1.18 | 2.79 | - | 3.13 |
| rs6475606 | 9 | 22,071,850 | 0.5 | 0.01 | 1.87 | 2.72 | 0.11 | 2.11 | 2.04 | 2.38 |
| rs1333047* | 9 | 22,114,504 | 0.49 | 0.99 | 2.32 | 3.64 | 0.00 | 2.95 | 2.05 | 2.96 |
| Best score | 9 | - | - | - | 2.50 |
| 0.32 | 2.95 | - | 2.99 |
For each CHD region, we list results for each score (-log in base 10 of the p-value) for the originally implicated genotyped SNP, the imputed (* denotes imputed SNPs) or genotyped SNP producing the most significant P-value in the region and the best score for each of the five scores. Analogous to CHD, for each T2D region. The value achieving the smallest p-value is denoted in bold.
Results obtained at FGFR2 locus, SNP rs2981578 using MACH imputation.
| ATT | ADM | MIX | SNP1 | HET | SUM | |
| χ2 value | 13.99 | 6.16 | 17.04 | 16.57 | 1.80 | 22.74 |
| -log10(p-value) | 3.74 | 1.88 | 4.44 | 4.33 | 0.75 | 4.94 |
We list the χ2 values along with the –log(p-value) obtained by the case-control scoring statistics showing that incorporating the admixture signal yields increased results over the standard ATT test with correction for global ancestry. We note that SNP rs2981578 shows the highest scores in the region.
Average statistical power of simulated quantitative scores in African Americans.
| Typed Genotypes | ||||||
|
|
|
|
|
|
| |
| QATT χ2(1dof) | 0.0013 | 0.0009 | 0.2165 | 0.3223 | 0.8566 | 0.9883 |
| QSNP1 χ2(1dof) | 0.0012 | 0.0005 | 0.1951 | 0.2087 | 0.8437 | 0.9422 |
| QADM χ2(1dof) | 0 | 0.0001 | 0.0004 | 0.0048 | 0.0229 | 0.2594 |
| QSUM χ2(2dof) | 0.0006 | 0.0003 | 0.1636 | 0.2473 | 0.8353 | 0.9839 |
For each score we list the proportion of SNPs for which the score attains genome-wide significance (defined as P<5e-08 for all scores except QADM, P<1e-05 for QADM), for random SNPs as well as SNPs in the top decile of population differences (Δ>0.4), 0, ε = 0.05, ε = 0.10, ε = 0.20 simulations (see main text). For ε = 0, the power is 0 for all scores. Imputed Genotypes: The same 100,000 SNPs were masked, followed by imputation, and the imputed genotypes were scored and presented as in Typed Genotypes.