| Literature DB >> 23071581 |
Marie-Claude Babron1, Marie de Tayrac, Douglas N Rutledge, Eleftheria Zeggini, Emmanuelle Génin.
Abstract
Although variations in allele frequencies at common SNPs have been extensively studied in different populations, little is known about the stratification of rare variants and its impact on association tests. In this paper, we used Affymetrix 500K genotype data from the WTCCC to investigate if variants in three different frequency categories (below 1%, between 1 and 5%, above 5%) show different stratification patterns in the UK population. We found that these patterns are indeed different. The top principal component extracted from the rare variant category shows poor correlations with any principal component or combination of principal components from the low frequency or common variant categories. These results could suggest that a suitable solution to avoid false positive association due to population stratification would involve adjusting for the respective PCs when testing for variants in different allele frequency categories. However, we found this was not the case both on type 2 diabetes data and on simulated data. Indeed, adjusting rare variant association tests on PCs derived from rare variants does no better to correct for population stratification than adjusting on PCs derived from more common variants. Mixed models perform slightly better for low frequency variants than PC based adjustments but less well for the rarest variants. These results call for the need of new methodological developments specifically devoted to address rare variant stratification issues in association tests.Entities:
Mesh:
Year: 2012 PMID: 23071581 PMCID: PMC3465327 DOI: 10.1371/journal.pone.0046519
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flowchart showing the different QC steps and the number of SNPs in the different MAF categories.
Figure 2Distribution of the number of rare (A) and low frequency (B) variants shared by several regions.
Figure 3PCA for the different pruned MAF sets.
The mean of PC1 and PC2 scores (top row) and of PC2 and PC3 scores (bottom row) in each region are plotted considering the common variants, the low frequency variants and the rare variants.
Correlation (R2 values) between the first two PCs (PC1.x and PC2.x) obtained on the different subsets of variants (x = “common”, “lowfreq” or “rare”).
| PC1.common | PC2.common | PC1.lowfreq | PC2.lowfreq | PC1.rare | PC2.rare | Top10.common | Top10.lowfreq | Top10.rare | |
|
| 1.00 | 0.00 | 0.14 | 0.44 | 0.00 | 0.07 | 1.00 | 0.59 | 0.13 |
|
| 1.00 | 0.50 | 0.12 | 0.01 | 0.19 | 1.00 | 0.63 | 0.43 | |
|
| 1.00 | 0.00 | 0.03 | 0.31 | 0.67 | 1.00 | 0.65 | ||
|
| 1.00 | 0.00 | 0.00 | 0.57 | 1.00 | 0.00 | |||
|
| 1.00 | 0.00 | 0.02 | 0.04 | 1.00 | ||||
|
| 1.00 | 0.28 | 0.31 | 1.00 |
The last three columns Top10.x give the cumulative R2 values over the top 10 PCs to show how each PC.x in line is captured by the combined top 10 PCs of the different subsets.
Genomic control coefficient lambda λGC obtained for the different tests of association performed on the simulated scenario of stratification in regions R11 and R12.
| Within the Rare set | |||||
| Common | LowFreq | Rare | MAF< = 0.005 | MAF>0.005 | |
|
| 0.993 | 1.023 | 1.802 | 1.802 | 1.103 |
|
| 1.006 | 1.058 | 2.193 | 2.193 | 1.121 |
|
| 0.993 | 1.042 | 2.072 | 2.137 | 1.067 |
|
| 0.993 | 1.042 | 2.052 | 2.129 | 1.066 |
|
| 0.990 | 1.048 | 1.927 | 2.053 | 1.104 |
|
| 0.978 | 1.027 | 1.675 | 1.880 | 1.100 |
|
| 0.995 | 1.044 | 1.969 | 2.085 | 1.072 |
|
| 0.995 | 1.041 | 1.945 | 2.044 | 1.064 |
|
| 0.992 | 1.033 | 1.922 | 2.022 | 1.071 |
|
| 0.980 | 1.012 | 1.767 | 1.931 | 1.112 |
|
| 0.993 | 1.040 | 2.085 | 2.152 | 1.089 |
|
| 0.991 | 1.043 | 2.081 | 2.143 | 1.090 |
|
| 0.989 | 1.032 | 1.960 | 2.074 | 1.081 |
|
| 0.974 | 1.014 | 1.858 | 2.013 | 1.072 |
|
| 1.001 | 1.052 | 2.182 | 2.182 | 1.115 |
|
| 1.001 | 1.052 | 2.182 | 2.182 | 1.115 |
|
| 1.000 | 1.048 | 2.165 | 2.180 | 1.114 |
Cochran-Mantel-Haenszel test accounting for the 2 different regions.
Test not corrected for population stratification.
Test corrected for population stratification using different number of PCs computed on different pruned MAF sets (i.e.; common.2 means that the PCs were computed on the common varint sets and 2 the test is adjusted on 2 such PCs).
Test performed using the mixed model implemented in EMMAX with the relatedness matrix computed either on the common, low frequency or rare variant sets.