| Literature DB >> 23874685 |
Peter K Joshi1, James Prendergast, Ross M Fraser, Jennifer E Huffman, Veronique Vitart, Caroline Hayward, Ruth McQuillan, Dominik Glodzik, Ozren Polašek, Nicholas D Hastie, Igor Rudan, Harry Campbell, Alan F Wright, Chris S Haley, James F Wilson, Pau Navarro.
Abstract
The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1-10%) in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28-38%, for SNPs with a minor allele frequency in the range 1-3%.Entities:
Mesh:
Year: 2013 PMID: 23874685 PMCID: PMC3712964 DOI: 10.1371/journal.pone.0068604
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Preparation of array data and local reference panel for imputation.
The genotype data were quality controlled and phased. These data were then used in further downstream analysis.
Figure 2Illustration of the procedure to estimate imputation accuracy.
We used a drop one-out crossvalidation approach. For the imputation step each subject was removed from the reference panel in turn, and this subject’s exome sequence SNPs were then imputed using either the 1000 Genomes reference panel alone or in conjunction with a second local reference panel. All subjects’ imputed allelic dosages were then compared with the exome sequence genotype data (“gold standard”).
Mean accuracy of imputation (r2 of allelic dosage across all samples for a SNP) averaged across SNPs split by Minor Allele Frequency (MAF).
| MAF | 1–3.2% | 3.2–10% | 10–32% | >32% | ||||
| Population | Korčula | Orkney | Korčula | Orkney | Korčula | Orkney | Korčula | Orkney |
|
| 12132 | 12123 | 11548 | 10677 | 16243 | 15262 | 10174 | 9265 |
|
| 0.504 | 0.586 | 0.729 | 0.778 | 0.868 | 0.894 | 0.894 | 0.913 |
|
| 0.697 | 0.753 | 0.841 | 0.867 | 0.916 | 0.931 | 0.934 | 0.944 |
|
| 0.193 | 0.167 | 0.112 | 0.089 | 0.049 | 0.037 | 0.039 | 0.031 |
|
| 0.309 | 0.295 | 0.182 | 0.157 | 0.093 | 0.078 | 0.074 | 0.065 |
|
| 38% | 28% | 15% | 11% | 6% | 4% | 4% | 1% |
MAF bins increase by factors of √10, to create four exponentially increasing bins.
N SNPs: number of SNPs in MAF bin.
1kG: 1000 Genomes used as reference panel.
1kG+LRP: 1000 Genomes plus local reference panel.
Increase r2: Average across all SNPs in MAF bin increase in r2.
Std dev: The standard deviation (across SNPs) of the increase in r2 at each SNP.
Inc. Sample: Increase in effective sample size for GWAS.
The standard errors of mean increases are less than 0.003. All improvements in r2 are significantly different from zero and significantly different between MAF bands (P<0.001, two-sided t tests).
Figure 3Frequency plot of imputation accuracy (r2) using 1000 Genomes data alone against 1000 Genomes plus a local reference panel for SNPs with Minor Allele Frequencies (MAF) of 1–3.2%.
Figure 4Plot of mean improvement in imputation accuracy (r2) for SNPs with minor allele frequency (MAF) in the range 1–10% in our exome sequence data.