| Literature DB >> 20018042 |
Joanna M Biernacka1, Rui Tang, Jia Li, Shannon K McDonnell, Kari G Rabe, Jason P Sinnwell, David N Rider, Mariza de Andrade, Ellen L Goode, Brooke L Fridley.
Abstract
Several methods have been proposed to impute genotypes at untyped markers using observed genotypes and genetic data from a reference panel. We used the Genetic Analysis Workshop 16 rheumatoid arthritis case-control dataset to compare the performance of four of these imputation methods: IMPUTE, MACH, PLINK, and fastPHASE. We compared the methods' imputation error rates and performance of association tests using the imputed data, in the context of imputing completely untyped markers as well as imputing missing genotypes to combine two datasets genotyped at different sets of markers. As expected, all methods performed better for single-nucleotide polymorphisms (SNPs) in high linkage disequilibrium with genotyped SNPs. However, MACH and IMPUTE generated lower imputation error rates than fastPHASE and PLINK. Association tests based on allele "dosage" from MACH and tests based on the posterior probabilities from IMPUTE provided results closest to those based on complete data. However, in both situations, none of the imputation-based tests provide the same level of evidence of association as the complete data at SNPs strongly associated with disease.Entities:
Year: 2009 PMID: 20018042 PMCID: PMC2795949 DOI: 10.1186/1753-6561-3-s7-s5
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Mean error rates by imputation method and scenario
| IMPUTE | MACH | PLINK | fastPHASE | ||
|---|---|---|---|---|---|
| Overall | 0.112 | 0.114 | 0.142 | 0.135 | |
| By regiona | null1 | 0.251 | 0.251 | 0.284 | 0.271 |
| null2 | 0.066 | 0.066 | 0.090 | 0.085 | |
| 0.083 | 0.092 | 0.131 | 0.111 | ||
| 0.106 | 0.109 | 0.144 | 0.162 | ||
| 0.099 | 0.098 | 0.122 | 0.107 | ||
| 0.061 | 0.059 | 0.069 | 0.058 | ||
| By max pairwise | 0.208 | 0.212 | 0.245 | 0.248 | |
| LD | 0.030 | 0.030 | 0.053 | 0.038 | |
| Overall | 0.116 | 0.112 | 0.173 | 0.127 | |
| By regiona | null1 | 0.206 | 0.201 | 0.250 | 0.218 |
| null2 | 0.123 | 0.122 | 0.175 | 0.139 | |
| 0.079 | 0.069 | 0.145 | 0.097 | ||
| 0.055 | 0.053 | 0.121 | 0.053 | ||
| By max pairwise | 0.200 | 0.197 | 0.256 | 0.211 | |
| LD | 0.046 | 0.041 | 0.105 | 0.059 | |
aIn Scenario 1, for regions PADI4-1 and PTPN22-1, the most strongly associated SNP was removed and imputed, while for regions PADI4-2 and PTPN22-2, the two SNPs flanking the most strongly associated SNP were imputed, in addition to other SNPs as described in the methods.
Figure 1Imputation error rates decline with increasing LD (scenario 2).
Mean (SD) differencea in -log10(p-value) based on a test of association using complete data and a test of association using the imputed data
| Scenario 1 | Scenario 2 | |
|---|---|---|
| IMPUTE | 0.352 (1.26) | 0.078 (0.493) |
| MACH | 0.363 (1.27) | 0.093 (0.543) |
| PLINK | 0.509 (1.55) | 0.054 (0.617) |
| fastPHASE | 0.483 (1.70) | 0.046 (0.633) |
aDifference = (imputed data -log10(p-value)) - (complete data -log10(p-value))
Figure 2Comparison of association test results (-log10(.
Figure 3Association test results (-log10(.