| Literature DB >> 20300623 |
Peng Lin1, Sarah M Hartz, Zhehao Zhang, Scott F Saccone, Jia Wang, Jay A Tischfield, Howard J Edenberg, John R Kramer, Alison M Goate, Laura J Bierut, John P Rice.
Abstract
BACKGROUND: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2010 PMID: 20300623 PMCID: PMC2837741 DOI: 10.1371/journal.pone.0009697
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Marginal cross classification of the genotypes used for the computation of IQS.
| True genotypes | ||||
| Imputed Genotypes | AA | AB | BB | Total |
| AA |
|
|
|
|
| AB |
|
|
|
|
| BB |
|
|
|
|
| Total |
|
|
|
|
IQS adjusts for minor allele frequency by comparing observed frequencies to expected frequencies.
Summary of evaluation measures for European American and African American samples.
| Ethnic group | European Americans | African Americans | |
|
| No. of imputed SNPs | 260908 | 304425 |
|
| Efficiency % | 94.5 | 85.1 |
| Mean % | 98.8 | 97.1 | |
| Range % | 0.0∼100.0 | 0.0∼100.0 | |
| Inter-quartile % | 98.8∼99.9 | 96.3∼99.5 | |
|
| Mean % | 90.2 | 78.3 |
| Range % | −9.1∼100 | −7.9∼100 | |
| Inter-quartile % | 90.7∼99.2 | 68.4∼94.3 |
Figure 1The means of IQS and imputation accuracy within each minor allele frequency interval.
IQS adjusts for chance agreement. As the minor allele frequency approaches 0, the difference between IQS and imputation accuracy increases. The standard deviation is shown for every other point.
Figure 2The Q-Q plots based on randomly dividing data into cases and controls.
Samples were divided randomly into cases and controls. (A) All Illumina 1 M SNPs are directly genotyped indicating there is no population stratification or other non-random factors in cases and controls. (B) Cases were genotyped on the Illumina 550 K array and the remaining Illumina 1 M SNPs were imputed. (C) An IQS filter (IQS>0.9) was applied, retaining 92% of the SNPs. (D) An imputation accuracy filter (>0.99) was applied, retaining 91% of the SNPs.
Comparison of empirical evaluations of imputation quality to IQS in European Americans.
| Minor Allele frequency | |||
| False positives n (retained %) | >0.01 | >0.05 | >0.10 |
| IQS >0.9 | 0 (89.47%) | 0 (83.90%) | 0 (72.92%) |
| No filter | 3120 (96.63%) | 2331 (89.48%) | 1775 (77.47%) |
| Proper_info >0.5 | 3093 (96.62%) | 2329 (89.48%) | 1775 (77.47%) |
| Proper_info >0.7 | 2726 (96.32%) | 2080 (89.28%) | 1571 (77.31%) |
| Proper_info >0.9 | 1392 (94.16%) | 1032 (87.67%) | 805 (76.06%) |
| Variance Ratio >0.3 | 1869 (96.22%) | 1526 (89.27%) | 1234 (77.33%) |
| Variance Ratio >0.5 | 1226 (95.65%) | 928 (88.89%) | 770 (77.04%) |
| Variance Ratio >0.7 | 789 (94.57%) | 514 (88.12%) | 390 (76.47%) |
| Variance Ratio >0.9 | 498 (90.40%) | 253 (85.00%) | 153 (74.14%) |
| MAF difference <0.01 | 267 (22.89%) | 120 (19.63%) | 76 (15.60%) |
| MAF difference <0.1 | 2516 (95.11%) | 1739 (87.97%) | 1191 (75.94%) |
| MAF difference <0.2 | 2952 (96.57%) | 2168 (89.42%) | 1615 (77.38%) |
The sample is based on 2,597 European Americans that were randomized to cases and controls. Cases used genotypes from the Illumina 550 K platform and were imputed to the 1 M platform and controls were genotyped on the 1 M platform. Genome-wide significance is set as p<5×10−8. There were 792,563 SNPs available. False positives refer to the absolute number of SNPs that reached genome-wide significance despite the filter. The retained percentage is the proportion of SNPs that passed the filter.
Figure 3Evaluation of the robustness of IQS score.
European Americans (A) and African Americans(B) datasets were split in half and Illumina 550 K SNPs were imputed to Illumina 1 M SNPs. IQS score for the two halves of the data were plotted against each other. SNPs with minor allele frequency less than 0.01 were excluded to avoid zero in the denominator.
Figure 4A database of IQS can be used to filter poorly-imputed SNPs.
The set of hard-to-impute SNPs compiled from one dataset can be used to filter the imputed data in another dataset. (A) Cases were European Americans genotyped on the Illumina 550 K array and the remaining Illumina 1 M SNPs were imputed. Controls were European Americans genotyped on the Illumina 1 M array. The QQ plot was shown for the 790,965 available SNPs. (B) An IQS filter (IQS>0.9) was applied, retaining 92% of the SNPs. IQS was calculated from an independent dataset. (C) A similar QQ plot for African Americans. Cases were genotyped on the Illumina 550 K array and the remaining Illumina 1 M SNPs were imputed. Controls were genotyped on the Illumina 1 M array. The QQ plot was shown for the 836,993 available SNPs. (D) An IQS filter (IQS>0.9) was applied, retaining 78% of the SNPs. IQS was calculated from an independent dataset.