| Literature DB >> 30514702 |
Ehsan Ullah1, Raghvendra Mall1, Mostafa M Abbas1, Khalid Kunji1, Alejandro Q Nato2,3, Halima Bensmail1, Ellen M Wijsman2,4, Mohamad Saad1.
Abstract
Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.Entities:
Mesh:
Year: 2018 PMID: 30514702 PMCID: PMC6314157 DOI: 10.1101/gr.236315.118
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Mean correlation R2 and IQS between true and imputed genotypes for all approaches, using the random selection strategy: (A) R2 for EUR; (B) R2 for AFR; (C) IQS for EUR; (D) IQS for AFR. The first/second of a pair of programs in the key indicates phasing/imputation functions. Computation of the mean of R2 and IQS is based on all 100 genetic data sets with a sample size of 960 subjects, each having 7954 SNPs for EUR and 10,891 SNPs for AFR.
Figure 2.Mean correlation R2 between true and imputed genotypes for SHAPEIT+minimac, duoHMM+minimac, SHAPEIT+IMPUTE, duoHMM+IMPUTE, Eagle+minimac, and Eagle+IMPUTE, using the random selection strategy: (A) EUR; (B) AFR. The first/second of a pair of programs indicates phasing/imputation functions. Computation of the mean of R2 is based on all 100 genetic data sets with a sample size of 960 subjects, each having 7954 SNPs for EUR and 10,891 SNPs for AFR.
Power of association tests performed in European and African data for different combinations of phasing+imputation approaches using the random selection strategy for = 0.05
Figure 3.Mean correlation R2 between true and imputed genotypes for the four selection strategies (GIGI-Pick, ExomePicks, PRIMUS, and random selection) for Ped_Pop, GIGI, and duoHMM+minimac: (A) EUR; (B) AFR. Computation of the mean of R2 is based on all 100 genetic data sets with a sample size of 960 subjects, each having 7954 SNPs for EUR and 10,891 SNPs for AFR.
Phasing and imputation approach summary