| Literature DB >> 30804980 |
Haiko Schurz1,2, Stephanie J Müller1,2, Paul David van Helden1, Gerard Tromp1,2, Eileen G Hoal1, Craig J Kinnear1, Marlo Möller1.
Abstract
Genotype imputation is a powerful tool for increasing statistical power in an association analysis. Meta-analysis of multiple study datasets also requires a substantial overlap of SNPs for a successful association analysis, which can be achieved by imputation. Quality of imputed datasets is largely dependent on the software used, as well as the reference populations chosen. The accuracy of imputation of available reference populations has not been tested for the five-way admixed South African Colored (SAC) population. In this study, imputation results obtained using three freely-accessible methods were evaluated for accuracy and quality. We show that the African Genome Resource is the best reference panel for imputation of missing genotypes in samples from the SAC population, implemented via the freely accessible Sanger Imputation Server.Entities:
Keywords: 1000 Genomes; AGR; African; CAAPA; accuracy; admixture; imputation; quality
Year: 2019 PMID: 30804980 PMCID: PMC6370942 DOI: 10.3389/fgene.2019.00034
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Haplotype phasing and genotype imputation methods used.
| Protocol number | Server | Reference Panel | Phasing software | Imputation software |
|---|---|---|---|---|
| 1 | In-house | 1000G | ShapeITv2 | IMPUTE2 |
| 2 | Sanger Imputation Server | 1000G | ShapeITv2 | PBWT |
| 3 | Sanger Imputation Server | AGR1 | ShapeITv2 | PBWT |
| 4 | Michigan Imputation Server | 1000G | ShapeITv2 | Minimac3 |
| 5 | Michigan Imputation Server | CAAPA2 | ShapeITv2 | Minimac3 |
Number of imputed variants and variants overlapping with MEGA as well as the percentage of calls that did not reach the genotype calling threshold (0.7). Imputed number of SNPs is given in millions and Overlapping number is given per ten thousand.
| Method | Reference | Autosomes | X chromosome | % No calls | ||
|---|---|---|---|---|---|---|
| Imputed1 | Overlap2 | Imputed1 | Overlap2 | |||
| In-house | 1000G | 57.8 | 71.8 | 2.5 | 3.98 | 25.46 |
| SIS | 1000G | 48.7 | 46.7 | 1.7 | 1.01 | 35.89 |
| AGR | 50.5 | 60.6 | 1.6 | 1.43 | 44.18 | |
| MIS | 1000G | 28.6 | 47.8 | 1.3 | 2.79 | 35.22 |
| CAAPA | 16.9 | 34.3 | NA | NA | 43.40 | |
Genome wide error rate and accuracy of imputation on the autosomes and X chromosome.
| Method | Reference | Accuracy in overlap (%) | GW Error rate in overlap (%) | |
|---|---|---|---|---|
| Autosomes | X chromosome | |||
| In-house | 1000G | 88.00 | 87.93 | 11.98 |
| SIS | 1000G | 87.15 | 88.12 | 12.83 |
| SIS | AGR | 89.27 | 90.21 | 10.70 |
| MIS | 1000G | 83.68 | 69.89 | 17.084 |
| MIS | CAAPA | 62.39 | NA | 37.61 |
Number of SNPs and accompanying median quality score for the three categories, within the MEGA overlapping region.
| Method | Reference | Autosomes | X chromosome | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | Half | No | Total | Half | No | ||||||||
| In-house | 1000G | 632a | 0.78 | 38a | 0.36 | 48a | 0.89 | 35a | 0.73 | 2.7a | 0.37 | 2.1a | 0.83 |
| SIS1 | 1000G | 407 | 0.79 | 25 | 0.46 | 35 | 0.87 | 8.9 | 0.8 | 0.5 | 0.56 | 0.7 | 0.88 |
| AGR | 541 | 0.79 | 23 | 0.5 | 42 | 0.89 | 12.9 | 0.83 | 0.6 | 0.6 | 0.8 | 0.89 | |
| MIS1 | 1000G | 400 | 0.69 | 45 | 0.11 | 33 | 0.83 | 19.5 | 0.57 | 7.1 | 0.08 | 1.3 | 0.70 |
| CAAPA | 214 | 0.68 | 105 | 0.03 | 24 | 0.76 | NA | ||||||
FIGURE 1Mean quality score for all variants in a certain MAF range for all imputed datasets.
FIGURE 2Distribution of the number of imputed SNPs by quality score for (A) chromosome 1 and (B) the X chromosome.