| Literature DB >> 27698363 |
Nab Raj Roshyara1,2, Katrin Horn1, Holger Kirsten1,2,3, Peter Ahnert1,2, Markus Scholz1,2.
Abstract
A variety of modern software packages are available for genotype imputation relying on advanced concepts such as pre-phasing of the target dataset or utilization of admixed reference panels. In this study, we performed a comprehensive evaluation of the accuracy of modern imputation methods on the basis of the publicly available POPRES samples. Good quality genotypes were masked and re-imputed by different imputation frameworks: namely MaCH, IMPUTE2, MaCH-Minimac, SHAPEIT-IMPUTE2 and MaCH-Admix. Results were compared to evaluate the relative merit of pre-phasing and the usage of admixed references. We showed that the pre-phasing framework SHAPEIT-IMPUTE2 can overestimate the certainty of genotype distributions resulting in the lowest percentage of correctly imputed genotypes in our case. MaCH-Minimac performed better than SHAPEIT-IMPUTE2. Pre-phasing always reduced imputation accuracy. IMPUTE2 and MaCH-Admix, both relying on admixed-reference panels, showed comparable results. MaCH showed superior results if well-matched references were available (Nei's GST ≤ 0.010). For small to medium datasets, frameworks using genetically closest reference panel are recommended if the genetic distance between target and reference data set is small. Our results are valid for small to medium data sets. As shown on a larger data set of population based German samples, the disadvantage of pre-phasing decreases for larger sample sizes.Entities:
Mesh:
Year: 2016 PMID: 27698363 PMCID: PMC5048136 DOI: 10.1038/srep34386
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Imputation Frameworks analysed: Frameworks differ with respect to usage of pre-phasing or admixed versus specific reference panels.
| Imputation software/framework | Pre-phasing | Use of admixed reference panel |
|---|---|---|
| MaCH-Minimac | Yes | No |
| SHAPEIT-IMPUTE2 | Yes | Yes |
| Mach-Admix | No | Yes |
| MaCH | No | No |
| IMPUTE2 | No | Yes |
We aim at comparing the impact of these features on imputation accuracy.
Figure 1Violin plot of Hellinger scores of genotypes imputed with five different frameworks.
Results of African-Americans (AfAm) population are shown. We present results for all imputed genotypes, and separately, for cases where best guess genotypes match true genotypes (correctly imputed) or not (wrongly imputed). A Hellinger score > = 0.45 almost always ensured that the best-guess genotype matches the true genotype.
Figure 2Scatterplot between average Hellinger score and Mach-Rsq/IMPUTE-info score for four different POPRES populations imputed with MaCH (using YRI reference panel), MaCH-Minimac (using YRI reference panel), MaCH-Admix, IMPUTE2 and SHAPEIT-IMPUTE2 (using admixed reference panels).
For the same Hellinger score, Info scores of SHAPEIT-IMPUTE2 are clearly inflated compared to IMPUTE2.
Figure 3Violin plot of posterior probabilities of best guess genotypes in AfAm population.
All imputation frameworks were used with default parameters and reference panels. SHAPEIT-IMPUTE2 shows considerably higher posterior probabilities for wrongly imputed SNPs.
Comparison of percentages of genotypes with good Hellinger scores (> = 0.45) obtained for 20 different POPRES samples with either MaCH, MaCH-Minimac, MaCH-Admix, IMPUTE2, or SHAPEIT-IMPUTE2.
| Population | MaCH and MaCH-Minimac framework(Best-matched Reference Panel) | Mixed Reference Panel | |||||
|---|---|---|---|---|---|---|---|
| Reference Panel | Nei’s | MaCH | MaCH-Minimac | MaCH-Admix | IMPUTE2 | SHAPEIT-IMPUTE2 | |
| Australian | CEU | 0.0078287 | 88.334* | 89.031* | 89.393 | 88.081* | |
| British | CEU | 0.0078541 | 89.189* | 89.973* | 90.231* | 88.547* | |
| Canadian | CEU | 0.0078631 | 88.583* | 89.603* | 89.702* | 87.985* | |
| Swiss.French | CEU | 0.0079978 | 88.495* | 89.098* | 89.153* | 87.864* | |
| French | CEU | 0.0080226 | 88.277* | 89.241* | 89.291* | 88.255* | |
| German | CEU | 0.0080485 | 88.81* | 89.478* | 89.703* | 88.338* | |
| Irish | CEU | 0.0081449 | 89.155* | 89.49* | 89.704* | 88.541* | |
| Swiss | CEU | 0.0082549 | 88.151* | 89.264* | 89.357* | 87.937* | |
| Belgians | CEU | 0.0084603 | 89.062* | 89.992 | 90.009 | 88.291* | |
| Swiss.German | CEU | 0.0086417 | 88.456* | 89.366* | 89.081* | 87.848* | |
| eastEU | CEU | 0.0088483 | 88.256* | 88.991* | 89.144 | 87.851* | |
| Portuguese | CEU | 0.0096742 | 88.569 | 87.34* | 88.410 | 87.675* | |
| Spanish | CEU | 0.0096786 | 87.909* | 89.023 | 88.985 | 87.706* | |
| Italian | CEU | 0.0105699 | 88.025* | 88.781 | 88.742 | 87.28* | |
| From Yugoslavia | CEU | 0.0108079 | 87.832* | 88.643* | 88.819 | 87.629* | |
| Mexican | MEX | 0.0108799 | 89.137* | 87.908* | 89.477* | 88.188* | |
| AfAm | YRI | 0.0188273 | 82.603* | 80.86* | 86.123 | 83.437* | |
| Punjabi | CEU | 0.0244462 | 86.767* | 86.257* | 87.951 | 87.14* | |
| Indian | CEU | 0.0247062 | 86.441* | 85.202* | 87.527 | 86.315* | |
| Japanese | CHB.JPT | 0.0330444 | 89.089* | 88.391* | 89.524* | 88.575* | |
For Imputation with MaCH and MaCH-Minimac framework, the best matched reference panels based on Nei’s G were selected. Nei’s G values and corresponding reference panels are also presented. Imputation frameworks with best results are marked with bold italic letter for each population and those scenarios which are significantly different from the best scenario are marked with an asterisk. McNemar’s test was used to determine significant differences of alternative scenarios to the best scenario.
Percentage of Genotypes with good Hellinger score (> = 0.45) for three imputation frameworks considering mixed reference panels:
| Country | MaCH-Admix | IMPUTE2 | SHAPEIT-IMPUTE2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Missing percentage | 50% | 70% | 100% | 50% | 70% | 100% | 50% | 70% | 100% |
| German | 90.26 | 90.06 | 91.1 | 89.2 | 89.17 | 88.88 | |||
| Swiss-German | 90.37 | 89.37 | 89.4 | 88.19 | 88.29 | 88.28 | |||
| Belgians | 91.34 | 90.5 | 90.23 | 89 | 88.83 | 88.63 | |||
| Spanish | 89.7 | 89.29 | 90.35 | 88.19 | 88.00 | 88.06 | |||
| French | 90.75 | 89.67 | 89.32 | 88.85 | 88.59 | 88.89 | |||
| Irish | 90.84 | 90.07 | 89.62 | 88.64 | 88.59 | 88.92 | |||
| Italian | 90.46 | 89.57 | 89.56 | 87.93 | 88.03 | 87.94 | |||
| Portuguese | 89.15 | 88.61 | 90.23 | 87.84 | 87.78 | 87.99 | |||
| Swiss-French | 90.77 | 89.76 | 89.39 | 89.02 | 88.69 | 88.71 | |||
| Swiss | 90.65 | 89.73 | 89.8 | 88.72 | 88.77 | 88.64 | |||
| British | 91.62 | 90.51 | 90.61 | 89.16 | 89.35 | 89.15 | |||
| FromYugoslavia | 90.1 | 89.15 | 88.87 | 88.23 | 87.83 | 88.09 | |||
| Canadian | 90.23 | 90.08 | 91.32 | 89.08 | 88.86 | 88.7 | |||
| Mexican | 91.49 | 90.64 | 90.37 | 89.54 | 89.42 | 89.07 | |||
| Australian | 90.91 | 89.7 | 89.29 | 88.62 | 89.08 | 88.68 | |||
| Japanese | 91.7 | 90.59 | 90.34 | 89.84 | 89.86 | 89.76 | |||
| AfAm | 86.25 | 87.85 | 86.87 | 83.49 | 83.36 | 83.8 | |||
| Punjabi | 89.14 | 88.91 | 89.95 | 87.69 | 88.03 | 88.04 | |||
| Indian | 89.61 | 88.44 | 88.78 | 87.37 | 87.36 | 87.27 | |||
| eastEU | 89.6 | 89.27 | 90.8 | 88.17 | 88.21 | 88.17 | |||
20 Popres population were studied. Different percentages of HQ-SNPs were masked (50%, 70%, and 100%) and re-imputed. The best software framework for each population and degree of missingness is presented in bold italic letters. An asterisk (*) indicates whether the other software frameworks perform significantly worse for the corresponding missingness scenario.
Percentage of most likely genotypes which agree with the original genotypes for three imputation frameworks considering mixed reference panels:
| Software-> | MaCH-Admix | IMPUTE2 | SHAPEIT-IMPUTE2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| German | 91.35 | 90.64 | 90.26 | 89.03 | 89.29 | 88.76 | |||
| Swiss-German | 91.09 | 90.20 | 89.8 | 88.22 | 88.31 | 88.33 | |||
| Belgians | 91.44 | 90.47 | 90.08 | 88.54 | 88.44 | 88.16 | |||
| Spanish | 90.56 | 89.99 | 89.84 | 87.96 | 87.87 | 87.98 | |||
| French | 91.12 | 90.07 | 89.84 | 88.85 | 88.62 | 88.82 | |||
| Irish | 91.23 | 90.43 | 90.20 | 88.53 | 88.47 | 88.71 | |||
| Italian | 90.80 | 89.77 | 89.97 | 87.98 | 88.07 | 87.98 | |||
| Portuguese | 90.34 | 89.55 | 89.34 | 87.63 | 87.63 | 87.93 | |||
| Swiss-French | 91.21 | 90.24 | 90.02 | 88.86 | 88.74 | 88.73 | |||
| Swiss | 90.40 | 91.29 | 90.13 | 88.59 | 88.7 | 88.58 | |||
| British | 91.92 | 91.18 | 91.03 | 89.16 | 89.24 | 89.03 | |||
| FromYugoslavia | 90.51 | 89.64 | 89.53 | 88.24 | 87.81 | 87.98 | |||
| Canadian | 91.52 | 90.94 | 90.57 | 89.07 | 88.78 | 88.6 | |||
| Mexican | 91.82 | 91.02 | 90.98 | 89.27 | 89.19 | 88.84 | |||
| Australian | 90.01 | 91.20 | 90.45 | 88.41 | 88.92 | 88.46 | |||
| Japanese | 90.97 | 91.7 | 91.19 | 89.52 | 89.5 | 89.43 | |||
| AfAm | 88.02 | 87.24 | 86.79 | 83.48 | 83.42 | 83.69 | |||
| Punjabi | 90.28 | 89.73 | 89.47 | 87.62 | 88.07 | 88.05 | |||
| Indian | 90.22 | 89.21 | 89.14 | 87.28 | 87.29 | 87.17 | |||
| eastEU | 91.10 | 90.07 | 89.75 | 88.07 | 88.17 | 88.11 | |||
20 Popres population were studied. Different percentages of HQ-SNPs were masked (50%, 70%, and 100%). The best software framework for each population and degree of missingness is presented in bold italic letters. An asterisk (*) indicates whether the other software frameworks perform significantly worse for the corresponding missingness scenario.
Percentage of genotypes with good Hellinger score (> = 0.45) for imputation frameworks with pre-phasing strategy:
| Country | Genetic similarity | MaCH-Minimac | SHAPEIT-IMPUTE2 | |||||
|---|---|---|---|---|---|---|---|---|
| Reference Panel | Nei’s | 50% | 70% | 100% | 50% | 70% | 100% | |
| Australian | CEU | 0.0078287 | 88.67 | 88.62 | 89.08 | |||
| British | CEU | 0.0078541 | 89.16 | 89.35 | 89.15 | |||
| Canadian | CEU | 0.0078631 | 89.08 | 88.86 | 88.7 | |||
| Swiss.French | CEU | 0.0079978 | 89.06 | 88.69 | 88.71 | |||
| French | CEU | 0.0080226 | 88.56 | 88.85 | 88.59 | |||
| German | CEU | 0.0080485 | 89.20 | 89.17 | 88.88 | |||
| Irish | CEU | 0.0081449 | 88.64 | 88.59 | 88.92 | |||
| Swiss | CEU | 0.0082549 | 88.73 | 88.77 | 88.64 | |||
| Belgians | CEU | 0.0084603 | 88.10 | 88.83 | 88.64 | |||
| Swiss.German | CEU | 0.0086417 | 88.19 | 88.29 | 88.28 | |||
| eastEU | CEU | 0.0088483 | 88.18 | 88.21 | 88.17 | |||
| Portuguese | CEU | 0.0096742 | 87.84 | 87.78 | ||||
| Spanish | CEU | 0.0096786 | 88.19 | 87.1 | 88.06 | |||
| Italian | CEU | 0.0105699 | 87.93 | 88.03 | 87.94 | |||
| From Yugoslavia | CEU | 0.0108079 | 88.23 | 87.83 | 88.09 | |||
| Mexican | MEX | 0.0108799 | 89.33 | 88.78 | 89.54 | |||
| AfAm | YRI | 0.0188273 | 83.11 | 81.92 | 81.72 | |||
| Punjabi | CEU | 0.0244462 | 87.39 | 86.98 | 87.69 | |||
| Indian | CEU | 0.0247062 | 86.79 | 86.13 | 87.37 | |||
| Japanese | CHB.JPT | 0.0330444 | 88.88 | 89.06 | 89.71 | |||
The rows of the table are arranged with increasing order of genetic distance between target population and best matched reference. Different percentages of HQ-SNPs were masked (50%, 70%, and 100%). The best software framework for each population and degree of missingness is presented in bold italic letters. An asterisk (*) indicates whether the other software framework perform significantly worse for the corresponding scenario. MaCH-Minimac tends to be advantageous for small distances between target and reference population and for lower percentages of missingness.
Percentage of well-imputed best-guess genotypes for two imputation frameworks relying on pre-phasing.
| Country | | MaCH-Minimac | SHAPEIT-IMPUTE2 | |||||
|---|---|---|---|---|---|---|---|---|
| Best matched reference | Nei’s | 50% | 70% | 100% | 50% | 70% | 100% | |
| Australian | CEU | 0.0078287 | 88.77 | 89.28 | 88.82 | |||
| British | CEU | 0.0078541 | 89.34 | 89.41 | 89.20 | |||
| Canadian | CEU | 0.0078631 | 89.28 | 88.98 | 88.81 | |||
| Swiss.French | CEU | 0.0079978 | 89.05 | 88.93 | 88.92 | |||
| French | CEU | 0.0080226 | 89.06 | 88.83 | 89.02 | |||
| German | CEU | 0.0080485 | 89.16 | 89.43 | 88.88 | |||
| Irish | CEU | 0.0081449 | 88.83 | 88.76 | 89.00 | |||
| Swiss | CEU | 0.0082549 | 88.83 | 88.94 | 88.82 | |||
| Belgians | CEU | 0.0084603 | 89.14 | 89.04 | 88.76 | |||
| Swiss.German | CEU | 0.0086417 | 88.37 | 88.45 | 88.48 | |||
| eastEU | CEU | 0.0088483 | 88.22 | 88.32 | 88.27 | |||
| Portuguese | CEU | 0.0096742 | 87.88 | 87.88 | 88.18 | |||
| Spanish | CEU | 0.0096786 | 88.16 | 88.073 | 88.18 | |||
| Italian | CEU | 0.0105699 | 88.12 | 88.19 | 88.11 | |||
| From Yugoslavia | CEU | 0.0108079 | 88.45 | 88.01 | 88.18 | |||
| Mexican | MEX | 0.0108799 | 89.58 | 89.51 | 89.15 | |||
| AfAm | YRI | 0.0188273 | 82.79 | 82.72 | 83.66 | |||
| Punjabi | CEU | 0.0244462 | 88.03 | 87.8 | 88.25 | |||
| Indian | CEU | 0.0247062 | 87.24 | 87.52 | 87.53 | |||
| Japanese | CHB.JPT | 0.0330444 | 89.80 | 89.76 | 89.75 | |||
The rows of the table are arranged with increasing order of genetic distance between target population and best matched reference measured by Nei’s G. Different percentages of HQ-SNPs were masked (50%, 70%, 100%). The best software framework for each population and degree of missingness is presented in bold italic letter. An asterisk (*) indicates whether the other software framework perform significantly worse for the corresponding scenario.
Dependence of imputation accuracy on sample size studied in LIFE-Adult.
| Reference Panel | MaCH and MaCH-Minimac framework (Best-matched Reference Panel) | Mixed Reference Panel | |||
|---|---|---|---|---|---|
| MaCH | MaCH-Minimac | MaCH-Admix | IMPUTE2 | SHAPEIT-IMPUTE2 | |
| CEU | CEU | Mixed | Mixed | Mixed | |
| Sample size | |||||
| 40 | 90.05 | 90.86 | 92.23 | 90.06 | |
| 100 | 91.38 | 90.83 | 92.27 | 91.11 | |
| 250 | 91.86 | 90.64 | 92.27 | 91.57 | |
| 500 | 92.29 | 91.80 | 90.30 | 91.69 | |
| 1000 | 92.31 | 91.86 | 90.18 | 91.83 | |
| 2500 | 92.18 | 91.90 | 89.47 | 91.96 | |
Percentages of genotypes with good Hellinger scores (> = 0.45) were analysed. Frameworks showing best performance are written with italic bold letters and the frameworks showing significantly lower performance than the best one are marked with an asterisk (*).