| Literature DB >> 24490796 |
Aniek C Bouwman1, John M Hickey, Mario P L Calus, Roel F Veerkamp.
Abstract
BACKGROUND: Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives.Entities:
Mesh:
Year: 2014 PMID: 24490796 PMCID: PMC3929150 DOI: 10.1186/1297-9686-46-6
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Description of testing and reference sets for each scenario and different testing sets
| Testing set | 1,344 | 1,344 | 8052 | 805 | 805 | 805 |
| Both parents | 14 | 43 | 25 | 25 | 25 | 25 |
| SireMGS | 62 | 1,258 | 756 | 756 | 756 | 756 |
| DamPGS | 6 | 0 | 0 | 0 | 0 | 0 |
| Sire | 241 | 24 | 15 | 15 | 15 | 15 |
| Dam | 23 | 0 | 0 | 0 | 0 | 0 |
| Other | 998 | 19 | 9 | 9 | 9 | 9 |
| Reference set | 4,079 | 4,716 | 4,716 | 5,521 | 6,326 | 7,936 |
| At least one offspring in reference2 | 539 | 539 | 0 | 805 | 805 | 805 |
1Genotypes of sires and maternal grandsires of phenotyped individuals not already included in the real scenario were added; 2within the testing set of 1344 cows, 539 had at least one offspring genotyped and were removed from the testing set for the offspring scenarios, hence there were only 805 cows in the testing set in Off0 to Off4.
Average imputation accuracy (r) from segregation analysis of 1344 individuals for scenarios Real and SireMGS for different categories of individuals
| | ||||||
|---|---|---|---|---|---|---|
| Both parents | 14 | 0.72 | 0.08 | 43 | 0.73 | 0.07 |
| SireMGS | 62 | 0.61 | 0.13 | 1,258 | 0.60 | 0.12 |
| DamPGS | 6 | 0.63 | 0.08 | 0 | - | - |
| Sire | 241 | 0.59 | 0.13 | 24 | 0.54 | 0.14 |
| Dam | 23 | 0.70 | 0.09 | 0 | - | - |
| Other | 998 | 0.42 | 0.22 | 19 | 0.35 | 0.28 |
| Total | 1,344 | 0.47 | 0.22 | 1,344 | 0.60 | 0.13 |
| runcorrected1 | 1,344 | 0.80 | 0.06 | 1,344 | 0.84 | 0.04 |
Average imputation accuracies were split into categories of the closest recent ancestor genotyped and calculated over 10 replications; 1mean of imputation accuracy calculated as the correlation between true genotypes and imputed genotype dosages, both uncorrected for the mean gene content; 2standard deviation.
Average imputation accuracy (r) of 805 individuals with varying offspring information for different categories of individuals
| | | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Both parents | 25 | 0.70 | 0.66 | 0.78 | 0.78 | 0.84 | 0.83 | 0.93 | 0.94 |
| SireMGS | 756 | 0.57 | 0.54 | 0.73 | 0.68 | 0.82 | 0.77 | 0.92 | 0.88 |
| DamPGS | 0 | - | - | - | - | - | - | - | - |
| Sire | 15 | 0.50 | 0.48 | 0.69 | 0.68 | 0.80 | 0.78 | 0.93 | 0.87 |
| Dam | 0 | - | - | - | - | - | - | - | - |
| Other | 9 | 0.13 | 0.13 | 0.61 | 0.61 | 0.77 | 0.77 | 0.91 | 0.91 |
| Total1 | 805 | 0.57 | 0.54 | 0.73 | 0.68 | 0.82 | 0.77 | 0.92 | 0.88 |
| runcorrected2 | 805 | 0.83 | 0.81 | 0.88 | 0.86 | 0.92 | 0.89 | 0.96 | 0.95 |
Results from both AlphaImpute with phasing (phased) and only segregation analysis (segregation) are reported; average imputation accuracies were split into categories of the closest recent ancestor genotyped and calculated over 10 replications; 1for AlphaImpute phased, the standard deviations for Total were 0.16, 0.13, 0.13, and 0.10 for scenarios Off0, Off1, Off2, and Off4, respectively; for segregation analyses, the standard deviations for Total were 0.12, 0.07, 0.07, and 0.05 for scenarios Off0, Off1, Off2, and Off4, respectively; 2mean of imputation accuracy calculated as the correlation between true genotypes and imputed genotype dosages, both uncorrected for the mean gene content.
Average percentage of correct, incorrect or not imputed genotypes per individual for each scenario and for different categories of individuals
| | | | | | | | ||
| Real | Correct | 52.9 | 20.3 | 33.6 | 16.8 | 45.9 | 11.9 | 14.3 |
| | Incorrect | 0.0 | 0.2 | 0.4 | 0.2 | 0.7 | 0.2 | 0.2 |
| | Not imputed | 47.0 | 79.4 | 66.0 | 82.9 | 53.4 | 87.8 | 85.5 |
| SireMGS | Correct | 52.8 | 17.9 | - | 9.0 | - | 7.2 | 18.7 |
| | Incorrect | 0.0 | 0.1 | - | 0.1 | - | 0.2 | 0.1 |
| | Not imputed | 47.1 | 81.9 | - | 90.8 | - | 92.6 | 81.2 |
| Off0 | Correct | 51.1 | 14.4 | - | 6.6 | - | 6.5 | 15.3 |
| | Incorrect | 0.0 | 0.1 | - | 0.1 | - | 0.2 | 0.1 |
| | Not imputed | 48.8 | 85.4 | - | 93.2 | - | 93.2 | 84.6 |
| Off1 | Correct | 57.9 | 30.3 | - | 19.4 | - | 7.2 | 30.8 |
| | Incorrect | 0.2 | 1.15 | - | 1.3 | - | 0.1 | 1.1 |
| | Not imputed | 41.7 | 68.4 | - | 79.2 | - | 92.6 | 68.1 |
| Off2 | Correct | 61.6 | 39.3 | - | 31.0 | - | 11.9 | 39.6 |
| | Incorrect | 0.2 | 0.9 | - | 1.4 | - | 0.2 | 0.8 |
| | Not imputed | 38.1 | 59.7 | - | 67.5 | - | 87.9 | 59.6 |
| Off4 | Correct | 67.5 | 50.7 | - | 46.3 | - | 35.7 | 51.0 |
| | Incorrect | 0.1 | 0.3 | - | 0.2 | - | 0.2 | 0.3 |
| | Not imputed | 32.4 | 48.9 | - | 53.4 | - | 64.0 | 48.7 |
| | | | | | | | ||
| Off0 | Correct | 40.8 | 12.4 | - | 4.9 | - | 4.7 | 13.1 |
| | Incorrect | 15.3 | 4.3 | - | 1.6 | - | 1.7 | 4.6 |
| | Not imputed | 43.8 | 83.2 | - | 93.3 | - | 93.5 | 82.3 |
| Off1 | Correct | 71.8 | 43.4 | - | 22.1 | - | 7.0 | 43.5 |
| | Incorrect | 4.0 | 6.2 | - | 3.3 | - | 0.1 | 6.0 |
| | Not imputed | 24.1 | 50.3 | - | 74.6 | - | 92.8 | 50.5 |
| Off2 | Correct | 73.4 | 60.1 | - | 37.1 | - | 11.8 | 59.6 |
| | Incorrect | 1.5 | 5.2 | - | 4.2 | - | 0.2 | 5.0 |
| | Not imputed | 25.0 | 34.6 | - | 58.7 | - | 87.9 | 35.4 |
| Off4 | Correct | 87.4 | 78.4 | - | 61.7 | - | 35.9 | 78.0 |
| | Incorrect | 0.5 | 2.7 | - | 4.1 | - | 0.2 | 2.6 |
| Not imputed | 12.1 | 18.8 | - | 34.2 | - | 63.8 | 19.4 | |
For the offspring scenarios, results from both AlphaImpute with phasing (phased) and only segregation analysis (segregation) are given; average percentages were split into categories of the closest recent ancestor genotyped and calculated over 10 replications.
Theoretically predicted imputation accuracy based on selection index theory
| | ||||
|---|---|---|---|---|
| Both parents | 0.71 | 0.76 | 0.79 | 0.84 |
| SireMGS/DamPGS | 0.56 | 0.66 | 0.73 | 0.80 |
| Sire/Dam | 0.50 | 0.63 | 0.71 | 0.79 |
Predicted for situations with both parents genotyped, one parent and one grandparent genotyped (SireMGS/DamPGS), or one parent genotyped (Sire/Dam) and with different numbers of genotyped offspring.
Figure 1Imputation accuracy by SNP (r) plotted against the minor allele frequency (MAF) for each scenario. Imputation accuracy was defined as the correlation of true genotypes with imputed genotype dosages by SNP and was calculated across 10 replicates (2000 SNPs × 10 replicates) for scenario Real (A), SireMGS (B), Off0 (C), Off1 (D), Off2 (E), and Off4 (F). The blue curves were obtained by fitting a nonparametric local regression (LOESS).
Figure 2Percentage of (in)correct and not imputed genotypes by SNP, plotted against the minor allele frequency (MAF) for each scenario. Percentages of correctly imputed genotypes (in black), incorrectly imputed genotypes (in dark grey), and not imputed (in light grey) genotypes by SNP, plotted against MAF for scenario Real (A), SireMGS (B), Off0 (C), Off1 (D), Off2 (E), and Off4 (F) and calculated across 10 replicates (2000 SNPs × 10 replicates).