| Literature DB >> 23406470 |
Hubert Pausch1, Bernhard Aigner, Reiner Emmerling, Christian Edel, Kay-Uwe Götz, Ruedi Fries.
Abstract
BACKGROUND: Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54,000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density.Entities:
Mesh:
Year: 2013 PMID: 23406470 PMCID: PMC3598996 DOI: 10.1186/1297-9686-45-3
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Number of validation animals without close relatives in the reference population
| no relatives with r ≥ 0.50 | 621 | 562 | 453 | 316 |
| no relatives with r ≥ 0.25 | 135 | 62 | 30 | 15 |
| no relatives with r ≥ 0.125 | 16 | 4 | - | - |
| no relatives with r ≥ 0.0625 | 5 | 2 | - | - |
The number of validation animals without close relatives in the reference population is presented for four different classes of relationship (r) and four scenarios with an increasing number of reference animals. Since most animals in our study were born between 1997 and 2004, the number of validation animals without close relatives in the reference population was very high across all scenarios.
Average number of relatives in the reference population
| 50 / 747 | 0.18 | 1.15 | 1.86 | 4.57 |
| 100 / 697 | 0.21 | 1.61 | 2.43 | 8.64 |
| 200 / 597 | 0.27 | 2.51 | 4.16 | 20.89 |
| 400 / 397 | 0.27 | 4.82 | 9.60 | 54.58 |
The average number of relatives in the reference population is given for the animals in the validation population for four classes of relationships (r) and four scenarios with an increasing number of reference animals. The average number of close relatives in the reference population was very small for most animals in the validation population.
Number of SNP used for the evaluation of imputation accuracy on six chromosomes
| 1 | 158.32 | 39 167 | 4042 | 2568 | 61 587 |
| 5 | 121.18 | 29 050 | 4171 | 1621 | 74 740 |
| 10 | 104.30 | 26 695 | 3906 | 1646 | 62 724 |
| 15 | 85.27 | 21 425 | 3978 | 1280 | 65 850 |
| 20 | 71.98 | 19 111 | 3764 | 1183 | 60 530 |
| 25 | 42.85 | 11 725 | 3648 | 744 | 57 533 |
Number of high-density SNP passing stringent quality parameters for the six evaluated chromosomes. The medium-density SNP are a subset of the bovineHD BeadChip collection that are interrogated with the BovineSNP50 BeadChip (version 2). SNP positions were determined based on the UMD3.1 assembly of the bovine genome.
Figure 1Imputation accuracy. Barplots indicate the correlation between true and imputed genotypes (rTG,IG) averaged over six chromosomes for an increasing reference population size. The black lines represent the minimum and maximum imputation accuracy for the six chromosomes.
Computing time for the imputation of high-density SNP on chromosomes 1, 15 and 25
| 50 / 747 | BTA1 | 2.67 h | 1.30 h (0.03 h / 0.30 h / 0.97 h) | 0.07 h | 0.17 h (0.03 h / 0.07 h / 0.07 h) |
| BTA15 | 1.18 h | 0.68 h (0.02 h / 0.14 h / 0.52 h) | 0.04 h | 0.09 h (0.02 h / 0.04 h / 0.03 h) | |
| BTA25 | 0.67 h | 0.37 h (0.01 h / 0.08 h / 0.28 h) | 0.04 h | 0.05 h (0.01 h / 0.02 h / 0.02 h) | |
| 100 / 697 | BTA1 | 3.93 h | 5.01 h (0.08 h / 1.11 h / 3.82 h) | 0.07 h | 0.27 h (0.08 h / 0.06 h / 0.13 h) |
| BTA15 | 2.48 h | 2.72 h (0.05 h / 0.55 h / 2.12 h) | 0.05 h | 0.15 h (0.05 h / 0.03 h / 0.07 h) | |
| BTA25 | 1.33 h | 1.48 h (0.03 h / 0.32 h / 1.13 h) | 0.04 h | 0.09 h (0.03 h / 0.02 h / 0.04 h) | |
| 200 / 597 | BTA1 | 4.49 h | 18.92 h (0.20 h / 4.31 h / 14.41 h) | 0.07 h | 0.48 h (0.20 h / 0.05 h / 0.23 h) |
| BTA15 | 2.87 h | 10.06 h (0.11 h / 2.22 h / 7.73 h) | 0.05 h | 0.27 h (0.11 h / 0.03 h / 0.13 h) | |
| BTA25 | 1.38 h | 5.76 h (0.06 h / 1.24 h / 4.45 h) | 0.04 h | 0.14 h (0.06 h / 0.01 h / 0.07 h) | |
| 400 / 397 | BTA1 | 3.73 h | 81.23 h (0.44 h / 21.97 h / 58.82 h) | 0.07 h | 1.1 h (0.44 h / 0.03 h / 0.63 h) |
| BTA15 | 2.45 h | 40.16 h (0.21 h / 10.52 h / 29.43 h) | 0.05 h | 0.56 h (0.21 h / 0.02 h / 0.33 h) | |
| BTA25 | 1.37 h | 28.30 h (0.11 h / 5.98 h / 22.21 h) | 0.04 h | 0.30 h (0.11 h / 0.01 h / 0.18 h) |
The number of imputed SNP was 36 599, 20 145 and 10 981 for chromosomes 1, 15 and 25, respectively. Computing was performed on an Intel Xeon 2.13 Ghz processor.
a The entire computing time for MaCH can be partitioned into three separate steps (in parentheses): pre-phasing of the reference population with Beagle, inference of tuning parameters based on 200 randomly selected animals of the validation population and actual genotype imputation with MaCH.
bfindhap.f90 was run exploiting the multi-threading option.
c The entire computing time for Minimac can be partitioned into three separate steps (in parentheses): pre-phasing of the reference population with Beagle, pre-phasing of the validation population with Beagle and actual genotype imputation with Minimac.
Evaluation of imputation accuracy
| 50 / 747 | 0.914 | 0.840 | 0.858 | 0.966 | 0.933 | 0.945 | 0.925 | 0.858 | 0.865 | 0.971 | 0.942 | 0.953 |
| 100 / 697 | 0.963 | 0.927 | 0.940 | 0.985 | 0.970 | 0.976 | 0.959 | 0.921 | 0.933 | 0.986 | 0.972 | 0.977 |
| 200 / 597 | 0.986 | 0.972 | 0.977 | 0.993 | 0.987 | 0.989 | 0.978 | 0.956 | 0.965 | 0.993 | 0.986 | 0.989 |
| 400 / 397 | 0.993 | 0.987 | 0.989 | 0.996 | 0.993 | 0.994 | 0.986 | 0.973 | 0.978 | 0.996 | 0.992 | 0.993 |
The mean allelic and genotypic accuracies over six chromosomes (BTA1, BTA5, BTA10, BTA15, BTA20, BTA25) were assessed for the imputed genotypes based on an increasing size of the reference population. Additionally, the correlation between true and imputed genotypes (rTG,IG) was calculated.
a a genotype is correctly imputed if both alleles are correctly imputed.
Figure 2Allelic imputation accuracy. The proportion of correctly imputed alleles is displayed as a function of allele frequencies for findhap.f90 (light grey), Beagle (dark grey), MaCH (blue) and Minimac (light blue) for an increasing reference population size. The curves were obtained by fitting a nonparametric local regression (LOESS).
Figure 3Individual imputation accuracy for the scenario with 50 reference animals. Barplots indicate the correlation between true and imputed genotypes (rTG,IG) for 747 animals based on 50 reference animals (A). The individual rTG,IG increased considerably as the number of close relatives increased (coefficient of relationship >0.12) in the reference population (B).
Imputation accuracy on chromosome 20 based on varying reference populations
| 0.866 | 0.854 | 0.841 | 0.864 | |
| 0.949 | 0.942 | 0.937 | 0.946 | |
| 0.876 | 0.837 | 0.812 | 0.856 | |
| 0.957 | 0.947 | 0.943 | 0.951 | |
The correlation between true and imputed genotypes (rTG,IG) based on the 50 most informative animals as reference population is compared with rTG,IG obtained with 50 randomly selected reference animals. The mean, minimum and maximum rTG,IG obtained with randomly selected reference animals are displayed across ten replications for the four imputation tools.
Figure 4Genome-wide distribution of the proportion of correctly imputed genotypes. Genotypes of 599 535 SNP were imputed for 397 animals based on haplotype information of 400 reference animals using Minimac. Blue dots represent 5039 SNP within regions of poor imputation quality probably representing misplaced SNP.