| Literature DB >> 25025158 |
Liuhong Chen1, Changxi Li2, Mehdi Sargolzaei3, Flavio Schenkel4.
Abstract
The aim of this study was to evaluate the impact of genotype imputation on the performance of the GBLUP and Bayesian methods for genomic prediction. A total of 10,309 Holstein bulls were genotyped on the BovineSNP50 BeadChip (50 k). Five low density single nucleotide polymorphism (SNP) panels, containing 6,177, 2,480, 1,536, 768 and 384 SNPs, were simulated from the 50 k panel. A fraction of 0%, 33% and 66% of the animals were randomly selected from the training sets to have low density genotypes which were then imputed into 50 k genotypes. A GBLUP and a Bayesian method were used to predict direct genomic values (DGV) for validation animals using imputed or their actual 50 k genotypes. Traits studied included milk yield, fat percentage, protein percentage and somatic cell score (SCS). Results showed that performance of both GBLUP and Bayesian methods was influenced by imputation errors. For traits affected by a few large QTL, the Bayesian method resulted in greater reductions of accuracy due to imputation errors than GBLUP. Including SNPs with largest effects in the low density panel substantially improved the accuracy of genomic prediction for the Bayesian method. Including genotypes imputed from the 6 k panel achieved almost the same accuracy of genomic prediction as that of using the 50 k panel even when 66% of the training population was genotyped on the 6 k panel. These results justified the application of the 6 k panel for genomic prediction. Imputations from lower density panels were more prone to errors and resulted in lower accuracy of genomic prediction. But for animals that have close relationship to the reference set, genotype imputation may still achieve a relatively high accuracy.Entities:
Mesh:
Year: 2014 PMID: 25025158 PMCID: PMC4099124 DOI: 10.1371/journal.pone.0101544
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Number of SNPs on each chromosome for all SNP panels used in the study.
| Chromosome | 50 k | 6 k | 3 k | L1536 | L768 | L384 |
| BTA1 | 2,291 | 369 | 160 | 97 | 49 | 24 |
| BTA2 | 1,856 | 331 | 136 | 85 | 42 | 21 |
| BTA3 | 1,792 | 284 | 120 | 77 | 38 | 19 |
| BTA4 | 1,731 | 285 | 125 | 75 | 37 | 19 |
| BTA5 | 1,504 | 283 | 117 | 76 | 38 | 19 |
| BTA6 | 1,750 | 283 | 115 | 74 | 37 | 18 |
| BTA7 | 1,510 | 260 | 108 | 68 | 34 | 17 |
| BTA8 | 1,631 | 280 | 112 | 71 | 35 | 18 |
| BTA9 | 1,427 | 255 | 108 | 65 | 33 | 16 |
| BTA10 | 1,496 | 254 | 103 | 64 | 32 | 16 |
| BTA11 | 1,590 | 268 | 104 | 67 | 33 | 17 |
| BTA12 | 1,157 | 211 | 88 | 52 | 26 | 13 |
| BTA13 | 1,236 | 206 | 86 | 51 | 25 | 13 |
| BTA14 | 1,206 | 207 | 82 | 49 | 25 | 12 |
| BTA15 | 1,200 | 202 | 83 | 51 | 25 | 13 |
| BTA16 | 1,049 | 194 | 77 | 47 | 24 | 12 |
| BTA17 | 1,137 | 185 | 72 | 46 | 23 | 12 |
| BTA18 | 973 | 168 | 66 | 40 | 20 | 10 |
| BTA19 | 977 | 163 | 57 | 39 | 20 | 10 |
| BTA20 | 1,072 | 191 | 72 | 46 | 23 | 11 |
| BTA21 | 947 | 174 | 73 | 42 | 21 | 10 |
| BTA22 | 912 | 160 | 66 | 37 | 19 | 9 |
| BTA23 | 800 | 146 | 52 | 32 | 16 | 8 |
| BTA24 | 895 | 165 | 63 | 39 | 20 | 10 |
| BTA25 | 746 | 136 | 46 | 26 | 13 | 7 |
| BTA26 | 752 | 139 | 46 | 31 | 15 | 8 |
| BTA27 | 714 | 131 | 47 | 30 | 15 | 7 |
| BTA28 | 700 | 119 | 43 | 28 | 14 | 7 |
| BTA29 | 739 | 128 | 53 | 31 | 16 | 8 |
| Total | 35,790 | 6,177 | 2,480 | 1,536 | 768 | 384 |
Number of bulls in different groups by birth year.
| Year of birth | Reference | Training | Validation |
| 1950–1954 | 4 | 0 | 0 |
| 1955–1959 | 7 | 1 | 0 |
| 1960–1964 | 17 | 5 | 0 |
| 1965–1969 | 13 | 3 | 0 |
| 1970–1974 | 15 | 11 | 0 |
| 1975–1979 | 25 | 15 | 0 |
| 1980–1984 | 76 | 29 | 0 |
| 1985–1989 | 437 | 86 | 0 |
| 1990–1994 | 382 | 140 | 0 |
| 1995–1999 | 2,557 | 843 | 0 |
| 2000–2003 | 1,936 | 475 | 0 |
| 2004–2007 | 0 | 0 | 3,232 |
| Total | 5,469 | 1,608 | 3,232 |
Bulls in the reference group were used only for genotype imputation.
Accuracy of genotype imputation under different scenarios.
| Scenario | 6 k | 3 k | L1536 | L768 | L384 |
| S1 | 0.9841 | 0.9604 | 0.9430 | 0.8787 | 0.7965 |
| S2 | 0.9792 | 0.9507 | 0.9300 | 0.8573 | 0.7617 |
| S3 | 0.9723 | 0.9367 | 0.9120 | 0.8285 | 0.7210 |
The reference population sizes used for imputation were n = 7,077, n = 4,741 and n = 2,406 for scenario S1, S2 and S3, respectively; 0%, 33%, and 66% of the training set in scenario S1, S2, and S3, respectively, and all bulls in the validation set were genotyped on the low density panel.
Accuracy of genomic prediction and posterior estimates of π using observed 50 k SNP genotypes under scenario S01.
| Trait | Accuracy | Posterior π | |
| GBLUP | Bayesian | ||
| Milk | 0.61 | 0.64 | 0.96 |
| Fat % | 0.64 | 0.75 | 0.99 |
| Protein % | 0.71 | 0.76 | 0.99 |
| SCS | 0.62 | 0.62 | 0.93 |
S0: All animals in the training and validation sets were genotyped on the 50 k SNP panel.
Figure 1SNP effects for fat percentage estimated from GBLUP and the Bayesian methods.
Accuracy of genomic prediction using imputed 50: Accuracies from using Bayesian model are parenthesized and accuracies for GBLUP are presented outside the parenthesis.
| Trait | Scenario | Low density SNP panel | ||||
| 6 k | 3 k | L1536 | L768 | L384 | ||
| Milk | S1 | 0.61 (0.64) | 0.60 (0.63) | 0.59 (0.61) | 0.55 (0.57) | 0.49 (0.50) |
| S2 | 0.61 (0.64) | 0.59 (0.62) | 0.58 (0.60) | 0.52 (0.54) | 0.44 (0.46) | |
| S3 | 0.61 (0.64) | 0.59 (0.61) | 0.57 (0.58) | 0.50 (0.52) | 0.39 (0.42) | |
| Fat % | S1 | 0.64 (0.75) | 0.63 (0.73) | 0.59 (0.63) | 0.54 (0.57) | 0.42 (0.42) |
| S2 | 0.64 (0.75) | 0.63 (0.73) | 0.56 (0.61) | 0.50 (0.53) | 0.37 (0.39) | |
| S3 | 0.64 (0.74) | 0.62 (0.72) | 0.53 (0.59) | 0.45 (0.51) | 0.32 (0.34) | |
| Protein % | S1 | 0.71 (0.76) | 0.70 (0.74) | 0.69 (0.72) | 0.65 (0.66) | 0.57 (0.57) |
| S2 | 0.71 (0.75) | 0.69 (0.73) | 0.68 (0.70) | 0.62 (0.63) | 0.53 (0.54) | |
| S3 | 0.71 (0.75) | 0.68 (0.72) | 0.67 (0.69) | 0.58 (0.59) | 0.47 (0.47) | |
| SCS | S1 | 0.62 (0.62) | 0.62 (0.62) | 0.61 (0.61) | 0.57 (0.57) | 0.53 (0.53) |
| S2 | 0.62 (0.62) | 0.61 (0.61) | 0.60 (0.60) | 0.54 (0.55) | 0.49 (0.50) | |
| S3 | 0.62 (0.61) | 0.60 (0.60) | 0.60 (0.60) | 0.53 (0.54) | 0.44 (0.46) | |
0%, 33%, and 66% of the training set in scenario S1, S2, and S3, respectively, and all bulls in the validation set were genotyped on the low density panel.
Imputation accuracy under scenario S1 for two SNPs with largest effects on fat percentage.
| SNP ID | Location | Low density SNP panel | ||||
| 6 k | 3 k | L1536 | L768 | L384 | ||
| ARS-BFGL-NGS-4939 | 443,937 | 1 | 0.9830 | 0.8688 | 0.8181 | 0.7249 |
| ARS-BFGL-NGS-57820 | 226,532 | 1 | 0.9802 | 0.8642 | 0.8165 | 0.7200 |
Bulls in the training were genotyped on the 50 k panel, and bulls in the validation set were genotyped on low density panels.
Locations of SNPs are shown as from the bovine genome assembly Btau4.2.
Accuracy of genomic prediction under scenario S1 for fat percentage by including actual genotypes from the two SNPs with largest effects into the low density SNP panel.
| Method | Low density SNP panel | ||||
| 6 k | 3 k | L1536 | L768 | L384 | |
| GBLUP | 0.64 | 0.63 | 0.60 | 0.56 | 0.44 |
| Bayesian | 0.75 | 0.74 | 0.73 | 0.70 | 0.64 |
Bulls in the training were genotyped on the 50 k panel, and bulls in the validation set were genotyped on low density panels.