| Literature DB >> 25926276 |
Tingting Wang1,2,3, Yi-Ping Phoebe Chen4, Michael E Goddard5,6,7, Theo H E Meuwissen8, Kathryn E Kemper9, Ben J Hayes10,11,12.
Abstract
BACKGROUND: Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation-maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time.Entities:
Mesh:
Year: 2015 PMID: 25926276 PMCID: PMC4415253 DOI: 10.1186/s12711-014-0082-4
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Numbers of Holstein bulls in the reference and validation sets for functional traits and production traits
|
|
| |
|---|---|---|
| Milk | 3049 | 262 |
| Protein | 3049 | 262 |
| Fertility | 2806 | 396 |
| Protein% | 3049 | 262 |
| Fat% | 3049 | 262 |
| Angularity | 1484 | 251 |
| Mammary conformation | 1484 | 251 |
| Stature | 1484 | 251 |
| Somatic cell count | 2662 | 410 |
Figure 1Convergence of estimated SNP effects, error variance and Pr over 5000 iterations. The x axis represents the number of iterations that range from 0 to 5000; the y axis represents the estimated SNP effects, error variance and the first element of Pr (the proportion of SNPs in the distribution with zero variance).
Figure 2Correlation between SNP effects from BayesR and emBayesR SNP effects in four replicates of HD_Mix_45 (h = 0.45). The x axis represents the BayesR estimates of SNP effect; blue line plots emBayesR estimates of SNP effects on BayesR estimates of SNP effects; black line plots BayesR estimates of SNP effects on themselves for four replicates of HD_Mix with a heritability of 0.45.
Figure 3Estimates of SNP effects from BayesR and emBayesR compared with their true effects in one replicate of HD_Mix_45 (HD_Mix_45_2). The x axis represents true effects; blue curve plots BayesR estimates of SNP effects on true effects; red line plots emBayesR estimates of SNP effects on true effects; the black line plots true effects on themselves for one replicate of simulated data HD_Mix with a heritability of 0.45 (HD_Mix_45_2).
Figure 4Estimates of SNP effects from SNP-BLUP, BayesR, emBayesR, FastBayesB against their least square estimates. The x axis represents the least square estimates of SNP effects; blue line plots BayesR estimates of SNP effects on the least square estimates; red line represents emBayesR SNP effect estimates; dotted green line represents the fastBayesB estimates of SNP effects; black line represents SNP_BLUP estimates of SNP effects for HD_Mix_45.
Estimated mixing proportions (Pr) from BayesR and emBayesR in the 10 k simulation data (HD_Mix_45)
|
| ||
|---|---|---|
|
| ||
|
|
| |
|
| [0.9865 0.0110 0.0010 0.0015] | [0.9813 0.0163 0.0009 0.0015] |
|
| [0.9861 0.0127 0.0004 0.0008] | [0.9852 0.0136 0.0003 0.0009] |
|
| [0.9933 0.0046 0.0009 0.0012] | [0.9899 0.0083 0.0005 0.0012] |
|
| [0.9909 0.0055 0.0022 0.0015] | [0.9864 0.0110 0.0010 0.0016] |
|
| [0.9944 0.0043 0.0006 0.0007] | [0.9910 0.0078 0.0005 0.0007] |
|
| ||
|
| ||
|
|
| |
|
| [0.9759 0.0021 0.0024 0.0010] |
|
|
| [0.9624 0.0343 0.0025 0.0009] |
|
|
| [0.9757 0.0022 0.0018 0.0008] |
|
|
| [0.9620 0.0334 0.0032 0.0014] |
|
|
| [0.9664 0.0295 0.0023 0.0018] |
|
Estimated mixing proportions (Pr) from BayesR and emBayesR for the 630 k real dairy cattle data
|
|
| |
|---|---|---|
| Milk | [0.99291 0.00690 0.00018 0.00001] |
|
| Protein | [0.99161 0.00831 0.00005 0.00003] |
|
| Fertility | [0.98863 0.01034 0.00092 0.00011] |
|
| Protein% | [0.99602 0.00378 0.00019 0.00001] |
|
| Fat% | [0.99480 0.00485 0.00021 0.00014] |
|
| Angularity | [0.99221 0.00739 0.00039 0.00001] |
|
| Mammary conformation | [0.99091 0.00859 0.00047 0.00003] |
|
| Stature | [0.99013 0.00927 0.00052 0.00008] |
|
| Somatic cell count | [0.98688 0.01272 0.00039 0.00001] |
|
Pr estimates (proportion of SNP in each distribution) with different prior values α for the HD_Mix_45 simulated data
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| (1, 1, 1, 1) | 0.9861 | 0.0127 | 0.0004 | 0.0008 |
| (1, 1, 1, 100) | 0.9801 | 0.0130 | 0.0042 | 0.0027 |
| (1, 1, 100, 1) | 0.9863 | 0.0101 | 0.0028 | 0.0008 |
| (100,1, 1, 1) | 0.9883 | 0.0105 | 0.0003 | 0.0009 |
The prior α was (1, 1, 1, 1), (1, 1, 1, 100), (1, 100, 1, 1) or (100, 1, 1, 1).
Accuracy of genomic prediction from emBayesR_without_PEV and emBayesR on HD_Mix dataset
|
| |||||
|---|---|---|---|---|---|
|
| Rep 1 | Rep 2 | Rep 3 | Rep 4 | Rep 5 |
| emBayesR_without_PEV | 0.91 | 0.90 | 0.85 | 0.90 | 0.91 |
| emBayesR | 0.97 | 0.96 | 0.93 | 0.97 | 0.97 |
|
| Rep 1 | Rep 2 | Rep 3 | Rep 4 | Rep 5 |
| emBayesR_without_PEV | 0.89 | 0.82 | 0.87 | 0.81 | 0.79 |
| emBayesR | 0.91 | 0.87 | 0.93 | 0.86 | 0.87 |
Figure 5Comparison of SNP effect estimates from emBayesR with and without accounting for PEV with estimates from BayesR. A: The x axis represents BayesR estimates of SNP effects; blue line plots emBayesR estimates of SNP effects on BayesR estimates of SNP effects; red line plots emBayesR_Without_PEV estimates of SNP effect on BayesR estimates of SNP effects; black line plots BayesR estimates of SNP effects against themselves. B: The x axis represents true effects; blue line plots BayesR estimates of SNP effects on true effects; green line plots emBayesR estimates of SNP effects on true effect; red line plots emBayesR_without_PEV estimates of SNP effects on true effects; black line plots true effects against themselves.
Accuracy of genomic prediction using the algorithm posterior mode (emBayesR_Mode, Equation 8a) or posterior mean estimates of SNP effects (emBayesR_Mean, Equation 8b), in the HD_Mix dataset
|
| |||||
|---|---|---|---|---|---|
|
| Rep 1 | Rep 2 | Rep 3 | Rep 4 | Rep 5 |
| emBayesR_Mode | 0.97 | 0.96 | 0.93 | 0.97 | 0.97 |
| emBayesR_Mean | 0.97 | 0.95 | 0.93 | 0.97 | 0.97 |
|
| Rep 1 | Rep 2 | Rep 3 | Rep 4 | Rep 5 |
| emBayesR_Mode | 0.91 | 0.87 | 0.93 | 0.86 | 0.87 |
| emBayesR_Mean | 0.91 | 0.88 | 0.93 | 0.87 | 0.87 |
Accuracy of genomic prediction and the regression coefficient of true breeding value (TBV) on genomic estimated breeding value (GEBV) for different methods for the HD_Mix simulated dataset
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
|
|
|
|
| |
| BayesR | 0.97 ± 0.01 | 0.89 ± 0.03 | 1.02 ± 0.02 | 1.00 ± 0.05 |
| emBayesR |
|
|
|
|
Accuracy of genomic prediction from GBLUP, BayesR, fastBayesB and emBayesR for the 630 K dairy cattle data for production and functional traits
|
| |||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| GBLUP | 0.57 | 0.63 | 0.40 | 0.63 | 0.77 |
| BayesR | 0.63 | 0.64 | 0.41 | 0.79 | 0.83 |
| FastBayesB | 0.57 | 0.60 | 0.35 | 0.70 | 0.80 |
| emBayesR |
|
|
|
|
|
|
| |||||
|
|
|
|
| ||
| GBLUP | 0.45 | 0.28 | 0.47 | 0.71 | |
| BayesR | 0.44 | 0.28 | 0.47 | 0.71 | |
| FastBayesB | 0.39 | 0.25 | 0.43 | 0.61 | |
| emBayesR |
|
|
|
| |
Figure 6Accuracy of genomic prediction and running time for BayesR with an increasing number of iterations.
Number of iterations required for emBayesR and fastBayesB to reach convergence for five traits with the 630 K dairy cattle data
|
|
|
|
|
| |
|---|---|---|---|---|---|
| emBayesR |
|
|
|
|
|
| FastBayesB | 410 | 540 | 856 | 848 | 564 |
Figure 7Computational time required for BayesR, emBayesR and FastBayesB on a range of SNP chips (10 K, 50 K and 630 K). The x axis represents the different sizes of the SNP chips, y axis is the computational time in minutes; blue bar is BayesR’s running time; red bar is emBayesR’s; green bar is FastBayesB’s computing time.
Estimated mixing proportions (Pr) and genomic prediction accuracy from BayesR, emBayesR and GBLUP with the HD_Mix_45 and HD_One_45 datasets
|
| ||
|---|---|---|
|
|
| |
| True | [0.9950 0.0017 0.0016 0.0017] | |
| BayesR | [0.9861 0.0127 0.0004 0.0008] | 0.97 |
| emBayesR | [0.9852 0.0136 0.0003 0.0009] | 0.97 |
| GBLUP | - | 0.67 |
|
| ||
|
|
| |
| True | [0 0 0 1] | |
| BayesR | [0.722 0.2621 0.0115 0.0044] | 0.80 |
| emBayesR | [0.012 0.986 0.0007 0.0013] | 0.80 |
| GBLUP | - | 0.78 |