| Literature DB >> 20043835 |
Gerhard Moser1, Bruce Tier, Ron E Crump, Mehar S Khatkar, Herman W Raadsma.
Abstract
BACKGROUND: Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20043835 PMCID: PMC2814805 DOI: 10.1186/1297-9686-41-56
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Figure 1Distribution of EBVs for Australian Selection Index (ASI, a) and protein percentage (PPT, b), distribution of reliabilities of EBVs (c), and number of bulls within year of birth (d).
Figure 2Partial least squares regression model optimization for Australian Selection Index using cross-validation. Shown is the mean prediction error (MSEP) in the training (MSEPtraining) data set, the average MSEP in the 5-fold cross-validation samples (MSEPCV), the proportion of EBV (VarEBV) and SNP variance (VarSNP) explained in the training data for models with an increasing number of latent components; the optimal prediction model includes the first 5 latent components, identified by the smallest MSEPCV.
Cross-validation results for method fixed regression-least squares at different threshold values
| Trait | nSNP | MSEP | rEBV,MBV | bEBV,MBV | |||||
|---|---|---|---|---|---|---|---|---|---|
| ASI | 0.1 | 197.2 | (31.6) | 1464 | (139.2) | 0.52 | (0.031) | 0.49 | (0.059) |
| 0.01 | 98.2 | (7.1) | 1235 | (62.1) | 0.54 | (0.043) | 0.58 | (0.045) | |
| 0.001 | 33.0 | (5.4) | 1090 | (124.4) | 0.53 | (0.036) | 0.71 | (0.048) | |
| 0.0001 | 15.0 | (1.9) | 1108 | (136.8) | 0.50 | (0.043) | 0.76 | (0.084) | |
| PPT | 0.1 | 215.6 | (29.3) | 0.0214 | (0.0023) | 0.35 | (0.056) | 0.32 | (0.059) |
| 0.01 | 81.6 | (5.0) | 0.0156 | (0.0016) | 0.42 | (0.059) | 0.48 | (0.075) | |
| 0.001 | 30.0 | (4.2) | 0.0135 | (0.0023) | 0.43 | (0.089) | 0.62 | (0.155) | |
| 0.0001 | 15.4 | (2.1) | 0.0136 | (0.0016) | 0.39 | (0.076) | 0.67 | (0.173) | |
Average number of SNP (nSNP) in the model, mean square error (MSEP), correlation (rEBV,MBV) between EBV and MBV, and regression coefficient (bEBV,MBV) of EBV on MBV for different threshold levels (α) in five cross-validation samples of the training data set, standard error in parentheses; ASI: Australian Selection Index; PPT: protein percentage; † P-value used to select SNPs in or out of model.
Summary of MBV prediction in the training data for five methods obtained by cross-validation
| Trait | Method | MSEP | rEBV,MBV | bEBV,MBV | |||
|---|---|---|---|---|---|---|---|
| ASI | FR-LS | 1,090 | (124.4) | 0.53 | (0.036) | 0.71 | (0.048) |
| RR-BLUP | 712 | (93.5) | 0.71 | (0.017) | 1.07 | (0.076) | |
| Bayes-R | 714 | (95.3) | 0.71 | (0.016) | 1.09 | (0.071) | |
| SVR | 700 | (92.2) | 0.72 | (0.017) | 1.06 | (0.079) | |
| PLSR | 735 | (95.4) | 0.70 | (0.022) | 0.93 | (0.069) | |
| PPT | FR-LS | 0.0135 | (0.0023) | 0.43 | (0.089) | 0.62 | (0.155) |
| RR-BLUP | 0.0104 | (0.0018) | 0.56 | (0.067) | 1.01 | (0.104) | |
| Bayes-R | 0.0104 | (0.0010) | 0.56 | (0.067) | 1.06 | (0.117) | |
| SVR | 0.0100 | (0.0010) | 0.58 | (0.064) | 1.01 | (0.100) | |
| PLSR | 0.0109 | (0.0012) | 0.55 | (0.061) | 0.81 | (0.078) | |
Mean square error (MSEP), correlation (rEBV,MBV) between EBV and MBV, and regression coefficient (bEBV,MBV) of EBV on MBV derived by 5-fold cross-validation of the training data set, standard errors in parentheses; ASI: Australian Selection Index; PPT: protein percentage; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression; SVR: support vector regression; PLSR: partial least squares regression.
Correlation (rEBV,MBV) between EBV and MBV and regression coefficient (bEBV,MBV) of EBV on MBV in cohorts of young bulls for five methods
| Trait | Method | Year of birth | |||||
|---|---|---|---|---|---|---|---|
| 1998 | 1999 | 2000 | 2001 | 2002 | 1998-2002 | ||
| rEBV,MBV | |||||||
| ASI | FR-LS | 0.22 | 0.23 | 0.33 | 0.26 | 0.12 | 0.27 |
| RR-BLUP | 0.35 | 0.39 | 0.40 | 0.32 | 0.28 | 0.39 | |
| Bayes-R | 0.38 | 0.38 | 0.42 | 0.33 | 0.29 | 0.41 | |
| SVR | 0.42 | 0.40 | 0.46 | 0.40 | 0.35 | 0.45 | |
| PLSR | 0.39 | 0.38 | 0.40 | 0.35 | 0.34 | 0.42 | |
| PPT | FR-LS | 0.48 | 0.52 | 0.41 | 0.46 | 0.43 | 0.47 |
| RR-BLUP | 0.53 | 0.58 | 0.53 | 0.56 | 0.49 | 0.55 | |
| Bayes-R | 0.63 | 0.60 | 0.55 | 0.63 | 0.51 | 0.60 | |
| SVR | 0.64 | 0.61 | 0.57 | 0.63 | 0.52 | 0.61 | |
| PLSR | 0.63 | 0.55 | 0.50 | 0.62 | 0.43 | 0.56 | |
| bEBV,MBV | |||||||
| ASI | FR-LS | 0.18 | 0.25 | 0.29 | 0.26 | 0.11 | 0.26 |
| RR-BLUP | 0.59 | 0.72 | 0.82 | 0.72 | 0.60 | 0.72 | |
| Bayes-R | 0.59 | 0.71 | 0.81 | 0.71 | 0.59 | 0.76 | |
| SVR | 0.61 | 0.65 | 0.76 | 0.77 | 0.66 | 0.74 | |
| PLSR | 0.45 | 0.49 | 0.55 | 0.51 | 0.48 | 0.55 | |
| PPT | FR-LS | 0.58 | 0.62 | 0.45 | 0.59 | 0.40 | 0.55 |
| RR-BLUP | 1.09 | 0.93 | 0.98 | 1.10 | 0.72 | 1.08 | |
| Bayes-R | 1.24 | 1.10 | 1.15 | 1.29 | 0.81 | 1.16 | |
| SVR | 1.13 | 0.99 | 1.07 | 1.17 | 0.72 | 1.05 | |
| PLSR | 0.88 | 0.73 | 0.74 | 0.93 | 0.50 | 0.80 | |
| Number of bulls | |||||||
| 144 | 189 | 173 | 137 | 63 | 706 | ||
The training data set included animals born before 1998; ASI: Australian Selection Index; PPT: protein percentage; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression; SVR: support vector regression; PLSR: partial least squares regression.
Figure 3Fit of models relating EBVs and predicted MBVs in the training data and in young bulls. To avoid cluttering predictions are plotted for a single fold of the cross-validation (CV) of the training data set and young bull cohort 1998; ASI: Australian Selection Index; PPT: protein percentage; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression; SVR: support vector regression; PLSR: partial least squares regression.
Pearson correlations of MBV predictions in the training data (above diagonal) and in cohorts of young bulls (below diagonal) between five methods
| FR-LS | 0.73 | 0.74 | 0.73 | 0.71 | 0.66 | 0.67 | 0.66 | 0.64 | ||
| RR-BLUP | 0.57 | 1 | 0.99 | 0.97 | 0.59 | 1 | 0.98 | 0.97 | ||
| Bayes-R | 0.58 | 0.96 | 0.99 | 0.97 | 0.63 | 0.95 | 0.98 | 0.96 | ||
| SVR | 0.59 | 0.93 | 0.96 | 0.96 | 0.63 | 0.91 | 0.97 | 0.95 | ||
| PLSR | 0.55 | 0.93 | 0.97 | 0.95 | 0.60 | 0.92 | 0.97 | 0.93 | ||
ASI: Australian Selection Index; PPT: protein percentage; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression; SVR: support vector regression; PLSR: partial least squares regression.
Figure 4Distribution of 7,372 SNP effects along the genome estimated by four methods. The right most 772 SNPs are unassigned to chromosomes; ASI: Australian Selection Index; PPT: protein percentage; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression.
Correlation (rEBV,MBV) between EBV and MBV in cohorts of young bulls with increasing size of the training data
| Training | rEBV,MBV | ||||||
|---|---|---|---|---|---|---|---|
| Year | 1998 | 1999 | 2000 | 2001 | 2002 | ||
| Trait | N | 144 | 189 | 173 | 137 | 63 | |
| ASI | ≤1997 | 1,239 | 0.39 | 0.38 | 0.40 | 0.35 | 0.34 |
| ≤1998 | 1,383 | 0.37 | 0.38 | 0.29 | 0.26 | ||
| ≤1999 | 1,572 | 0.45 | 0.35 | 0.30 | |||
| ≤2000 | 1,745 | 0.39 | 0.34 | ||||
| ≤2001 | 1,882 | 0.32 | |||||
| PPT | ≤1997 | 1,239 | 0.63 | 0.55 | 0.50 | 0.62 | 0.43 |
| ≤1998 | 1,383 | 0.55 | 0.51 | 0.64 | 0.41 | ||
| ≤1999 | 1,572 | 0.52 | 0.66 | 0.43 | |||
| ≤2000 | 1,745 | 0.68 | 0.46 | ||||
| ≤2001 | 1,882 | 0.47 | |||||
Results were obtained by cross-validation using partial least squares regression; ASI: Australian Selection Index; PPT: protein percentage
Correlation (rEBV,SMGS) between EBV and pre-progeny test sire maternal-grandsire EBV prediction and correlation (rEBV,GEBV) between EBV and GEBV in young bulls for five methods
| rEBV,GEBV | ||||||
|---|---|---|---|---|---|---|
| Trait | rEBV,SMGS | FR-LS | RR-BLUP | Bayes-R | SVR | PLSR |
| ASI | 0.35 | 0.37 | 0.45 | 0.45 | 0.47 | 0.45 |
| PPT | 0.49 | 0.57 | 0.60 | 0.62 | 0.60 | 0.62 |
GEBV predictions for bulls born between 1998 and 2002 were calculated by combining the MBV predictions with the sire maternal-grandsire pathway predictions, which were calculated at the time of birth of the young bull calves; ASI: Australian Selection Index; PPT: protein percentage; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression; SVR: support vector regression; PLSR: partial least squares regression.
Summary of ANOVA of factors affecting correlation (rEBV,MBV) between EBV and MBV and regression coefficient (logebEBV,MBV) of EBV on MBV
| rEBV,MBV | logebEBV,MBV | |||
|---|---|---|---|---|
| Model Term | ||||
| Method | < 0.001 | 48.88 | n.t. | 88.09 |
| Trait | n.t. | 350.84 | n.t. | 131.35 |
| Year | n.t. | 61.68 | n.t. | 20.44 |
| Method.Trait | 0.198 | 1.66 | 0.002 | 6.18 |
| Method.Year | 0.827 | 0.65 | 0.346 | 1.20 |
| Trait.Year | < 0.001 | 60.19 | <0.001 | 10.73 |
Shown are the significance level (P-value) and F-value of each model term; the regression coefficient was loge-transformed to account for non-normality and unstable variance; † n.t. non-testable.
Computation times for estimation of SNP effects for five methods
| FR-LS | RR-BLUP | Bayes-R | SVR | PLSR |
|---|---|---|---|---|
| ~3 min | ~22 s | ~421 min | ~4 min | ~8 s |
Results were obtained by calculating SNP effects for a single replicate of the training data; FR-LS: fixed regression-least squares; RR-BLUP: random regression-BLUP; Bayes-R: Bayesian regression; SVR: support vector regression; PLSR: partial least squares regression.