| Literature DB >> 22745228 |
Gustavo de Los Campos1, John M Hickey, Ricardo Pong-Wong, Hans D Daetwyler, Mario P L Calus.
Abstract
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.Entities:
Mesh:
Year: 2012 PMID: 22745228 PMCID: PMC3567727 DOI: 10.1534/genetics.112.143313
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1 Commonly used prior densities of marker effects (all with zero mean and unit variance). The densities are organized in a way that, starting from the Gaussian in the top left corner, as one moves clockwise, the amount of mass at zero increases and tails become thicker and flatter.
Figure 2 Relationships between some prior densities commonly assigned to marker effects.
Prior density of marker effects, prior variance of marker effects, and suggested formulas for choosing hyperparameter values by model
| Model | Prior variance | Solution for scale/variance parameter | |
|---|---|---|---|
| Hyperparameters | |||
| Bayesian ridge regression | |||
| Bayesian LASSO | |||
| BayesA | |||
| Spike–slab | |||
| BayesC | |||
| BayesB | |||
where represents number of copies of the allele coded as one at the j (j = 1,…,p) locus of the i (i = 1,…,n) individual, and is the average genotype at the j marker.
Classification and abbreviations of the models included in Figure 3, A and B
| Name (abbreviation) | Bayesian | Penalized | Nonparametric |
|---|---|---|---|
| Least-squares regression (LSR) | |||
| Bayesian ridge regression (BRR) or RR-BLUP | X | X | |
| BLUP using a genomic relationship matrix (G-BLUP) | X | X | |
| Trait-specific BLUP (TA-BLUP) | X | X | |
| BayesA | X | ||
| BayesB | X | ||
| BayesC | X | ||
| Bayes SSVS | X | ||
| Bayesian LASSO (BL) | X | ||
| Double hierarchical generalized linear models (DHGLM) | |||
| Least absolute shrinkage and selection operator (LASSO) | X | ||
| Partial least-squares regression (PLS) | X | ||
| Principal component regression (PCR) | X | ||
| Elastic net (EN) | X | ||
| Reproducing kernel Hilbert spaces regressions (RKHS) | X | X | X |
| Support vector regression (SVR) | X | X | |
| Boosting | NA | NA | NA |
| Random forests (RF) | X | ||
| Neural networks (NN) | X | X | X |
The following are early references of the use of the above methods for genomic prediction (references with the original description of some of the methods are also given in earlier sections of this article and in the references given here). LSR, BRR, BayesA, and BayesB, Meuwissen ; G-BLUP, VanRaden (2008); TA-BLUP, Zhang ; BayesC, Habier ; Bayes SSVS, Calus ; BL, de los Campos ; DHGLM, Shen ; LASSO, Usai ; PLS and SVR, Moser ; PCR, Solberg ; EN, ; RKHS, ; Boosting, González-Recio ; RF, González-Recio and Forni (2011); and NN, Okut .
Boosting as an estimation technique could be applied to any method, Bayesian or penalized, parametric or nonparametric.
NN could be implemented in a nonpenalized, penalized, or Bayesian framework.
Figure 3 (A and B) Number of articles reviewed comparing one or more methods using simulated (A) or real (B) data. The abbreviations used for the methods are given in Table 2. The following references were used: (Meuwissen ; Habier ; Piyasatian ; González-Recio ; Lee ; Bennewitz ; de los Campos ; Gonzalez-Recio ; Hayes ,b; Lorenzana and Bernardo 2009; Luan ; Lund ; Meuwissen 2009; Meuwissen ; Moser ; Solberg ; Usai ; Verbyla ; Zhong ; Andreescu ; Bastiaansen ; Coster ; Crossa ; Daetwyler ,b; de los Campos ,b; Gonzalez-Recio ; Gredler ; Guo ; Habier ; Konstantinov and Hayes 2010; Meuwissen and Goddard 2010; Mrode ; Pérez ; Shepherd ; Zhang ; Calus and Veerkamp 2011; Clark ; Croiseau ; de Roos ; Gonzalez-Recio and Forni 2011; Habier ; Heffner ; Iwata and Jannink 2011; Legarra ; Long ,b; Makowsky ; Mujibi ; Ober ; Ostersen ; Pryce ; Pszczola ; Wiggans ; Wittenburg ; Wolc ,b; Yu and Meuwissen 2011; Bastiaansen ; Heslot ).
Figure 4 Accuracies of G-BLUP, BayesA, and Bayes SSVS models for fat and protein percentage, estimated using three different Holstein–Friesian reference populations (Hayes ; Verbyla ; de Roos ). Note that the data used by Hayes are a subset of the data used by Verbyla .