Daniel Gianola1,2,3,4, Rohan L Fernando3. 1. Department of Animal Sciences, University of Wisconsin-Madison, Wisconsin 53706 gianola@ansci.wisc.edu. 2. Department of Dairy Science, University of Wisconsin-Madison, Wisconsin 53706. 3. Department of Animal Science, Iowa State University, Ames, Iowa 50011. 4. Department of Plant Sciences, Technical University of Munich (TUM), TUM School of Life Sciences, Freising, 85354 Germany.
Abstract
A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T-variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T-variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
TWO main paradigms have been employed for investigating statistical associations between molecular markers and complex traits: marker-by-marker genome-wide association studies (GWAS) and whole-genome regression approaches (WGR). GWAS is dominant in human genetics; Visscher present a perspective and Gianola formulate a statistically orientated critique. WGR was developed mostly in animal and plant breeding (e.g., Lande and Thompson 1990; Meuwissen ; Gianola ) primarily for predicting future performance, but it has received some attention in human genetics as well (e.g., de los Campos ; Yang ; Lee ; Makowsky ; López de Maturana ). de los Campos , Gianola (2013) and Isik reviewed an extensive collection of WGR approaches. Other studies noted that WGR can be used both for “discovery” of associations and for prediction (Moser ; Goddard ; Fernando ). Hence, WGR methodology is an active area of research.Multiple-trait analysis has long been of great interest in plant and animal breeding, mainly from the point of view of joint selection for many traits (Smith 1936; Hazel 1943; Walsh and Lynch 2018). Henderson and Quaas (1976) developed multi-trait best linear unbiased prediction of breeding values for all individuals and traits measured in a population of animals—a methodology that has gradually become routine in the field. For example, Gao described an application of a nine-variate model to data representing close to 7 million and 4 million Holstein and Nordic Red cattle, respectively; the nine traits were milk, fat, and protein yields in each of the first three lactations of the cows.A multiple-trait analysis is also a natural choice in quests for understanding and dissecting genetic correlations between traits using molecular markers, e.g., evaluating whether pleiotropy or linkage disequilibrium are at the root of between-trait associations (Gianola ; Cheng ). For instance, Galesloot compared six methods of multivariate GWAS via simulation and found that all delivered a higher power than single-trait GWAS, even when genetic correlations were weak. Many single-trait WGR methods extend directly to the multiple-trait domain, e.g., genomic best linear unbiased prediction (GBLUP; VanRaden 2007). Other procedures, such as Bayesian mixture models, are more involved, but extensions are available (Calus and Veerkamp 2011; Jia and Jannink 2012; Cheng ). The mixture model of Cheng is particularly interesting because it provides insight into whether markers affect all, some, or none of the traits addressed. For example, the proportion of markers in each of the
and categories, where means “no effect,” and denotes “effect” on each of two disease traits in Pinus taeda was estimated by Cheng using single nucleotide polymorphisms (SNPs). The proportion of Markov chain Monte Carlo (MCMC) samples falling into the class was <3%, with ∼140 markers appearing as candidates for further scrutiny of pleiotropy; 97% of the SNP were in the class, and 0.5% were in the and classes. It must be noted that Cheng used Bayesian model averaging, so posterior estimates of effects, and of their uncertainties, constitute averages over all possible configurations. The resulting “average model” is not truly sparse, as Bayesian mixture models always assign some posterior probability to each of the possible configurations. An alternative to a mixture is to use a prior distribution that produces strong shrinkage toward the origin of “weak” vectors of marker effects; here, each marker has a vector with dimension equal to the number of traits.The LASSO (least absolute shrinkage and selection operator) method presented by Tibshirani (1996) is a single-response method based on minimizing a linear regression residual sum of squares subject to a constraint based on an norm. It can produce a sparse model, i.e., if the linear regression model has p regression coefficients, the LASSO yields a smaller model (i.e., model selection), but with a complexity that cannot exceed the number of observations. Tibshirani (1996) noted that the LASSO solutions can also be obtained by calculating the mode of the conditional posterior distribution of the regression coefficients in a Bayesian model in which each coefficient is assigned the same conditional double exponential (DE) or Laplace prior. Using a ridge regression reformulation of LASSO, it can be seen that its Bayesian version shrinks small-value regression coefficients very strongly toward zero, whereas large-effect variants are regularized to a much lesser extent than in ridge regression (Tibshirani 1996; Gianola 2013). Yuan and Lin (2006) and Yuan considered the problem of clustering regression coefficients into groups (factors), with the focus becoming factor selection, as opposed to the predictor variable selection that takes place in LASSO. For instance, a cluster could consist of a group of markers in tight physical linkage. These authors noted that, in some instances, grouping enhances prediction performance over ridge regression, while in others, it does not. Such findings are consistent with knowledge accumulated in close to two decades of experience with genome-enabled prediction in animal breeding: there is no universally best prediction machine. A multiple-trait application of a LASSO penalty on regression coefficients was presented by Li . These authors assigned a multivariate Laplace distribution to the model residuals, and a group-LASSO penalty (Yuan and Lin 2006) to the regression coefficients. The procedure differs from Tibshirani’s LASSO in that the model selects vectors of regressors (corresponding to regressions of a given marker over traits) as opposed to single-trait predictor variables.Park and Casella (2008) introduced a fully Bayesian LASSO (BL). Contrary to LASSO, BL produces a model where all regression coefficients are non-null (even if ); most regressions are often tiny in value, except those associated with covariates (markers) with strong effects. In short, LASSO produces a sparse model, whereas BL yields an effectively sparse specification, similar to Bayesian mixture models such as Bayes B (Meuwissen ). The first application of BL in quantitative genetics was made by Yi and Xu (2008) in the context of quantitative trait locus (QTL) mapping, with subsequent applications reported in de los Campos , Legarra , Li and Lehermeier .It appears that no multiple-trait generalization of the BL has yet been reported. The present paper describes a multi-trait Bayesian LASSO (MBL) model based on adopting a multivariate Laplace distribution with unknown scale matrix as prior distribution for the markers or variants under scrutiny. The MBL is introduced and compared with a multiple-trait GBLUP (MTGBLUP) using wheat and pine tree data sets. The section Multi-Trait Regression Model describes MBL, including a MCMC sampling algorithm. Subsequently, MBL is compared with MTGBLUP using a wheat data set. Finally, bivariate MBL and bivariate MTGBLUP are contrasted from a predictive perspective, showing a better performance of MBL over BLUP and over a single-trait Bayesian LASSO specification, corroborating the usefulness of multiple-trait analyses. The paper concludes with a general discussion and technical details are presented in Appendices.
Multi-Trait Regression Model
Assume there are T traits observed in each of N individuals, and let be a vector of allelic substitution effects at marker with representing the effect of marker j on trait t
. The multi-trait regression model (assuming no nuisance location effects other than a mean) for the T responses iswhere is a vector of responses observed in individual
is the vector of trait means, and is the genotype individual i possesses at marker locus j. The residual vector
is assumed to follow the Gaussian distribution where is a covariance matrix. All vectors are assumed to be mutually independent and identically distributed.If traits are sorted within individuals, the probability model associated with Equation 1 can be represented aswhereis a matrix of sums of squares and products of the unobserved regression residuals.The regression model can be formulated in an equivalent manner by sorting individuals within traits; hereafter, we will use . Let
and be response vectors of order N each observed for traits 1, 2, and 3, respectively. The representation of the model iswhere is an vector of
is an matrix of marker genotypes, and
and
are vectors of regression coefficients and of residuals for trait , respectively. Above, is a vector and has dimension . Note that . Putting the probability model isWe will work with either (1) or (4), depending on the context.
Bayesian prior assumptions
Parameters and :
The vector will be assigned a “flat” improper prior, and Jeffreys noninformative prior (e.g., Sorensen and Gianola 2002) will be adopted for so that their joint prior density is
Multivariate laplace prior distribution for marker effects:
The same T-variate Laplace prior distribution with a null mean vector will be assigned to each of the vectors
, assumed mutually independent, a
. Gómez presented a multi-dimensional version of the power exponential family of distributions; one special case is the multivariate Laplace distribution (MLAP). The density of the MLAP with a zero-mean vector used here iswhere is a positive-definite scale matrix. The variance–covariance matrix of MLAP isnote that the absolute values of the elements of the intertrait variance–covariance of marker effects, are larger than those of Σ. Hence, for where is the appropriate diagonal element of likewise, is the covariance of marker effects between traits t and for all Putting in (7) yieldsThe preceding is the density of a DE distribution with null mean, parameter and variance . As mentioned earlier, Tibshirani (1996) and Park and Casella (2008) used the DE distribution as conditional (given Σ) prior for regression coefficients in the BL—a member of the “Bayesian Alphabet” (Gianola ). Gianola assigned the DE distribution to residuals of a linear model for the purpose of attenuating outliers, and Li used the MLAP distribution for the residuals in a “robust” linear regression model for QTL mapping.MLAP is therefore an interesting candidate prior for multi-trait marker effects in a multiple trait generalization of the Bayesian LASSO (MBL). A zero-mean MLAP distribution has a sharp peak at the 0 coordinates, although, when MLAP reduces to a DE distribution, the marginal and conditional densities of MLAP are not DE. Gómez showed that such densities are elliptically contoured, and, thus, not DE. Appendix A and Supplemental Material, Figures S1–S3 in the Supplemental material give background on MLAP.Gómez-Sánchez-Manzano showed that MLAP can be represented as a scaled mixture of normal distributions under the hierarchy: (1) and (2) for
denotes a T−variate normal distribution with null mean and covariance matrix The density of isLet the collection of all marker effects over traits be represented by the vectorIf independent and identical MLAP prior distributions are assigned to each of the subvectors, the joint prior density of all marker effects, given Σ, can be represented asand the joint density of and isWhen individuals are sorted within traits (e.g., ), note that is a Tp−dimensional normal distribution, with null mean vector and covariance matrixwhere is a diagonal matrix. Hence,
Scale matrix Σ:
The scale matrix Σ of MLAP can be given a fixed value (becoming a hyper-parameter), or inferred, in which case a prior distribution is needed. Here, an inverse-Wishart distribution with scale matrix and degrees of freedom will be assigned as prior. The density is
Joint posterior and fully conditional distributions
The joint posterior distribution, including from the scale-mixture of normals representation of the prior distribution of , was assumed to take the formwhere H denotes the hyper-parameters; recall that is the data vector sorted by individuals within trait. The fully conditional distributions are presented below, with used to denote all parameters that are kept fixed, together with H, in a specific conditional distribution.
Parameters μ and given :
From (17,) and using representations (4) and (15), the fully conditional posterior distribution of and has densityThe preceding is a multivariate normal density (e.g., Sorensen and Gianola 2002). The mean vector of the distribution isand the variance–covariance matrix isA more explicit representation is presented in Appendix B for the case
Fully conditional distributions of partitions of the location vector:
For details, see Van Tassell and Van Vleck (1996) and Sorensen and Gianola (2002). Since the joint posterior of the location parameters, given and is multivariate normal, all conditionals and linear combinations thereof are normal as well. In particular ,andLikewiseand
Fully conditional distributions of and :
From (17), using (2) and (6)so is an distribution with degrees of freedom and scale matrix . In the kernel of the density is often written as whereRecall thatso, from (17)whereis a matrix. Hence the conditional posterior distribution of Σ is . The kernel of the density of Σ is often represented as , where
Fully conditional distribution of :
From (17), and using (13),The preceding density is not in a recognizable form. Appendix C gives details of a Metropolis-Hastings algorithm tailored for making draws from the distribution having density (29). A brief description of the procedure follows.Starting values for and Σ can be obtained “externally” from some estimates of and (the matrix of variances and covariances of marker effects) calculated with standard methods such as maximum likelihood. Recall thatSample each
using the following Metropolis-Hastings sampler:At round t, draw y from and evaluate y as proposal; stands for an inverse-gamma distribution.Draw with the probability of move being , with R as in Appendix C.If set and form as a new state. Otherwise, setIn a “single-pass” sampler, use (19) and (20) for sampling the entire location vector jointly. Otherwise, adopt a blocking strategy; for example draw and using (21), (22), (23) and (24).Sample from and Σ from .
Remarks
Appendix E shows that the degree of shrinkage of marker effects results from a joint action between Σ and the strength of marker effects. A vector of effects of a marker with a short Mahalanobis distance away from 0 is more strongly shrunk toward the origin (i.e., the mean of prior distribution) than vectors containing strong effects on at least one trait. MLAP preserves the spirit of BL, producing “pseudoselection” of covariates: all markers stay in the model, but some are effectively nullified. A marker with strong marginal and joint effects on the traits under consideration could flag potentially pleiotropic regions.
Missing records for some traits
Often, not all traits are measured in all individuals, a situation that is more common in animal breeding than in plant breeding. A standard approach (“data augmentation”) treats missing phenotypes as unknowns in an expanded joint posterior distribution. As shown in Appendix F, a predictive distribution can be used to produce an imputation of missing data.
Alternative Formulation in Dimensions
The MCMC sampler described above is based on a regression on markers formulation stemming from either (1) or (4). In a “single-pass” sampler, parameters must be drawn together; when p is very large, direct inversion is typically unfeasible, so the scheme must be reformulated into a “block-sampling” one, i.e., by drawing some of the location parameters jointly by conditioning on the other location parameters, or by using a single-site sampler (Sorensen and Gianola 2002). Blocking or single-site sampling facilitate computation at the expense of slowing down convergence to the target distribution. Appendix D gives a scheme in which effects (trait means and bivariate genomic breeding values) are inferred, and the marker effects are calculated indirectly, following ideas of Henderson (1977) and adapted by Goddard (2009) to a genome-based model.
Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. The wheat yield data set in the R package BGLR (Pérez and de los Campos 2014) was employed to contrast MBL with GBLUP and Bayes Cπ. This wheat data set has been studied extensively, e.g., by Crossa , Gianola , 2016), and Long . There are wheat inbred lines, each genotyped with DArT (Diversity Array Technology) markers, and each planted in four environments. The DArT markers are binary and denote the presence or absence of an allele at a marker locus in a given line. Grain yields in environments 1 and 2 were employed to compare outcomes between analyses based on bivariate GBLUP and the bivariate BL. In the bivariate model, yields in the two environments are treated as distinct traits, conceptually—an idea that dates back to Falconer (1952). This type of setting arises in breeding of dairy cattle, where milk production of daughters of bulls in different countries are regarded as different traits and in multi-environment situations in plant breeding; both instances can be represented as special cases of a multiple-trait mixed effects model.A publicly available Loblolly pine data described in Cheng was used to carry out a predictive comparison between a Bayesian bivariate GBLUP with the bivariate Bayesian LASSO, as well as the latter vs. a single-trait Bayesian LASSO. After edits, there were individuals with SNP markers with measurements on rust bin scores and rust gall volume—two disease traits; see Cheng . Supplemental material available at figshare: https://doi.org/10.25386/genetics.10689380.
Bivariate Analysis of Wheat Yield: MBL vs. GBLUP
Genomic BLUP and Bayesian BLUP
The bivariate model waswhere is the vector of grain yields in environment of the 599 inbred lines; and are the trait means in the two environments, and 1 is a incidence vector of ones; and are the “additive genomic values” of the lines, and and are model residuals. In GBLUP (VanRaden 2008) the genetic signals captured by markers are represented as and where is a centered and scaled matrix of genotype codes, and contains the marker allele substitution effects on trait The residual distribution waswhere, as before, is the between-trait residual variance–covariance matrix. Effects of environment 1 are expected to be uncorrelated with those of environment 2. However, allowance was made for a non-null residual covariance because the additive genomic model may not capture extant epistasis involving additive effects, potentially creating correlations among residuals of the same lines in different environmental conditions.GBLUP assumed
and sois the variance–covariance matrix of marker effects. It follows thatwhereis a between-trait variance–covariance matrix of the additive genomic values (here, e.g., ) and is a genomic-relationship matrix describing genome-based similarities among the 599 lines. The preceding assumptions induce the marginal distributionwhere is the phenotypic covariance matrix. The bivariate best linear unbiased predictor of and (Henderson 1975) is:whereis the bivariate generalized least-squares (GLS) estimator of the trait means.BLUP and GLS require knowledge of and and we replaced these unknown matrices by estimates obtained using a crude, but simple, procedure. Genomic and residual variance components were obtained by univariate maximum likelihood analyses of traits 1, 2, and and covariance component estimates were calculated from the expression The resulting estimates of and were inside their corresponding parameter spaces. An estimate of was obtained by applying relationship (34) to the estimatedHenderson (1977) showed how BLUP of vectors that are not likelihood identified can be obtained from best linear unbiased predictions of likelihood-identified random effects (see Gianola 2013). Goddard (2009) and Strandén and Garrick (2009) used this property to obtain predictions of marker effects given predictions of signal . If and have a joint normal distribution, under (30) one hasUsing iterated expectations, and recalling that BLUP can be viewed as an estimated conditional expectation (with fixed effects replaced by their GLS estimates), BLUP of marker effects is expressible aswith
After lengthy algebra, and using Henderson (1975), the prediction error variance–covariance matrix of the BLUP of marker effects is given byA set of can be formed by taking the ratio between the BLUP of a given marker effect as in (38), and the square root of the corresponding diagonal element of (39). The statistic is a crude criterion for association between marker and phenotype as it ignores uncertainty associated with the fact that and are estimated from the data, as opposed to being “true values” required by BLUP theory.The Bayesian bivariate GBLUP model used standard assumption as in Sorensen and Gianola (2002), i.e., it was a multivariate normal-inverse Wishart hierarchical specification. The only difference with GBLUP is that, in the Bayesian treatment, and were treated as unknown parameters, with the uncertainty about their values accounted for.
Bivariate LASSO
Our MCMC implementation for MBL was applied to markers directly, as opposed to inferring their effects from signal indirectly, as it is done for GBLUP. The model was as in (4) with Each marker was assigned a conditional bivariate Laplace prior distribution with scale matrix Σ in turn, Σ was given a two-dimensional (2D) inverse Wishart distribution on degrees of freedom and with scale matrix The residual variance–covariance matrix was assigned the 2D Jeffreys improper prior in (6).The MCMC scheme employed the scale mixture of normal representation of the bivariate Laplace distribution. First, six independent chains of 1500 iterations each were run. The shrinkage diagnostic metric of Gelman and Rubin (1992) was calculated for ,
and Σ, for the effect of marker 10 on trait 1, and for the effect of marker 200 on trait 2; the R package CODA was used for this purpose. Figures S4–S13 gave no strong evidence of lack of convergence, as indicated by shrinkage factor values close to 1.Post burn-in samples were collected for an additional 2000 iterations in each chain, so a total of 12,000 samples (without thinning) was used for inference. Figures S14 and S15 depict post burn-in trace plots for the elements of and Σ, respectively. The six chains “joined” eventually, and sample values fluctuated thereafter within what seemed to be stationary distributions. To assess convergence further, a test suggested by Geweke (1992) was applied to the combined 12,000 samples from the posterior distributions of (residual correlation between yields in environments 1 and 2) and , the correlation between effects of a marker on the two traits. The test compared means of two parts of the combined collection of 12,000 samples at each of 10 segments of the collection: there was no evidence of lack of convergence. In short, the implementation met successfully the convergence tests applied.Figure 1 and Figure 2 display estimated posterior densities of and Mixing for was poorer than for ; the effective number of samples was 220.6 and 979.0, respectively, and Monte Carlo errors were small enough. The residual correlation (posterior mean, 0.17) was positive and different from 0, whereas the parameter was estimated at −0.35, also different from zero. However, the posterior densities were not sharp enough for precise inference, probably due to the small sample size and low density of the marker panel . The quality of these estimates is of subsidiary interest here as our objective was to demonstrate the MBL in a comparison with bibariate BLUP of marker effects.
Figure 1
Bivariate Bayesian LASSO: trace plot and posterior density of residual correlation.
Figure 2
Bivariate Bayesian LASSO: trace plot and posterior density of correlation between marker effects.
Bivariate Bayesian LASSO: trace plot and posterior density of residual correlation.Bivariate Bayesian LASSO: trace plot and posterior density of correlation between marker effects.Location parameters mixed well. For example, the average effective sample size of over the six chains during burn-in was 1499 for a nominal 1500 iterations. For marker 10 effect on trait 1, it was 962 out of 1500, and, for marker 200 effect on trait 2, the effective size was 1130 out of 1500. These numbers suggest that all 2558 marker effects were estimated with a very small Monte Carlo error in our MBL implementation with 12,000 samples used for inference.
MBL vs. BLUP estimates of marker effects
Figure 3 gives a comparison between bivariate BLUP and posterior mean estimates of effects from MBL. The upper panel shows good alignment between estimates, except at the extremes of the scatter plots. The lower panel depicts that markers with the strongest absolute effects, as estimated by BLUP, had an even stronger effect when estimated under the bivariate BL. Figure 4 presents standardized estimates of each of the 1279 marker effects, by trait. For GBLUP the was the estimated marker effect divided by the square root of its prediction error variance; for MBL, it was the posterior mean divided by its posterior standard deviation. There is no evidence that any of the markers had an effect differing from 0, corroborating the view that wheat yield is a typical quantitative trait affected by many variants each having small effects (Singh ; Sleper and Poehlman 2006). Using a univariate least-squares, GWAS-type analysis, there were 29 (yield 1) and 56 (yield 2) significant hits after a Bonferroni correction (1279 tests, ) A comparison between the from the GWAS-type analysis with the standardized BLUP and MBL effects is provided in Figure 5. As expected, shrinkage toward null-mean distributions (bivariate Gaussian in BLUP and bivariate Laplace in MBL) made much smaller in absolute value than the corresponding ones from GWAS.
Figure 3
Bivariate GBLUP vs. bivariate Bayesian LASSO (posterior mean) estimates of marker effects on wheat grain yield.
Figure 4
t-Statistics for marker effects on wheat grain yield: GBLUP vs. bivariate Bayesian LASSO (MBL)
Figure 5
t-Statistics for marker effects on wheat grain yield: ordinary least-squares (OLS) vs. bivariate Bayesian LASSO (MBL) and bivariate GBLUP.
Bivariate GBLUP vs. bivariate Bayesian LASSO (posterior mean) estimates of marker effects on wheat grain yield.t-Statistics for marker effects on wheat grain yield: GBLUP vs. bivariate Bayesian LASSO (MBL)t-Statistics for marker effects on wheat grain yield: ordinary least-squares (OLS) vs. bivariate Bayesian LASSO (MBL) and bivariate GBLUP.Standard GWAS aims to find connections between markers and genomic regions having an effect on a single trait (e.g., Manolio , Visscher ; Gianola ; Schaid ) A search for pleiotropy, on the other hand, focuses on markers having multi-trait effects. The latter can be viewed as a search for vectors of effects with non-null coordinates that are distant from a T−dimensional 0 origin. Mahalanobis squared distances away from for each the 1279 bivariate vectors of marker effects were calculated for both BLUP and MBL. For BLUP and marker the squared distance was computed as and for MBL it was where are effect estimates for marker j, and is the estimated posterior expectation of Σ. For BLUP, had median and maximum values of 0.16 and 2.94, respectively, over markers. For MBL the corresponding values were 0.14 and 3.83. Figure 6 shows that the largest estimated distances were obtained with MBL, supporting the view that the method produces less shrinkage of multiple-trait effect sizes than BLUP. If the 95% percentile of a chi-squared distribution on 2 degrees of freedom (5.99 and 14.4 without and with a Bonferroni correction at ) is used as “significance threshold”, none of the 1279 markers could be claimed to have a bivariate effect on the trait, which is consistent with the
Figure 6
Mahalanobis squared distance (M) away from (0, 0) for bivariate effects on grain yield of 1279 markers: GBLUP vs. bivariate Bayesian LASSO (BLASSO)
Mahalanobis squared distance (M) away from (0, 0) for bivariate effects on grain yield of 1279 markers: GBLUP vs. bivariate Bayesian LASSO (BLASSO)
Predictive Comparison between MBL, MTGBLUP, and MT-BayesCπ: Wheat
Bivariate Bayesian GBLUP and BayesCπ models (Cheng ) were also fitted to the wheat data set. Multiple-trait Bayesian linear models are well known (e.g., Sorensen and Gianola 2002); BayesCπ consisted of a bivariate mixture in which each of the 1279 markers was allowed to fall, a
into one of four disjoint classes: , where means that a marker has no effect on either trait, indicates that a marker affects yield 2 only, and so on. The prior for the four probabilities of membership was a distribution. All three methods were run in each of 100 randomly constructed training sets, and predictions were formed for lines in corresponding testing sets. Training and testing set sizes had 300 and 299 wheat lines, respectively, in each of the 100 runs. For all methods, the MCMC scheme was a single long chain of 50,000 iterations, with a burn-in period of 1000 draws. The analyses were run using the JWAS package written in the JULIA language (Cheng ).Figure 7 and Figure 8 present pairwise plots (bivariate Bayesian GBLUP denoted as RR-BLUP in the plots) of predictive correlations and predictive mean-squared errors, respectively; the plots display <100 points because numbers were rounded to two decimal points. There were no appreciable differences in predictive performance between the three methods, supporting the view that cereal grain yield is multi-factorial and that there are none, if any, genomic regions, with large effects. The variability among replications of the training-testing layout is essentially random, reinforcing the notion of the importance of measuring uncertainty of prediction (Gianola ). Many studies fail to replicate, and often claim differences between, methods based on single realizations of predictive analyses.
Figure 7
Predictive correlations for wheat grain yield: bivariate Bayesian LASSO (Bayes L) vs. bivariate GBLUP (RR-BLUP).
Figure 8
Predictive mean-squared error for grain yield: bivariate Bayesian LASSO (Bayes L), bivariate GBLUP (RR-BLUP), and bivariate Bayes Cπ.
Predictive correlations for wheat grain yield: bivariate Bayesian LASSO (Bayes L) vs. bivariate GBLUP (RR-BLUP).Predictive mean-squared error for grain yield: bivariate Bayesian LASSO (Bayes L), bivariate GBLUP (RR-BLUP), and bivariate Bayes Cπ.
Predictive Comparison between MBL vs. MTGBLUP and MBL vs. Single Trait Bayesian LASSO: Pinus
Figure 9 and Figure 10 present scatter-plots of the predictive performance (mean squared error and correlation, respectively) of the bivariate Bayesian LASSO and bivariate Bayesian GBLUP (MTGBLUP, denoted as RR-BLUP in the plots) in the 100 testing sets. There were no obvious differences in mean-squared error for either rust bin or gall volume although, for the latter trait, a slight superiority of MBL was noted (Figure 9); the plot contains distinct 12 points because the overlap in numerical values produced “clusters” of points. On the other hand, there was a decisive superiority (Figure 10) of MBL over MTGBLUP in predictive correlation.
Figure 9
Mean-squared error of prediction for rust bin and gall volume in pine trees: bivariate Bayesian LASSO (Bayes L) vs. bivariate Bayesian GBLUP (RR-BLUP).
Figure 10
Predictive correlation for rust bin and gall volume in pine trees: bivariate Bayesian LASSO (Bayes L) vs. bivariate Bayesian GBLUP (RR-BLUP).
Mean-squared error of prediction for rust bin and gall volume in pine trees: bivariate Bayesian LASSO (Bayes L) vs. bivariate Bayesian GBLUP (RR-BLUP).Predictive correlation for rust bin and gall volume in pine trees: bivariate Bayesian LASSO (Bayes L) vs. bivariate Bayesian GBLUP (RR-BLUP).Figure 11 contrasts the predictive performance of the bivariate Bayesian LASSO over the single trait Bayesian LASSO for gall volume. The two trait analysis tended to produce larger predictive correlations and smaller mean-squared errors, illustrating instances in which a multiple-trait specification clearly constitutes a better prediction machine.
Figure 11
Predictive mean squared error and correlation for gall volume in pine trees: bivariate Bayesian LASSO (MTBayesL) vs. univariate Bayesian LASSO (STBayesL).
Predictive mean squared error and correlation for gall volume in pine trees: bivariate Bayesian LASSO (MTBayesL) vs. univariate Bayesian LASSO (STBayesL).
Conclusion
Our study is possibly the first report in the quantitative genetics literature of a MBL, inspired by the BL of Park and Casella (2008). MBL assumes that vectors of marker effects on T traits follow a null-mean multivariate Laplace distribution, a
This distribution has a sharp peak at the origin, and reduces to the DE prior of the BL when applied to a single trait. The implementation of MBL requires MCMC sampling, and a relative simple Metropolis-Hastings algorithm based on a scaled mixture of normals representation (Gómez-Sánchez-Manzano ) was presented. The algorithm was tested thoroughly with a wheat data set and found to mix well, with no evidence of lack of convergence to the posterior distribution, and with a small Monte Carlo error.A question that arises often in practice, is the extent to which a multiple-trait method will produce a better performance than a single-trait specification. If the parameters of the model (assuming it holds) representing the inter-trait distribution are either known or well estimated, one should expect more power for QTL detection and a better predictive performance for the multivariate specification. In our study, we found that MBL outperformed the single trait in terms of delivering a better predictive performance for gall volume but not for rust bin in Pinus. On the other hand, a multiple-trait analysis is more complex and requires more assumptions, so it may be less robust than a single trait procedure, and fail to deliver according to expectation in real-life circumstances. It is risky to make sweeping statements arguing in favor of a specific treatment of data, as outcomes are heavily dependent on the biological architecture of the traits considered, and on the data structure as well. The picture emerging from two decades of experience in genome-enabled prediction in the fields of animal and plant breeding is that, in view of the large variability of performance with respect to data structure for any given prediction machine, it is largely futile to categorize methods in terms of expected predictive performance using broad criteria (Morota and Gianola 2014; Gianola and Rosa 2015; Momen ; Montesinos-López et al. 2018 a,b, 2019a,b; Azodi ).MBL is expected to shrink more strongly toward zero vectors of markers with small effects in their coordinates, thus producing differential shrinkage and preserving the modus operandi of BL. Mimicking the single-trait argument in Tibshirani (1996), which shows equivalence between LASSO and a posterior mode, the representation in Appendix E illustrates that the degree of shrinkage of the vectorial effects of a marker (j, say) on a set of traits is inversely proportional to the quadratic form . Thus, multivariate Bayesian pseudosparsity is induced by MBL to an extent depending on the heterogeneity of over markers. We note, in passing, that the term given in (66) in Appendix E is the counterpart of , part of the “group-penalty” in Li , where g is some meaningful group of markers arrived at, say, on the basis of biological considerations, and is the group regression coefficient for trait t. The latter penalty assigns the same weight to these regressions over traits, contrary to MBL, where weights and coweights are driven by The BL or MBL can be adapted to situations where a group structure may be needed via hierarchical modeling; this fairly straightforward issue is outside of the scope of the paper, but may pursued in future extensions of MBL. Actually, Liquet described a Bayesian multiple-trait analysis where a LASSO-type penalty is assigned to group effects, and a spike-slab prior induces additional Bayesian sparsity at the level of individual regression coefficients. The authors did not address the predictive ability of their method so it would be interesting to compare it against MBL and the multiple-trait mixture model of Cheng . We plan to carry out this comparison in collaboration with CIMMYT (Centro Internacional de Mejoramiento de Maíz y Trigo, México) using a large number of data sets in various cereal crops.Knowledge of the genetic basis of complex traits is limited, and not vast enough to enable formulation of a priori prescriptions for any specific trait or situation. The number, location, and effects of causal variants, the linkage disequilibrium structure between such variants and markers, and the mode of gene action of QTL are largely unknown, this holding for all species of domesticated plants and animals, and for most common diseases in humans. Theoretically, MBL is expected to perform better than multiple-trait BLUP whenever appreciable heterogeneity exists over the effects of the markers in the panel employed, while behaving as multiple-trait GBLUP when all markers have tiny and similar effects. This consideration follows directly from the structure of the method, and computer simulations could be easily tailored to create scenarios where MBL has a better or a worse performance simply by design but without necessarily being relevant to a real-life inferential or predictive problem.The expectation stated above was verified empirically: markers with stronger (positive or negative) effects on the wheat yields examined had larger Mahalanobis distances away from zero than markers with small effects. Further, markers with short distances in GBLUP had even shorter distances under MBL. Neither of the two methods was able to detect variants having a strong effect on wheat yield, contrary to least-squares GWAS. However, outcomes from GWAS are not strictly comparable with those from shrinkage-based procedures. In single-marker least-squares, the estimator is potentially biased because other genomic regions are ignored in the model; further, short- and long-range linkage disequilibria create statistical ambiguity (Gianola ). In WGR, on the other hand, regressions are akin to partial derivatives, i.e., the coefficient gives the net effect of the marker given that the other markers are fitted; typically, regressions become smaller as p is increased at a fixedIn plant and animal breeding, a focal point is the evaluation of genetic merit of candidates for artificial selection, and the prediction of expected performance in either collateral relatives or in descendants. Under the assumptions of additive inheritance, genome-enabled prediction (Meuwissen ) produces estimates of marked additive genomic value, or signal as referred to in our paper. In MBL, and marker effects can be inferred from their posterior mean or from a modal approximation (MAP-MBL) that does not involve MCMC that is described in Appendix E. A rough comparison between GBLUP, MBL, and MAP-MBL was carried out with the wheat data. For the latter, we used and starting values for the iteration were calculated using BLUP estimates of marker effects. MAP with were iterated for 500 rounds. Figure S16 shows that, at iteration 500, the metric used for monitoring convergence had stabilized at the third decimal place, but, for our purposes, iteration could have stopped after 200 rounds. Figure S17 presents a scatter plot of the 2558 (bivariate) marker effect solutions at iterations 1 and 500 against the corresponding BLUP or MBL posterior mean estimates. Clearly, the MAP approach gave markedly different results, producing a stronger shrinkage to 0 of small-effect markers and, thus, an effectively sparser model. Figure S18 gives a comparison of the fitted genomic values, i.e., and for the two traits. GBLUP and MBL estimates were closely aligned and fit the data in a similar manner. On the other hand, MAP-MBL gave a larger mean-squared error of fit, and a smaller correlation between fitted and observed phenotypes, possibly because of the larger effective sparsity of MAP-MBL. A worse fit to the data does not necessarily imply a poorer predictive ability. A thorough comparison of predictive ability between MBL and MAP-MBL will be carried out in future research.Our predictive comparison in wheat involved three bivariate models: GBLUP, MBL and BayesC which employs Bayesian model averaging. A training-testing validation replicated 100 times at random indicated no differences among methods. However, it was found that MBL was better than MT Bayesian BLUP for the two pine tree traits. After almost two decades of genome-enabled prediction it is now clear that no universally best prediction machine exists (Gianola ; Heslot ; de los Campos ; Momen ; Bellot ; Montesinos-López ,b, 2019a,b), even when nonparametric or deep learning techniques are brought into the comparisons. For this reason, we refrain from making any sweeping claim about the superiority (inferiority) of MBL over any other competing multiple-trait Bayesian regression method. If the situation is such that multiple-trait vectors of effects are fairly homogeneous over makers, it is to be expected that most methods will have a similar performance. On the other hand, if there is underlying heterogeneity of vectors of effects, possibly reflecting “structural sparsity” at the QTL level, it is quite likely that MBL and multiple-trait Bayesian variable selection methods (e.g., Cheng ) will outperform MTGBLUP or a multiple-trait version of Bayes A. Unfortunately, is it difficult to anticipate a priori which method will deliver the best performance, given the limited knowledge of the biological architecture of complex traits, the strong influence of the data structure, and the variability of matrices over data sets, in dimension and content.As far as we know, our paper represents the first report in the quantitative genetics literature of a multiple-trait LASSO, implemented in a Bayesian or empirical Bayes (Appendix E) manner. MBL adds to the armamentarium of genome-enabled prediction, and expands the family of members of the Bayesian alphabet (Gianola ; Habier ; Gianola 2013). Further, it has been implemented in the publicly available JWAS software (Cheng ). We take the view that every prediction problem is unique, and that no claims about the superiority of a specific procedure over others should be made without qualification. For instance, MBL could perform worse or better than here when applied to other species, traits, or when confronted with different data structures. Most quantitative and disease traits are truly complex, and it is dangerous to offer simplistic solutions or predictive panaceas (Goddard ).
Authors: Gustavo de los Campos; Hugo Naya; Daniel Gianola; José Crossa; Andrés Legarra; Eduardo Manfredi; Kent Weigel; José Miguel Cotes Journal: Genetics Date: 2009-03-16 Impact factor: 4.562
Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330
Authors: Gerhard Moser; Sang Hong Lee; Ben J Hayes; Michael E Goddard; Naomi R Wray; Peter M Visscher Journal: PLoS Genet Date: 2015-04-07 Impact factor: 5.917