| Literature DB >> 29588381 |
Paulino Pérez-Rodríguez1, Rocío Acosta-Pech1, Sergio Pérez-Elizalde1, Ciro Velasco Cruz1, Javier Suárez Espinosa1, José Crossa2,3.
Abstract
Genomic selection (GS) has become a tool for selecting candidates in plant and animal breeding programs. In the case of quantitative traits, it is common to assume that the distribution of the response variable can be approximated by a normal distribution. However, it is known that the selection process leads to skewed distributions. There is vast statistical literature on skewed distributions, but the skew normal distribution is of particular interest in this research. This distribution includes a third parameter that drives the skewness, so that it generalizes the normal distribution. We propose an extension of the Bayesian whole-genome regression to skew normal distribution data in the context of GS applications, where usually the number of predictors vastly exceeds the sample size. However, it can also be applied when the number of predictors is smaller than the sample size. We used a stochastic representation of a skew normal random variable, which allows the implementation of standard Markov Chain Monte Carlo (MCMC) techniques to efficiently fit the proposed model. The predictive ability and goodness of fit of the proposed model were evaluated using simulated and real data, and the results were compared to those obtained by the Bayesian Ridge Regression model. Results indicate that the proposed model has a better fit and is as good as the conventional Bayesian Ridge Regression model for prediction, based on the DIC criterion and cross-validation, respectively. A computing program coded in the R statistical package and C programming language to fit the proposed model is available as supplementary material.Entities:
Keywords: GBLUP; GenPred; Genomic Selection; Ridge regression; Shared Data Resources; asymmetric distributions; data augmentation
Mesh:
Year: 2018 PMID: 29588381 PMCID: PMC5940167 DOI: 10.1534/g3.117.300406
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Densities of the standard skew normal distribution for different values of λ and the corresponding values for , .
Figure 2Density plot for Gray Leaf Spot (GLS) rating (disease resistance), at each site: Kakamega (Kenya), San Pedro Lagunillas (Mexico) and Santa Catalina (Colombia).
Point estimates, standard deviations for , , , correlations between observed and predicted values and MSE of predictions. Phenotypes were simulated under model (9) with and then regression models with skew normal (BSN) and normal errors (BRR) were fitted
| Model | MSE | |||||
|---|---|---|---|---|---|---|
| BSN | 3.075 (0.854) | 2.257 (0.052) | 0.003 (0.001) | 833.72 | 0.479 | 2.441 |
| BRR | 3.113 (0.975) | 2.207 (0.155) | 0.003 (0.001) | 627.33 | 0.531 | 3.036 |
| BSN | 3.009 (0.771) | 2.218 (0.047) | 0.003 (0.001) | 803.35 | 0.648 | 3.274 |
| BRR | 2.991 (0.905) | 2.167 (0.133) | 0.003 (0.001) | 602.89 | 0.667 | 2.714 |
| BSN | 2.972 (0.714) | 2.210 (0.048) | 0.003 (0.001) | 816.71 | 0.442 | 2.219 |
| BRR | 2.945 (0.828) | 2.168 (0.139) | 0.003 (0.001) | 614.19 | 0.506 | 2.154 |
| BSN | 3.094 (0.821) | 2.219 (0.054) | 0.003 (0.001) | 833.48 | 0.648 | 3.112 |
| BRR | 3.120 (0.858) | 2.175 (0.155) | 0.003 (0.001) | 621.59 | 0.676 | 2.639 |
| BSN | 3.055 (0.942) | 2.270 (0.061) | 0.003 (0.001) | 872.98 | 0.662 | 1.676 |
| BRR | 3.067 (0.900) | 2.196 (0.169) | 0.003 (0.001) | 631.79 | 0.696 | 1.642 |
| BSN | 2.830 (1.280) | 2.167 (0.056) | 0.003 (0.001) | 668.16 | 0.578 | 2.593 |
| BRR | 2.890 (0.878) | 2.165 (0.153) | 0.004 (0.001) | 606.84 | 0.631 | 2.893 |
.
True and estimated posterior mean of , effective number of parameters (), deviance information criterion (DIC), correlations between “true” and estimated marker effects and correlations between “true” and estimated genetic signals; standard deviations in parentheses. Phenotypes were simulated under model (9) with and then regression models with skew normal (BSN) and normal errors (BRR) were fitted
| Model | ||||
|---|---|---|---|---|
| BSN | 40.794 | 2206.112 | 0.192 (0.046) | 0.697 (0.116) |
| BRR | 59.573 | 2212.001 | 0.193 (0.049) | 0.689 (0.115) |
| BSN | 80.469 | 2279.974 | 0.207 (0.049) | 0.718 (0.119) |
| BRR | 91.548 | 2279.706 | 0.207 (0.050) | 0.714 (0.117) |
| BSN | 41.996 | 2262.930 | 0.194 (0.051) | 0.717 (0.104) |
| BRR | 57.826 | 2267.114 | 0195 (0.052) | 0.708 (0.104) |
| BSN | 76.978 | 2218.017 | 0.203 (0.049) | 0.718 (0.114) |
| BRR | 96.31 | 2238.787 | 0.198 (0.052) | 0.706 (0.115) |
| BSN | 93.687 | 2144.77 | 0.203 (0.046) | 0.734 (0.109) |
| BRR | 102.345 | 2174.32 | 0.191 (0.047) | 0.707 (0.116) |
| BSN | 85.465 | 2151.505 | 0.216 (0.055) | 0.747 (0.098) |
| BRR | 83.422 | 2276.08 | 0.196 (0.052) | 0.703 (0.109) |
Estimates of posterior means of parameters , and from the full-data analysis of Kakamega, San Pedro Lagunillas and Santa Catalina for Gray Leaf Spot in 300 tropical inbred maize lines and 1,152 SNPs; standard deviations in parentheses
| Site | Parameter | ||||||
|---|---|---|---|---|---|---|---|
| Model | |||||||
| Kakamega | BSN | 0.498 (0.0725) | 0.00032 (9e-05) | 1726.551 (723.794) | 61.257 | 586.361 | 0.981 (0.021) |
| BRR | 0.425 (0.073) | 0.00053 (0.00014) | 901.913 (391.380) | 97.367 | 629.15 | — | |
| San Pedro Lagunillas | BSN | 0.369 (0.079) | 0.00093 (0.00019) | 425.973 (173.541) | 126.833 | 602.752 | 0.376 (0.550) |
| BRR | 0.331 (0.069) | 0.00104 (0.00019) | 339.114 (125.014) | 143.06 | 597.852 | — | |
| Santa Catalina | BSN | 0.518 (0.092) | 0.00046 (0.00015) | 1331.033 (785.158) | 50.072 | 555.512 | 0.9226 (0.227) |
| BRR | 0.404 (0.070) | 0.00075 (0.00016) | 574.862 (199.984) | 112.447 | 595.027 | — | |
Figure 3Scatterplot of predicted Gray Leaf Spot (GLS) obtained when fitting the BSN model and the BRR model. In the three cases considered, the Pearson’s correlation between predictions was higher than 0.95.
Figure 4Plots of the predictive correlation for each of the 100 cross-validations and 3 locations. When the best model is BSN, this is represented by a filled circle, and when the best model is BRR, this is represented by an open circle. The number of times that Pearson’s correlation in BSN is better than Pearson’s correlation in BRR is also shown in the plots.
Figure 5Plots of the mean squared error (MSE) in the testing set for each of the 100 cross-validations and 3 locations. When the MSE in BSN is smaller than the MSE in BRR, this is represented by an open circle and when the MSE in BRR is bigger than the MSE in BSN, this is represented by a filled circle. The number of times that the MSE in BRR is bigger than the MSE in BSN is also shown in the plots.
Average of Pearson’s correlation and mean squared error (MSE) between observed and predicted values in the testing set. The predictions were obtained after fitting the BSN and BRR models. The average is across the 100 random partitions with 80% of observations in the training set and 20% in the testing set. Standard deviations are given in parentheses
| Site | Parameter | ||
|---|---|---|---|
| Model | Pearson’s correlation | MSE | |
| Kakamega | BSN | 0.2836 (0.1157) | 0.7017 (0.1130) |
| BRR | 0.2609 (0.1163) | 0.7187 (0.1212) | |
| San Pedro Lagunillas | BSN | 0.5489 (0.0895) | 0.7752 (0.1031) |
| BRR | 0.5450 (0.0887) | 0.7804 (0.1064) | |
| Santa Catalina | BSN | 0.4871 (0.1238) | 0.7790 (0.1302) |
| BRR | 0.4804 (0.1220) | 0.7685 (0.1338) | |