| Literature DB >> 30021830 |
Bartolode Jesús Villar-Hernández1, Sergio Pérez-Elizalde2, José Crossa2,3, Paulino Pérez-Rodríguez1, Fernando H Toledo3, Juan Burgueño3.
Abstract
Plant and animal breeders are interested in selecting the best individuals from a candidate set for the next breeding cycle. In this paper, we propose a formal method under the Bayesian decision theory framework to tackle the selection problem based on genomic selection (GS) in single- and multi-trait settings. We proposed and tested three univariate loss functions (Kullback-Leibler, KL; Continuous Ranked Probability Score, CRPS; Linear-Linear loss, LinLin) and their corresponding multivariate generalizations (Kullback-Leibler, KL; Energy Score, EnergyS; and the Multivariate Asymmetric Loss Function, MALF). We derived and expressed all the loss functions in terms of heritability and tested them on a real wheat dataset for one cycle of selection and in a simulated selection program. The performance of each univariate loss function was compared with the standard method of selection (Std) that does not use loss functions. We compared the performance in terms of the selection response and the decrease in the population's genetic variance during recurrent breeding cycles. Results suggest that it is possible to obtain better performance in a long-term breeding program using the single-trait scheme by selecting 30% of the best individuals in each cycle but not by selecting 10% of the best individuals. For the multi-trait approach, results show that the population mean for all traits under consideration had positive gains, even though two of the traits were negatively correlated. The corresponding population variances were not statistically different from the different loss function during the 10th selection cycle. Using the loss function should be a useful criterion when selecting the candidates for selection for the next breeding cycle.Entities:
Keywords: Bayesian Decision Theory; GenPred; Genomic Selection; Loss Function; Shared Data Resources; Simulation Scenarios
Mesh:
Year: 2018 PMID: 30021830 PMCID: PMC6118314 DOI: 10.1534/g3.118.200430
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1A) Classic idea of selection by truncation at . Loss functions need to be minimized at in order to favor lines with high response to selection. Losses were standardized by subtracting the minimum value for representation. Loss functions Kullback-Leibler (KL) and Continuous Ranked Probability Score (CRPS) are symmetric on both sides of target , while LinLin loss is asymmetric. B) The solid line in black represents the base population, the solid gray line corresponds to the truncated distribution after censoring at representing the breeder’s preferences. Dashed lines are theoretical distributions of three possible candidates. The candidates’ distribution with mean close to the theoretical (the greater ) and variance similar to that of the parent distribution has the minimum loss.
Figure 2Univariate real data. Boxplots of estimated breeding values for a real wheat dataset (with four traits) of the top 10% of selected lines with three univariate loss functions Kullback-Leibler (KL), Continuous Ranked Probability Score (CRPS) and Linear-Linear loss (LinLin), and breeding values of the lines selected under the standard method (Std) for A) Grain Yield (GY), B) thousand-kernel weight (TKW), C) Zn concentration in grain (GZnC), and D) Fe concentration in grain (GFeC). Values in parentheses are the lines that the loss functions selected but the Std did not.
Figure 3Multivariate real data. Boxplots of estimated breeding values for a real spring wheat dataset of the top 10% of selected lines for four traits according to the multivariate loss functions Kullback-Leibler (KL), Energy Score (EnergyS) and Multivariate Asymmetric Loss Function (MALF) for A) grain yield (GY), B) thousand-kernel weight (TKW), C) Zn concentration in grain (GZnC), and D) Fe concentration in grain (GFeC). Brown dots indicate the mean of all lines.
Figure 4Results of the univariate simulation study. Standardized selection response for breeding cycles 1 to 5 are illustrated in A), while cycles 5 to 10 are in B). In each selection cycle, the top 10% of lines with minimum posterior expected losses were selected using the Kullback-Leibler loss function (KL), the Continuous Ranked Probability Score (CRPS), the Linear-Linear loss function (LinLin), and the standard method (Std). Selected lines were crossed in each cycle to recover the population size for upcoming selection cycles. and represent the population mean and the selection response, respectively, in cycle ; and are the population mean and the population standard deviation, respectively, in cycle 1. The black vertical lines indicate the standard error of under 20 replications of the simulation study.
Figure 5Results of the univariate simulation study. Scaled population variance for breeding cycles 1 to 5 are illustrated in A), while cycles 5 to 10 are in B). In each selection cycle, the top 10% were selected using the Kullback-Leibler (KL), the Continuous Ranked Probability Score (CRPS), and the Linear-Linear (LinLin) loss functions, and lines selected under the standard method (Std). Selected lines were crossed at each cycle to recover the population size for upcoming selection cycles. and are the population variance in cycle and cycle 1, respectively. The black vertical lines indicate the standard error of under 20 replications of the simulation study.
Figure 6Results of the univariate simulation study at the 10th selection cycle for 10% of the selected lines. A) boxplots of the standardized selection response; B) boxplots of the scale population variance using the Kullback-Leibler (KL), the Continuous Ranked Probability Score (CRPS), the Linear-Linear (LinLin) loss functions, and lines selected under the standard method (Std). The boxplots illustrate the mean (white dots) and median (black middle line) of 20 replications of the simulation study. and were defined in Figs. 4 and 5. Sub-indices refer to the 10th selection cycle.
Simulation univariate study. Student t-test of mean and variance differences between the lines selected by the univariate Kullback-Leibler (KL), Continuous Ranked Probability Score (CRPS) and Linear-Linear (LinLin) loss functions vs. lines selected under the standard selection method (Std), after 20 replications of the simulated breeding program. The selected proportions were the top 10% and top 30% of the candidates, and the means and variances were compared at the 10th selection cycle
| contrast | a) mean of top 10% | b) mean of top 30% | ||||
|---|---|---|---|---|---|---|
| df | p-value | df | p-value | |||
| CRPS | −0.85 | 36 | 0.4 | 2.9 | 38 | 0.006* |
| KL | −0.11 | 38 | 0.914 | 1.7 | 37 | 0.088 |
| Lin | −0.73 | 38 | 0.469 | 3.1 | 34 | 0.004* |
Figure 7Results of the multivariate simulation study. A) Standardized population mean for the breeding cycles when heritability for all traits was fixed at 0.3, and B) standardized population mean when heritability was 0.6 for all traits. In each selection cycle, the top 10% of candidates were selected using the multivariate loss functions: Kullback-Leibler (KL), Energy Score (EnergyS) and Multivariate Asymmetric Loss Function (MALF) to recover the population size for the upcoming breeding cycles. and correspond to the population mean and the selection response, respectively, in cycle ; and are the population mean and the population standard deviation, respectively, in cycle 1. The black vertical lines indicate the standard error of under 20 replications of the simulation study.
Simulation multivariate study. Means of percentage differences of population means a) and for population variance b) in the 10th breeding cycle with respect to the first cycle for trait 1 (T1), trait 2 (T2) and trait 3 (T3) for lines selected under three multivariate loss functions Kullback-Leibler (KL), Energy Score (EnergyS) and Multivariate Asymmetric Loss Function (MALF) (standard errors are in parentheses). Heritability of 0.3 and 0.6 for all traits
| a) | b) | ||||||
|---|---|---|---|---|---|---|---|
| Loss | T1 | T2 | T3 | T1 | T2 | T3 | |
| KL | 1.583 (0.343) | 6.441 (0.387) | 5.762 (0.120) | -47.886 (1.678) | 48.128 (1.506) | –51.181 (1.370) | |
| EnergyS | 0.462 (0.332) | 8.224 (0.477) | 6.140 (0.173) | -47.533 (1.146) | -47.584 (1.372) | -50.287 (1.327) | |
| MALF | 1.211 (0.264) | 6.493 (0.344) | 5.499 (0.139) | -47.437 (1.221) | -47.928 (1.496) | -47.820 (1.352) | |
| KL | 3.318 (0.327) | 6.206 (0.403) | 6.935 (0.118) | -44.532 (1.672) | -43.975 (1.341) | -49.588 (1.509) | |
| EnergyS | 0.506 (0.375) | 9.819 (0.500) | 7.291 (0.139) | -45.256 (1.550) | -46.618 (1.456) | -48.025 (1.345) | |
| MALF | 2.347 (0.232) | 5.721 (0.239) | 5.842 (0.121) | -44.977 (0.811) | -45.451 (0.827) | -44.910 (1.271) | |
Figure 8Results of the multivariate simulation study. A) Scale population variance for the breeding cycles when heritability for all traits was fixed at 0.3, and B) scale population variance when heritability was 0.6 for all traits. In each selection cycle, the top 10% of candidates were selected using the multivariate loss functions: Kullback-Leibler (KL), Energy Score (EnergyS) and Multivariate Asymmetric Loss Function (MALF) to recover the population size for the upcoming breeding cycles. and are the population variances for cycle and cycle 1, respectively. The black vertical lines indicate the standard error of under 20 replications of the simulation study.
| Selection through Standard Method (Std) |
|---|
| selStd <- function(Xb, y_c, Nsel, MU, sigma) { |
| yHat <- apply(Xb, 1, mean, na.rm = TRUE) |
| selected <- order(yHat, decreasing = TRUE)[1:Nsel] |
| return(selected) |
| } |
| Selection through Multivariate Kullback-Leibler (KL) |
|---|
| selKL <- function(Xb, MU, K, y_c, Nsel) { |
| Xb <- as.matrix(Xb); MU <- as.matrix(MU); K <- as.matrix(K); |
| n.traits <- ncol(MU) |
| n.lines <- ncol(Xb)/n.traits |
| n.mcmc <- nrow(Xb) |
| losses <- matrix(0, nrow = n.lines, ncol = n.mcmc) |
| Kall <- alply(K, 1, function(V) as.matrix(nearPD(xpnd(unlist(V)))$mat)) |
| Kallinv <- llply(Kall, solve) |
| Xb <- apply(Xb, 1, split, f = rep(1:n.traits, each = n.lines)) |
| for(i in 1:n.mcmc) { |
| mu1 <- as.vector(MU[i,]) |
| mu2 <- do.call(cbind, Xb[[i]]) |
| K <- Kall[[i]] |
| Kinv <- Kallinv[[i]] |
| muS <- as.vector(mtmvnorm(mean = mu1, sigma = K, lower = y_c, |
| upper = rep(Inf, length(y_c)), |
| doComputeVariance = FALSE)$tmean) |
| Z = pmvnorm(lower = y_c, upper = Inf, mean = mu1, sigma = K) |
| # posterior predictive distribution at each iteration. |
| yppdf <- as.matrix(t(apply(mu2, 1, function(ypred) { |
| rmvnorm(1, mean = ypred, sigma = K)}))) |
| S <- muS-mu1 # selection differential |
| UKU <- as.numeric(t(S)%*%Kinv%*%S) |
| muSmu2 <- sweep(yppdf,2,muS,’-’) # muS - mu2 |
| losses[,i] <- as.vector(apply(muSmu2, 1, function(x) { |
| 0.5*(t(x)%*%Kinv%*%x-UKU)-log(Z) })) |
| } |
| e.loss <- apply(losses, 1, mean, na.rm = TRUE) |
| selected <- order(e.loss, decreasing = FALSE)[1:Nsel] |
| return(selected) |
| } |