| Literature DB >> 23572121 |
J Crossa1, P Pérez, J Hickey, J Burgueño, L Ornella, J Cerón-Rojas, X Zhang, S Dreisigacker, R Babu, Y Li, D Bonnett, K Mathews.
Abstract
Genomic selection (GS) has been implemented in animal and plant species, and is regarded as a useful tool for accelerating genetic gains. Varying levels of genomic prediction accuracy have been obtained in plants, depending on the prediction problem assessed and on several other factors, such as trait heritability, the relationship between the individuals to be predicted and those used to train the models for prediction, number of markers, sample size and genotype × environment interaction (GE). The main objective of this article is to describe the results of genomic prediction in International Maize and Wheat Improvement Center's (CIMMYT's) maize and wheat breeding programs, from the initial assessment of the predictive ability of different models using pedigree and marker information to the present, when methods for implementing GS in practical global maize and wheat breeding programs are being studied and investigated. Results show that pedigree (population structure) accounts for a sizeable proportion of the prediction accuracy when a global population is the prediction problem to be assessed. However, when the prediction uses unrelated populations to train the prediction equations, prediction accuracy becomes negligible. When genomic prediction includes modeling GE, an increase in prediction accuracy can be achieved by borrowing information from correlated environments. Several questions on how to incorporate GS into CIMMYT's maize and wheat programs remain unanswered and subject to further investigation, for example, prediction within and between related bi-parental crosses. Further research on the quantification of breeding value components for GS in plant breeding populations is required.Entities:
Mesh:
Year: 2013 PMID: 23572121 PMCID: PMC3860161 DOI: 10.1038/hdy.2013.16
Source DB: PubMed Journal: Heredity (Edinb) ISSN: 0018-067X Impact factor: 3.821
Mean correlations between predicted and observed values of the three models, BL, RKHS regression and RBFNN for four traits (female flowering, FFL; male flowering, MFL; grain yield, GY: anthesis-silking interval, ASI) measured in the following environments: WW, SS, HIGH and LOW in 284 maize inbred lines genotyped with 55 000 and 1148 SNPs
| FFL-WW (0.89) | 0.814 | 0.834 | |
| FFL-SS (0.81) | 0.754 | 0.757 | |
| MFL-WW (0.88) | 0.817 | 0.832 | |
| MFL-SS (0.83) | 0.776 | 0.780 | |
| ASI-WW (0.79) | 0.582 | 0.586 | |
| ASI-SS (0.77) | 0.612 | 0.605 | |
| GY-WW (0.49) | 0.548 | 0.529 | |
| GY-SS (0.38) | 0.326 | 0.288 | |
| GY-HI (0.50) | 0.633 | 0.653 | |
| GY-LOW (0.40) | 0.402 | 0.393 | |
| FFL-WW | 0.588 | — | |
| FFL-SS | 0.648 | — | |
| MFL-WW | 0.607 | — | |
| MFL-SS | 0.674 | — | |
| ASI-WW | 0.513 | — | |
| ASI-SS | 0.517 | — | |
| GY-WW | 0.514 | — | |
| GY-SS | 0.415 | — | |
| GY-HI | — | — | — |
| GY-LOW | — | — | — |
Abbreviation: BL, Bayesian LASSO; HI, optimum; LOW, stress; RBFNN, radial basis function neural network; RKHS, reproducing kernel Hilbert space; SNP, single-nucleotide polymorphism; SS, severe drought stress; WW, well-watered.
The best prediction model for each trait-environment combination in the data set is underlined.
Extracted from Gonzalez-Camacho .
Extracted from Crossa .
Mean correlations between observed and predicted values for GY using GBLUP; P; Bayesian LASSO with markers, and with markers + pedigree (MBL, PMBL, respectively); RKHS regression with markers, and with marker + pedigree (MRKHS and PMRKHS, respectively); MRBFNN, and MBRNN models in two data sets, one with 306 wheat lines genotyped with 1717 DArTs markers and evaluated in seven environments, and the other with 599 wheat lines genotyped with 1279 DArTs and evaluated in four environments
| 1 | 0.43 | 0.48 | 0.50 | |||
| 2 | 0.41 | 0.48 | 0.43 | 0.43 | ||
| GY | 3 | 0.29 | 0.20 | 0.37 | 0.32 | |
| 4 | 0.46 | 0.45 | 0.53 | 0.49 | ||
| 5 | 0.56 | 0.59 | 0.64 | 0.63 | ||
| 6 | 0.67 | 0.70 | 0.71 | 0.69 | ||
| 7 | 0.50 | 0.46 | 0.53 | 0.50 | ||
| 1 | 0.45 | 0.52 | 0.60 | 0.54 | ||
| 2 | 0.42 | 0.49 | 0.49 | 0.49 | ||
| GY | 3 | 0.42 | 0.40 | 0.40 | ||
| 4 | 0.45 | 0.46 | 0.50 | 0.46 | ||
Abbreviations: BRNN, Bayesian regularized neural networks; GBLUP, genomic best linear unbiased predictor; GY, grain yield; MBRNN, Bayesian regularized neural networks with markers; MRBFNN, radial basis function neural networks with markers; P, pedigree; RBFNN, radial basis function neural networks; RKHS, reproducing kernel Hilbert space.
The best prediction model for each environment in the data set is underlined.
Extracted from Pérez .
Extracted from Crossa .
Figure 1Heat map of the G matrix for the data set with 306 wheat lines genotyped with 1717 DArTs markers.
Figure 2Heat map of the G matrix for the data set with 599 wheat lines genotyped with 1279 DArTs markers.
Figure 3Heat map of the genomic relationship matrix G of five wheat populations: PBW343/Pavon76, PBW343/Juchi, PBW343/Kingbird, PBW343/K-Nyangumi and PBW343/Muu. The numbers indicate the average values of the corresponding elements of G within and between populations (from Ornella ).
Pair-wise correlations between observed and predicted stem rust values of two models, Bayesian LASSO and the GBLUP, trained in one population and evaluated in the other population for five populations (adapted from Ornella )
| Testing | ||||||
| PBW343/Juchi | — | 0.48 | 0.14 | 0.28 | 0.31 | Bayes LASSO |
| PBW343/Kingbird | 0.53 | — | 0.29 | 0.25 | 0.54 | |
| PBW343/K-Nyangumi | 0.14 | 0.30 | — | 0.28 | 0.28 | |
| PBW343/Muu | 0.18 | 0.30 | 0.33 | — | 0.29 | |
| PBW343/Pavon76 | 0.37 | 0.51 | 0.22 | 0.33 | — | |
| GBLUP | ||||||
There are five related populations: PBW343/Juchi, PBW343/Kingbird, PBW343/K-Nyangumi, PBW343/Muu and PBW343/Pavon76.
The triangle on the upper-right shows the prediction ability (correlation) of Bayes LASSO, with the rows indicating the training population (that is, PBW343/Juchi) and the columns the testing population (that is, PBW343/Kingbird, 0.48); the triangle on the lower-left gives the prediction ability of GBLUP, with the columns indicating the training population (that is, PBW343/Juchi) and the rows the testing population (that is, PBW343/Kingbird, 0.53).
Correlations between predicted and observed stem rust values for five wheat populations, when four of them, with different numbers of individuals in the training set, predict each of the others in the testing set using the GBLUP and the Bayesian LASSO (BL) models
| PBW343/Muu | 534 | 148 | 0.59 | 0.60 |
| PBW343/K-Nyangumi | 506 | 176 | 0.52 | 0.53 |
| PBW343/Kingbird | 592 | 90 | 0.79 | 0.82 |
| PBW343/Juchi | 590 | 92 | 0.41 | 0.39 |
| PBW343/F6Pavon | 506 | 176 | 0.59 | 0.62 |
| All populations | 612 | 70 | 0.62 | 0.64 |
Mean correlations when considering all five populations combined with a prediction design that considers 50 random partitions with a split of 9:1 for the training set: testing set proportion.
Figure 4Mean correlations (across four environments) between predicted and observed grain yield values derived from models using only pedigree, only genomics and pedigree+genomic for two cross-validation schemes (CV1 and CV2) (adapted from Burgueño ). Cross-validation CV1 predicts genotypes that have never been evaluated in any environment, and cross-validation CV2 predicts genotypes that were evaluated in some environments but not in other environments.
Figure 5Correlations between predicted and observed performance in environment 1 (E1) and average of environments 2, 3 and 4 (E2 3 4) obtained in CV2 using only pedigree (a), only genomics (b) or using pedigree+genomics (c)-based models with different specifications for the residual and genetic covariance matrices (FA=GE modeled using the factor analytic model; no FA=GE not modeled) (adapted from Burgueño ).
Mean correlations between the predicted and observed values of genotypes for individual environments (E1, E2, E3 and E4), and for three environments combined (E2, E3 and E4) using five different factor analytic models (FA) for a cross-validation scheme (CV2), each with 10-fold cross-validation
| E1 | 0.460 | 0.552 | 0.469 | 0.512 | |
| E2 | 0.623 | 0.609 | 0.612 | 0.637 | |
| E3 | 0.633 | 0.581 | 0.612 | 0.603 | |
| E4 | 0.533 | 0.513 | 0.448 | 0.465 | |
| E2, E3, E4 | 0.596 | 0.571 | 0.566 | 0.568 |
The best predictive model for each environment or environment combination is underlined. Data used were extracted from Crossa and Burgueño .
Factor analytic-pedigree for G factor analytic-genomic for G factor analytic-pedigree (G0P) and factor analytic-genomic (G0M); factor analytic-pedigree additive (G0P) and factor analytic-pedigree additive × additive (G0PP); factor analytic-genomic additive (G0M) and factor analytic-genomic additive × additive (G0MM).
Mean correlations between the predicted grain yield values for three models (BL, RKHS regression and GBLUP) when the numbers of individuals in the training sets randomly taken from the entire population (50 different times) are 30, 40, 50, 70 and 90 for different numbers of SNPs
| BL_30 | 0.1348 | 0.1009 | 0.0640 |
| RKHS-KA_30 | 0.1520 | 0.1144 | 0.0703 |
| GBLUP_30 | 0.1643 | 0.1173 | 0.0672 |
| BL_40 | 0.1539 | 0.1286 | 0.0899 |
| RKHS-KA_40 | 0.1685 | 0.1380 | 0.0925 |
| GBLUP_40 | 0.1789 | 0.1441 | 0.0906 |
| BL_50 | 0.2093 | 0.1598 | 0.0918 |
| RKHS-KA_50 | 0.2004 | 0.1555 | 0.0944 |
| GBLUP_50 | 0.2165 | 0.1612 | — |
| BL_70 | 0.2236 | 0.1814 | 0.1152 |
| RKHS-KA_70 | 0.2153 | 0.1756 | 0.1147 |
| GBLUP_70 | 0.2338 | 0.1839 | — |
| BL_90 | 0.2484 | 0.2000 | 0.1251 |
| RKHS-KA_90 | 0.2386 | 0.1952 | |
| GBLUP_90 | — | ||
| BL_30 | 0.2994 | 0.2765 | 0.2165 |
| RKHS-KA_30 | 0.2924 | 0.2725 | 0.2136 |
| GBLUP_30 | 0.2982 | 0.2770 | 0.2161 |
| BL_40 | 0.3367 | 0.3156 | 0.2447 |
| RKHS-KA_40 | 0.3371 | 0.3172 | 0.2471 |
| GBLUP_40 | 0.3378 | 0.3146 | — |
| BL_50 | 0.3471 | 0.3264 | 0.2585 |
| RKHS-KA_50 | 0.3486 | 0.3271 | 0.2621 |
| GBLUP_50 | 0.3467 | — | — |
| BL_70 | 0.3717 | 0.3549 | 0.2818 |
| RKHS-KA_70 | 0.3770 | 0.3605 | 0.2866 |
| GBLUP_70 | 0.3714 | — | — |
| BL_90 | 0.3919 | 0.3725 | 0.2998 |
| RKHS-KA_90 | |||
| GBLUP_90 | 0.3903 | — | — |
Abbreviations: BL, Bayesian LASSO; RKHS, reproducing kernel Hilbert space; SNP, single-nucleotide polymorphism.
For each bi-parental population and for each number of markers (columns) the best predictive model is underlined.