| Literature DB >> 26976443 |
Helena Oakey1, Brian Cullis2, Robin Thompson3, Jordi Comadran4, Claire Halpin5, Robbie Waugh6.
Abstract
Genomic selection in crop breeding introduces modeling challenges not found in animal studies. These include the need to accommodate replicate plants for each line, consider spatial variation in field trials, address line by environment interactions, and capture nonadditive effects. Here, we propose a flexible single-stage genomic selection approach that resolves these issues. Our linear mixed model incorporates spatial variation through environment-specific terms, and also randomization-based design terms. It considers marker, and marker by environment interactions using ridge regression best linear unbiased prediction to extend genomic selection to multiple environments. Since the approach uses the raw data from line replicates, the line genetic variation is partitioned into marker and nonmarker residual genetic variation (i.e., additive and nonadditive effects). This results in a more precise estimate of marker genetic effects. Using barley height data from trials, in 2 different years, of up to 477 cultivars, we demonstrate that our new genomic selection model improves predictions compared to current models. Analyzing single trials revealed improvements in predictive ability of up to 5.7%. For the multiple environment trial (MET) model, combining both year trials improved predictive ability up to 11.4% compared to a single environment analysis. Benefits were significant even when fewer markers were used. Compared to a single-year standard model run with 3490 markers, our partitioned MET model achieved the same predictive ability using between 500 and 1000 markers depending on the trial. Our approach can be used to increase accuracy and confidence in the selection of the best lines for breeding and/or, to reduce costs by using fewer markers.Entities:
Keywords: GEBV; GenPred; barley; genomic selection; multi-environment trial; random ridge regression; shared data resource
Mesh:
Year: 2016 PMID: 26976443 PMCID: PMC4856083 DOI: 10.1534/g3.116.027524
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of the special cases of the general form of
| Model | Description | STY or MET | Reference | |||||
|---|---|---|---|---|---|---|---|---|
| Single trial | Diagonal ( | 1 | 0 | STY | ||||
| DIAG | Diagonal | 0 | STY | |||||
| US | Unstructured | 0 | MET | |||||
| CS | Compound symmetry | 1 | MET | |||||
| CS+DIAG | CS+DIAG | 1 | MET | |||||
| FAM | Factor analytic (main effect) | MET |
STY, single trial year (note the DIAG model is equivalent to analyzing each trial year separately); MET, multi-environment trial.
A similar table could be constructed for with , and .
⊕ represents a kronecker sum, so that results in a diagonal matrix with elements for the specific variance of trial t.
is a matrix of k factor loadings at each of the s trials.
For FAMk let where , var and .
Number of lines with marker information in the groups used in the cross-validation
| Groups for Cross-Validation | Number of Lines | Total Number of Lines in Each Group (Total Number of Lines Across Groups) | ||||
|---|---|---|---|---|---|---|
| Common | 2010 Only | 2011 Only | 2010 | 2011 | MET | |
| 1–6 | 46 | 0 | 2 | 46 (276) | 48 (288) | 48 (288) |
| 7–9 | 46 | 0 | 1 | 46 (138) | 47 (141) | 47 (141) |
| 10 | 45 | 1 | 2 | 46 (46) | 47 (47) | 48 (48) |
| Total | 459 | 1 | 17 | 460 | 476 | 477 |
These are the number of lines with marker information.
The common lines groups are kept the same across all analyses.
The multi-environment trial (MET) analyses contain information from both trial years.
Summary of validation and training groups in three cross-validations
| Cross-Validation | Number of Groups in VALIDATION Set | Number | Number of Groups in TRAINING Set | Total Number | ||
|---|---|---|---|---|---|---|
| 2010 | 2011 | MET (both years) | ||||
| CV10 | 1 | 46 (10) | 47 (9.7) – 48 (10.1) | 47 (9.9) – 48 (10.1) | 9 | 10 |
| CV20 | 2 | 92 (20) | 94 (19.7) – 96 (20.2) | 94 (19.7) – 96 (20.1) | 8 | 45 |
| CV40 | 4 | 184 (40) | 188 (39.5) – 192 (40.3) | 188 (39.4) – 192 (40.3) | 6 | 210 |
The number will be a range for 2011 and the MET as the number of lines in each group (Table 2) is variable.
This is the number of iterations so all combinations of groups in the validation set can be investigated.
Figure 1Correlation of mean heights of lines in the 2010 and 2011 trials. The datapoints represent only the lines with marker data that were grown in both years.
Summary of the models fitted to the full data set
| Model | Form | Form of | STY or MET | Log-Likelihood | AIC | |
|---|---|---|---|---|---|---|
| Phenotypic | EDIAG | DIAG | STY | −11,074.6 | 22,163.1 | |
| ECS | CS | MET | −10,877.0 | 21,768.1 | ||
| ECS+DIAG | CS+DIAG | MET | −10,872.5 | 21,760.9 | ||
| EFAM | FAM | MET | −10,872.4 | 21,760.8 | ||
| Standard | SDIAG | DIAG | STY | −10,924.9 | 21,863.8 | |
| SCS | CS | MET | −10,794.0 | 21,602.1 | ||
| SCS+DIAG | CS+DIAG | MET | −10,790.6 | 21,597.2 | ||
| SFAM | FAM | MET | −10,787.8 | 21,591.6 | ||
| Partitioned | PDIAG | DIAG | DIAG | STY | −10,876.4 | 21,770.8 |
| PCS | CS | CS | MET | −10,747.2 | 21,512.5 | |
| PCS+DIAG | CS+DIAG | CS+DIAG | MET | −10,744.6 | 21,511.2 | |
| PFAM | FAM | FAM | MET | −10,744.2 | 21,510.4 |
STY, single trial year; MET, multi-environment trial; AIC, Akaike information criteria.
All models derive from Equation 1 but are special cases of (Equation 2).
Details of forms of m are given in Table 1.
Phenotypic model has = e.
DIAG implies the covariance between the two trials is assumed to be zero, and is equivalent to fitting the two trials separately.
CS is the compound symmetry model.
CS+DIAG is the model described by Cullis .
FAM1 is the factor analytic model (Smith ), with main effect with k the number of factors equal to 1.
US is the unstructured model (US), for two trials this model is equivalent to the FAM1 model.
Standard RR-BLUP model has m.
Partitioned RR-BLUP model has .
REML estimates of variance components of the models fitted (Table 4)
| Form | Form of | STY or MET | var(um) | var(ue) | Residual | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Year 2010 | Year 2011 | Covar | Year 2010 | Year 2011 | Covar | Year 2010 | Year 2011 | ||||
| Phenotypic | EDIAG | DIAG | STY | 111.65 | 86.35 | 0 | 39.23 | 95.41 | ||||
| ECS | CS | MET | 103.22 | 103.22 | 92.77 | 39.37 | 93.73 | |||||
| ECS+DIAG | CS+DIAG | MET | 109.05 | 89.45 | 89.45 | 39.23 | 94.95 | |||||
| EFAM | FAM | MET | 109.99 | 88.18 | 89.59 | 39.23 | 95.29 | |||||
| Standard | SDIAG | DIAG | STY | 0.0703 | 0.0365 | 0 | 40.72 | 101.26 | ||||
| SCS | CS | MET | 0.0659 | 0.0659 | 0.0626 | 41.23 | 97.99 | |||||
| SCS+DIAG | CS+DIAG | MET | 0.0670 | 0.0595 | 0.0595 | 41.01 | 98.09 | |||||
| SFAM | FAM | MET | 0.0718 | 0.0519 | 0.0584 | 40.76 | 99.83 | |||||
| Partitioned | PDIAG | DIAG | DIAG | STY | 0.0265 | 0.0194 | 0 | 27.67 | 23.43 | 0 | 39.24 | 95.73 |
| PCS | CS | CS | MET | 0.0240 | 0.0240 | 0.0227 | 27.04 | 27.04 | 19.97 | 39.35 | 94.11 | |
| PCS+DIAG | CS+DIAG | CS+DIAG | MET | 0.0246 | 0.0218 | 0.0218 | 28.84 | 22.03 | 19.54 | 39.23 | 95.51 | |
| PFAM | FAM | FAM | MET | 0.0260 | 0.0205 | 0.0221 | 27.86 | 23.46 | 19.35 | 39.23 | 95.48 | |
STY, single trial year; MET, multi-environment trial, Covar, covariance between trial year 2010 and trial year 2011.
All models derive from Equation 1 but are special cases of g (Equation 2).
Details of forms of m are given in Table 1 and Table 4.
For two trials the US model is equivalent to the FAM1 model.
Average R-squared (SD) of partitioned verses standard RR-BLUP model for different cross-validation (Table 3), models (Table 4) and effects used to generate the GV and GEBV
| CV40 | CV20 | CV10 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Comparison | Phenotypic Model | Line Effects to Generate GV | Standard Model | Partitioned Model | Marker | Standard Model | Partitioned Model | Standard Model | Partitioned Model | Standard Model | Partitioned Model |
| 1 | EDIAG | 2010 | SDIAG | PDIAG | 0.366(0.047) | 0.406(0.048) | 0.390(0.055) | 0.438(0.064) | 0.404(0.085) | 0.461(0.093) | |
| 2 | EDIAG | 2010 | SDIAG | PDIAG | 0.333(0.055) | 0.359(0.062) | 0.362(0.091) | 0.383(0.095) | 0.387(0.139) | 0.407(0.137) | |
| 3 | EDIAG | 2011 | SDIAG | PDIAG | 0.280(0.043) | 0.298(0.046) | 0.304(0.081) | 0.318(0.081) | 0.323(0.144) | 0.334(0.138) | |
| 4 | EDIAG | 2011 | SDIAG | PDIAG | 0.252(0.050) | 0.288(0.043) | 0.267(0.070) | 0.307(0.060) | 0.277(0.122) | 0.319(0.097) | |
| 5 | ECS+DIAG | Total 2010 | SCS+DIAG | PCS+DIAG | 0.368(0.044) | 0.410(0.046) | 0.392(0.049) | 0.439(0.061) | 0.406(0.078) | 0.462(0.088) | |
| 6 | ECS+DIAG | Total 2011 | SCS+DIAG | PCS+DIAG | 0.360(0.041) | 0.401(0.043) | 0.382(0.053) | 0.428(0.062) | 0.395(0.089) | 0.448(0.098) | |
| 7 | EFAM | Total 2010 | SFAM | PFAM | 0.320(0.044) | 0.376(0.051) | 0.335(0.059) | 0.403(0.071) | 0.344(0.093) | 0.423(0.097) | |
| 8 | EFAM | Total 2011 | SFAM | PFAM | 0.365(0.043) | 0.402(0.048) | 0.386(0.052) | 0.430(0.056) | 0.396(0.085) | 0.448(0.083) | |
| 9 | EDIAG | 2010 | SCS+DIAG | PCS+DIAG | 0.350(0.048) | 0.392(0.056) | 0.371(0.068) | 0.419(0.082) | 0.387(0.101) | 0.442(0.115) | |
| 10 | EDIAG | 2011 | SCS+DIAG | PCS+DIAG | 0.271(0.040) | 0.302(0.040) | 0.289(0.066) | 0.322(0.068) | 0.300(0.117) | 0.336(0.115) | |
| 11 | EDIAG | 2010 | SFAM | PFAM | 0.323(0.059) | 0.377(0.063) | 0.336(0.087) | 0.404(0.092) | 0.349(0.122) | 0.430(0.125) | |
| 12 | EDIAG | 2011 | SFAM | PFAM | 0.268(0.039) | 0.299(0.040) | 0.287(0.070) | 0.323(0.070) | 0.302(0.128) | 0.338(0.118) | |
GEBV, genomic estimated breeding value; GV, genotypic value SD=standard deviation.
The R-squared value is from a linear model for the validation set in which the GEBV is the covariate and the GV the response, the R-squared value shown is the average of the R-squared value over the different iterations (Table 2). Large R-squared values indicate better predictive ability.
GV are calculated using a phenotypic model with all of the lines.
Marker effects from DIAG form are in bold with year of trial shown, and are equivalent to results from a single trial year analysis, marker effects for the MET analyses are in bold and italic, three marker effects are possible: main, interaction 2010, interaction 2011; with the sum of the (main + interaction) marker effects being equivalent of a total marker effect for a particular year.
Average mean square error (SD) of partitioned verses standard RR-BLUP model for cross-validation (Table 3), models (Table 4) and effects used to generate the GV and GEBV
| Comparison | Phenotypic Model | Line Effects to Generate GV | Standard Model | Partitioned Model | Marker Effects to Generate GEBV | Mean Square Error (SD) | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| CV40 | CV20 | CV10 | |||||||||
| Standard Model | Partitioned Model | Standard Model | Partitioned Model | Standard Model | Partitioned Model | ||||||
| 1 | EDIAG | 2010 | SDIAG | PDIAG | 8.13(0.74) | 7.86(0.72) | 7.96(1.09) | 7.64(1.09) | 7.85(1.59) | 7.47(1.57) | |
| 2 | EDIAG | 2010 | SDIAG | PDIAG | 8.27(0.55) | 8.10(0.54) | 8.04(0.77) | 7.91(0.78) | 7.82(1.05) | 7.70(1.08) | |
| 3 | EDIAG | 2011 | SDIAG | PDIAG | 6.85(0.40) | 6.77(0.39) | 6.72(0.60) | 6.65(0.59) | 6.57(0.90) | 6.52(0.89) | |
| 4 | EDIAG | 2011 | SDIAG | PDIAG | 7.06(0.51) | 6.89(0.49) | 6.98(0.73) | 6.80(0.71) | 6.92(1.09) | 6.72(1.01) | |
| 5 | ECS+DIAG | Total 2010 | SCS+DIAG | PCS+DIAG | 8.01(0.75) | 7.74(0.70) | 7.85(1.09) | 7.53(1.04) | 7.73(1.55) | 7.35(1.49) | |
| 6 | ECS+DIAG | Total 2011 | SCS+DIAG | PCS+DIAG | 7.02(0.57) | 6.79(0.55) | 6.88(0.77) | 6.61(0.75) | 6.75(1.02) | 6.44(1.01) | |
| 7 | EFAM | Total 2010 | SFAM | PFAM | 9.98(0.87) | 9.55(0.83) | 9.87(1.39) | 9.34(1.34) | 9.80(2.07) | 9.19(1.96) | |
| 8 | EFAM | Total 2011 | SFAM | PFAM | 7.02(0.62) | 6.81(0.61) | 6.88(0.87) | 6.63(0.83) | 6.79(1.22) | 6.49(1.13) | |
| 9 | EDIAG | 2010 | SCS+DIAG | PCS+DIAG | 8.06(0.63) | 7.79(0.61) | 7.89(0.87) | 7.58(0.88) | 7.75(1.17) | 7.38(1.18) | |
| 10 | EDIAG | 2011 | SCS+DIAG | PCS+DIAG | 6.87(0.42) | 6.73(0.41) | 6.78(0.59) | 6.62(0.60) | 6.69(0.85) | 6.52(0.86) | |
| 11 | EDIAG | 2010 | SFAM | PFAM | 8.21(0.55) | 7.88(0.57) | 8.09(0.77) | 7.66(0.85) | 7.94(1.00) | 7.45(1.15) | |
| 12 | EDIAG | 2011 | SFAM | PFAM | 6.89(0.40) | 6.74(0.40) | 6.78(0.57) | 6.61(0.59) | 6.66(0.81) | 6.50(0.85) | |
GEBV, genomic estimated breeding value; GV, genotypic value; SD=standard deviation.
The mean square error value is from a linear model for the validation set, in which the GEBV is the covariate and the GV the response. The mean square error shown is the average of the mean square error over the different number of iterations (Table 2). Lower mean square error indicates more accurate and precise estimates of GEBV.
GV are calculated using a phenotypic model with all of the lines.
Marker effects from DIAG form are in bold with year of trial shown, and are equivalent to results from a single trial year analysis, marker effects for the MET analyses are in bold and italic, three marker effects are possible: main, interaction 2010, interaction 2011, with the sum of the (main + interaction) marker effects being equivalent of a total marker effect for a particular year.
Figure 2Comparison of partitioned vs. standard RR-BLUP model of CV10 (Table 3) for different forms (Table 1) and comparisons (Table 6) for trial year 2010 across a range of subsets of random markers. The horizontal line is maximum predictive ability of the standard single trial year analysis for 2010. Each subset represents the average results from 200 different sets of random markers, the comparisons across analyses are on the same subsets of random markers. MET, multi-environment trial analysis; STY, single trial year analysis; part, partitioned model; std, standard model; total, main marker effect + marker by trial interaction effect; main, main marker effect; same, same year used for prediction (2010 GEBV used to predict 2010 GV); opp, opposite year used for prediction (2011 GEBV used to predict 2010 GV); C, see Comparison as per Table 6 for more detail.
Figure 3Comparison of partitioned verses standard RR-BLUP model of CV10 (Table 3) for different forms (Table 1) and comparisons (Table 6), for trial year 2011 across a range of subsets of random markers. The horizontal line is maximum predictive ability of the standard single trial year analysis for 2011. Each subset represents the average results from 200 different sets of random markers, the comparisons across analyses are on the same subsets of random markers. MET, multi-environment trial analysis; STY, single trial year analysis; part, partitioned model; std, standard model; total, main marker effect + marker by trial interaction effect; main, main marker effect; same, same year used for prediction (2011 GEBV used to predict 2011 GV); opp, opposite year used for prediction (2010 GEBV used to predict 2011 GV); C, see Comparison as per Table 6 for more detail.