| Literature DB >> 24337101 |
Diego Jarquín1, José Crossa, Xavier Lacaze, Philippe Du Cheyron, Joëlle Daucourt, Josiane Lorgeou, François Piraux, Laurent Guerreiro, Paulino Pérez, Mario Calus, Juan Burgueño, Gustavo de los Campos.
Abstract
New methods that incorporate the main and interaction effects of high-dimensional markers and of high-dimensional environmental covariates gave increased prediction accuracy of grain yield in wheat across and within environments. In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (G × E). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, G × E can be accounted for using interactions between markers and environmental covariates (ECs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ECs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ECs, respectively. We assessed the proposed method using data from Arvalis, consisting of 139 wheat lines genotyped with 2,395 SNPs and evaluated for grain yield over 8 years and various locations within northern France. A total of 68 ECs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17-34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important.Entities:
Mesh:
Year: 2013 PMID: 24337101 PMCID: PMC3931944 DOI: 10.1007/s00122-013-2243-1
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Main effect and interaction of the seven models used to fit the data set
| Model abbreviation | Factors included | |||||
|---|---|---|---|---|---|---|
| Main effect | Interaction | |||||
| E | L | G | W | G × E | G × W | |
| EL | X | X | ||||
| EG | X | X | ||||
| ELW | X | X | X | |||
| EGW | X | X | X | |||
| EGW-G × E | X | X | X | X | ||
| EGW-G × W | X | X | X | X | ||
| EGW-G × WG × E | X | X | X | X | X | |
E environment, L line, G marker covariates, W environmental covariates, G × E interaction between environments and markers, G × W interactions between markers and ECs
Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1–5) and five environments (E1–E5)
| CV1 | CV2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| E1 | E2 | E3 | E4 | E5 | E1 | E2 | E3 | E4 | E5 | |
| Line 1 |
|
|
|
|
|
| NA |
|
|
|
| Line 2 |
|
|
|
|
|
|
| NA |
|
|
| Line 3 | NA | NA | NA | NA | NA |
|
|
|
| NA |
| Line 4 |
|
|
|
|
|
|
|
| NA |
|
| Line 5 |
|
|
|
|
| NA |
|
|
|
|
Lines with unobserved phenotypic data in the cross-validation scheme are indicated by NA (not available); lines with observed values in environments are denoted as Y for (i, j = 1, 2, 3, 4, 5)
Fig. 1Histogram of grain yield in quintals per hectare
Fig. 2Scree plot (left panel) and loadings of the first two eigenvectors (right panel) of the covariance matrices derived from markers (top panel) and from environmental covariates (lower panel)
Estimated variance components
| Model | Variance component | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Estimate | Percentage of the within-environment variancea | ||||||||||||
| E | L | G | W | G × E | G × W | Res. | L | G | W | G × E | G × W | Res | |
| EL | 199.7 | 12.3 | 22.6 | 35.2 | 64.8 | ||||||||
| EG | 199.0 | 14.9 | 22.6 | 39.7 | 60.3 | ||||||||
| EW | 153.5 | 23.7 | 27.7 | 46.1 | 53.9 | ||||||||
| ELW | 143.7 | 12.9 | 24.9 | 22.1 | 21.5 | 41.6 | 36.9 | ||||||
| EGW | 145.6 | 14.6 | 23.3 | 22.1 | 24.3 | 38.8 | 36.8 | ||||||
| EGW-G × E | 148.0 | 14.4 | 20.2 | 5.9 | 16.4 | 25.3 | 35.5 | 10.4 | 28.8 | ||||
| EGW-G × W | 146.7 | 12.8 | 22.3 | 5.3 | 18.3 | 21.8 | 38.0 | 9.0 | 31.2 | ||||
| EGW-G × WG × E | 148.6 | 12.7 | 19.8 | 3.9 | 5.0 | 14.9 | 22.6 | 35.2 | 6.9 | 8.9 | 26.5 | ||
E environment, L line, G genomic [marker] information, W environmental covariate [EC] information, G × E genotype × environment and G × W genotype × EC interaction, and Res model residual
aRelative to the total variance minus the variance due to main effect of the environment
Estimated correlations between adjusted phenotypes and cross-validation prediction for each of the seven models for cross-validation CV1 (prediction without using phenotypic records of the lines whose performance is predicted, i.e., prediction for un-tested lines) and CV2 (prediction in incomplete field trials)
| Models | CV1 | CV2 | ||||
|---|---|---|---|---|---|---|
| Estimate | 95 % CI | Estimate | 95 % CI | |||
| Par.a | Non-P.b | Par.a | Non-P.b | |||
| EL | 0.090 | [0.068; 0.112] | [0.063; 0.117] | 0.425 | [0.405; 0.445] | [0.404; 0.447] |
| EG | 0.191 | [0.169; 0.213] | [0.167; 0.215] | 0.426 | [0.406; 0.446] | [0.404; 0.448] |
| ELW | −0.027 | [−0.049; −0.005] | [−0.050; −0.004] | 0.438 | [0.418; 0.458] | [0.416; 0.459] |
| EGW | 0.175 | [0.153; 0.197] | [0.151; 0.198] | 0.439 | [0.419; 0.459] | [0.417; 0.460] |
| EGW-G × E | 0.209 | [0.187; 0.231] | [0.185; 0.232] | 0.454 | [0.434; 0.474] | [0.432; 0.476] |
| EGW-G × W | 0.214 | [0.192; 0.236] | [0.191; 0.237] | 0.506 | [0.486; 0.525] | [0.495; 0.525] |
| EGW-G × WG × E | 0.236 | [0.215; 0.257] | [0.213; 0.259] | 0.514 | [0.495; 0.533] | [0.494; 0.535] |
aComputed using , where is the estimated correlation and n is the number of records used to compute the correlation
bObtained by Bootstrapping 10,000 times the vectors of CV-adjusted predictions and CV-adjusted phenotypes
Fig. 3Adjusted phenotype versus adjusted cross-validation predictions, derived from the most comprehensive model (EGW-G × WG × E) in two cross-validation designs (CV 1: prediction without using phenotypic records of the lines whose performance is predicted, i.e., prediction of un-tested lines; and CV 2: prediction in incomplete field trials). Horizontal and vertical dashed lines give the 20, 50 and 80 % empirical percentiles of the variables in the vertical and horizontal axes, and the numbers inside the grid give the observed proportions of each of four groups defined by the percentiles displayed for observed adjusted yield, given the groups defined in the horizontal line (predictions)