| Literature DB >> 36159995 |
Caio Canella Vieira1, Reyna Persa2, Pengyin Chen1, Diego Jarquin2.
Abstract
The availability of high-dimensional molecular markers has allowed plant breeding programs to maximize their efficiency through the genomic prediction of a phenotype of interest. Yield is a complex quantitative trait whose expression is sensitive to environmental stimuli. In this research, we investigated the potential of incorporating soil texture information and its interaction with molecular markers via covariance structures for enhancing predictive ability across breeding scenarios. A total of 797 soybean lines derived from 367 unique bi-parental populations were genotyped using the Illumina BARCSoySNP6K and tested for yield during 5 years in Tiptonville silt loam, Sharkey clay, and Malden fine sand environments. Four statistical models were considered, including the GBLUP model (M1), the reaction norm model (M2) including the interaction between molecular markers and the environment (G×E), an extended version of M2 that also includes soil type (S), and the interaction between soil type and molecular markers (G×S) (M3), and a parsimonious version of M3 which discards the G×E term (M4). Four cross-validation scenarios simulating progeny testing and line selection of tested-untested genotypes (TG, UG) in observed-unobserved environments [OE, UE] were implemented (CV2 [TG, OE], CV1 [UG, OE], CV0 [TG, UE], and CV00 [UG, UE]). Across environments, the addition of G×S interaction in M3 decreased the amount of variability captured by the environment (-30.4%) and residual (-39.2%) terms as compared to M1. Within environments, the G×S term in M3 reduced the variability captured by the residual term by 60 and 30% when compared to M1 and M2, respectively. M3 outperformed all the other models in CV2 (0.577), CV1 (0.480), and CV0 (0.488). In addition to the Pearson correlation, other measures were considered to assess predictive ability and these showed that the addition of soil texture seems to structure/dissect the environmental term revealing its components that could enhance or hinder the predictability of a model, especially in the most complex prediction scenario (CV00). Hence, the availability of soil texture information before the growing season could be used to optimize the efficiency of a breeding program by allowing the reconsideration of field experimental design, allocation of resources, reduction of preliminary trials, and shortening of the breeding cycle.Entities:
Keywords: genetic gain; genomic prediction/selection; genotype × environment G×E interaction; soil covariates; soybean breeding
Year: 2022 PMID: 36159995 PMCID: PMC9493273 DOI: 10.3389/fgene.2022.905824
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
Percentage of phenotypic variability explained by the different model components across and within environments for the four models (M1–M4).
| Model | % Of across environment variability | % Of within environment variability | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| E | L | S | G | G×E | G×S | R | L | S | G | G×E | G×S | R | ||
| M1: E+L+G | 65.7 | 1.2 | 7.3 | 25.8 | 3.6 | 21.3 | 75.2 | |||||||
| M2: E+L+G+G×E | 65.7 | 1.6 | 6.0 | 12.7 | 14.0 | 4.5 | 17.5 | 37.1 | 40.9 | |||||
| M3: E+L+S+G+G×E+G×S | 45.7 | 2.2 | 12.5 | 3.5 | 10.9 | 9.5 | 15.7 | 4.0 | 23.1 | 6.5 | 20.0 | 17.5 | 28.9 | |
| M4: E+L+S+G+G×S | 64.0 | 1.5 | 0.0 | 4.2 | 9.5 | 20.8 | 4.2 | 11.6 | 26.4 | 57.8 | ||||
The letters E, L, S, and G denote the mean effects of environments, genotypes, soil type, and molecular markers, respectively, whereas G×E and G×S reflect the interaction of each molecular marker with environments and soil type, respectively. The residual variance is denoted by R.
Weighted mean average correlation across environments for four cross-validation schemes and four models.
| Model | CV2 | CV1 | CV0 | CV00 |
|---|---|---|---|---|
| M1: E+L+G | 0.461 | 0.359 | 0.461 |
|
| M2: E+L+G+G×E | 0.558 |
| 0.459 | 0.192 |
| M3: E+L+S+G+G×E+G×S |
|
|
| 0.227 |
| M4: E+L+S+G+G×S | 0.515 | 0.405 | 0.484 | 0.231 |
E, L, S, and G constitute the main effect of the environments, genotypes, soil type, and molecular markers; and G×E and G×S evoke the interaction between each molecular marker with environments and soil type, respectively.
CV2 considers the case of predicting incomplete field trials (i.e., some genotypes tested in some environments but not others), whereas CV1 assessed the accuracy of predicting newly developed genotypes. CV0 represents plant performance in novel environments of previously studied genotypes. CV00 assesses new genotypes in novel environments. For CV2 and CV1, 10 replicates of fivefold cross-validation were considered while for CV0 and CV00 the leave one environment out scheme was implemented.
Bolded numbers indicate the best model performance for each cross-validation scheme.
FIGURE 1Genotypic means (BLUP-centered on zero within environments) of observed versus predicted cross-validation predictions of four models (M1–M4) under the cross-validation scheme CV2, which mimics the incomplete field trial prediction scenario (predicting tested genotypes in observed environments). Models and terms are described in detail in the Materials and Methods section (Eqs 1–4). Horizontal and vertical dashed lines indicate the 20, 50, and 80% empirical percentiles corresponding to the genotypic means of observed and predicted values; the numbers inside the grid provide the conditional proportions observed on the y-axis for the different percentiles on the x-axis (e.g., out of the top 20% of the predicted values with M3 (panel C), 79% [top right] of these correspond to genotypes that showed a performance among the 20% across fields).
FIGURE 4Genotypic means (BLUP-centered on zero within environments) of observed versus predicted cross-validation predictions of four models (M1–M4) under the cross-validation scheme CV00, which mimics the prediction scenario of predicting newly developed lines in novel environments (predicting untested genotypes in unobserved environments). Models and terms are described in detail in the Materials and Methods section (Eqs 1–4). Horizontal and vertical dashed lines indicate the 20, 50, and 80% empirical percentiles corresponding to the observed and predicted values; the numbers inside the grid provide the conditional proportions observed on the y-axis for the different percentiles on the x-axis (e.g., out of the top 20% of the predicted values with M4 (panel D), 39% [top right] of these correspond to phenotypes that showed a performance among the 20% in fields).
FIGURE 2Genotypic means (BLUP-centered on zero within environments) of observed versus predicted cross-validation predictions of four models (M1–M4) under the cross-validation scheme CV1, which mimics the prediction scenario of newly developed lines (predicting untested genotypes in observed environments). Models and terms are described in detail in the Materials and Methods section (Eqs 1–4). Horizontal and vertical dashed lines indicate the 20, 50, and 80% empirical percentiles corresponding to the observed and predicted values; the numbers inside the grid provide the conditional proportions observed on the y-axis for the different percentiles on the x-axis (e.g., out of the top 20% of the predicted values with M3 (panel C), 48% [top right] of these correspond to phenotypes that showed a performance among the 20% in fields).