| Literature DB >> 36066596 |
Daniel J Tolhurst1, R Chris Gaynor2, Brian Gardunia2, John M Hickey3, Gregor Gorjanc4.
Abstract
KEY MESSAGE: The integration of known and latent environmental covariates within a single-stage genomic selection approach provides breeders with an informative and practical framework to utilise genotype by environment interaction for prediction into current and future environments. This paper develops a single-stage genomic selection approach which integrates known and latent environmental covariates within a special factor analytic framework. The factor analytic linear mixed model of Smith et al. (2001) is an effective method for analysing multi-environment trial (MET) datasets, but has limited practicality since the underlying factors are latent so the modelled genotype by environment interaction (GEI) is observable, rather than predictable. The advantage of using random regressions on known environmental covariates, such as soil moisture and daily temperature, is that the modelled GEI becomes predictable. The integrated factor analytic linear mixed model (IFA-LMM) developed in this paper includes a model for predictable and observable GEI in terms of a joint set of known and latent environmental covariates. The IFA-LMM is demonstrated on a late-stage cotton breeding MET dataset from Bayer CropScience. The results show that the known covariates predominately capture crossover GEI and explain 34.4% of the overall genetic variance. The most notable covariates are maximum downward solar radiation (10.1%), average cloud cover (4.5%) and maximum temperature (4.0%). The latent covariates predominately capture non-crossover GEI and explain 40.5% of the overall genetic variance. The results also show that the average prediction accuracy of the IFA-LMM is [Formula: see text] higher than conventional random regression models for current environments and [Formula: see text] higher for future environments. The IFA-LMM is therefore an effective method for analysing MET datasets which also utilises crossover and non-crossover GEI for genomic prediction into current and future environments. This is becoming increasingly important with the emergence of rapidly changing environments and climate change.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36066596 PMCID: PMC9519718 DOI: 10.1007/s00122-022-04186-w
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.574
Summary of the 2017 P1 MET dataset for seed cotton yield
| State | Env | Trials | Genotypes | Plots | Yield | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total | 1rep | 2rep | >2rep | Total | NAs | Mean | ||||
| 17NC1 | 3 | 208 | 15 | 189 | 4 | 432 | 16 | 1.43 | 0.48 | |
| 17SC1 | 3 | 206 | 0 | 202 | 4 | 432 | 5 | 1.63 | 0.59 | |
| 17SC2 | 3 | 183 | 52 | 127 | 4 | 432 | 107 | 1.94 | 0.46 | |
| 17SC3 | 3 | 208 | 5 | 199 | 4 | 432 | 5 | 2.32 | 0.50 | |
| 17GA1 | 3 | 208 | 2 | 202 | 4 | 432 | 3 | 1.72 | 0.59 | |
| 17GA2 | 3 | 208 | 2 | 202 | 4 | 432 | 2 | 1.92 | 0.64 | |
| 17GA3 | 3 | 208 | 2 | 202 | 4 | 432 | 2 | 1.74 | 0.50 | |
| 17GA4 | 3 | 208 | 2 | 202 | 4 | 432 | 2 | 1.62 | 0.49 | |
| ° Missouri | 17MO1 | 3 | 207 | 69 | 134 | 4 | 432 | 76 | 1.95 | 0.61 |
| 17AR1 | 3 | 207 | 18 | 185 | 4 | 432 | 20 | 0.99 | 0.24 | |
| ° Arkansas | 17AR2 | 3 | 205 | 2 | 199 | 4 | 432 | 9 | 1.63 | 0.83 |
| 17MS1 | 3 | 204 | 9 | 191 | 4 | 432 | 19 | 1.21 | 0.57 | |
| 17MS2 | 3 | 207 | 6 | 197 | 4 | 432 | 10 | 1.93 | 0.63 | |
| ° Mississippi | 17MS3 | 3 | 207 | 140 | 63 | 4 | 432 | 150 | 0.91 | 0.55 |
| 17LA1 | 3 | 208 | 4 | 200 | 4 | 432 | 6 | 1.32 | 0.72 | |
| ° Louisiana | 17LA2 | 3 | 208 | 11 | 193 | 4 | 432 | 12 | 1.16 | 0.60 |
| 17TX1 | 3 | 208 | 1 | 203 | 4 | 432 | 1 | 2.12 | 0.62 | |
| 17TX2 | 3 | 208 | 2 | 202 | 4 | 432 | 2 | 1.79 | 0.59 | |
| 17TX3 | 3 | 207 | 4 | 199 | 4 | 432 | 7 | 2.05 | 0.72 | |
| 17TX4 | 3 | 208 | 4 | 200 | 4 | 432 | 4 | 1.86 | 0.38 | |
| 17TX5 | 3 | 198 | 132 | 62 | 4 | 432 | 161 | 1.38 | 0.56 | |
| 17TX6 | 3 | 206 | 29 | 173 | 4 | 432 | 33 | 1.95 | 0.43 | |
| 17TX7 | 3 | 208 | 7 | 197 | 4 | 432 | 7 | 1.77 | 0.56 | |
| 17TX8 | 3 | 208 | 18 | 186 | 4 | 432 | 19 | 2.57 | 0.40 | |
| – | – | – | – | |||||||
Presented for each environment is the number of trials, genotypes (with one, two or more replicates) and plots (total and missing), as well as the mean yield (t/ha) and generalised narrow-sense heritability ()
Note: Symbols distinguish the Southeast, ° Midsouth and Texas growing regions
*Total number after missing plots removed
Fig. 1Map of the cotton growing environments in the 2017 P1 and 2018 P2 MET datasets. Note: States and years are distinguished by colour and growing regions are distinguished by shape
Summary of the known environmental covariates in the 2017 P1 MET dataset
| Covariate | Description (units) | Min | Mean | Max | Min | Mean | Max | Min | Mean | Max |
|---|---|---|---|---|---|---|---|---|---|---|
| LAT | latitude ( | 31.0 | 33.0 | 35.4 | 31.6 | 33.6 | 36.4 | 31.4 | 33.2 | 34.9 |
| LONG | longitude ( | − 84.7 | − 81.7 | − 78.0 | − 91.9 | − 91.1 | − 89.7 | − 102.3 | − 101.1 | − 99.5 |
| avgCCR | average cloud cover (%) | 53.4 | 56.0 | 59.1 | 46.6 | 48.7 | 52.2 | 32.1 | 34.5 | 37.0 |
| minHUM | min humidity (%) | 43.7 | 47.7 | 53.7 | 52.0 | 53.4 | 55.9 | 30.1 | 34.0 | 40.4 |
| maxDSR | max downward solar radiation (W/m | 0.74 | 0.76 | 0.77 | 0.75 | 0.76 | 0.77 | 0.82 | 0.85 | 0.87 |
| maxNSR | max net solar radiation (W/m | 0.62 | 0.64 | 0.66 | 0.63 | 0.64 | 0.65 | 0.68 | 0.68 | 0.70 |
| maxPRP | max precipitation (mm/hr) | 2.4 | 2.9 | 3.4 | 1.7 | 2.6 | 3.6 | 1.1 | 1.4 | 1.8 |
| totPRP | total precipitation (mm/day) | 3.2 | 3.5 | 4.2 | 3.0 | 3.7 | 4.9 | 1.3 | 1.6 | 2.1 |
| maxDPT | max dew point temperature ( | 20.5 | 21.1 | 22.1 | 18.9 | 20.7 | 22.0 | 13.5 | 15.7 | 17.6 |
| maxTMP | max temperature ( | 28.5 | 30.3 | 31.5 | 27.6 | 28.9 | 29.6 | 28.7 | 30.3 | 32.1 |
| minTMP | min temperature ( | 19.0 | 20.1 | 21.0 | 17.9 | 19.5 | 20.4 | 15.4 | 17.4 | 19.5 |
| minWSP | min wind speed (km/hr) | 4.9 | 5.2 | 5.7 | 4.7 | 4.9 | 5.0 | 7.4 | 8.1 | 9.4 |
| avgWDR | average wind direction (azimuth degrees) | 166.7 | 175.8 | 181.5 | 152.3 | 161.9 | 174.0 | 144.7 | 152.9 | 162.9 |
| maxST1 | max soil temperature 1 ( | 27.6 | 29.9 | 31.3 | 27.0 | 28.3 | 29.1 | 29.5 | 32.2 | 34.5 |
| minST1 | min soil temperature 1 ( | 19.8 | 21.8 | 23.2 | 19.3 | 20.6 | 21.5 | 19.0 | 20.6 | 22.8 |
| avgSM3 | soil moisture 3 (%) | 7.0 | 23.8 | 42.3 | 28.0 | 30.1 | 32.7 | 11.4 | 19.0 | 25.6 |
| avgSM4 | soil moisture 4 (%) | 10.0 | 29.5 | 44.6 | 29.8 | 32.9 | 35.3 | 8.3 | 15.8 | 21.8 |
| minST4 | min soil temperature 4 ( | 20.0 | 22.4 | 24.2 | 20.2 | 22.0 | 23.0 | 21.0 | 22.9 | 25.2 |
Note: Values presented are prior to centring and scaling
Presented for each covariate is the minimum, mean and maximum for the Southeast, Midsouth and Texas growing regions
Summary of the variance models for the additive GE effects considered in this paper
| Model | Description | Parameters | Reference | ||
|---|---|---|---|---|---|
| Identity | 1 | ||||
| Diagonal | |||||
| Compound symmetry | 2 |
Patterson et al. ( | |||
| Main effects plus diagonal |
Cullis et al. ( | ||||
| FAM | Factor analytic plus main effects |
Smith et al. ( | |||
| FA | Factor analytic |
Smith et al. ( | |||
| Random regression 1 |
Jarquín et al. ( | ||||
| Random regression 2 |
Heslot et al. ( | ||||
| FAR | Factor analytic regression |
Jennrich and Schluchter ( | |||
| IFA | Integrated factor analytic | This paper |
Presented for each model is the structure of the additive genetic variance matrix between environments (), number of estimated variance parameters and the reference
Note: The vp-vector of additive GE effects is given by with var, where is the variance matrix between environments and is the genomic relationship matrix between genotypes. Also note that , is a matrix of latent covariates with p environments and k factors, is a matrix of known covariates with q covariates and is an orthogonal projection matrix, with
Linear mixed models with random regressions on latent environmental covariates
| Regressions on latent covariates | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Pars | Loglik | AIC | Model | Pars | Loglik | AIC | ||||
| 2 | 10,504.2 | − 20,748.4 | 36.2 | 36.2 | 1 | 10,156.9 | − 20,055.9 | ||||
| 25 | 10,563.6 | − 20,821.1 | 33.6 | 33.6 | 24 | 10,249.3 | − 20,194.7 | – | – | ||
| FAM1 | 49 | 10,765.4 | − 21,176.8 | 36.8 | 54.4 | FA1 | 48 | 10,667.1 | − 20,982.2 | 43.2 | 43.2 |
| FAM2 | 72 | 10,893.8 | − 21,387.6 | 37.2 | 67.5 | FA2 | 71 | 10,827.4 | − 21,256.8 | 44.1 | 60.4 |
| FAM3 | 94 | 10,942.9 | − 21,441.8 | 38.2 | 72.0 | FA3 | 93 | 10,940.3 | − 21,438.5 | 43.8 | 70.7 |
| FAM5 | 135 | 11,011.2 | − 21,496.5 | 38.7 | 80.0 | FA5 | 134 | 11,010.1 | − 21,496.1 | 44.3 | 79.0 |
Presented for each model is the number of estimated genetic variance parameters, residual log-likelihood, AIC and percentage of variance explained by the simple () or generalised () main effects and overall ()
Note: 128 non-genetic and residual variance parameters estimated in all models. The selected FAM4 and FA4 models are distinguished with bold font
*Models where intercepts are not explicitly fitted
Linear mixed models with random regressions on known and latent environmental covariates
| Regressions on known covariates | Regressions on known and latent covariates | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Pars | Loglik | AIC | Model | Pars | Loglik | AIC | ||||
| 26 | 10,721.2 | − 21,134.3 | 20.8 | 57.1 | 1 | 10,156.9 | − 20,055.9 | ||||
| 43 | 10,750.7 | − 21,159.3 | 23.2 | 58.5 | 24 | 10,249.3 | − 20,194.7 | ||||
| FAR1 | 43 | 10,636.7 | − 20,931.4 | 6.2 | 40.0 | IFA1 | 48 | 10,667.1 | − 20,982.2 | 7.0 | 43.2 |
| FAR2 | 61 | 10,791.4 | − 21,204.8 | 19.2 | 57.0 | IFA2 | 71 | 10,827.4 | − 21,256.8 | 20.1 | 60.4 |
| FAR3 | 78 | 10,887.0 | − 21,361.9 | 29.2 | 66.7 | IFA3 | 93 | 10,940.3 | − 21,438.5 | 30.1 | 70.7 |
| FAR5 | 109 | 10,931.3 | − 21,388.7 | 36.2 | 73.8 | IFA5-3 | 122 | 10,996.4 | − 21,492.8 | 36.2 | 78.0 |
Presented for each model is the number of estimated genetic variance parameters, residual log-likelihood, AIC and percentage of variance explained by the known covariates () and overall ()
Note: 128 non-genetic and residual variance parameters estimated in all models. The models rreg and rreg correspond to the random regressions in Jarquín et al. (2014) and Heslot et al. (2014). The selected FAR4 and IFA4-3 models are distinguished with bold font.
*Models where intercepts are not explicitly fitted
Fig. 2Regression plots for checks C1 and C2 in terms of the first two factors obtained from the a FAM4 and b/c FA4 models. Note: The simple main effects in a and the generalised main effects in b are denoted with closed circles and the growing regions are distinguished by shape. The percentage of additive genetic variance explained by each factor is labelled. The additive GE effects in c have been adjusted for those in b
Summary of the prediction accuracies for the 2017 current and 2018 future environments
| Year | Model | ° Midsouth | Overall | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min | Mean | Max | Min | Mean | Max | Min | Mean | Max | Min | Mean | Max | ||
| 0.27 | 0.51 | 0.68 | 0.30 | 0.58 | 0.77 | 0.27 | 0.47 | 0.60 | 0.27 | 0.52 | 0.77 | ||
| 0.27 | 0.52 | 0.69 | 0.29 | 0.58 | 0.76 | 0.27 | 0.47 | 0.61 | 0.27 | 0.52 | 0.76 | ||
| FAR4 | 0.25 | 0.50 | 0.66 | 0.34 | 0.59 | 0.77 | 0.25 | 0.48 | 0.64 | 0.25 | 0.52 | 0.77 | |
| 2017 | |||||||||||||
| 0.58 | 0.60 | 0.64 | 0.30 | 0.50 | 0.71 | − 0.03 | 0.20 | 0.34 | − 0.03 | 0.42 | 0.71 | ||
| 0.58 | 0.61 | 0.64 | 0.28 | 0.49 | 0.70 | − 0.02 | 0.21 | 0.36 | − 0.02 | 0.42 | 0.70 | ||
| FAR4 | 0.58 | 0.61 | 0.67 | 0.26 | 0.49 | 0.71 | 0.02 | 0.22 | 0.36 | 0.02 | 0.43 | 0.71 | |
| 2018 | |||||||||||||
Presented for each model is the minimum, mean and maximum prediction accuracy for the Southeast, ° Midsouth and Texas, as well as overall across all regions
Note: The models rreg and rreg correspond to the random regressions in Jarquín et al. (2014) and Heslot et al. (2014). The highest accuracy is distinguished with bold font
The selected IFA4-3 model, Part 1: Summary of growing environments
| State | Env | Var | ||||||
|---|---|---|---|---|---|---|---|---|
| 17NC1 | 0.01 | 85.4 | 69.3 | 0.06 | − 0.04 | 0.33 | 0.06 | |
| 17SC1 | 0.02 | 12.5 | 56.4 | 0.18 | − 0.06 | 0.17 | − 0.15 | |
| 17SC2 | 0.01 | 40.7 | 48.6 | 0.07 | − 0.03 | 0.27 | − 0.09 | |
| 17SC3 | 0.02 | 23.8 | 90.5 | 0.23 | − 0.14 | 0.26 | −0.03 | |
| 17GA1 | 0.03 | 23.1 | 63.8 | 0.20 | − 0.08 | 0.29 | − 0.02 | |
| 17GA2 | 0.03 | 19.1 | 54.0 | 0.19 | − 0.10 | 0.31 | 0.01 | |
| 17GA3 | 0.02 | 29.8 | 82.3 | 0.21 | − 0.12 | 0.20 | − 0.09 | |
| 17GA4 | 0.02 | 26.9 | 67.6 | 0.18 | − 0.10 | 0.28 | 0.14 | |
| ° Missouri | 17MO1 | 0.06 | 26.6 | 82.2 | 0.39 | − 0.17 | − 0.15 | 0.39 |
| 17AR1 | 0.01 | 49.1 | 100.0 | 0.14 | 0.00 | − 0.32 | 0.09 | |
| ° Arkansas | 17AR2 | 0.06 | 32.1 | 89.2 | 0.39 | − 0.16 | − 0.34 | 0.30 |
| 17MS1 | 0.03 | 46.0 | 81.6 | 0.23 | 0.00 | − 0.26 | − 0.44 | |
| 17MS2 | 0.03 | 47.5 | 77.6 | 0.24 | − 0.12 | − 0.15 | 0.23 | |
| ° Mississippi | 17MS3 | 0.03 | 37.3 | 100.0 | 0.26 | − 0.09 | − 0.23 | − 0.43 |
| 17LA1 | 0.03 | 19.9 | 71.8 | 0.23 | − 0.17 | − 0.01 | − 0.32 | |
| ° Louisiana | 17LA2 | 0.02 | 22.5 | 76.4 | 0.20 | − 0.10 | 0.11 | − 0.07 |
| 17TX1 | 0.02 | 61.4 | 91.8 | 0.15 | 0.39 | 0.04 | 0.09 | |
| 17TX2 | 0.02 | 36.6 | 61.9 | 0.12 | 0.28 | 0.10 | 0.17 | |
| 17TX3 | 0.05 | 41.5 | 74.0 | 0.21 | 0.46 | 0.07 | − 0.06 | |
| 17TX4 | 0.01 | 32.6 | 64.6 | 0.10 | 0.22 | 0.07 | − 0.17 | |
| 17TX5 | 0.04 | 29.9 | 62.0 | 0.20 | 0.34 | 0.01 | − 0.18 | |
| 17TX6 | 0.01 | 80.7 | 44.3 | 0.06 | 0.17 | 0.05 | 0.10 | |
| 17TX7 | 0.02 | 44.4 | 66.5 | 0.12 | 0.33 | − 0.01 | 0.02 | |
| 17TX8 | 0.02 | 24.1 | 72.0 | 0.13 | 0.28 | 0.12 | 0.19 | |
| – |
Presented are the REML estimates of additive genetic variance, percentage of variance explained by the known covariates () and overall (), as well as estimates of the joint factor loadings ()
Note: The percentage of variance explained across all environments ( and ), as well as by individual factors () is presented in the final row. The measure is greater than for 17NC1 and 17TX6 since the known and latent covariates are not orthogonal for individual environments
The selected IFA4-3 model, Part 2: Summary of known environmental covariates
| Covariate | Covar | |||||
|---|---|---|---|---|---|---|
| LAT | 0.02 | 0.4 | 0.02 | 0.10 | − 0.21 | − 0.20 |
| LONG | 0.05 | 0.5 | − 0.18 | 0.04 | 0.56 | 0.33 |
| avgCCR | − 0.18 | 4.5 | − 0.37 | 0.31 | − 0.02 | 0.29 |
| maxDPT | 0.25 | 3.7 | 0.47 | − 0.46 | − 0.68 | − 0.22 |
| maxDSR | 0.25 | 10.1 | − 0.30 | 0.41 | − 0.10 | 0.17 |
| minHUM | − 0.33 | 3.5 | − 0.62 | 0.24 | 1.03 | 1.10 |
| maxNSR | 0.04 | 1.9 | 0.05 | 0.11 | − 0.19 | − 0.29 |
| maxPRP | − 0.01 | 0.1 | 0.04 | 0.05 | − 0.18 | − 0.55 |
| totPRP | 0.03 | 1.6 | 0.11 | − 0.01 | 0.05 | − 0.15 |
| maxTMP | 0.18 | 4.0 | − 0.31 | 0.09 | 0.58 | 0.32 |
| minTMP | − 0.18 | 3.1 | − 0.05 | 0.44 | − 0.67 | − 1.00 |
| minWSP | 0.01 | 0.1 | − 0.13 | − 0.09 | 0.31 | 0.16 |
| avgWDR | − 0.03 | 1.5 | 0.03 | 0.14 | − 0.01 | − 0.33 |
| maxST1 | − 0.04 | 1.0 | 0.09 | 0.06 | − 0.27 | − 0.25 |
| minST1 | 0.04 | 0.1 | 0.37 | − 0.48 | 0.15 | 0.96 |
| avgSM3 | − 0.02 | 0.4 | 0.10 | 0.12 | 0.10 | 0.19 |
| avgSM4 | 0.05 | 1.2 | − 0.10 | − 0.15 | − 0.25 | − 0.41 |
| minST4 | 0.09 | 1.4 | − 0.30 | 0.32 | 0.10 | − 0.48 |
Presented are the REML estimates of additive genetic covariance, percentage of variance explained by individual known covariates () and estimates of the known factor loadings ()
Note: The percentage of variance explained by all known covariates () and by individual factors () is presented in the final row
Fig. 3Heatmaps of the additive genetic correlation matrices between environments in terms of the a known covariates and b known and latent covariates. Note: Both matrices are ordered using the dendrogram applied to b. Black lines distinguish the Southeast, ° Midsouth and Texas cotton growing regions. The colourkey ranges from 1 (agreement in rankings) through zero (dissimilarity in rankings) to −1 (reversal of rankings)
Fig. 4a Regression plots for checks C1 and C2 in terms of four joint factors and b percentage of additive genetic variance in the joint factors explained by the known covariates. Note: The generalised main effects in a are denoted with closed circles and the growing regions are distinguished by shape. The percentage of variance explained by each factor is labelled in a and the percentage of variance in each factor explained by all known covariates is labelled in b. The additive GE effects for the higher order factors are adjusted for the preceding factor(s). Only 10 (of the 18) known covariates are displayed in b for brevity