| Literature DB >> 29559536 |
Washington Gapare1, Shiming Liu2, Warren Conaty2, Qian-Hao Zhu3, Vanessa Gillespie3, Danny Llewellyn3, Warwick Stiller2, Iain Wilson3.
Abstract
Genomic selection (GS) has successfully been used in plant breeding to improve selection efficiency and reduce breeding time and cost. However, there has not been a study to evaluate GS prediction models that may be used for predicting cotton breeding lines across multiple environments. In this study, we evaluated the performance of Bayes Ridge Regression, BayesA, BayesB, BayesC and Reproducing Kernel Hilbert Spaces regression models. We then extended the single-site GS model to accommodate genotype × environment interaction (G×E) in order to assess the merits of multi- over single-environment models in a practical breeding and selection context in cotton, a crop for which this has not previously been evaluated. Our study was based on a population of 215 upland cotton (Gossypium hirsutum) breeding lines which were evaluated for fiber length and strength at multiple locations in Australia and genotyped with 13,330 single nucleotide polymorphic (SNP) markers. BayesB, which assumes unique variance for each marker and a proportion of markers to have large effects, while most other markers have zero effect, was the preferred model. GS accuracy for fiber length based on a single-site model varied across sites, ranging from 0.27 to 0.77 (mean = 0.38), while that of fiber strength ranged from 0.19 to 0.58 (mean = 0.35) using randomly selected sub-populations as the training population. Prediction accuracies from the M×E model were higher than those for single-site and across-site models, with an average accuracy of 0.71 and 0.59 for fiber length and strength, respectively. The use of the M×E model could therefore identify which breeding lines have effects that are stable across environments and which ones are responsible for G×E and so reduce the amount of phenotypic screening required in cotton breeding programs to identify adaptable genotypes.Entities:
Keywords: Bayesian models; GenPred; Genomic prediction; Genomic selection; Gossypium hirsutum; Shared Data Resources; marker × environment interaction
Mesh:
Substances:
Year: 2018 PMID: 29559536 PMCID: PMC5940163 DOI: 10.1534/g3.118.200140
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Details of test sites for fiber quality traits over several years and estimates of genomic heritability (h2g) ± SEs
| Site | Region | Latitude | Longitude | Years | No. lines | h2g ± SE. Fiber length | h2g ± SE. Fiber strength |
|---|---|---|---|---|---|---|---|
| Central | 30° 14’S | 149° 38’E | 1998-2004 | 215 | 0.62 ± 0.11 | 0.21 ± 0.10 | |
| Collarenebri (CO) | Central | 29° 30’S | 148° 44’E | 1994-2002 | 88 | 0.33 ± 0.12 | 0.29 ± 0.11 |
| Hot | 30° 02’S | 145° 57’E | 1995-2010 | 125 | 0.51 ± 0.10 | 0.25 ± 0.13 | |
| Hot | 23° 31’S | 148° 10’E | 1993-2005 | 124 | 0.52 ± 0.15 | 0.47 ± 0.08 | |
| Hot | 28° 08’S | 148° 41’E | 1993-2010 | 128 | 0.32 ± 0.14 | 0.29 ± 0.11 | |
| Breeza (BR) | Cool | 31° 06’S | 150° 31’E | 1993-2005 | 80 | 0.42 ± 0.19 | 0.33 ± 0.13 |
| Darling Downs (DD) | Cool | 27° 22’S | 150° 31’E | 1993-2005 | 99 | 0.37 ± 0.13 | 0.42 ± 0.17 |
Trials with 116 breeding lines in common and used for across-site and marker-by-environment (M×E) interaction GS models.
Region refers to Australian cotton belt which is divided into three regions, i.e., hot, central and cool based on day-degrees (McMahon and Low 1972).
Figure 1Heat map of the G matrix of 215 cotton historical breeding lines genotyped with 13,330 SNP markers.
Figure 2Plot of principle component (PC) 1 vs. PC 2 scores for each historical breeding line (N = 215). Principal component analysis performed on genomic relationship matrix (G) estimated from single nucleotide polymorphism data for each breeding line. Green, red, black and blue squares represent a mix of elite varieties and overseas introduced lines, lines derived from pre-2000, post-2000 crosses and elite varieties, respectively.
Measures of goodness of fit for different models for two fiber quality traits using data at Myall Vale site
| Trait | Models | |||||
|---|---|---|---|---|---|---|
| BRR | BayesA | BayesB | BayesC | RKHS | ||
| Fiber length | Res var (SD) | 0.244 (0.05) | 0.236 (0.04) | 0.233 (0.05) | 0.243 (0.05) | 0.242 (0.05) |
| 402.86 | 398.99 | 402.21 | 401.7 | |||
| PA | 0.96 | 0.96 | 0.96 | 0.95 | 0.94 | |
| Fiber strength | Res var (SD) | 0.727 (0.09) | 0.708 (0.09) | 0.706 (0.09) | 0.723 (0.10) | 0.745 (0.09) |
| 587.46 | 587.15 | 587.00 | 592.88 | |||
| PA | 0.78 | 0.81 | 0.82 | 0.79 | 0.77 | |
Res var = residual variance; BRR = Bayesian Ridge Regression; RKHS = Reproducing Kernel Herbert Spaces Regression; SD = Standard Deviation; DIC = Deviance Information Criterion; DIC in bold was the best model for the trait; PA = Prediction Accuracy – i.e., correction between phenotypes and genomic estimated breeding values.
Sample phenotypic correlation estimates ± SE for fiber length and strength evaluated at four sites
| Trait -Length | Emerald | St George | Myall Vale |
|---|---|---|---|
| Bourke | 0.68 ± 0.07 | 0.63 ± 0.07 | 0.60 ± 0.07 |
| Emerald | 0.52 ± 0.08 | 0.68 ± 0.06 | |
| St George | 0.51 ± 0.09 | ||
| Trait -Strength | |||
| Bourke | 0.71 ± 0.07 | 0.76 ± 0.06 | 0.79 ± 0.06 |
| Emerald | 0.81 ± 0.05 | 0.80 ± 0.06 | |
| St George | 0.77 ± 0.06 |
Estimated posterior residual variance components (and their posterior standard deviations, SD) and the estimated posterior probability of markers with nonnull effects from the single-site, across-site and the marker × environment interaction models for fiber length (LEN) and strength measured at four sites
| Fiber length | Fiber strength | ||||
|---|---|---|---|---|---|
| Site | Estimate | SD | Estimate | SD | |
| Residual | Myall Vale | 0.469 | 0.13 | 0.405 | 0.12 |
| Bourke | 0.407 | 0.13 | 0.446 | 0.13 | |
| Emerald | 0.401 | 0.13 | 0.463 | 0.13 | |
| St George | 0.465 | 0.12 | 0.488 | 0.12 | |
| Probability | Myall Vale | 0.321 | 0.15 | 0.158 | 0.06 |
| Bourke | 0.385 | 0.18 | 0.219 | 0.09 | |
| Emerald | 0.377 | 0.21 | 0.214 | 0.10 | |
| St George | 0.301 | 0.17 | 0.182 | 0.07 | |
| Residual | Myall Vale | 0.458 | 0.09 | 0.298 | 0.07 |
| Bourke | 0.309 | 0.06 | 0.310 | 0.06 | |
| Emerald | 0.289 | 0.06 | 0.260 | 0.06 | |
| St George | 0.750 | 0.13 | 0.269 | 0.07 | |
| Probability | All | 0.298 | 011 | 0.238 | 0.06 |
| Residual | Myall Vale | 0.341 | 0.06 | 0.225 | 0.05 |
| Bourke | 0.243 | 0.04 | 0.264 | 0.05 | |
| Emerald | 0.254 | 0.05 | 0.209 | 0.04 | |
| St George | 0.619 | 0.11 | 0.213 | 0.04 | |
| Probability | Main effect | 0.382 | 0.07 | 0.376 | 0.07 |
| environment | Myall Vale | 0.504 | 0.08 | 0.462 | 0.08 |
| main effect and | Bourke | 0.216 | 0.08 | 0.200 | 0.07 |
| specific effect | Emerald | 0.435 | 0.12 | 0.444 | 0.16 |
| St George | 0.567 | 0.09 | 0.500 | 0.10 |
Figure 3Estimated prediction accuracy (prediction accuracy between phenotypes and predictions averaged over 50 TRN-TST partitions) for cotton fiber length and fiber strength at seven test sites.
Estimated prediction accuracy (correlation coefficient between predicted and observed phenotypes, averaged over 50 TRN-TST partitions) ± SE for fiber length and strength in cotton by CV1
| Trait/Sites | Prediction accuracy ± SE | |||
|---|---|---|---|---|
| Fiber length | Single site | Across-site | M×E model | Selection efficiency (%) |
| Myall Vale | 0.19 ± 0.02 | 0.14 ± 0.02 | 0.16 ± 0.03 | 27; 37 |
| Bourke | 0.23 ± 0.03 | 0.23 ± 0.02 | 0.23 ± 0.02 | 24; 38 |
| Emerald | 0.33 ± 0.02 | 0.26 ± 0.03 | 0.29 ± 0.02 | 23; 37 |
| St George | 0.30 ± 0.02 | 0.22 ± 0.02 | 0.28 ± 0.02 | 19; 37 |
| Mean | 0.26 | 0.21 | 0.24 | 23; 37 |
| Fiber strength | ||||
| Myall Vale | 0.26 ± 0.04 | 0.19 ± 0.04 | 0.19 ± 0.03 | 35; 52 |
| Bourke | 0.14 ± 0.02 | 0.13 ± 0.04 | 0.11 ± 0.02 | 28; 39 |
| Emerald | 0.35 ± 0.05 | 0.28 ± 0.02 | 0.29 ± 0.02 | 42; 51 |
| St George | 0.26 ± 0.02 | 0.26 ± 0.03 | 0.26 ± 0.03 | 43; 54 |
| Mean | 0.25 | 0.22 | 0.14 | 37; 49 |
Selection efficiency across-site model relative to single-site (before semi-colon) and relative to M×E model (after semi-colon).
Estimated prediction accuracy (correlation coefficient between predicted and observed phenotypes, averaged over 50 TRN-TST partitions) ± SE for studied traits in cotton, CV2
| Trait/Sites | Prediction accuracy ± SE | |||
|---|---|---|---|---|
| Fiber length | Single site | Across-site | M×E model | Selection efficiency (%) |
| Myall Vale | 0.19 ± 0.02 | 0.38 ± 0.02 | 0.64 ± 0.14 | 31; 39 |
| Bourke | 0.21 ± 0.02 | 0.33 ± 0.09 | 0.41 ± 0.11 | 28; 41 |
| Emerald | 0.34 ± 0.02 | 0.51 ± 0.03 | 0.65 ± 0.13 | 21; 23 |
| St George | 0.33 ± 0.02 | 0.42 ± 0.02 | 0.63 ± 0.14 | 33; 38 |
| Mean | 0.27 | 0.41 | 0.71 | 28; 35 |
| Fiber strength | ||||
| Myall Vale | 0.25 ± 0.01 | 0.31 ± 0.09 | 0.52 ± 0.11 | 32; 47 |
| Bourke | 0.12 ± 0.02 | 0.28 ± 0.11 | 0.57 ± 0.11 | 29; 36 |
| Emerald | 0.34 ± 0.02 | 0.48 ± 0.12 | 0.61 ± 0.14 | 48; 57 |
| St George | 0.34 ± 0.02 | 0.49 ± 0.15 | 0.64 ± 0.13 | 49; 52 |
| Mean | 0.26 | 0.39 | 0.59 | 40; 48 |
Selection efficiency across-site model relative to single-site (before semi-colon) and relative to M×E model (after semi-colon).