| Literature DB >> 30680419 |
J Martin Sarinelli1, J Paul Murphy1, Priyanka Tyagi1, James B Holland1,2, Jerry W Johnson3, Mohamed Mergoum3, Richard E Mason4, Ali Babar5, Stephen Harrison6, Russell Sutton7, Carl A Griffey8, Gina Brown-Guedira9,10.
Abstract
KEY MESSAGE: The optimization of training populations and the use of diagnostic markers as fixed effects increase the predictive ability of genomic prediction models in a cooperative wheat breeding panel. Plant breeding programs often have access to a large amount of historical data that is highly unbalanced, particularly across years. This study examined approaches to utilize these data sets as training populations to integrate genomic selection into existing pipelines. We used cross-validation to evaluate predictive ability in an unbalanced data set of 467 winter wheat (Triticum aestivum L.) genotypes evaluated in the Gulf Atlantic Wheat Nursery from 2008 to 2016. We evaluated the impact of different training population sizes and training population selection methods (Random, Clustering, PEVmean and PEVmean1) on predictive ability. We also evaluated inclusion of markers associated with major genes as fixed effects in prediction models for heading date, plant height, and resistance to powdery mildew (caused by Blumeria graminis f. sp. tritici). Increases in predictive ability as the size of the training population increased were more evident for Random and Clustering training population selection methods than for PEVmean and PEVmean1. The selection methods based on minimization of the prediction error variance (PEV) outperformed the Random and Clustering methods across all the population sizes. Major genes added as fixed effects always improved model predictive ability, with the greatest gains coming from combinations of multiple genes. Maximum predictabilities among all prediction methods were 0.64 for grain yield, 0.56 for test weight, 0.71 for heading date, 0.73 for plant height, and 0.60 for powdery mildew resistance. Our results demonstrate the utility of combining unbalanced phenotypic records with genome-wide SNP marker data for predicting the performance of untested genotypes.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30680419 PMCID: PMC6449317 DOI: 10.1007/s00122-019-03276-6
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Test year, entries per cooperating state breeding programs and total numbers of check cultivars and elite advanced line entries in the Gulf Atlantic Wheat Nursery from 2008 to 2016
| Test year | Breeding programs | Check cultivars | Total by year | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| UA | UF | UG | LSU | NCSU | CU | TAMU | VPI | |||
| 2008 | 12 | 7 | 12 | 12 | 12 | 12 | 0 | 12 | 3 | 82 |
| 2009 | 13 | 0 | 12 | 10 | 12 | 12 | 0 | 12 | 3 | 74 |
| 2010 | 12 | 1 | 12 | 11 | 12 | 12 | 0 | 12 | 3 | 75 |
| 2011 | 0 | 0 | 12 | 12 | 12 | 4 | 0 | 12 | 4 | 56 |
| 2012 | 12 | 1 | 12 | 11 | 12 | 12 | 0 | 12 | 4 | 76 |
| 2013 | 12 | 1 | 12 | 11 | 12 | 0 | 0 | 12 | 4 | 64 |
| 2014 | 5 | 0 | 7 | 6 | 6 | 0 | 0 | 10 | 4 | 38 |
| 2015 | 6 | 0 | 6 | 6 | 6 | 0 | 6 | 10 | 4 | 44 |
| 2016 | 4 | 0 | 6 | 7 | 9 | 0 | 6 | 10 | 3 | 45 |
| Total by program | 76 | 10 | 91 | 86 | 93 | 52 | 12 | 102 | ||
Summary of phenotypic information for grain yield, test weight, heading date, plant height and powdery mildew resistance, including number of environments where each trait was evaluated, number of data points for the analysis of each trait, descriptive statistics, variance components estimates, and broad sense heritability estimate calculated on a per plot basis
| Traits | |||||
|---|---|---|---|---|---|
| Grain yield | Test weight | Heading date | Plant height | Powdery mildew | |
| Mg ha−1 | kg m−3 | Days | cm | 0–9 scale | |
| No. environments | 49 | 49 | 53 | 44 | 19 |
| No. data points | 7028 | 5075 | 4861 | 4780 | 2446 |
| Minimum | 0.42 | 41.70 | 63.00 | 53.34 | 0.00 |
| Mean | 4.24 | 57.05 | 105.56 | 88.34 | 2.16 |
| Maximum | 8.27 | 65.80 | 131.00 | 137.16 | 9.00 |
| Standard deviation | 1.30 | 6.60 | 11.91 | 11.07 | 2.01 |
| Variance components estimates | |||||
| Location (L) | 52.60* | 3.12* | 37.95* | 3.24* | 0.60* |
| Year (Y) | 53.82 | 0.97 | 43.26* | 2.82* | |
| YL | 142.09* | 5.41* | 21.67* | 4.05* | |
| Rep (YL) | 3.76* | 0.07* | 0.08* | 0.15* | 0.10* |
| Genotype (G) | 19.84* | 1.07* | 14.42* | 4.85* | 1.66* |
| GY | 6.54* | 0.41* | 1.26* | 0.13 | 0.62* |
| GL | 26.24* | 0.32* | 5.39* | 0.06 | |
| GYL | 50.71* | 1.26* | 4.12* | 1.10* | |
| Residual | 39.65 | 0.80 | 1.45 | 2.42 | 1.03 |
| Heritability | 0.14 | 0.28 | 0.54 | 0.57 | 0.49 |
*Significant at α = 0.05
Allele frequency, SNP position, and mean allelic effect for trait associated markers utilized as fixed effects for heading date and plant height
| Trait | Locus | Chromosome | Positiona (Mbp) | Freq. of early/dwarf allele | Dwarf/early allele | Reference allele | Mean effectb | Std deviationb |
|---|---|---|---|---|---|---|---|---|
| Plant height (cm) | 4B | 30.86 | 0.20 | T | C | − 3.78 | 0.14 | |
| 4D | 18.78 | 0.74 | T | G | − 5.38 | 0.15 | ||
| Heading date (days) | 5A | 587.42 | 0.21 | C | T | − 1.37 | 0.14 | |
| 5B | 573.81 | 0.18 | C | G | − 1.28 | 0.20 | ||
| 2D | 33.96 | 0.63 | Deletion | Non-deleted | − 0.91 | 0.14 |
aPosition of SNP in the International Wheat Genome Sequencing Consortium (IWGSC) RefSeq1.0 assembly
bCalculated as the average effect of the marker estimated from 50 different training populations of size 350 selected at Random to predict GEBVs in each of the 50 validation set for a model that included all diagnostic markers associated with the trait simultaneously
Fig. 1Scatter plot of the first two principal components from analysis of 467 winter wheat genotypes based on the full data set of 34,107 markers. Points are color coded according to the origin of genotypes: AR, University of Arkansas; FL, University of Florida; GA, University of Georgia; LA, Louisiana State University; NC, North Carolina State University; SC, Clemson University; VA, Virginia Tech; TX, Texas A&M AgriLife Research. Different shapes represent the number of copies of the allele of SNP marker IWA8068 located in the t2BS:2GS·2GL:2BL translocation from T. timopheevii. Percentages in each axis represent the proportion of variance explained by each principal component
Fig. 2Comparison of mean predictive ability (Mean Pred. Ability) for grain yield (a), test weight (b), heading date (c), plant height (d) and powdery mildew resistance (e) using four training population optimization methods: Clustering (Weighted proportion of translocation t2BS:2GS·2GL:2BL in the training population and validation set), PEVmean (training population selected by minimization of the PEV mean in the validation set), PEVmean1 (training population selected by minimization of the PEV of each individual in the validation set) and Random (random training population selection). All methods were evaluated for seven different training population sizes. Error bars represent ± one standard error of the mean
Mean predictive ability after 50 cycles of cross-validation using two TP optimization methods (Random and PEVmean) for heading date, plant height and powdery mildew resistance calculated and averaged across seven TP sizes. Genomic selection models used only phenotypic data available from the historical series or incorporated phenotypic data from a common environment (Raleigh 2016) along with the historical series
| All locations | Historical data | |
|---|---|---|
| Heading date | ||
| Random | 0.59** | 0.53 |
| PEVmean | 0.66** | 0.58 |
| Plant height | ||
| Random | 0.57 | 0.57 |
| PEVmean | 0.62 | 0.62 |
| Powdery mildew resistance | ||
| Random | 0.48 | 0.48 |
| PEVmean | 0.53 | 0.52 |
**Significant at α = 0.01
Comparison of mean predictive ability across 50 cycles of cross-validation for heading date, plant height and powdery mildew resistance according to genomic selection models having no marker as fixed effects with models that consider the addition of trait associated markers as fixed effects. Analyses used the Random training population selection method and different training population sizes. Combinations of diagnostic markers for loci associated with heading date (Ppd-D1, Vrn-A1, Vrn-B1) and plant height (Rht-B1, Rht-D1) were utilized, while for powdery mildew resistance the most significant SNP detected in GWAS for each validation cycle was utilized as the fixed effect
| Training population size | |||||||
|---|---|---|---|---|---|---|---|
| TP050 | TP100 | TP150 | TP200 | TP250 | TP300 | TP350 | |
| Heading date | |||||||
| No fixed marker | 0.46 | 0.53 | 0.56 | 0.61 | 0.63 | 0.67 | 0.68 |
| | 0.42* | 0.54 | 0.58 | 0.62 | 0.64 | 0.68 | 0.69 |
| | 0.51** | 0.58** | 0.60** | 0.63* | 0.65* | 0.68 | 0.69 |
| | 0.49 | 0.56 | 0.59 | 0.63 | 0.64 | 0.67 | 0.68 |
| | 0.51** | 0.59** | 0.62** | 0.65** | 0.67** | 0.70** | 0.70** |
| | 0.56** | 0.61** | 0.64** | 0.67** | 0.68** | 0.71** | 0.71** |
| Plant height | |||||||
| No fixed marker | 0.49 | 0.53 | 0.55 | 0.58 | 0.60 | 0.61 | 0.63 |
| | 0.47 | 0.52 | 0.55 | 0.58 | 0.60 | 0.61 | 0.62 |
| | 0.56** | 0.60** | 0.62** | 0.64** | 0.66** | 0.67** | 0.68** |
| | 0.59** | 0.64** | 0.67** | 0.69** | 0.71** | 0.72** | 0.73** |
| Powdery mildew resistance | |||||||
| No fixed marker | 0.36 | 0.44 | 0.48 | 0.51 | 0.52 | 0.53 | 0.55 |
| Most significant SNP | 0.42** | 0.50** | 0.53** | 0.56** | 0.56** | 0.57** | 0.60** |
*, **Significantly different from the no fixed marker model at α = 0.05 and α = 0.01, respectively