| Literature DB >> 28936114 |
Jiayin Song1,2, Brett F Carver3, Carol Powers3, Liuling Yan3, Jaroslav Klápště1,4,5, Yousry A El-Kassaby1, Charles Chen2,6.
Abstract
Crop improvement is a long-term, expensive institutional endeavor. Genomic selection (GS), which uses single nucleotide polymorphism (SNP) information to estimate genomic breeding values, has proven efficient to increasing genetic gain by accelerating the breeding process in animal breeding programs. As for crop improvement, with few exceptions, GS applicability remains in the evaluation of algorithm performance. In this study, we examined factors related to GS applicability in line development stage for grain yield using a hard red winter wheat (Triticum aestivum L.) doubled-haploid population. The performance of GS was evaluated in two consecutive years to predict grain yield. In general, the semi-parametric reproducing kernel Hilbert space prediction algorithm outperformed parametric genomic best linear unbiased prediction. For both parametric and semi-parametric algorithms, an upward bias in predictability was apparent in within-year cross-validation, suggesting the prerequisite of cross-year validation for a more reliable prediction. Adjusting the training population's phenotype for genotype by environment effect had a positive impact on GS model's predictive ability. Possibly due to marker redundancy, a selected subset of SNPs at an absolute pairwise correlation coefficient threshold value of 0.4 produced comparable results and reduced the computational burden of considering the full SNP set. Finally, in the context of an ongoing breeding and selection effort, the present study has provided a measure of confidence based on the deviation of line selection from GS results, supporting the implementation of GS in wheat variety development.Entities:
Keywords: Genomic best linear unbiased prediction; Genomic selection; Reproducing kernel Hilbert space regression; Single nucleotide polymorphism; Wheat
Year: 2017 PMID: 28936114 PMCID: PMC5582076 DOI: 10.1007/s11032-017-0715-8
Source DB: PubMed Journal: Mol Breed ISSN: 1380-3743 Impact factor: 2.589
Fig. 1Comparison of two missing data imputation methods, EM and mean, based on the predictive ability from the GBLUP (above) and RKHS (below) cross-validation models (with SNP effect only) across a gradient of SNP call rate; TP training population; bandwidth parameter was set to 0.1 for all RKHS models
Best-performing models and the number of SNPs required
| Training population | Algorithm | Model | Prediction population | |||
|---|---|---|---|---|---|---|
| Predictive ability (± SE) | Number of SNPs (call rate) | |||||
| 2014 | 2015 | 2014 | 2015 | |||
| 2014 | GBLUP | G | 0.58 (±0.008) | 0.35 (±0.002) | 5726 (0.6) | 7260 (0.5) |
| G + HD | 0.57 (±0.003) | 0.33 (±0.002) | 7260 (0.5) | 5726 (0.6) | ||
| RKHS | G | 0.57 (±0.005) | 0.36 (±0.003) | 5726 (0.6) | 7260 (0.5) | |
| G + HD | 0.65 (±0.005) | 0.42 (±0.003) | 7260 (0.5) | 5726 (0.6) | ||
| 2015 | GBLUP | G | 0.37 (±0.003) | 0.57 (±0.007) | 9244 (0.4) | 4010 (0.75) |
| G + HD | 0.38 (±0.002) | 0.55 (±0.004) | 7260 (0.5) | 5726 (0.6) | ||
| G + rust (RI5) | 0.37 (±0.003) | 0.56 (±0.006) | 4010 (0.75) | 5726 (0.6) | ||
| G + HD + rust (RI5) | 0.37 (±0.006) | 0.53 (±0.006) | 7260 (0.5) | 4010 (0.75) | ||
| RKHS | G | 0.39 (±0.003) | 0.60 (±0.004) | 7260 (0.5) | 4010 (0.75) | |
| G + HD | 0.39 (±0.003) | 0.61 (±0.003) | 5726 (0.6) | 5726 (0.6) | ||
| G + rust (RI5) | 0.40 (±0.001) | 0.68 (±0.003) | 7260 (0.5) | 5726 (0.6) | ||
| G + HD + rust (RI5) | 0.39 (±0.002) | 0.70 (±0.003) | 7260 (0.5) | 5726 (0.6) | ||
| BLUP | GBLUP | G | 0.48 (±0.003) | 0.52 (±0.006) | 5726 (0.6) | 5726 (0.6) |
| RKHS | G | 0.50 (±0.003) | 0.56 (±0.006) | 5726 (0.6) | 7260 (0.5) | |
SE standard error
Number of SNP markers within each correlation group based on the whole population data
| Absolute pairwise correlation threshold ( | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Full set | 1.0 | 0.9 | 0.8 | 0.7 | 0.6 | 0.5 | 0.4 | 0.3 | 0.2 | |
| No. of SNPs | 5726 | 4976 | 4506 | 4241 | 3883 | 3338 | 2595 | 1473 | 267 | 27 |
Fig. 2Predictive ability from the best within-year cross-validation model (within: year 2015 RKHS model with the marker effect and both heading date and disease index as covariates) and the best cross-year prediction model (cross: year 2014 predicting 2015 RKHS model with the marker effect and heading date as covariate) across subsets of marker filtered by absolute pairwise correlation threshold (t)
Fig. 3Trend of average ranking distance over the number of individuals from models which do and do not consider the G × E effect. Analysis started with the ranking distance of the best-performing individual in the year 2014 and proceeded by adding the next best individual’s ranking distance and taking the mean until all individuals in the population were included (18 individuals were removed from the G × E models due to the missing replicates)