| Literature DB >> 30888431 |
Stefan McKinnon Edwards1, Jaap B Buntjer1, Robert Jackson2, Alison R Bentley2, Jacob Lage3, Ed Byrne3, Chris Burt4, Peter Jack4, Simon Berry5, Edward Flatman5, Bruno Poupard5, Stephen Smith6, Charlotte Hayes6, R Chris Gaynor1, Gregor Gorjanc1, Phil Howell2, Eric Ober2, Ian J Mackay7, John M Hickey8.
Abstract
Genomic selection offers several routes for increasing the genetic gain or efficiency of plant breeding programmes. In various species of livestock, there is empirical evidence of increased rates of genetic gain from the use of genomic selection to target different aspects of the breeder's equation. Accurate predictions of genomic breeding value are central to this, and the design of training sets is in turn central to achieving sufficient levels of accuracy. In summary, small numbers of close relatives and very large numbers of distant relatives are expected to enable predictions with higher accuracy. To quantify the effect of some of the properties of training sets on the accuracy of genomic selection in crops, we performed an extensive field-based winter wheat trial. In summary, this trial involved the construction of 44 F2:4 bi- and tri-parental populations, from which 2992 lines were grown on four field locations and yield was measured. For each line, genotype data were generated for 25 K segregating SNP markers. The overall heritability of yield was estimated to 0.65, and estimates within individual families ranged between 0.10 and 0.85. Genomic prediction accuracies of yield BLUEs were 0.125-0.127 using two different cross-validation approaches and generally increased with training set size. Using related crosses in training and validation sets generally resulted in higher prediction accuracies than using unrelated crosses. The results of this study emphasise the importance of the training panel design in relation to the genetic material to which the resulting prediction model is to be applied.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30888431 PMCID: PMC6588656 DOI: 10.1007/s00122-019-03327-y
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Trial design summary showing number of plots per tested line per location
| # Lines | 2015/2016 | 2016/2017 | ||
|---|---|---|---|---|
| Cambridge | Duxford | Duxford | Hinxton | |
| 367 | 2 | 1 | 1 | 0 |
| 381 | 2 | 1 | 0 | 1 |
| 381 | 1 | 2 | 1 | 0 |
| 367 | 1 | 2 | 0 | 1 |
| 748 | 1 | 1 | 1 | 1 |
| 748 | 0 | 0 | 2 | 2 |
|
|
|
|
|
|
Summary of line means per location after adjusting for spatial effects
| No. of lines | Avg. value | Coef. variation (%) | Correlationa | ||
|---|---|---|---|---|---|
| 2016 | Cambridge | 2247 | 8.58 | 6.1 | 0.63 |
| 2016 | Duxford | 2248 | 10.82 | 6.3 | 0.81 |
| 2017 | Hinxton | 2249 | 4.64 | 10.3 | 0.71 |
| 2017 | Duxford | 2235 | 8.24 | 6.6 | 0.62 |
aCorrelation between moisture corrected yield values and spatially adjusted values
Fig. 1Resampling strategies applied to assess the impact of training set design. Leave-one-cross-out strategy (left) tests the impact of inclusion of the amount of crosses as well as training set size, while the tenfold cross-validation (right) tests training set size only
Fig. 2Yield heritabilities when estimated per cross. Crosses (blue bars) are ordered by heritability value; overall heritability for this trait is shown in red
Prediction accuracies using the largest training sets by cross-validation approach
| Correlation metric | Training set size | Correlationa | Bi-/tri-parentald | |
|---|---|---|---|---|
| Leave-one-cross-out | By cross | 2787 | 0.127 0.222 | 0.12 0,.20/0.20 0.11 |
| Tenfold, crosses | By cross | 2563 | 0.125 0.193 | 0.11 0.20/0.20 0.08 |
| Tenfold, randomb | By cross | 2567 | 0.142 0.195 | 0.12 0.17/0.24 0.09 |
| Tenfold, crosses | Across allc | 2567 | 0.289 0.259 | N/A |
| Tenfold, randomb | Across allc | 2567 | 0.543 0.009 | N/A |
aAverage across all replicates. Small font displays inter-quantile range for correlations
bTenfold cross-validation where validation and training sets were grouped by lines instead of crosses
cCorrelations were calculated across multiple crosses in validation set
dAverage correlation (a), but across bi-parental or tri-parental crosses
Fig. 3Increasing training set size increased prediction accuracy (correlation). Solid line shows average of all leave-one-cross-out cross-validations with 10th and 90th percentile range shown by greyed area
Fig. 4Prediction accuracies increased with increasing number of crosses or increasing number of lines per cross in training set. Right-hand numbers show number of lines per cross in training set
Fig. 5Prediction accuracies increased when the validation cross was partly in training set or had its related crosses in training set. Results show average prediction accuracies for six validation crosses. Lines show prediction accuracies when training set is comprised of related crosses (green solid line), unrelated crosses (purple line), or a mix of both (blue line). Lower set of lines show prediction accuracies when validation crosses were not included on the training set; upper set of lines show prediction accuracies when validation crosses were included in the training set with 3/4 of lines. Grey horizontal lines show average prediction accuracy using only 1/4, 2/4 or 3/4 of validation cross as training set. Inserted figure shows the increase in accuracy when adding 1/4, 2/4 and 3/4 of the validation group to the training set. The thick lines in the inserted figure denote the lines of the main figure (color figure online)