| Literature DB >> 34117908 |
Hans-Jürgen Auinger1, Christina Lehermeier2, Daniel Gianola3, Manfred Mayer1, Albrecht E Melchinger4, Sofia da Silva5, Carsten Knaak5, Milena Ouzunova5, Chris-Carolin Schön6.
Abstract
KEY MESSAGE: Model training on data from all selection cycles yielded the highest prediction accuracy by attenuating specific effects of individual cycles. Expected reliability was a robust predictor of accuracies obtained with different calibration sets. The transition from phenotypic to genome-based selection requires a profound understanding of factors that determine genomic prediction accuracy. We analysed experimental data from a commercial maize breeding programme to investigate if genomic measures can assist in identifying optimal calibration sets for model training. The data set consisted of six contiguous selection cycles comprising testcrosses of 5968 doubled haploid lines genotyped with a minimum of 12,000 SNP markers. We evaluated genomic prediction accuracies in two independent prediction sets in combination with calibration sets differing in sample size and genomic measures (effective sample size, average maximum kinship, expected reliability, number of common polymorphic SNPs and linkage phase similarity). Our results indicate that across selection cycles prediction accuracies were as high as 0.57 for grain dry matter yield and 0.76 for grain dry matter content. Including data from all selection cycles in model training yielded the best results because interactions between calibration and prediction sets as well as the effects of different testers and specific years were attenuated. Among genomic measures, the expected reliability of genomic breeding values was the best predictor of empirical accuracies obtained with different calibration sets. For grain yield, a large difference between expected and empirical reliability was observed in one prediction set. We propose to use this difference as guidance for determining the weight phenotypic data of a given selection cycle should receive in model retraining and for selection when both genomic breeding values and phenotypes are available.Entities:
Mesh:
Year: 2021 PMID: 34117908 PMCID: PMC8354938 DOI: 10.1007/s00122-021-03880-5
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Description of data sets S1 to S6 tested in the years 2010 to 2015, respectively. Given are the sample size (N), the number of parents and crosses from which DH lines were derived, the median [minimum–maximum] number of DH lines per parent and cross, the number of locations and the number of testers used for evaluating each data set
| Data set | No. of parents | No. of lines per crosses | No. of | Locationsa | Testers | ||
|---|---|---|---|---|---|---|---|
| Parent | Cross | ||||||
| S1 | 928 | 52 | 173 | 21 [1–203] | 3 [1–63] | 6 (4) | 2 |
| S2 | 842 | 73 | 287 | 12 [1–129] | 2 [1–26] | 6 (3.4) | 2 |
| S3 | 1085 | 148 | 246 | 6 [1–115] | 1 [1–28] | 7 (4.5) | 4 |
| S4 | 1017 | 58 | 130 | 13 [1–455] | 4 [1–47] | 6 | 1 |
| S5 | 1545 | 145 | 607 | 5 [1–62] | 2 [1–31] | 5 | 1 |
| S6 | 551 | 36 | 228 | 30 [2–82] | 2 [1–6] | 5 | 1 |
aIf DH lines were not tested in all locations, numbers in parentheses indicate the average
Mean, minimum and maximum of BLUEs, variance components and heritabilities for traits grain dry matter yield (GDY) and grain dry matter content (GDC) for data sets S1 to S6
| Trait | Set | Mean | Minimum | Maximum | |||
|---|---|---|---|---|---|---|---|
| S1 | 128 | 95 | 146 | 35.6 | 24.3 | 0.85 | |
| S2 | 144 | 111 | 163 | 43.4 | 41.1 | 0.78 | |
| GDY | S3 | 142 | 113 | 163 | 16.9 | 47.6 | 0.75 |
| S4 | 120 | 97 | 136 | 12.9 | 61.4 | 0.56 | |
| S5 | 144 | 110 | 168 | 52.8 | 87.2 | 0.74 | |
| S6 | 124 | 87 | 143 | 20.7 | 93.7 | 0.52 | |
| S1 | 69 | 65 | 74 | 1.20 | 0.20 | 0.96 | |
| S2 | 72 | 66 | 77 | 1.92 | 0.19 | 0.97 | |
| GDC | S3 | 70 | 66 | 75 | 0.80 | 2.27 | 0.75 |
| S4 | 69 | 66 | 73 | 1.04 | 0.53 | 0.92 | |
| S5 | 70 | 67 | 73 | 0.70 | 0.44 | 0.88 | |
| S6 | 69 | 66 | 72 | 0.88 | 0.61 | 0.88 |
aVariance component represents the genotype × location and the residual variance
Fig. 1Principal coordinate analysis of pairwise realised kinship coefficients of 5968 DH lines. DH lines are coloured according to their grouping in data sets. Axis labels show the percentage of variance explained by the coordinate
Mean and range of prediction accuracy (r), effective sample size (N) of calibration sets, number of polymorphic SNPs shared by the calibration and prediction set (nPoly), average maximum kinship (u), linkage phase similarity (LPS) and trait-specific reliability (ρ2) for prediction sets S5 and S6 in combination with all possible calibration sets (15 for S5, 31 for S6)
Pairwise correlations between sample size N, genomic measures effective sample size (Neff), number of polymorphic SNPs shared by the calibration and prediction set (nPoly), average maximum kinship (umax), linkage phase similarity (LPS), expected trait-specific reliability (ρ2) and empirical trait-specific prediction accuracy (r). In the upper triangle, values are based on combinations of 15 calibration sets with S5 as the prediction set; in the lower triangle, values are based on combinations of 31 calibration sets with S6 as the prediction set
Fig. 2Prediction accuracies for grain dry matter yield (GDY) and grain dry matter content (GDC) for prediction set S5 (orange) and S6 (grey)
Fig. 3Prediction accuracy for grain dry matter yield (GDY) and grain dry matter content (GDC) as a function of sample size assessed by repeated sampling from combined calibration set S_1_2_3_4_5
Fig. 4Relationship of prediction accuracy for grain yield and sample size (N), effective sample size (Neff), average maximum kinship (umax), reliability ρ2, number of polymorphic SNPs shared by the calibration and prediction set (nPoly) and linkage phase similarity (LPS) for 15 calibration sets predicting genomic breeding values (GBV) in S5 (orange) and 31 calibration sets predicting GBVs in S6 (grey)
Regression analysis of prediction accuracy for grain dry matter yield (GDY) and grain dry matter content (GDC) on genomic measures characterising the 46 possible combinations of calibration and prediction sets. Significance (p-value), Akaike information criterion (AIC) and explained variance (Radj2) are given for models fitting sample size (N), effective sample size (Neff), number of polymorphic SNPs shared by the calibration and prediction set (nPoly), average maximum kinship (umax), linkage phase similarity (LPS) and trait-specific reliability (ρ2) in combination with the affiliation to the prediction set (PS) as covariates. The last row presents results from the best model selected by stepwise regression
| GDY | GDC | ||||||
|---|---|---|---|---|---|---|---|
| Model | AIC | Model | AIC | ||||
| PS | 0.54 | − 225 | PS | 0.06 | − 271 | ||
| PS + | 0.74 | 5.9E−07 | − 250 | PS + | 0.50 | 1.2E−07 | − 299 |
| PS + | 0.70 | 1.4E−05 | − 242 | PS + | 0.23 | 2.5E−03 | − 307 |
| PS + nPoly | 0.63 | 1.2E−03 | − 244 | PS + nPoly | 0.23 | 2.0E−03 | − 279 |
| PS + | 0.69 | 3.1E−05 | − 235 | PS + | 0.59 | 2.4E−09 | − 279 |
| PS + LPS | 0.73 | 1.7E−06 | − 248 | PS + LPS | 0.71 | 8.4E−13 | − 324 |
| PS + | 0.80 | 1.0E−07 | − 263 | PS + | 0.75 | 4.7E−14 | − 330 |
| PS + nPoly + | 0.81 | − 264 | PS + | 0.84 | − 347 | ||