| Literature DB >> 27440921 |
Amber Hoffstetter1, Antonio Cabrera1, Mao Huang1, Clay Sneller2.
Abstract
Genomic selection (GS) is a breeding tool that estimates breeding values (GEBVs) of individuals based solely on marker data by using a model built using phenotypic and marker data from a training population (TP). The effectiveness of GS increases as the correlation of GEBVs and phenotypes (accuracy) increases. Using phenotypic and genotypic data from a TP of 470 soft winter wheat lines, we assessed the accuracy of GS for grain yield, Fusarium Head Blight (FHB) resistance, softness equivalence (SE), and flour yield (FY). Four TP data sampling schemes were tested: (1) use all TP data, (2) use subsets of TP lines with low genotype-by-environment interaction, (3) use subsets of markers significantly associated with quantitative trait loci (QTL), and (4) a combination of 2 and 3. We also correlated the phenotypes of relatives of the TP to their GEBVs calculated from TP data. The GS accuracy within the TP using all TP data ranged from 0.35 (FHB) to 0.62 (FY). On average, the accuracy of GS from using subsets of data increased by 54% relative to using all TP data. Using subsets of markers selected for significant association with the target trait had the greatest impact on GS accuracy. Between-environment prediction accuracy was also increased by using data subsets. The accuracy of GS when predicting the phenotypes of TP relatives ranged from 0.00 to 0.85. These results suggest that GS could be useful for these traits and GS accuracy can be greatly improved by using subsets of TP data.Entities:
Keywords: GenPred; breeding values; genomic selection; shared data resource; soft wheat
Mesh:
Substances:
Year: 2016 PMID: 27440921 PMCID: PMC5015948 DOI: 10.1534/g3.116.032532
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Accuracy of genomic selection for grain yield (all environments, GYA; Wooster Ohio, GYW; Northwest Ohio, GYN), Fusarium Head Blight resistance (FHB), softness equivalence (SE), and flour yield (FY) using either all TP data (470 lines and 33,169 markers) or subsets of lines (n < 470) chosen for low genotype-by-environment interactions, and subsets of marker data (M2–M6) chosen based on significance criteria (see Materials and Methods)
| Trait | M1 | M2 | M3 | M4 | M5 | M6 |
|---|---|---|---|---|---|---|
| No. markers for GYA, GYW, GYN | 33169 | 2902 | 2524 | 664 | 293 | 362 |
| GYA, | 0.45 | 0.77 | 0.76 | 0.73 | 0.74 | 0.73 |
| GYA, | 0.43 | 0.82 | 0.82 | 0.81 | 0.79 | 0.79 |
| GYW, | 0.57 | 0.79 | 0.79 | 0.77 | 0.77 | 0.75 |
| GYW, | 0.35 | 0.60 | 0.60 | 0.57 | 0.55 | 0.57 |
| GYN, | 0.41 | 0.36 | 0.36 | 0.27 | 0.21 | 0.28 |
| GYN, | 0.33 | 0.77 | 0.73 | 0.73 | 0.70 | 0.59 |
| GYW to predict GYN, | −0.07 | −0.10 | −0.09 | −0.08 | −0.06 | −0.08 |
| GYW to predict GYN, | 0.13 | 0.31 | 0.32 | 0.33 | 0.35 | 0.37 |
| GYN to predict GYW, | −0.10 | −0.16 | −0.15 | −0.15 | −0.14 | −0.15 |
| GYN to predict GYW, | 0.13 | 0.28 | 0.29 | 0.29 | 0.30 | 0.37 |
| No. markers for FHB | 33169 | 1556 | 1031 | 286 | 134 | |
| FHB, | 0.35 | 0.64 | 0.62 | 0.62 | 0.58 | — |
| FHB, | 0.37 | 0.81 | 0.79 | 0.78 | 0.72 | — |
| 2010 FHB to predict 2011 FHB, | 0.13 | 0.17 | 0.17 | 0.17 | 0.21 | — |
| 2010 FHB to predict 2011 FHB, | 0.13 | 0.17 | 0.18 | 0.16 | 0.20 | — |
| 2011 FHB to predict 2010 FHB, | 0.15 | 0.32 | 0.31 | 0.32 | 0.36 | — |
| 2011 FHB to predict 2010 FHB, | 0.16 | 0.38 | 0.37 | 0.38 | 0.42 | — |
| No. markers for SE | 33169 | 1672 | 1133 | 330 | 151 | |
| SE, | 0.51 | 0.87 | 0.85 | 0.83 | 0.80 | — |
| SE, | 0.51 | 0.89 | 0.87 | 0.85 | 0.82 | — |
| 2010 SE to predict 2011 SE, | 0.33 | 0.57 | 0.53 | 0.59 | 0.59 | — |
| 2010 SE to predict 2011 SE, | 0.32 | 0.58 | 0.53 | 0.59 | 0.50 | — |
| 2011 SE to predict 2010 SE, | 0.24 | 0.32 | 0.30 | 0.33 | 0.37 | — |
| 2011 SE to predict 2010 SE, | 0.24 | 0.32 | 0.31 | 0.33 | 0.26 | — |
| No. markers for FY | 33169 | 1632 | 968 | 316 | 166 | |
| FY, | 0.62 | 0.91 | 0.88 | 0.87 | 0.84 | — |
| FY, | 0.62 | 0.91 | 0.88 | 0.87 | 0.84 | — |
| 2010 FY to predict 2011 FY, | 0.56 | 0.70 | 0.68 | 0.7 | 0.70 | — |
| 2010 FY to predict 2011 FY, | 0.59 | 0.74 | 0.72 | 0.74 | 0.74 | — |
| 2011 FY to predict 2010 FY, | 0.47 | 0.49 | 0.49 | 0.49 | 0.49 | — |
| 2011 FY to predict 2010 FY, | 0.48 | 0.50 | 0.50 | 0.5 | 0.50 | — |
Accuracy is shown for the trait itself using cross-validation and when using data from one set of environments to predict the phenotype from the other set of environments. Also shown is the number of markers in each marker subset for each trait.
Genomic selection accuracy (r), standard deviation of accuracy (σ), relative efficiency per cycle (REc), and relative efficiency per year (REy) of three genomic selection models using a TP of 470 wheat lines and 10-fold cross-validation for four traits: grain yield over all six environments (GYA), grain yield at Northwest Ohio environments (4 environments, GYN), grain yield at Wooster, Ohio (2 environments, GYW), flour yield (FY), softness equivalence (SE), and Fusarium Head Blight index (FHBI)
| Ridge-Regression BLUP | Random Forest | Bayesian LASSO | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Trait | σ | REc | REyr | σ | REc | REyr | σ | REc | REyr | |||
| GYA | 0.45 | 0.01 | 0.58 | 4.1 | 0.48 | 0.01 | 0.62 | 4.3 | 0.12 | 0.02 | 0.15 | 1.1 |
| GYN | 0.41 | 0.01 | 0.58 | 4.1 | 0.42 | 0.01 | 0.59 | 4.1 | 0.14 | 0.02 | 0.20 | 1.4 |
| GYW | 0.57 | 0.01 | 0.67 | 4.7 | 0.57 | 0.01 | 0.69 | 4.8 | 0.57 | 0.01 | 0.68 | 4.8 |
| FY | 0.62 | 0.01 | 0.67 | 3.4 | 0.63 | 0.01 | 0.68 | 3.4 | 0.22 | 0.01 | 0.24 | 1.2 |
| SE | 0.51 | 0.01 | 0.53 | 2.7 | 0.49 | 0.01 | 0.52 | 2.6 | 0.06 | 0.02 | 0.06 | 0.3 |
| FHB | 0.35 | 0.01 | 0.46 | 2.3 | 0.37 | 0.01 | 0.48 | 2.4 | 0.17 | 0.02 | 0.22 | 1.1 |
RRBLUP and Bayesian LASSO (BLR) were run for 1500 cycles and RF was run for 500 cycles.
Accuracy is the Pearson’s correlation between the phenotype and the genomic estimated breeding value.
Relative efficiency per cycle calculated by .
Relative efficiency per year calculated as REc times the ratio of years in a cycle of phenotypic selection to years in a cycle of genomic selection.
Genomic selection accuracy obtained using data from the TP to calculate genomic estimated breeding values (GEBVs) for lines in the Parental (PP) and Validation (VP) populations and correlating those GEBVs to the phenotypes of the PP and VP lines
| PP – Phenotypes | PP – True Breeding Values | VP – Phenotypes | ||||
|---|---|---|---|---|---|---|
| Trait | Unweighted | Weighted | Unweighted | Weighted | All VP Lines | 85 Most Related VP Lines |
| GYA | 0.02 | 0.08 | 0.44 | 0.34 | −0.17 | −0.19 |
| GYN | −0.41 | −0.23 | −0.02 | 0.16 | −0.17 | −0.16 |
| GYW | 0.57 | 0.67 | 0.32 | 0.31 | −0.25 | −0.27 |
| FY | −0.15 | 0.50 | 0.13 | 0.55 | 0.05 | 0.05 |
| SE | 0.10 | 0.85 | 0.00 | −0.22 | 0.27 | 0.33 |
| FHB | 0.14 | 0.47 | −0.05 | −0.03 | 0.22 | 0.22 |
For the PP we also correlated the GEBVs to the estimates of the TBVs of the PP lines. We used unweighted and weighted correlations in the PP: the weights were the percentage TP parentage that was derived from each PP line. In the VP the correlation was performed using all VP lines or just the 85 VP lines that had a pedigree relationship to the TP. The traits are grain yield over all five environments (GYA), grain yield at Northwest Ohio environments (GYN), grain yield at Wooster Ohio environments (GYW), flour yield (FY), softness equivalence (SE), and Fusarium Head Blight index (FHB).
Accuracy is the Pearson’s correlation between the phenotype and the GEBV of the 21 parental lines.
Average phenotype of the top three (T3) and bottom three (B3) of the 21 parental lines (PP) and validation population lines (VP) as ranked based on their genomic estimated breeding values (GEBVs) that were predicted using data from the training population (TP) and the Ridge-Regression BLUP model
| PP | VP | |||||||
|---|---|---|---|---|---|---|---|---|
| Trait | Avg. T3 | Avg. B3 | T3-B3 | P-value | Avg. T3 | Avg. B3 | T3-B3 | P-Value |
| GYA | 4782 | 4600 | 181 | 0.45 | 5100 | 5442 | −343 | 0.52 |
| GYN | 4529 | 4450 | 78 | 0.67 | 4074 | 3955 | 119 | 0.34 |
| GYW | 5496 | 4616 | 880 | 0.06 | 5846 | 5908 | −63 | 0.84 |
| FY | 69.0 | 68.9 | 0.13 | 0.92 | 69.0 | 64.4 | 4.6 | 0.02 |
| SE | 58.7 | 57.4 | 1.3 | 0.09 | 58.7 | 56.4 | 2.3 | 0.46 |
| FHB | 16.4 | 24.2 | −7.8 | 0.07 | 9.6 | 16.5 | −6.9 | 0.06 |
Values are shown for grain yield over all environments (GYA, kg hectare-1), grain yield at Northwest, Ohio environments (GYN), grain yield at Wooster, Ohio environments (GYW), flour yield (FY), softness equivalence (SE), and Fusarium Head Blight index (FHB).
Figure 1Principal component analysis of the 470 wheat lines of the training population (TP) and the 94 lines of the validation population (VP) using data from markers scored in both populations. Five groups of TP lines were defined by Cabrera and are identified as groups 1 to 5. The VP are identified by the “+” and the “*” where the “+” represents the 85 lines with varying relationship to the TP and the “*” represent the 17 VP lines that are half sibs of the TP.