| Literature DB >> 31088906 |
Benjamin B Stewart-Brown1, Qijian Song2, Justin N Vaughn3, Zenglu Li4.
Abstract
Genomic selection (GS) has become viable for selection of quantitative traits for which marker-assisted selection has often proven less effective. The potential of GS for soybean was characterized using 483 elite breeding lines, genotyped with BARCSoySNP6K iSelect BeadChips. Cross validation was performed using RR-BLUP and predictive abilities (r MP) of 0.81, 0.71, and 0.26 for protein, oil, and yield, were achieved at the largest tested training set size. Minimal differences were observed when comparing different marker densities and there appeared to be inflation in r MP due to population structure. For comparison purposes, two additional methods to predict breeding values for lines of four bi-parental populations within the GS dataset were tested. The first method predicted within each bi-parental population (WP method) and utilized a training set of full-sibs of the validation set. The second method utilized a training set of all remaining breeding lines except for full-sibs of the validation set to predict across populations (AP method). The AP method is more practical as the WP method would likely delay the breeding cycle and leverage smaller training sets. Averaging across populations for protein and oil content, r MP for the AP method (0.55, 0.30) approached r MP for the WP method (0.60, 0.52). Though comparable, r MP for yield was low for both AP and WP methods (0.12, 0.13). Based on increases in r MP as training sets increased and the effectiveness of WP vs. AP method, the AP method could potentially improve with larger training sets and increased relatedness between training and validation sets.Entities:
Keywords: GenPred; Genomic Prediction; Genomic selection; RR-BLUP; Seed composition; Seed yield; Shared Data Resources; Soybean
Mesh:
Year: 2019 PMID: 31088906 PMCID: PMC6643879 DOI: 10.1534/g3.118.200917
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of genomic selection (GS) dataset
| Set | Generation | # of pedigrees per set | # of breeding lines per set | # of pedigrees for GS | # of breeding lines for GS | Oil (Y/N) | Protein (Y/N) | Yield (Y/N) | Descriptor for GS |
|---|---|---|---|---|---|---|---|---|---|
| Set1-2 | F5:7 | 1 | 84 | 1 | 84 | Y | Y | Y | Pop1 |
| Set3-4 | F5:7 | 1 | 84 | 1 | 84 | Y | Y | Y | Pop2 |
| Set5-6 | F5:7 | 1 | 84 | 1 | 82 | Y | Y | Y | Pop3 |
| Set7-8 | F5:7 | 1 | 84 | 1 | 84 | Y | Y | Y | Pop4 |
| Set9-11 | F5:8 | 14 | 102 | 12 | 82 | N | N | Y | Ped1-12 |
| Set12-14 | F5:8 | 12 | 102 | 10 | 67 | Y | Y | Y | Ped13-22 |
| 540 | 483 |
Figure 1Diagram displaying the three methods performed for estimating predictive ability within the genomic selection dataset. (A) Perform cross-validation using the entire mixed population as both the validation set and training set (EGSD method), (B) Perform cross-validation within bi-parental populations using Pop1-4 individually as the validation set and training set (WP method); and (C) Predict across populations using one of Pop1-4 as the validation set and the remaining breeding lines as the training set (AP method).
Figure 2Principle component analysis of genomic selection dataset.
Figure 3Boxplots of the effect of training set size (NP) on predictive ability (rMP) for each trait when utilizing the entire genomic selection dataset (EGSD) method. Solid line represents median and dotted line represents mean.
Figure 4Boxplots of the effect of marker density (NM) on predictive ability (rMP) for each trait when utilizing the entire genomic selection dataset (EGSD) method. Number of markers indicated in parentheses. Solid line represents median and dotted line represents average.
Figure 5Graph displaying the effect of training set size (NP) on predictive ability (rMP) for each trait when contrasting the within population (WP) method vs. the across population (AP) method. rMP was averaged across the four validation sets (Pop1-4). The WP method was indicated with a horizontal dashed line while the AP method was indicated with a solid trend line across TS sizes. For the WP method, a single training set size of 50 breeding lines was used.
Figure 6Effects of population structure on prediction of oil content when utilizing the entire genomic selection dataset (EGSD) method. (A) PCA of genomic prediction population using all SNPs. (B) PCA of genomic prediction population using 8th tag SNPs. (C) Average predicted GEBV vs. observed BLUP values when using all SNPs. (D) Average predicted GEBV vs. observed BLUP values when using 8th tag SNPs. (E) Average predicted GEBV vs. observed BLUP within Pop1-4 when using all SNPs. (F) Average predicted GEBV vs. observed BLUP within Pop1-4 when using 8th tag SNPs. Correlation coefficients presented within scatterplots (C-F).