| Literature DB >> 26313446 |
Cécile Grenier1, Tuong-Vi Cao2, Yolima Ospina3, Constanza Quintero3, Marc Henri Châtel3, Joe Tohme3, Brigitte Courtois2, Nourollah Ahmadi2.
Abstract
Genomic selection (GS) is a promising strategy for enhancing genetic gain. We investigated the accuracy of genomic estimated breeding values (GEBV) in four inter-related synthetic populations that underwent several cycles of recurrent selection in an upland rice-breeding program. A total of 343 S2:4 lines extracted from those populations were phenotyped for flowering time, plant height, grain yield and panicle weight, and genotyped with an average density of one marker per 44.8 kb. The relative effect of the linkage disequilibrium (LD) and minor allele frequency (MAF) thresholds for selecting markers, the relative size of the training population (TP) and of the validation population (VP), the selected trait and the genomic prediction models (frequentist and Bayesian) on the accuracy of GEBVs was investigated in 540 cross validation experiments with 100 replicates. The effect of kinship between the training and validation populations was tested in an additional set of 840 cross validation experiments with a single genomic prediction model. LD was high (average r2 = 0.59 at 25 kb) and decreased slowly, distribution of allele frequencies at individual loci was markedly skewed toward unbalanced frequencies (MAF average value 15.2% and median 9.6%), and differentiation between the four synthetic populations was low (FST ≤0.06). The accuracy of GEBV across all cross validation experiments ranged from 0.12 to 0.54 with an average of 0.30. Significant differences in accuracy were observed among the different levels of each factor investigated. Phenotypic traits had the biggest effect, and the size of the incidence matrix had the smallest. Significant first degree interaction was observed for GEBV accuracy between traits and all the other factors studied, and between prediction models and LD, MAF and composition of the TP. The potential of GS to accelerate genetic gain and breeding options to increase the accuracy of predictions are discussed.Entities:
Mesh:
Year: 2015 PMID: 26313446 PMCID: PMC4551487 DOI: 10.1371/journal.pone.0136594
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary statistics of the four phenotypic traits, ANOVA results and heritability on phenotypes of the 343 S2:4 lines extracted from four subpopulations.
| Statistics | Phenotypic traits | |||
|---|---|---|---|---|
| FL | PH | YLD | PW | |
| Adjusted means (standard error) | ||||
| Subpopulation PCT4-C0 (n = 86) | 79.59 (0.77) | 90.16 (1.25) | 37.21 (2.55) | 2.09 (0.09) |
| Subpopulation PCT4-C1 (n = 83) | 80.95 (0.79) | 93.27 (1.27) | 45.46 (2.57) | 2.18 (0.09) |
| Subpopulation PCT4-C2 (n = 82) | 78.06 (0.79) | 99.34 (1.28) | 45.09 (2.60) | 2.39 (0.09) |
| Subpopulation PCT11-C1 (n = 92) | 79.72 (0.75) | 98.03 (1.22) | 44.2 (2.49) | 2.36 (0.09) |
| ANOVA results ( | ||||
| Replicate | < 0.0001 | < 0.0001 | 0.0003 | 0.0013 |
| Subpopulation | 0.0672 | <0.0001 | 0.0156 | 0.0086 |
| S2:4 lines (Subpopulation) | < 0.0001 | < 0.0001 | < 0.0001 | 0.0027 |
| Variance components | ||||
| Block (Replicate) | 1.59 | 12.85 | 96.04 | 0.13 |
| S2:4 lines (Subpopulation) | 46.03 | 89.89 | 214.55 | 0.12 |
| Residual | 4.16 | 34.74 | 273.68 | 0.61 |
| Heritability (h2) | 0.86 | 0.58 | 0.29 | 0.10 |
FL: days to flowering; PH: plant height; YLD: grain yield; PW: panicle weight; N and n: number of S2:4 lines that comprise the population and subpopulations, respectively. p-values from Fisher’s test to test the fixed effects.
Fig 1Projection of the 343 S2:3 lines on the first plane of a factorial discriminant analysis using (A) phenotypic data and (B) genotypic data.
Pairwise Fisher distance (F ) and genetic differentiation (F ) between subpopulations, effective population size (Ne) and number of monomorphic loci for each subpopulation (Nπ) out of 3,675 SNP.
| PCT4-C0 | PCT4-C1 | PCT4-C2 | PCT11-C1 |
| Nπ | |
|---|---|---|---|---|---|---|
| PCT4-C0 | 4.526 | 11.568 | 7.814 | 32 ± 0.05 | 836 | |
| PCT4-C1 |
| 10.422 | 5.381 | 28 ± 0.05 | 577 | |
| PCT4-C2 |
|
| 1.249 NS | 48 ± 0.10 | 314 | |
| PCT11-C1 |
|
|
| 53 ± 0.05 | 608 |
Pairwise F are presented above the diagonal and pairwise F below the diagonal.
*, ** and ***: significant with p < 0.01, 0.001 and 0.0001, respectively; NS: non-significant.
Best average accuracies among the 135 GEBVs obtained from the training data sets (TP) and the observed BLUP of the validation data sets (VP) considering each trait.
| Trait | k-fold | LD (r2 ≤) | MAF (≥ %) | Number of SNP | Method | Average Accuracy |
|---|---|---|---|---|---|---|
| PH | 6 | 0.9 | 5 | 4011 | BRR | 0.538 (0.082) |
| PW | 9 | 1 | 5 | 5604 | BL | 0.327 (0.126) |
| YLD | 9 | 0.9 | 5 | 4011 | BL | 0.309 (0.148) |
| FL | 6 | 0.9 | 5 | 4011 | LASSO | 0.295 (0.113) |
FL: days to flowering; PH: plant height; YLD: grain yield; PW: panicle weight; LD: linkage disequilibrium; MAF: minor allele frequency. Methods; LASSO: least absolute shrinkage and selection operator, BL: Bayesian LASSO, BRR: Bayesian ridge regression
Sources of variation of the AA in the first cross validation experiment considering 540 scenarios.
| Model | Source | DF | SS | MS | F Value | Prob F | R2 | CV | Root MSE | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ANOVA fit statistics | 1. Scenarios | Model | 539 | 1092.21 | 2.03 | 127.14 | < .0001 | *** | 0.556 | 39.61 | 0.126 |
| Error | 54625 | 870.60 | 0.02 | ||||||||
| Corrected Total | 55164 | 1962.81 | |||||||||
| 2. Individual factors | Model | 13 | 1020.48 | 78.50 | 4594.18 | < .0001 | *** | 0.520 | 41.01 | 0.131 | |
| Error | 55151 | 942.33 | 0.02 | ||||||||
| Corrected Total | 55164 | 1962.81 | |||||||||
| 3. Factors & interactions | Model | 79 | 1081.65 | 13.69 | 855.94 | < .0001 | *** | 0.551 | 39.68 | 0.126 | |
| Error | 55085 | 881.16 | 0.02 | ||||||||
| Corrected Total | 55164 | 1962.81 | |||||||||
| Test statistics for effects | Controlled factors | Trait | 3 | 987.38 | 329.13 | 19262.40 | < .0001 | *** | |||
| Method | 4 | 21.61 | 5.40 | 316.16 | < .0001 | *** | |||||
| k-fold | 2 | 9.78 | 4.89 | 286.13 | < .0001 | *** | |||||
| LD | 2 | 1.98 | 0.99 | 57.99 | < .0001 | *** | |||||
| MAF | 2 | 0.89 | 0.45 | 26.11 | < .0001 | *** | |||||
| First-order interactions | Method*Trait | 12 | 52.27 | 4.36 | 272.28 | < .0001 | *** | ||||
| LD*Trait | 6 | 4.24 | 0.71 | 44.14 | < .0001 | *** | |||||
| Trait*k-fold | 6 | 1.99 | 0.33 | 20.77 | < .0001 | *** | |||||
| Method*LD | 8 | 1.04 | 0.13 | 8.13 | < .0001 | *** | |||||
| Method*MAF | 8 | 0.78 | 0.10 | 6.10 | < .0001 | *** | |||||
| Method*k-fold | 8 | 0.61 | 0.08 | 4.77 | < .0001 | *** | |||||
| MAF*Trait | 6 | 0.18 | 0.03 | 1.90 | 0.0766 | NS | |||||
| LD*MAF | 4 | 0.02 | 0.00 | 0.28 | 0.8899 | NS | |||||
| MAF*k-fold | 4 | 0.04 | 0.01 | 0.69 | 0.5999 | NS | |||||
| LD*k-fold | 4 | 0.08 | 0.02 | 1.22 | 0.2984 | NS |
Sources of variation were: method (BL, BRR, G-BLUP, RR-BLUP and LASSO), trait (FL, PH, YLD, PW), MAF (≥ 2.5, 5 and 10%), LD (≤ 0.75, 0.9, 1) and k-folds (k = 3, 6, 9).
(1) The denominator term used was the mean square error (MSE) of model 2.
(2) The denominator term used was the mean square error (MSE) of model 3.
Adjusted means (LSMeans) of AA for controlled factors in the first cross validation experiment considering all 540 scenarios.
| Controlled factor | Modality | LSMeans |
|
|---|---|---|---|
| Method | BL | 0.312 | a |
| BRR | 0.307 | b | |
| G-BLUP | 0.304 | b | |
| RR-BLUP | 0.304 | b | |
| LASSO | 0.265 | c | |
| Trait | PH | 0.489 | a |
| PW | 0.274 | b | |
| GW | 0.239 | c | |
| FL | 0.192 | d | |
| k-fold ratio | 9 | 0.308 | a |
| 6 | 0.302 | b | |
| 3 | 0.285 | c | |
| LD (r2) | 0.75 | 0.302 | a |
| 0.90 | 0.301 | a | |
| 1 | 0.291 | b | |
| MAF | 2.5 | 0.301 | a |
| 5 | 0.301 | a | |
| 10 | 0.293 | b |
¶ Different letters indicate significant differences (p < 0.05)
Fig 2Mean correlation between GEBV obtained by cross validation of the training data set (Yp) and the observed BLUP values of the validation data sets (Yo).
Results presented for 2 traits, 9 incidence matrices and 3 k-fold cross validation experiments.
Fig 3Mean correlation between GEBV obtained by cross validation of the training data set (Yp) and the observed BLUP values of the validation data sets (Yo).
The results of 4 different traits and 9 incidence matrices are presented.
Fig 4Mean correlation between GEBV obtained by cross validation of the training data set (Yp) and the observed BLUP values of the validation data sets (Yo).
The results for flowering date (FL) and plant height (PH) and 15 incidence matrices are presented.
Fig 5Mean correlation between GEBV obtained by cross validation of the training data set (Yp) and the observed BLUP values of the validation data sets (Yo).
The results for days to flowering (FL), plant height (PH), panicle weight (PW) and grain yield (YLD) and 35 incidence matrices are presented.
Fig 6Mean correlation between GEBV obtained by cross validation of the training data set (Yp) and the observed BLUP values of the validation data sets (Yo).
Results for day to flowering (FL) are presented for different composition of the validation population (VP).