| Literature DB >> 29138904 |
M Ben Hassen1, T V Cao2, J Bartholomé2, G Orasen1, C Colombi3, J Rakotomalala4, L Razafinimpiasa4, C Bertone1, C Biselli5, A Volante6, F Desiderio5, L Jacquin2, G Valè6, N Ahmadi7.
Abstract
KEY MESSAGE: Rice breeding programs based on pedigree schemes can use a genomic model trained with data from their working collection to predict performances of progenies produced through rapid generation advancement. So far, most potential applications of genomic prediction in plant improvement have been explored using cross validation approaches. This is the first empirical study to evaluate the accuracy of genomic prediction of the performances of progenies in a typical rice breeding program. Using a cross validation approach, we first analyzed the effects of marker selection and statistical methods on the accuracy of prediction of three different heritability traits in a reference population (RP) of 284 inbred accessions. Next, we investigated the size and the degree of relatedness with the progeny population (PP) of sub-sets of the RP that maximize the accuracy of prediction of phenotype across generations, i.e., for 97 F5-F7 lines derived from biparental crosses between 31 accessions of the RP. The extent of linkage disequilibrium was high (r 2 = 0.2 at 0.80 Mb in RP and at 1.1 Mb in PP). Consequently, average marker density above one per 22 kb did not improve the accuracy of predictions in the RP. The accuracy of progeny prediction varied greatly depending on the composition of the training set, the trait, LD and minor allele frequency. The highest accuracy achieved for each trait exceeded 0.50 and was only slightly below the accuracy achieved by cross validation in the RP. Our results thus show that relatively high accuracy (0.41-0.54) can be achieved using only a rather small share of the RP, most related to the PP, as the training set. The practical implications of these results for rice breeding programs are discussed.Entities:
Mesh:
Year: 2017 PMID: 29138904 PMCID: PMC5787227 DOI: 10.1007/s00122-017-3011-4
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Genomic selection studies conducted on rice
| Plant materiel | Phenotypic data | Genotypic data | Statistical methods | Range of accuracy of GEBV | Main conclusion | References |
|---|---|---|---|---|---|---|
| 110 Asian cultivars | Eight traits including days to flowering (FL) | 3071 SNPs | rrBLUP, ENet, GBLUP, RKHS, RF, Lasso, BL, EBL, wBSR | FL: 0.65–0.85 | Reliability depended to a great extent on the traits targeted. Reliability was low when only a small number of cultivars were used for validation | Onogi et al. ( |
| Highly structured diversity panel of 413 accessions | Eight traits including grain yield (GY), flowering date (FL) and plant height (PH) | 36,901 SNPs (1 SNP per 10 Kb) | GBLUP | FL: 0.25–0.60 | Maximizing the phenotypic variance captured by the training set is important for optimal performance. Stratified sampling of the training set ensures better accuracy than sampling based on the CDmean | Isidro et al. ( |
| 15 traits of rather high heritability, including flowering time (FL), plant height (PH) and protein content | 36,901 SNPs (1 SNP per 10 Kb) | GBLUP, GBLUP-CPS | FL: 0.44–0.66 | Prediction accuracy was affected by the genomic relationship between TP and VP and by genomic heritability in the TP and VP | Guo et al. ( | |
| 369 elite breeding lines | Six traits including days to flowering (FL) and grain yield (GY) | 73,147 SNPs | rrBLUP, BL, RKHS, RF, | FL: 0.35–0.65 | Using one marker every 0.2 cM was sufficient for genomic selection in this collection of rice breeding material. rrBLUP was the most efficient statistical method for GY where no marked effect of QTLs was detected by GWAS | Spindel et al. ( |
| 354 S3: 4 lines | Days to flowering (FL), plant height (PH) and grain yield (GY) | 8336 SNPs | rrBLUP, GBLUP, Lasso, BL | FL: 0.20–0.30 | Accuracy of GEBV was affected by (i) relatedness between TP and CP, (ii) trait heritability and interaction between traits and all the other factors studied (prediction models, LD, MAF, composition of the TP) | Grenier et al. ( |
| 115 lines of hybrids | Eight traits including grain yield (GY), and plant height (PH) | 2395,866 SNPs | GBLUP, GBLUP dominance effects | FL: – | Model including the dominance effect provided more accurate prediction, particularly in multi-traits scenario for a low-heritability target trait, with highly correlated auxiliary traits | Wang et al. ( |
Size of the incidence matrices used in the cross validation experiments in the reference population
| LD ( | Minor allele frequency (MAF) | |||||
|---|---|---|---|---|---|---|
| ≥ 5% | ≥ 10% | ≥ 20% | ||||
|
|
|
|
|
|
| |
| ≤ 0.25 | 3322 | 8.7 | 1927 | 5.0 | 1173 | 3.1 |
| ≤ 0.36 | 5365 | 14.0 | 3450 | 9.0 | 2270 | 5.9 |
| ≤ 0.49 | 8324 | 21.7 | 5738 | 14.9 | 4013 | 10.5 |
| ≤ 0.64 | 12,099 | 31.5 | 8744 | 22.8 | 6095 | 15.9 |
| ≤ 0.81 | 16,923 | 44.1 | 12,652 | 34.2 | 8917 | 23.2 |
| ≤ 0.98 | 28,164 | 73.3 | 23,119 | 60.2 | 16,750 | 43.6 |
| ≤ 1 | 32,066 | 83.5 | 26,845 | 69.9 | 20,104 | 52.4 |
N total number of SNPs, D SNP density per Mb
Scenarios for genomic prediction across generations
| Scenario | Training set | Validation set |
|---|---|---|
| S1 | 31 parents | 97 progeny |
| S2 | 58 related accessions | 97 progeny |
| S3 | 31 parents + 58 related accessions | 97 progeny |
| S4 | 31 parents + 252 accessions | 97 progeny |
| S5 | 252 accessions, excluding the parents | 97 progeny |
| S6 | 100 random sampling of 31 accessions, excluding the parents | 97 progeny |
Fig. 1Distribution of phenotypic values for days to flowering (FL), nitrogen balance index (NI) and panicles weight (PW) in the reference and the progeny populations
Variance components of three phenotypic traits in the reference and progeny populations
| Population | Factors | FL | NI | PW |
|---|---|---|---|---|
| Reference population | Genotype | 47.78*** | 6.17*** | 5023.13*** |
| Year | 16.82NS | 2.96NS | 222.18NS | |
| Year × genotype | 4.36*** | 4.08*** | 889.8*** | |
| Residual | 5.95 | 16.74 | 2378.04 | |
|
| 0.937 (0.007) | 0.558 (0.052) | 0.852 (0.018) | |
| Progeny population | Genotype | 23.2*** | 4.12*** | 2698.61*** |
| Year | 55.47NS | 4.99NS | 16.19NS | |
| Year × genotype | 7.38*** | 0.7*** | 415.03*** | |
| Residual | 2.27 | 3.72 | 554.11 | |
|
| 0.849 (0.031) | 0.798 (0.041) | 0.899 (0.021) |
FL days to flowering, NI nitrogen balance index, PW 100 panicle weight, H broad sense heritability, NS not significant
***Significant at p = 0.001
Fig. 2Patterns of decay in linkage disequilibrium in the reference population (red) and in the progeny population (gray). The curve represents the average r 2 among the 12 chromosomes and the bars represent the associated standard deviation (color figure online)
Fig. 3Unweighted neighbor-joining tree based on simple matching distances constructed from the genotype of 284 accessions of the reference population (RP) and 97 lines of the progeny population (PP), using 4824 SNP markers. Red: parental lines (PL); Black and blue: RP accessions belonging to tropical japonica and temperate japonica, respectively; Green: PP (color figure online)
Fig. 4Accuracy of genomic prediction in cross validation experiments in the reference population for days to flowering (FL), nitrogen balance index (NI) and 100 panicle weight (PW), obtained with 3 statistical methods, BayesB, GBLUP and RKHS
ANOVA of factors affecting the transformed accuracy (Z) of the 63 cross validation experiments per trait in the reference population
| Model | Trait |
| CV | RMSE | Mean | Source |
| SS | MS |
| Prob |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Main effects only | FL | 0.5650 | 3.27 | 0.024 | 0.74 | Model | 10 | 0.0393 | 0.0039 | 6.75 | < .0001 |
| Error | 52 | 0.0302 | 0.0006 | ||||||||
| Corrected total | 62 | 0.0695 | |||||||||
| LD | 6 | 0.0127 | 0.0021 | 3.63 | 0.0044 | ||||||
| MAF | 2 | 0.0068 | 0.0034 | 5.85 | 0.0051 | ||||||
| Method | 2 | 0.0198 | 0.0099 | 17.02 | < .0001 | ||||||
| NI | 0.9404 | 2.82 | 0.016 | 0.56 | Model | 10 | 0.2014 | 0.0201 | 82.02 | < .0001 | |
| Error | 52 | 0.0128 | 0.0002 | ||||||||
| Corrected total | 62 | 0.2142 | |||||||||
| LD | 6 | 0.1780 | 0.0297 | 120.8 | < .0001 | ||||||
| MAF | 2 | 0.0055 | 0.0028 | 11.22 | < .0001 | ||||||
| Method | 2 | 0.0179 | 0.0090 | 36.47 | < .0001 | ||||||
| PW | 0.8849 | 1.96 | 0.013 | 0.67 | Model | 10 | 0.0692 | 0.0069 | 39.99 | < .0001 | |
| Error | 52 | 0.0090 | 0.0002 | ||||||||
| Corrected total | 62 | 0.0782 | |||||||||
| LD | 6 | 0.0679 | 0.0113 | 65.34 | < .0001 | ||||||
| MAF | 2 | 0.0009 | 0.0005 | 2.61 | 0.0834 | ||||||
| Method | 2 | 0.0005 | 0.0002 | 1.31 | 0.2792 | ||||||
| Main effects + first-order interactions | FL | 0.9742 | 1.17 | 0.009 | 0.74 | Model | 38 | 0.0677 | 0.0018 | 23.82 | < .0001 |
| Error | 24 | 0.0018 | 0.0001 | ||||||||
| Corrected total | 62 | 0.0695 | |||||||||
| LD | 6 | 0.0127 | 0.0021 | 28.25 | < .0001 | ||||||
| MAF | 2 | 0.0068 | 0.0034 | 45.44 | < .0001 | ||||||
| Method | 2 | 0.0198 | 0.0099 | 132.28 | < .0001 | ||||||
| LD × method | 12 | 0.0126 | 0.0010 | 14.01 | < .0001 | ||||||
| LD × MAF | 12 | 0.0139 | 0.0012 | 15.43 | < .0001 | ||||||
| MAF × method | 4 | 0.0020 | 0.0005 | 6.74 | 0.0009 | ||||||
| NI | 0.9938 | 1.34 | 0.007 | 0.56 | Model | 38 | 0.2129 | 0.0056 | 100.89 | < .0001 | |
| Error | 24 | 0.0013 | 0.0001 | ||||||||
| Corrected total | 62 | 0.2142 | |||||||||
| LD | 6 | 0.1780 | 0.0297 | 534.32 | < .0001 | ||||||
| MAF | 2 | 0.0055 | 0.0028 | 49.62 | < .0001 | ||||||
| Method | 2 | 0.0179 | 0.0090 | 161.3 | < .0001 | ||||||
| LD × method | 12 | 0.0087 | 0.0007 | 13.04 | < .0001 | ||||||
| LD × MAF | 12 | 0.0024 | 0.0002 | 3.55 | 0.004 | ||||||
| MAF × method | 4 | 0.0004 | 0.0001 | 1.72 | 0.178 | ||||||
| PW | 0.9998 | 0.13 | 0.001 | 0.67 | Model | 38 | 0.0782 | 0.0021 | 2610.91 | < .0001 | |
| Error | 24 | 0.0000 | 0.0000 | ||||||||
| Corrected total | 62 | 0.0782 | |||||||||
| LD | 6 | 0.0679 | 0.0113 | 14,349.8 | < .0001 | ||||||
| MAF | 2 | 0.0009 | 0.0005 | 572.6 | < .0001 | ||||||
| Method | 2 | 0.0005 | 0.0002 | 287.19 | < .0001 | ||||||
| LD × method | 12 | 0.0001 | 0.0000 | 13.55 | < .0001 | ||||||
| LD × MAF | 12 | 0.0088 | 0.0007 | 935.42 | < .0001 | ||||||
| MAF × method | 4 | 0.0000 | 0.0000 | 2.13 | 0.1078 |
R coefficient of determination, CV coefficient of variation, RMSE root mean square error, Mean intercept value of the transformed accuracy (Z), FL days to flowering, NI nitrogen balance index, PW 100 panicle weight, LD linkage disequilibrium with 7 levels (LD ≤ 0.25, LD ≤ 0.36, LD ≤ 0.49, LD ≤ 0.64, LD ≤ 0.81, LD ≤ 0.98, LD ≤ 1, MAF minor allele frequency with 3 levels (MAF ≥ 5%, MAF ≥ 10%, MAF ≥ 20%), Method BayesB, GBLUP, RKHS
Fig. 5Accuracy of genomic prediction of progeny phenotype for days to flowering (FL), nitrogen balance index (NI) and 100 panicle weight (PW), obtained with three statistical methods, BayesB, GBLUP and RKHS. The six scenarios are described in Table 3. For scenario S6 that includes random sampling, the average and the 95% confidence interval are shown. 1-a and 1-b, represent incidence matrices with no selection on r 2, but filtered with MAF > 5% and MAF > 2.5%, respectively
ANOVA of factors influencing the transformed accuracy (Z) of 126 progeny prediction experiments for three phenotypic traits
| Model | Trait |
| CV | RMSE | Mean | Source |
| SS | MS |
| Prob |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Main effects only | FL | 0.855 | 14.64 | 0.036 | 0.25 | Model | 13 | 0.862 | 0.066 | 50.77 | < .0001 |
| Error | 112 | 0.146 | 0.001 | ||||||||
| Corrected total | 125 | 1.008 | |||||||||
| LD | 6 | 0.188 | 0.031 | 24 | < .0001 | ||||||
| Method | 2 | 0.003 | 0.002 | 1.22 | 0.2999 | ||||||
| Scenario | 5 | 0.671 | 0.134 | 102.72 | < .0001 | ||||||
| NI | 0.848 | 12.48 | 0.045 | 0.36 | Model | 13 | 1.277 | 0.098 | 48.19 | < .0001 | |
| Error | 112 | 0.228 | 0.002 | ||||||||
| Corrected total | 125 | 1.505 | |||||||||
| LD | 6 | 0.237 | 0.040 | 19.41 | < .0001 | ||||||
| Method | 2 | 0.000 | 0.000 | 0.03 | 0.968 | ||||||
| Scenario | 5 | 1.039 | 0.208 | 102 | < .0001 | ||||||
| PW | 0.876 | 12.17 | 0.047 | 0.39 | Model | 13 | 1.791 | 0.138 | 61.11 | < .0001 | |
| Error | 112 | 0.252 | 0.002 | ||||||||
| Corrected total | 125 | 2.043 | |||||||||
| LD | 6 | 0.035 | 0.006 | 2.6 | 0.0213 | ||||||
| Method | 2 | 0.003 | 0.001 | 0.6 | 0.5497 | ||||||
| Scenario | 5 | 1.753 | 0.351 | 155.53 | < .0001 | ||||||
| Main effects + first-order interactions | FL | 0.978 | 7.79 | 0.019 | 0.25 | Model | 65 | 0.986 | 0.015 | 40.99 | < .0001 |
| Error | 60 | 0.022 | 0.000 | ||||||||
| Corrected total | 125 | 1.008 | |||||||||
| LD | 6 | 0.188 | 0.031 | 84.68 | < .0001 | ||||||
| Method | 2 | 0.003 | 0.002 | 4.3 | 0.018 | ||||||
| Scenario | 5 | 0.671 | 0.134 | 362.49 | < .0001 | ||||||
| LD × method | 12 | 0.014 | 0.001 | 3.06 | 0.002 | ||||||
| LD × scenario | 30 | 0.102 | 0.003 | 9.15 | < .0001 | ||||||
| Method × scenario | 10 | 0.009 | 0.001 | 2.4 | 0.0181 | ||||||
| NI | 0.980 | 6.18 | 0.022 | 0.36 | Model | 65 | 1.475 | 0.023 | 45.45 | < .0001 | |
| Error | 60 | 0.030 | 0.000 | ||||||||
| Corrected total | 125 | 1.505 | |||||||||
| LD | 6 | 0.237 | 0.040 | 79.21 | < .0001 | ||||||
| Method | 2 | 0.000 | 0.000 | 0.13 | 0.8761 | ||||||
| Scenario | 5 | 1.039 | 0.208 | 416.31 | < .0001 | ||||||
| LD × method | 12 | 0.006 | 0.001 | 1.01 | 0.4507 | ||||||
| LD × scenario | 30 | 0.173 | 0.006 | 11.57 | < .0001 | ||||||
| Method × scenario | 10 | 0.019 | 0.002 | 3.78 | 0.0006 | ||||||
| PW | 0.972 | 7.93 | 0.031 | 0.39 | Model | 65 | 1.986 | 0.031 | 31.97 | < .0001 | |
| Error | 60 | 0.057 | 0.001 | ||||||||
| Corrected total | 125 | 2.043 | |||||||||
| LD | 6 | 0.035 | 0.006 | 6.14 | < .0001 | ||||||
| Method | 2 | 0.003 | 0.001 | 1.42 | 0.2499 | ||||||
| Scenario | 5 | 1.753 | 0.351 | 366.83 | < .0001 | ||||||
| LD × method | 12 | 0.012 | 0.001 | 1.04 | 0.4256 | ||||||
| LD × scenario | 30 | 0.120 | 0.004 | 4.18 | < .0001 | ||||||
| Method × scenario | 10 | 0.063 | 0.006 | 6.62 | < .0001 |
R coefficient of determination, CV coefficient of variation, RMSE root mean square error, Mean intercept value of the transformed accuracy (Z), FL days to flowering, NI nitrogen balance index, PW 100 panicle weight, LD linkage disequilibrium with 7 levels (LD ≤ 0.25, LD ≤ 0.36, LD ≤ 0.49, LD ≤ 0.64, LD ≤ 0.81, LD ≤ 0.98, LD ≤ 1), MAF minor allele frequency with 3 levels (MAF ≥ 5%, MAF ≥ 10%, MAF ≥ 20%, Method BayesB, GBLUP, RKHS