| Literature DB >> 32457095 |
Marcus O Olatoye1, Lindsay V Clark1, Nicholas R Labonte1, Hongxu Dong2, Maria S Dwiyanti3, Kossonou G Anzoua4, Joe E Brummer5, Bimal K Ghimire6, Elena Dzyubenko7, Nikolay Dzyubenko7, Larisa Bagmet7, Andrey Sabitov7, Pavel Chebukin7, Katarzyna Głowacka8, Kweon Heo9, Xiaoli Jin10, Hironori Nagano4, Junhua Peng11, Chang Y Yu12, Ji H Yoo12, Hua Zhao13, Stephen P Long1, Toshihiko Yamada4, Erik J Sacks1, Alexander E Lipka14.
Abstract
Miscanthus is a perennial grass with potential for lignocellulosic ethanol production. To ensure its utility for this purpose, breeding efforts should focus on increasing genetic diversity of the nothospecies Miscanthus × giganteus (M×g) beyond the single clone used in many programs. Germplasm from the corresponding parental species M. sinensis (Msi) and M. sacchariflorus (Msa) could theoretically be used as training sets for genomic prediction of M×g clones with optimal genomic estimated breeding values for biofuel traits. To this end, we first showed that subpopulation structure makes a substantial contribution to the genomic selection (GS) prediction accuracies within a 538-member diversity panel of predominately Msi individuals and a 598-member diversity panels of Msa individuals. We then assessed the ability of these two diversity panels to train GS models that predict breeding values in an interspecific diploid 216-member M×g F2 panel. Low and negative prediction accuracies were observed when various subsets of the two diversity panels were used to train these GS models. To overcome the drawback of having only one interspecific M×g F2 panel available, we also evaluated prediction accuracies for traits simulated in 50 simulated interspecific M×g F2 panels derived from different sets of Msi and diploid Msa parents. The results revealed that genetic architectures with common causal mutations across Msi and Msa yielded the highest prediction accuracies. Ultimately, these results suggest that the ideal training set should contain the same causal mutations segregating within interspecific M×g populations, and thus efforts should be undertaken to ensure that individuals in the training and validation sets are as closely related as possible.Entities:
Keywords: GenPred; Genomic selection; Miscanthus; Population Structure; Prediction Accuracy; Shared data resources
Mesh:
Year: 2020 PMID: 32457095 PMCID: PMC7341128 DOI: 10.1534/g3.120.401402
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Schematic representation of the methods to account for population structure within Miscanthus sinensis and Miscanthus sacchariflorus diversity panels. Yellow rectangles refer to a given training set of individuals obtained from a fivefold cross validation procedure conducted within each diversity panel. Purple circles refer to the models that are trained; specifically the genomic selection (GS) model is a random regression best linear unbiased prediction (RR-BLUP) model, and the principal components (PC)-only model includes only the top PCs of 5,140 genome-wide markers as explanatory variables (see Figure S1 for scree plots). The blue diamonds refer to the process of randomly selecting 200 individuals and conducting CDmean procedure. Prediction accuracy, quantified as the Pearson correlation between the observed phenotypic values and the genomic estimated breeding values (GEBVs) from each GS model, is then calculated among individuals in the corresponding validation set (gray symbol).
Training populations derived from the Miscanthus sinensis (Msi) and Miscanthus sacchariflorus (Msa) diversity panels that were used to train genomic selection models fitted to predict trait values in the 09F2 population. Note that the CDmean procedure will select different subsets of individuals for each trait.
| Scenario | Individuals from Msi panel used in training set | Individuals from Msa panel used in training set |
|---|---|---|
| Msi.Random | 200 Randomly selected individuals from Msi panel | None |
| Msa.Random | None | 200 Randomly selected individuals from Msa panel |
| Msi.CDmean | 200 Msi individuals selected from CDmean procedure | None |
| Msa.CDmean | None | 200 Msa individuals selected from CDmean procedure |
| Msi.Whole | Entire Msi panel | None |
| Msa.Whole | None | Entire Msa panel |
| Whole.Msi.Msa | Entire Msi panel | Entire Msa panel |
Description of trait simulation in the Miscanthus sinensis (Msi) and Miscanthus sacchariflorus (Msa) diversity panels and simulated 50 F2 populations that were used to perform genomic selection
| Scenario | Description | QTN | Model + QTN # | ||
|---|---|---|---|---|---|
| 1 | The same QTNs in Msi and Msa | Random uniform between 0 and 1 (Same effects in both Msi and Msa) | A20D0E0, A20D4E0, A20D0E4 | 0.60 | 0.37 |
| 2 | Half of the QTNs similar and half different between Msi and Msa | Random uniform between 0 and 1 (Same effects in both Msi and Msa) | A20D0E0, A20D4E0, A20D0E4 | 0.60 | 0.37 |
| 3 | Completely different QTNs used in Msi and Msa | Random uniform between 0 and 1 (Same effects in both Msi and Msa) | A20D0E0, A20D4E0, A20D0E4 | 0.60 | 0.37 |
| 4 | Different QTNs with Large Effects in Msi | QTN from Msi with large effects (random uniform between 0.5 and 0.99) and Msa with small effects (random uniform between 0 and 0.25) | A20D0E0, A20D4E0, A20D0E4 | 0.60 | 0.37 |
| 5 | Different QTNs with Large Effects in Msa | QTN from Msa with large effects (random uniform between 0.5 and 0.99) and Msi with small effects (random uniform between 0 and 0.25) | A20D0E0, A20D4E0, A20D0E4 | 0.60 | 0.37 |
QTN, Quantitative trait nucleotide.
A, D, and E respectively refer to additive, dominance, and additive-by-additive epistatic QTNs. The number after each letter refers to the number of QTNs with that genetic mechanism simulated. For example A20D0E0 means that 20 additive QTN were simulated, 0 dominance QTN were simulated, and 0 epistatic QTN were simulated.
H, broad-sense heritability.
Figure 2Prediction accuracy for within diversity panel genomic selection. Each boxplot represents the distribution of prediction accuracy (Y-axis) across ten replicates of fivefold cross-validation folds for each method (color coded) and for each trait (X-axis), specifically basal circumference (Bcirc; cm), compressed circumference (Ccric; cm), culm length (CmL; cm), diameter of basal internode (DBI; mm), days to first heading (HD1; days), and yield (Yld; g/plant) for (A) Miscanthus sinensis, (B) Miscanthus sacchariflorus. RR-BLUP refers to the random regression best linear unbiased prediction model, while PCA refers to the model where the trait is the response variable and the top principal components of a principal component analysis of 5,140 markers are used as explanatory variables. CDmean refers to the subset of 200 individuals selected using the CD mean procedure, while Random refers to a random subset of 200 individuals. The white dots represent the mean value of each distribution.
Figure 3Prediction accuracy for using Miscanthus sinensis (Msi) and Miscanthus sacchariflorus (Msa) diversity panels to train genomic selection (GS) models for prediction in 09F2 breeding population. Each boxplot represents the distribution of prediction accuracy (Y-axis; across 1,000 bootstraps of the 09F2 breeding population) when the genomic selection model was trained using 200 individuals selected using CDMean (Msa.CDMean, Msi.CDMean), 200 randomly selected individuals (Msa.Random, Msi.Random), whole diversity panels (Msa.Whole, Msi.Whole) and a combination of the GEBVs estimated from Msi and Msa panels (Whole.Msi.Msa). The evaluated traits (X-axis) include basal circumference (Bcirc; cm), compressed circumference (Ccric; cm), culm length (CmL; cm), diameter of basal internode (DBI; mm), days to first heading (HD1; days), and yield (Yld; g/plant). The white dots represent the mean value of each distribution.
Figure 4Prediction accuracy for using Miscanthus sinensis (Msi) and Miscanthus sacchariflorus (Msa) diversity panels to train GS models for making predictions in simulated F2 populations. Boxplots represent a distribution of prediction accuracies (Y-axis) across 50 simulated interspecific F2 populations for simulated traits (X-axis). Boxplots are color coded according to approaches used to select training sets: 200 individuals selected randomly (Msa.Random, Msi.Random) or using the CDMean procedure (Msa.CDMean, Msi.CDMean), whole diversity panels (Msa.Whole, Msi.Whole) and the sum of the genomic estimated breeding values (GEBVs) estimated from GS models fitted separately within the Msi and Msa panels (Whole.Msi.Msa). Traits were simulated using five different scenarios namely; D.QTN (traits simulated with completely different QTN in Msi and Msa but with the same effect sizes), D.QTN.Msa (traits simulated with different QTNs in each of Msi and Msa, with Msa QTNs having large effects while Msi QTNs had small effects), D.QTN.Msi (traits simulated with different QTNs in each of Msi and Msa, with Msi QTNs having large effects while Msa QTNs had small effects), P.QTN (traits where 50% of the QTNs were the same across Msi and Msa, while 50% were different), and S.QTN (traits simulated in Msi and Msa based on the same QTNs and same effect sizes). Three different combinations of additive and non-additive QTNs were considered, specifically (A) traits with 20 additive QTN, 0 dominance QTN, and 0 epistatic QTN, (B) traits with 20 additive QTN, 4 dominance QTN, and 0 epistatic QTN, and (C) traits with 20 additive QTN, 0 dominance QTN, and 4 epistatic QTN. The white dots represent the mean value of each distribution while the black dot represent the prediction accuracy value for the same simulated genetic architecture using polyRAD genetic data in the 09F2 population.