| Literature DB >> 34943880 |
Cesar A Medina1, Harpreet Kaur2, Ian Ray2, Long-Xi Yu1.
Abstract
Agronomic traits such as biomass yield and abiotic stress tolerance are genetically complex and challenging to improve through conventional breeding approaches. Genomic selection (GS) is an alternative approach in which genome-wide markers are used to determine the genomic estimated breeding value (GEBV) of individuals in a population. In alfalfa (Medicago sativa L.), previous results indicated that low to moderate prediction accuracy values (<70%) were obtained in complex traits, such as yield and abiotic stress resistance. There is a need to increase the prediction value in order to employ GS in breeding programs. In this paper we reviewed different statistic models and their applications in polyploid crops, such as alfalfa and potato. Specifically, we used empirical data affiliated with alfalfa yield under salt stress to investigate approaches that use DNA marker importance values derived from machine learning models, and genome-wide association studies (GWAS) of marker-trait association scores based on different GWASpoly models, in weighted GBLUP analyses. This approach increased prediction accuracies from 50% to more than 80% for alfalfa yield under salt stress. Finally, we expended the weighted GBLUP approach to potato and analyzed 13 phenotypic traits and obtained similar results. This is the first report on alfalfa to use variable importance and GWAS-assisted approaches to increase the prediction accuracy of GS, thus helping to select superior alfalfa lines based on their GEBVs.Entities:
Keywords: Medicago sativa; WGBLUP; genomic selection
Mesh:
Year: 2021 PMID: 34943880 PMCID: PMC8699225 DOI: 10.3390/cells10123372
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Indirect selection based on molecular markers. (a) Generalized Manhattan plots illustrating a comparison of GWAS effectiveness in simple (left) vs. complex traits (right). Note: Bold dashed line indicates minimum threshold to select significant markers. A significant signal (i.e., QTL) was identified in the simple trait (left panel), while no defined QTL was identified for the complex trait. Therefore, genomic selection (GS) is more appropriate and practical for complex traits. (b) Common parametric and non-parametric models used in GS and their computational requirements. GBLUP, genomic best linear unbiased prediction; RRBLUP, ridge-regression BLUP; RF, random forest; SVM, support vector machine; MLP, multilayer perceptron; CNN, convolutional neural network; RNN, recurrent neural network.
Different prior distributions for Bayesian models.
| Model | Prior Distribution ‡ | Ref. |
|---|---|---|
| Bayes A |
| [ |
| Bayes B |
| [ |
| Bayes Cπ |
| [ |
| Bayesian LASSO |
| [ |
‡; , is the additive effect of the ; , scaled-t distribution; , degree of freedom; , scale parameters; , fraction of the SNPs that are in linkage disequilibrium with a quantitative trait locus; SNP; , probability of the marker effect equal to zero; , double exponential; , parameter of exponential distribution.
Kernels used in support vector machine (SVM) model. Meta-parameters used for tuning include gamma (), degree of polynomial () and intercept ().
| Kernel | Formula ‡ |
|---|---|
| Linear |
|
| Polynomial |
|
| Radial basis function |
|
| Sigmoidal |
|
‡; are two vectors in the n-dimensional space.
Coding effect assumptions of GWASpoly models according to allele dosage in biallelic SNPs.
| Allele Dosage ¶ | AAAA | AAAB | AABB | ABBB | BBBB |
| Numerical Code | 0 | 1 | 2 | 3 | 4 |
|
|
| ||||
| Diplo-additive | 0.00 | 0.50 | 1.00 | ||
| Diplo-general ‡ | 0.00 | 0.00 < x <1.00 | 1.00 | ||
| Additive | 0.00 | 0.25 | 0.50 | 0.75 | 1.00 |
| 1-dom-ref (A > B simplex) | 1.00 | 1.00 | 1.00 | 1.00 | 0.00 |
| 2-dom-ref (A > B duplex) | 1.00 | 1.00 | 1.00 | 0.00 | 0.00 |
| 1-dom-alt (B > A simplex) | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2-dom-alt (B > A duplex) | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 |
| General † | No restrictions | ||||
¶, allele dosage A is coded as the reference allele and B is coded as the alternative allele; §, phenotypic effects are scaled from 0.00 to 1.00; ‡, for the diplo-general model all heterozygotes have the same effect (x), but x is not constrained to be halfway between the homozygous effects; †, the general model has no restrictions on the effects of the different dosage levels.
Recent achievements in genomic selection (GS) in polyploid crops.
| Crop | Ploidy | Trait § | GS Method | Acc ‡ | Notes | Author |
|---|---|---|---|---|---|---|
|
| Allohexaploid | Seed lipid content | MK-BLUP | 0.48 | Use of additive marker effects of Bayesian models during the construction of G matrix | [ |
|
| Alloteteraploid | Seed yield | GBLUP | 0.69 | Several agronomic and seed quality traits were tested | [ |
|
| Allotetraploid | Canopy diameter | GBLUP | 0.40 | 18 agronomic traits were tested. Diploid dosage assumed | [ |
|
| Paleotetraploid | Wood density | MVGLUP † | 0.77 | Marker selection in multivariate analysis. Requires uses multiple traits highly correlated | [ |
|
| Autotetraploid | Yield | RRBLUP | 0.66 | Multi-environment trials over two generations. First report of GS in alfalfa. | [ |
|
| Autotetraploid | Yield | SVM | 0.35 | Six GS models were tested. First report of machine learning models in alfalfa | [ |
|
| Autotetraploid | Leaf crude protein | RRBLUP | 0.40 | Nine alfalfa forage quality traits were tested by five GS models | [ |
|
| Autotetraploid | Fall plant height | Bayes B | 0.65 | 15 quality traits and 10 agronomic traits were tested using three GS models | [ |
|
| Autotetraploid | Yield under salt stress | SVM | 0.50 | Multi-environment trials with seven yield measurements. Eight GS models were tested | [ |
|
| Autotetraploid | Organic matter | Bayes B-TD | 0.39 | Genomic selection using tetraploid dosage (GS-TD) vs. diploid dosage (GS-DD) | [ |
|
| Autopolyploid | Yield | GBLUP | 0.55 | Incorporation of additive and digenic dominant G covariance matrix | [ |
|
| Autopolyploid | Tuber weight | RKHS | 0.59 | Four agronomic tuber traits were tested by eight GS models | [ |
|
| Octaploid and decaploid | Fiber | GBLUP | 0.44 | Inclusion of additive and non-additive genetic components for GS | [ |
|
| Allohexaploid | Grain yield | GBLUP | 0.47 | Multi-trait selection for grain yield and protein content | [ |
|
| Allohexaploid | Grain yield | GBLUP | 0.53 | GWAS markers as fixed effects in GS models. | [ |
|
| Autotetraploid | Weight | GBLUP | 0.49 | Comparison of allele dosage with depth sequencing: 6×–60×) | [ |
§ For multiple traits, the trait with the highest predictive accuracy was selected; ‡, predictive accuracy measured as Pearson’s correlation; MK-BLUP, multi-kernel trait-specific BLUP; MVGLUP, Multi-trait model GBLUP; SVM, support vector machine; Bayes B-TD, Bayes B with tetraploid allele dosage; RKHS, Reproducing Kernel Hilbert Space; † In multi-trait genomic selection (MT-GS) a secondary trait that is genetically correlated with the primary trait is incorporated in the prediction model, to predict the primary trait with higher accuracy.
Figure 2Optimization of GS models. (a) GS model accuracy measured as Pearson’s correlation after 10-fold cross-validation for biomass yield under salt stress. Computing time was measured as system time in seconds to run one cross-validation. (b) Example of variable importance values derived from SVM for 10 randomly chosen SNPs. (c) Pearson’s correlation for 6796 SNPs weights obtained by variable importance (SVM, RF) or by −log10 p-values of different GWASpoly models. (d) Accuracy of GBLUP (GBLUP VR and GBLUP FA) and WGBLUP models. Accuracy was measured 10 times using Pearson’s correlation with 10-fold cross-validation. SNP weights for WGBLUP were obtained from variable importance values (SVM, RF) or −log10 p-values of different GWASpoly models. RRBLUP, best linear unbiased prediction using ridge-regression; BL Bayes LASSO; GBLUP, genomic best linear unbiased prediction; VR, VanRaden G matrix; FA, full autotetraploid G matrix; RF, random forest; SVM, support vector machine; WGBLUP, weighted GBLUP; 1-dom-alt and 1-dom-ref, simplex dominant models; 2-dom-alt and 2-dom-ref, duplex dominant models; diplo-general, diploidized general; diplo-additive, diploidized additive.
Comparison of genomic selection (GS) models in 13 phenotypic traits collected in the SolCAP potato diversity panel. Mean and standard deviation of Pearson’s correlation obtained by 10-fold cross validation in 10 replicates. SNP weights for WGBLUP were obtained from −log10 p-values of different GWASpoly models.
| Trait | RRBLUP | GBLUP | WGBLUP | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1-d-a | 1-d-r | 2-d-a | 2-d-r | General | d-Gen | d-Add | Additive | |||
| Chip color | 0.723 | 0.721 | 0.826 | 0.798 | 0.859 | 0.850 | 0.867 | 0.849 | 0.855 | 0.896 |
| (±0.014) | (±0.015) | (±0.009) | (±0.011) | (±0.007) | (±0.013) | (±0.008) | (±0.009) | (±0.007) | (±0.007) | |
| log10 fructose | 0.682 | 0.676 | 0.819 | 0.785 | 0.845 | 0.833 | 0.868 | 0.839 | 0.855 | 0.895 |
| (±0.024) | (±0.025) | (±0.014) | (±0.017) | (±0.007) | (±0.011) | (±0.011) | (±0.015) | (±0.003) | (±0.008) | |
| log10 glucose | 0.678 | 0.668 | 0.796 | 0.809 | 0.855 | 0.849 | 0.875 | 0.844 | 0.848 | 0.91 |
| (±0.017) | (±0.030) | (±0.009) | (±0.016) | (±0.009) | (±0.009) | (±0.009) | (±0.011) | (±0.013) | (±0.007) | |
| Malic acid | 0.602 | 0.598 | 0.751 | 0.745 | 0.802 | 0.801 | 0.838 | 0.808 | 0.826 | 0.876 |
| (±0.016) | (±0.027) | (±0.021) | (±0.022) | (±0.021) | (±0.016) | (±0.011) | (±0.016) | (±0.009) | (±0.007) | |
| Sucrose | 0.539 | 0.519 | 0.676 | 0.675 | 0.702 | 0.716 | 0.725 | 0.722 | 0.739 | 0.806 |
| (±0.024) | (±0.034) | (±0.011) | (±0.022) | (±0.019) | (±0.015) | (±0.023) | (±0.011) | (±0.019) | (±0.011) | |
| Total yield | 0.132 | 0.117 | 0.401 | 0.413 | 0.418 | 0.428 | 0.470 | 0.492 | 0.504 | 0.584 |
| (±0.023) | (±0.041) | (±0.026) | (±0.030) | (±0.031) | (±0.017) | (±0.029) | (±0.030) | (±0.030) | (±0.028) | |
| Tuber eye depth | 0.495 | 0.478 | 0.605 | 0.655 | 0.693 | 0.717 | 0.740 | 0.693 | 0.736 | 0.812 |
| (±0.026) | (±0.019) | (±0.029) | (±0.016) | (±0.025) | (±0.014) | (±0.020) | (±0.020) | (±0.018) | (±0.007) | |
| Tuber length | 0.826 | 0.821 | 0.891 | 0.884 | 0.899 | 0.889 | 0.904 | 0.908 | 0.912 | 0.928 |
| (±0.012) | (±0.014) | (±0.006) | (±0.009) | (±0.006) | (±0.012) | (±0.008) | (±0.008) | (±0.005) | (±0.009) | |
| Tuber shape | 0.775 | 0.780 | 0.865 | 0.853 | 0.886 | 0.863 | 0.896 | 0.89 | 0.891 | 0.922 |
| (±0.018) | (±0.017) | (±0.010) | (±0.013) | (±0.008) | (±0.005) | (±0.010) | (±0.008) | (±0.009) | (±0.006) | |
| Tuber size | 0.501 | 0.499 | 0.641 | 0.650 | 0.679 | 0.663 | 0.666 | 0.661 | 0.679 | 0.742 |
| (±0.024) | (±0.027) | (±0.019) | (±0.020) | (±0.020) | (±0.022) | (±0.024) | (±0.022) | (±0.019) | (±0.021) | |
| Tuber width | 0.635 | 0.638 | 0.752 | 0.749 | 0.782 | 0.772 | 0.805 | 0.789 | 0.803 | 0.847 |
| (±0.023) | (±0.021) | (±0.020) | (±0.021) | (±0.016) | (±0.018) | (±0.012) | (±0.015) | (±0.013) | (±0.017) | |
| Vine maturity 95 days | 0.288 | 0.286 | 0.550 | 0.538 | 0.603 | 0.589 | 0.668 | 0.632 | 0.65 | 0.746 |
| (±0.035) | (±0.042) | (±0.028) | (±0.020) | (±0.022) | (±0.028) | (±0.022) | (±0.019) | (±0.025) | (±0.017) | |
| Vine maturity 120 days | 0.321 | 0.323 | 0.495 | 0.569 | 0.636 | 0.633 | 0.669 | 0.616 | 0.666 | 0.755 |
| (±0.047) | (±0.024) | (±0.026) | (±0.021) | (±0.021) | (±0.013) | (±0.025) | (±0.023) | (±0.026) | (±0.019) | |
RRBLUP, best linear unbiased prediction using ridge-regression; GBLUP, genomic best linear unbiased prediction using VanRaden G matrix; WGBLUP, weighted GBLUP; 1-d-a and 1-d-r, simplex dominant models; 2-d-a and 2-d-r, duplex dominant models; d-gen, diploidized general; d-add, diploidized additive.