| Literature DB >> 30120288 |
Mehdi Momen1, Ahmad Ayatollahi Mehrgardi2, Ayyub Sheikhi3, Andreas Kranis4, Llibertat Tusell5, Gota Morota6, Guilherme J M Rosa7,8, Daniel Gianola7,8,9.
Abstract
Recent work has suggested that the performance of prediction models for complex traits may depend on the architecture of the target traits. Here we compared several prediction models with respect to their ability of predicting phenotypes under various statistical architectures of gene action: (1) purely additive, (2) additive and dominance, (3) additive, dominance, and two-locus epistasis, and (4) purely epistatic settings. Simulation and a real chicken dataset were used. Fourteen prediction models were compared: BayesA, BayesB, BayesC, Bayesian LASSO, Bayesian ridge regression, elastic net, genomic best linear unbiased prediction, a Gaussian process, LASSO, random forests, reproducing kernel Hilbert spaces regression, ridge regression (best linear unbiased prediction), relevance vector machines, and support vector machines. When the trait was under additive gene action, the parametric prediction models outperformed non-parametric ones. Conversely, when the trait was under epistatic gene action, the non-parametric prediction models provided more accurate predictions. Thus, prediction models must be selected according to the most probably underlying architecture of traits. In the chicken dataset examined, most models had similar prediction performance. Our results corroborate the view that there is no universally best prediction models, and that the development of robust prediction models is an important research objective.Entities:
Mesh:
Year: 2018 PMID: 30120288 PMCID: PMC6098164 DOI: 10.1038/s41598-018-30089-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Genotypic values of simulated QTL for a one-locus, two-allele model of gene action when a trait is affected only by additive (second column) and by both additive and dominance (third column).
| k | Pure additive( | Additive: Dominance |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
p: allelic frequency, a: additive effect, di: dominance effect, α = a + d(q − p): average effect of allelic substitution.
Distribution of simulated QTL effects (Gamma for addtive and normal for epistatic) and corresponding parameters. The dominance QTL effects were derived from additive effects and a degree of dominance derived from a normal distribution.
| Genetic Effects | Number of QTL/Interactions | Distribution |
|---|---|---|
| additive | 300 |
|
| dominance | 300 |
|
| additive × additive | 1500 |
|
| additive × dominance | 1500 |
|
| dominance × additive | 1500 |
|
| dominance × dominance | 1500 |
|
m: mean, t2: variance, δk: degree of dominance, G~: Gamma distribution, N~: normal distribution.
Genotypic values and genotypic frequencies[1] in a two-locus, two-allele model with additive, dominance, and epistatic gene action.
|
|
|
| ||
|---|---|---|---|---|
|
| 2 |
| ||
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
|
| ||
Two locus genotypic frequencies were obtained by multiplication of marginal frequencies under linkage equilibrium[63].
μ: population mean; a: additive substitution effect; d: dominance deviation;aa, da, da and dd: additive × additive, additive × dominance, dominance × additive and dominance × dominance, gene actions respectively; p and q are major and minor allele frequencies.
Variance components for main effects (additive and dominance) and two order epistatic interactions that contributed to genetic variance under different genetic architectures.
| Additive |
|
| Dominance |
|
| Additive × Additive |
|
| Additive × Dominance |
|
| Dominanc × Additive |
|
| Dominanc × Dominanc |
|
a: additive substitution effect; d: dominance deviation; α: average allelic effect; αα, αδ, δα and δδ are additive × additive, additive × dominance, dominance × additive and dominance × dominance epistatic deviations, respectively; p and q are major and minor allele frequencies[64].
Heritability of simulated traits under various forms of gene action (additive, dominance and epistatic).
| Gene action |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Purely Additive |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.30 |
| Additive:Dominance |
|
| 0.00 | 0.00 | 0.00 | 0.00 | 0.40 |
| Additive:Dominance:Epistatic |
|
|
|
|
|
| 0.80 |
| Purely Epistatic | 0.00 | 0.00 |
|
|
|
| 0.30 |
: additive heritability, : dominance heritability, and , , , and are additive by additive, additive by dominance, dominance by additive, and dominance by dominance epistatic heritabilites, respectively.
Figure 1Overall mean (standard error) of predictive and empirical accuracy of different prediction models under various gene action scenarios: purely additive (Ad), additive and dominance (Ad:Dom), additive dominance and epistasis (Ad:Dom:Epi), and pure epistasis (Epi).
Figure 2Predictive and empirical accuracies of genomic prediction models for traits simulated under purely additive (Ad), additive:dominance (Add:Dom), additive:dominance:epistatic (Add:Dom:Epi), and purely epistatic (Epi) gene action scenarios with a broad sense heritability of 0.30, 0.40, 0.80 and 0.30, respectively. Prediction models: BayesA, BayesB BayesC, Bayesian least absolute shrinkage and selector operator (BL), Bayesian ridge regression (BRR), elastic net (EN), genomic best linear unbiased prediction (GBLUP), Gaussian process (GP), least absolute shrinkage and selector operator (LASSO), random forest (RF), reproducing kernel Hilbert spaces regression (RKHS), ridge regression best linear unbiased prediction (rrBLUP), relevance vector machine (RVM), and support vector machine (SVM).
Figure 3Boxplots of bias (regression coefficient of simulated phenotypes on genomic estimated breeding values) for traits simulated under purely additive (Ad), additive:dominance (Ad:Dom), additive:dominance:epistatic (Ad:Dom:Epi) and pure epistatic (Epi) gene action scenarios and heritability of 0.30, 0.40, 0.80 and 0.30, respectively. Prediction models: BayesA, BayesB, BayesC, Bayesian least absolute shrinkage and selector operator (BL), Bayesian ridge regression (BRR), elastic net (EN), genomic best linear unbiased prediction (GBLUP), Gaussian process (GP), least absolute shrinkage and selector operator (LASSO), random forest (RF), reproducing kernel Hilbert spaces regression (RKHS), ridge regression best linear unbiased prediction (rrBLUP), relevance vector machine (RVM), and support vector machine (SVM). Outliers are denoted as black dots.
Figure 4Ward’s hierarchical clustering on predicted genomic values derived from traits simulated under purely additive (Ad), additive:dominance (Ad:Dom), additive:dominance:epistatic (Ad:Dom:Epi) and purely epistatic (Epi) gene action. Prediction models: Bayes A, Bayes B, Bayes C, Bayesian least absolute shrinkage and selector operator (BL), Bayesian ridge regression (BRR), elastic net (EN), genomic best linear unbiased prediction (GBLUP), Gaussian processor (GP), least absolute shrinkage and selector operator (LASSO), random forest (RF), reproducing kernel Hilbert spaces regression (RKHS), ridge regression best linear unbiased prediction (rrBLUP), relevance vector machine (RVM) and support vector machine (SVM).
Figure 5Boxplots of bias (regression coefficient of observed phenotypes on genomic estimated breeding values) obtained in the testing sets from a 20-fold cross validation using chicken data for body weight (BW), breast meat (BM) and hen-house production (HHP). Prediction models: Bayes A, Bayes B, Bayes C, Bayesian least absolute shrinkage and selector operator (BL), Bayesian ridge regression (BRR), elastic net (EN), genomic best linear unbiased prediction (GBLUP), Gaussian process (GP), least absolute shrinkage and selector operator (LASSO), random forest (RF), reproducing kernel Hilbert spaces regression (RKHS), ridge regression best linear unbiased prediction (rrBLUP), relevance vector machine (RVM) and support vector machine (SVM). Outliers are denoted as black dots.
Average correlations between phenotypes and predicted breeding values obtained in the testing sets from a 20-fold cross validation using the chicken data for body weight (BW), breast meat (BM), and hen-house production (HHP).
| Models | Traits | ||
|---|---|---|---|
| BW | BM | HHP | |
| BayesA | 0.320 (0.023) | 0.195 (0.012) | 0.209 (0.017) |
| BayesB | 0.330 (0.034) | 0.196 (0.012) | 0.219 (0.017) |
| BayesC | 0.188 (0.023) | 0.190 (0.012) | 0.220 (0.017) |
| BL | 0.196 (0.023) | 0.188 (0.012) | 0.186 (0.016) |
| BRR | 0.190 (0.024) | 0.176 (0.011) | 0.247 (0.018) |
| EN | 0.249 (0.027) | 0.198 (0.015) | 0.231 (0.019) |
| GBLUP | 0.192 (0.021) | 0.268 (0.009) | 0.221 (0.017) |
| GP | 0.178 (0.023) | 0.140 (0.011) | 0.227 (0.017) |
| LASSO | 0.284 (0.019) | 0.201 (0.015) | 0.176 (0.010) |
| RKHS | 0.191 (0.018) | 0.206 (0.009) | 0.219 (0.016) |
| rrBLUP | 0.175 (0.016) | 0.169 (0.010) | 0.236 (0.015) |
| RVM | 0.185 (0.026) | 0.159 (0.024) | 0.196 (0.015) |
| SVM | 0.172 (0.024) | 0.161 (0.018) | 0.202 (0.017) |
Prediction models: BayesA, BayesB, BayesC, Bayesian least absolute shrinkage and selector operator (BL), Bayesian ridge regression (BRR), elastic net (EN), genomic best linear unbiased prediction (GBLUP), Gaussian processor (GP), least absolute shrinkage and selector operator (LASSO), reproducing kernel Hilbert spaces regression (RKHS), ridge regression best linear unbiased prediction (rrBLUP), relevance vector machine (RVM), and support vector machine (SVM).