| Literature DB >> 28049412 |
Johannes W R Martini1, Ning Gao2,3, Diercles F Cardoso2,4, Valentin Wimmer5, Malena Erbe2,6, Rodolfo J C Cantet7, Henner Simianer2.
Abstract
BACKGROUND: Epistasis marker effect models incorporating products of marker values as predictor variables in a linear regression approach (extended GBLUP, EGBLUP) have been assessed as potentially beneficial for genomic prediction, but their performance depends on marker coding. Although this fact has been recognized in literature, the nature of the problem has not been thoroughly investigated so far.Entities:
Keywords: Epistasis model; Genomic prediction; Interaction
Mesh:
Year: 2017 PMID: 28049412 PMCID: PMC5209948 DOI: 10.1186/s12859-016-1439-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of the interaction effects which are given implicitly by the marker coding {−1,0,1} (left) and {0,1,2} (right) in the interaction terms of EGBLUP. Each entry has to be multiplied with the interaction effect h
|
|
Fig. 1Comparison of the parametrization of the genotypic values in GBLUP and the categorical marker effect model CM: Black dots: genotypic values of the corresponding genotype of a certain locus. GBLUP parameterizes the genotypic values by a fixed effect (red dot) and a random effect determining the slope (blue line), whereas CM parameterizes by the fixed effect (red line) and independent random effects (blue lines) for each genotype
Predictive abilities of the models on the simulated data. Comparison of the predictive abilities in terms of correlations between the measured phenotypes and the predictions for the individuals of the test sets (“Pearson’s correlation”; 100 test set genotypes were drawn randomly from all 1000 genotypes; 200 repeats for each simulated population; 20 independent simulations of population and phenotypes). Traits of different genetic architecture (additive A, dominant D, Epistasis E) and increasing number of QTL. Model abbreviations as introduced in the text. For EGBLUP, only the matrix based on the interactions was considered here
| GBLUP | EGBLUP 0,1,2 | EGBLUP -2,-1,0 | EGBLUP -1,0,1 | EGBLUP VR | CM | CE | K | |
|---|---|---|---|---|---|---|---|---|
| A1 | 0.551 ± 0.005 |
|
| 0.550 ± 0.005 | 0.372 ± 0.006 | 0.489 ± 0.005 | 0.494 ± 0.005 | 0.530 ± 0.005 |
| A2 | 0.549 ± 0.005 |
|
| 0.548 ± 0.005 | 0.351 ± 0.006 | 0.486 ± 0.005 | 0.490 ± 0.005 | 0.527 ± 0.005 |
| A3 | 0.569 ± 0.005 |
|
| 0.568 ± 0.005 | 0.372 ± 0.006 | 0.500 ± 0.005 | 0.504 ± 0.005 | 0.545 ± 0.005 |
| D1 | 0.159 ± 0.006 | 0.160 ± 0.006 | 0.159 ± 0.006 | 0.161 ± 0.007 | 0.111 ± 0.007 | 0.174 ± 0.006 |
| 0.162 ± 0.006 |
| D2 | 0.172 ± 0.006 | 0.172 ± 0.006 | 0.172 ± 0.006 | 0.171 ± 0.006 | 0.103 ± 0.006 |
|
| 0.170 ± 0.006 |
| D3 | 0.156 ± 0.006 | 0.156 ± 0.006 | 0.156 ± 0.006 | 0.158 ± 0.006 | 0.116 ± 0.006 | 0.177 ± 0.006 |
| 0.160 ± 0.006 |
| E1 | 0.244 ± 0.006 | 0.244 ± 0.006 | 0.244 ± 0.006 | 0.244 ± 0.006 | 0.159 ± 0.006 |
|
| 0.243 ± 0.006 |
| E2 | 0.275 ± 0.006 | 0.276 ± 0.006 | 0.276 ± 0.006 | 0.277 ± 0.006 | 0.188 ± 0.006 | 0.301 ± 0.006 |
| 0.277 ± 0.006 |
| E3 | 0.279 ± 0.006 | 0.278 ± 0.006 | 0.279 ± 0.006 | 0.278 ± 0.006 | 0.176 ± 0.006 |
|
| 0.276 ± 0.006 |
EGBLUP VR denotes the interaction model based on the by allele frequencies standardized matrix. The given values represent the empirical mean and the corresponding mean standard error across the 20 independently simulated data sets. The highest predictive ability is bold
Predictive abilities of the models on the wheat data. Comparison of the predictive abilities as Pearson’s correlation of the measured phenotypes and the predictions for the individuals of the test sets (60 test set genotypes, trait: grain yield)
| GBLUP | EGBLUP 0,1 | EGBLUP -1,0 | EGBLUP -1,1 | EGBLUP VR | CE | Gaussian kernel | |
|---|---|---|---|---|---|---|---|
| Environment 1 | 0.511 | 0.554 | 0.561 | 0.581 | 0.541 | 0.558 |
|
| Environment 2 | 0.499 | 0.502 |
| 0.495 | 0.422 |
| 0.500 |
| Environment 3 | 0.371 | 0.390 | 0.396 | 0.409 | 0.365 | 0.393 |
|
| Environment 4 | 0.463 | 0.498 | 0.504 | 0.530 | 0.500 | 0.502 |
|
Letters indicate groups that were not distinguishable at a 5% significance level in a Tukey’s ‘Honest Significant Difference’ test
Predictive abilities of the models on the mouse data. Comparison of the predictive abilities as Pearson’s correlation of the measured phenotypes and the predictions for the individuals of the test set (130 test set genotypes). Here, the already for fixed effects pre-corrected residuals of the phenotypes, which are also provided by the publicly available data, were used
| GBLUP | EGBLUP 0,1,2 | EGBLUP -2,-1,0 | EGBLUP -1,0,1 | EGBLUP VR | CM | CE | Gaussian kernel | |
|---|---|---|---|---|---|---|---|---|
| W6W | 0.493 | 0.540 | 0.505 | 0.545 | 0.553 | 0.486 | 0.514 |
|
| W10W | 0.466 | 0.491 | 0.474 | 0.495 | 0.461 | 0.466 | 0.479 |
|
| GrowthSlope | 0.347 | 0.363 | 0.350 | 0.364 |
| 0.355 | 0.363 | 0.371 |
| BMI | 0.195 | 0.204 | 0.200 |
| 0.194 | 0.153 | 0.166 |
|
| BodyLength | 0.271 | 0.282 | 0.276 |
| 0.275 | 0.226 | 0.240 | 0.284 |
| %B220 | 0.549 | 0.573 | 0.556 | 0.576 | 0.540 | 0.547 | 0.561 |
|
| %CD3 | 0.522 | 0.535 | 0.527 |
| 0.485 | 0.521 | 0.528 | 0.535 |
| %CD4 | 0.495 | 0.506 | 0.499 |
| 0.458 | 0.495 | 0.502 | 0.506 |
| %CD8 | 0.694 | 0.703 | 0.699 | 0.706 | 0.656 | 0.706 |
| 0.702 |
| %CD4/CD3 | 0.643 | 0.655 | 0.647 | 0.656 | 0.618 | 0.660 |
| 0.653 |
| %CD8/CD3 | 0.683 | 0.689 | 0.687 | 0.690 | 0.638 | 0.701 |
| 0.686 |
| CD4Intensity | 0.581 | 0.601 | 0.587 |
| 0.561 | 0.578 | 0.586 |
|
| CD8Intensity | 0.388 | 0.442 | 0.401 | 0.450 |
| 0.406 | 0.434 | 0.475 |
Letters indicate groups that were not distinguishable at a 5% significance level in a Tukey’s ‘Honest Significant Difference’ test
For a description of the traits see the corresponding UCL website which is at the moment http://mtweb.cs.ucl.ac.uk/mus/www/mouse/HS/index.shtml
Predictive abilities on the wheat data when prior information is incorporated in the marker coding of EGBLUP. Predictive abilities when the coding for each interaction is determined based on records under different environmental conditions
| G-Env 1 | G-Env 2 | G-Env 3 | G-Env 4 | |
|---|---|---|---|---|
| Environment 1 | —— | 0.555 ± 0.007 | 0.559 ± 0.007 | 0.552 ± 0.007 |
| Environment 2 | 0.503 ± 0.007 | —— |
|
|
| Environment 3 | 0.394 ± 0.008 |
| —— | 0.402 ± 0.008 |
| Environment 4 | 0.500 ± 0.007 | 0.511 ± 0.006 | 0.513 ± 0.006 | —— |
G-Env 1 means that the relationship matrix was constructed under the use of the data of Environment 1 (analogously for other environments; for a description of the construction of the matrices see section “Methods”). Bold numbers indicate predictive abilities higher than that of all previously used methods for this trait
Predictive abilities on the mouse data when prior information is incorporated in the marker coding of EGBLUP. Predictive abilities when the coding for each interaction is determined based on the records of other traits
| G-W6W | G-W10W | G-GrowthSlope | G-BMI | G-BodyLength | G-%B220 | ||
|---|---|---|---|---|---|---|---|
| W6W | —— | 0.548 ± 0.004 | 0.511 ± 0.004 | 0.507 ± 0.004 | 0.511 ± 0.004 | 0.507 ± 0.004 | |
| W10W |
| —— | 0.480 ± 0.005 | 0.475 ± 0.005 | 0.475 ± 0.005 | 0.474 ± 0.005 | |
| GrowthSlope | 0.356 ± 0.005 | 0.355 ± 0.005 | —— | 0.351 ± 0.005 | 0.355 ± 0.005 | 0.351 ± 0.005 | |
| BMI | 0.202 ± 0.006 | 0.202 ± 0.006 | 0.200 ± 0.006 | —— |
| 0.200 ± 0.006 | |
| BodyLength | 0.283 ± 0.006 | 0.278 ± 0.006 | 0.281 ± 0.006 |
| —— | 0.276 ± 0.006 | |
| %B220 | 0.557 ± 0.004 | 0.557 ± 0.004 | 0.557 ± 0.004 | 0.556 ± 0.004 | 0.556 ± 0.004 | —— | |
| %CD3 | 0.527 ± 0.004 | 0.527 ± 0.004 | 0.527 ± 0.004 | 0.527 ± 0.004 | 0.527 ± 0.004 |
| |
| %CD4 | 0.500 ± 0.004 | 0.500 ± 0.004 | 0.499 ± 0.004 | 0.499 ± 0.004 | 0.500 ± 0.004 |
| |
| %CD8 | 0.701 ± 0.003 | 0.701 ± 0.003 | 0.700 ± 0.003 | 0.700 ± 0.003 | 0.699 ± 0.003 | 0.708 ± 0.003 | |
| %CD4/CD3 | 0.649 ± 0.004 | 0.649 ± 0.004 | 0.648 ± 0.004 | 0.648 ± 0.004 | 0.647 ± 0.004 | 0.648 ± 0.004 | |
| %CD8/CD3 | 0.688 ± 0.003 | 0.688 ± 0.003 | 0.687 ± 0.003 | 0.687 ± 0.003 | 0.686 ± 0.003 | 0.687 ± 0.003 | |
| CD4Intensity | 0.589 ± 0.004 | 0.588 ± 0.004 | 0.588 ± 0.004 | 0.588 ± 0.004 | 0.588 ± 0.004 | 0.588 ± 0.004 | |
| CD8Intensity | 0.406 ± 0.005 | 0.405 ± 0.005 | 0.404 ± 0.005 | 0.405 ± 0.005 | 0.405 ± 0.005 | 0.404 ± 0.005 | |
| G-%CD3 | G-%CD4 | G-%CD8 | G-%CD4/CD3 | G-%CD8/CD3 | G-CD4Intensity | G-CD8Intensity | |
| W6W | 0.507 ± 0.005 | 0.507 ± 0.005 | 0.507 ± 0.005 | 0.507 ± 0.005 | 0.507 ± 0.004 | 0.507 ± 0.005 | 0.508 ± 0.005 |
| W10W | 0.475 ± 0.005 | 0.475 ± 0.005 | 0.475 ± 0.005 | 0.475 ± 0.005 | 0.475 ± 0.005 | 0.475 ± 0.005 | 0.476 ± 0.005 |
| GrowthSlope | 0.351 ± 0.005 | 0.351 ± 0.005 | 0.351 ± 0.005 | 0.351 ± 0.005 | 0.351 ± 0.005 | 0.351 ± 0.005 | 0.351 ± 0.005 |
| BMI | 0.200 ± 0.006 | 0.200 ± 0.006 | 0.201 ± 0.006 | 0.201 ± 0.006 | 0.201 ± 0.006 | 0.200 ± 0.006 | 0.202 ± 0.006 |
| BodyLength | 0.276 ± 0.006 | 0.276 ± 0.006 | 0.276 ± 0.006 | 0.276 ± 0.006 | 0.276 ± 0.006 | 0.276 ± 0.006 | 0.277 ± 0.006 |
| %B220 |
|
| 0.570 ± 0.004 | 0.557 ± 0.004 | 0.557 ± 0.004 | 0.556 ± 0.004 | 0.558 ± 0.004 |
| %CD3 | —— |
|
| 0.527 ± 0.004 | 0.527 ± 0.004 | 0.527 ± 0.004 | 0.527 ± 0.004 |
| %CD4 |
| —— | 0.504 ± 0.004 |
|
| 0.500 ± 0.004 | 0.499 ± 0.004 |
| %CD8 |
| 0.702 ± 0.003 | —— |
|
| 0.700 ± 0.003 | 0.7 ± 0.003 |
| %CD4/CD3 | 0.649 ± 0.004 | 0.656 ± 0.004 |
| —— |
| 0.649 ± 0.004 | 0.649 ± 0.004 |
| %CD8/CD3 | 0.688 ± 0.003 | 0.694 ± 0.003 |
|
| —— | 0.687 ± 0.003 | 0.687 ± 0.003 |
| CD4Intensity | 0.588 ± 0.004 | 0.589 ± 0.004 | 0.589 ± 0.004 | 0.589 ± 0.004 | 0.588 ± 0.004 | —— | 0.595 ± 0.004 |
| CD8Intensity | 0.403 ± 0.005 | 0.403 ± 0.005 | 0.403 ± 0.005 | 0.405 ± 0.005 | 0.404 ± 0.005 | 0.414 ± 0.005 | —— |
G-W6W means that the relationship matrix was constructed under the use of the pre-corrected residuals of the trait W6W. Bold numbers indicate predictive abilities higher than that of all previously used methods for this trait