| Literature DB >> 26660276 |
Gustavo de Los Campos1, Yogasudha Veturi2, Ana I Vazquez3, Christina Lehermeier4, Paulino Pérez-Rodríguez5.
Abstract
Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model. ELECTRONIC SUPPLEMENTARY MATERIAL: Supplementary materials for this article are available at 10.1007/s13253-015-0222-5.Entities:
Keywords: Bayesian; Genomic prediction; Genomic selection; Multi-breed analysis; Population structure
Year: 2015 PMID: 26660276 PMCID: PMC4666286 DOI: 10.1007/s13253-015-0222-5
Source DB: PubMed Journal: J Agric Biol Environ Stat ISSN: 1085-7117 Impact factor: 1.524
Fig. 1Clustering in the Wheat data set. First two marker-derived principal components (left) and allele frequency by group (right).
Fig. 2Scatter plot of estimated effects obtained with a stratified analysis (left) and estimated sampling distribution of the correlation between estimated effects obtained with 1000 permutations (right).
Fig. 3Clustering in the pig data set. First two principal components (top-left panel) and allele frequency (top-right and lower panels) by group (1 in red, 2 in blue and 3 in black).
Estimated posterior means of variance parameters (posterior SD) from Gaussian Model (Pig data set).
| Trait | Variance | Analyses | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Across groups | Interaction model | Stratified | |||||||||
| G1 | G2 | G3 | Main | G1 | G2 | G3 | G1 | G2 | G3 | ||
| T3 | Genomic | 0.215 (0.030) | 0.064 (0.018) | 0.068 (0.019) | 0.051 (0.013) | 0.127 (0.025) | 0.309 (0.064) | 0.222 (0.046) | 0.335 (0.053) | ||
| Residual | 0.782 (0.052) | 0.853 (0.045) | 0.797 (0.035) | 0.722 (0.056) | 0.804 (0.046) | 0.712 (0.036) | 0.729 (0.057) | 0.810 (0.047) | 0.699 (0.037) | ||
| T4 | Genomic | 0.381 (0.036) | 0.260 (0.042) | 0.072 (0.021) | 0.046 (0.011) | 0.0.059 (0.016) | 0.495 (0.085) | 0.373 (0.060) | 0.352 (0.047) | ||
| Residual | 0.602 (0.045) | 0.636 (0.037) | 0.675 (0.031) | 0.529 (0.050) | 0.604 (0.038) | 0.654 (0.031) | 0.553 (0.056) | 0.625 (0.043) | 0.673 (0.035) | ||
| T5 | Genomic | 0.397 (0.036) | 0.280 (0.040) | 0.047 (0.011) | 0.053 (0.013) | 0.060 (0.018) | 0.419 (0.059) | 0.379 (0.069) | 0.349 (0.047) | ||
| Residual | 0.471 (0.036) | 0.684 (0.040) | 0.641 (0.029) | 0.447 (0.037) | 0.633 (0.043) | 0.625 (0.030) | 0.474 (0.040) | 0.677 (0.049) | 0.662 (0.034) | ||
T3, T4, and T5 are three different traits. G1, G2, and G3 identify groups 1, 2, and 3, respectively. In the interaction model “Main” refers to the main effect and G1–G3 refer to interactions.
Estimated posterior means of variance parameters (posterior SD) from Gaussian Model (wheat data set).
| Environment | Variance | Analyses | ||||||
|---|---|---|---|---|---|---|---|---|
| Across groups | Interaction model | Stratified | ||||||
| G1 | G2 | Main | G1 | G2 | G1 | G2 | ||
| E1 | Genomic | 0.558 (0.095) | 0.215 (0.078) | 0.380 (0.121) | 0.426 (0.121) | 0.568 (0.133) | 0.635 (0.126) | |
| Residual | 0.605 (0.061) | 0.434 (0.061) | 0.554 (0.061) | 0.350 (0.056) | 0.563 (0.063) | 0.350 (0.057) | ||
| E2 | Genomic | 0.497 (0.092) | 0.277 (0.088) | 0.308 (0.106) | 0.270 (0.086) | 0.600 (0.140) | 0.481 (0.115) | |
| Residual | 0.647 (0.064) | 0.475 (0.062) | 0.612 (0.064) | 0.440 (0.062) | 0.618 (0.068) | 0.469 (0.067) | ||
| E3 | Genomic | 0.470 (0.096) | 0.470 (0.096) | 0.219 (0.080) | 0.346 (0.115) | 0.379 (0.132) | 0.556 (0.135) | 0.611 (0.159) |
| Residual | 0.531 (0.056) | 0.769 (0.091) | 0.490 (0.056) | 0.657 (0.092) | 0.491 (0.059) | 0.657 (0.095) | ||
| E4 | Genomic | 0.485 (0.097) | 0.185 (0.069) | 0.378 (0.116) | 0.437 (0.132) | 0.568 (0.128) | 0.602 (0.140) | |
| Residual | 0.620 (0.063) | 0.564 (0.076) | 0.561 (0.061) | 0.444 (0.070) | 0.558 (0.062) | 0.449 (0.071) | ||
E1–E4 are four different mega environments. G1 and G2 identify groups 1 and 2, respectively. In the interaction model “Main” refers to the main effect and G1 and G2 refer to interactions.
Fig. 4Estimated correlations between groups, by environment (wheat data set) or trait (pig data set).
Estimated posterior means of parameters (posterior SD) from BayesB (pig data set).
| Trait | Parameter | Analyses | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Across groups | Interaction model | Stratified | |||||||||
| G1 | G2 | G3 | Main | G1 | G2 | G3 | G1 | G2 | G3 | ||
| T3 | Scale | 2.04 (0.647) | 1.09 (0.785) | 0.532 (0.434) | 0.624 (0.616) | 0.670 (0.156) | 3.760 (1.500) | 1.599 (1.032) | 4.421 (4.114) | ||
| Prob. In | 0.345 (0.108) | 0.304 (0.154) | 0.390 (0.140) | 0.336 (0.188) | 0.504 (0.104) | 0.269 (0.099) | 0.412 (0.170) | 0.358 (0.196) | |||
| Res. Var. | 0.754 (0.050) | 0.857 (0.045) | 0.784 (0.034) | 0.710 (0.057) | 0.825 (0.051) | 0.706 (0.035) | 0.701 (0.055) | 0.832 (0.049) | 0.697 (0.039) | ||
| 2.504 (0.644) | 2.014 (0.688) | 0.455 (0.300) | 0.291 (0.166) | 0.182 (0.134) | 5.956 (1.169) | 5.891 (3.035) | 2.099 (0.621) | ||||
| T4 | Prob. In | 0.486 (0.118) | 0.516 (0.144) | 0.349 (0.121) | 0.290 (0.115) | 0.437 (0.127) | 0.282 (0.064) | 0.250 (0.116) | 0.503 (0.117) | ||
| Res. Var. | 0.596 (0.046) | 0.633 (0.037) | 0.673 (0.031) | 0.545 (0.051) | 0.614 (0.038) | 0.663 (0.031) | 0.533 (0.057) | 0.618 (0.043) | 0.681 (0.036) | ||
| T5 | 2.737 (0.654) | 2.556 (0.835) | 0.310 (0.188) | 0.352 (0.123) | 0.207 (0.216) | 2.983 (1.800) | 2.592 (0.802) | 4.500 (3.499) | |||
| Scale Prob. In | 0.426 (0.110) | 0.387 (0.104) | 0.343 (0.127) | 0.417 (0.124) | 0.384 (0.158) | 0.469 (0.178) | 0.445 (0.114) | 0.295 (0.139) | |||
| Res. Var. | 0.460 (0.035) | 0.692 (0.040) | 0.632 (0.028) | 0.438 (0.036) | 0.634 (0.042) | 0.628 (0.03) | 0.450 (0.039) | 0.683 (0.052) | 0.659 (0.034) | ||
T3, T4, and T5 are three different traits. G1, G2, and G3 identify groups 1, 2 and 3, respectively. Prob. In represent the estimated proportion of markers with no-null effect and Res. Var. denotes residual variance. In the interaction model Main refers to the main effect and G1–G3 refer to interactions.
Estimated posterior means of parameters (posterior SD) from BayesB (wheat data set).
| Environment | Parameter | Analyses | ||||||
|---|---|---|---|---|---|---|---|---|
| Across groups | Interaction model | Stratified | ||||||
| G1 | G2 | Main | G1 | G2 | G1 | G2 | ||
| E1 | Scale | 3.422 (1.231) | 1.636 (1.086) | 3.079 (1.884) | 2.933 (1.549) | 4.458 (2.471) | 4.132 (1.851) | |
| Prob. In | 0.541 (0.128) | 0.463 (0.144) | 0.444 (0.140) | 0.481 (0.141) | 0.438 (0.143) | 0.510 (0.137) | ||
| Res. Var. | 0.607 (0.062) | 0.451 (0.062) | 0.549 (0.062) | 0.364 (0.057) | 0.564 (0.063) | 0.368 (0.058) | ||
| E2 | Scale | 3.397 (1.340) | 2.857 (1.427) | 1.513 (1.344) | 0.985 (0.770) | 4.626 (2.590) | 3.789 (2.268) | |
| Prob.-In | 0.502 (0.130) | 0.478 (0.136) | 0.457 (0.143) | 0.448 (0.142) | 0.464 (0.142) | 0.450 (0.144) | ||
| Res. Var. | 0.649 (0.065) | 0.474 (0.062) | 0.615 (0.066) | 0.452 (0.064) | 0.612 (0.072) | 0.472 (0.070) | ||
| E3 | Scale | 3.287 (1.429) | 2.136 (1.287) | 2.255 (2.621) | 2.237 (1.545) | 3.932 (1.932) | 4.325 (2.292) | |
| Prob. In | 0.480 (0.135) | 0.457 (0.140) | 0.459 (0.143) | 0.463 (0.142) | 0.479 (0.137) | 0.479 (0.142) | ||
| Res. Var. | 0.532 (0.056) | 0.776 (0.092) | 0.493 (0.058) | 0.673 (0.099) | 0.494 (0.061) | 0.667 (0.102) | ||
| E4 | Scale | 3.298 (1.349) | 1.322 (1.025) | 2.911 (2.250) | 3.267 (1.873) | 3.880 (1.828) | 4.268 (2.137) | |
| Prob.-In | 0.478 (0.133) | 0.457 (0.144) | 0.465 (0.140) | 0.469 (0.141) | 0.483 (0.139) | 0.480 (0.142) | ||
| Res. Var. | 0.628 (0.063) | 0.576 (0.076) | 0.566 (0.063) | 0.456 (0.074) | 0.565 (0.063) | 0.463 (0.073) | ||
E1–E4 are four different mega environments. G1 and G2 identify groups 1 and 2, respectively. Prob. In represent the estimated proportion of markers with no-null effect and Res. Var. denotes residual variance. In the interaction model Main refers to the main effect and G1 and G2 refer to interactions.
Average (SD) prediction accuracy (correlation between phenotypes, average of 50 training-testing partitions), by trait, cluster and model (pig data set).
| Trait | Group | BRR | BayesB | ||||
|---|---|---|---|---|---|---|---|
| Across groups | Interaction model | Stratified analyses | Across groups | Interaction model | Stratified analyses | ||
| T3 | 1 | 0.213 (0.051) | 0.234 (0.050) | 0.231 (0.050) | 0.256 (0.054) | 0.257 (0.054) | 0.244 (0.050) |
| 2 | 0.199 (0.039) | 0.210 (0.043) | 0.210 (0.044) | 0.192 (0.038) | 0.208 (0.042) | 0.212 (0.046) | |
| 3 | 0.280 (0.060) | 0.301 (0.057) | 0.301 (0.057) | 0.297 (0.058) | 0.307 (0.056) | 0.304 (0.057) | |
| T4 | 1 | 0.371 (0.042) | 0.379 (0.041) | 0.356 (0.041) | 0.373 (0.042) | 0.380 (0.041) | 0.355 (0.042) |
| 2 | 0.438 (0.053) | 0.424 (0.053) | 0.390 (0.050) | 0.439 (0.053) | 0.425 (0.052) | 0.391 (0.050) | |
| 3 | 0.389 (0.056) | 0.382 (0.053) | 0.355 (0.050) | 0.390 (0.055) | 0.382 (0.053) | 0.354 (0.050) | |
| T5 | 1 | 0.544 (0.039) | 0.541 (0.039) | 0.523 (0.040) | 0.564 (0.039) | 0.563 (0.038) | 0.550 (0.035) |
| 2 | 0.359 (0.035) | 0.347 (0.036) | 0.299 (0.041) | 0.359 (0.034) | 0.345 (0.034) | 0.298 (0.041) | |
| 3 | 0.401 (0.050) | 0.393 (0.047) | 0.368 (0.043) | 0.420 (0.050) | 0.410 (0.048) | 0.382 (0.045) | |
| Average | 0.355 | 0.357 | 0.337 | 0.366 | 0.364 | 0.343 | |
T3–T5 are three different traits.
BRR Bayesian Ridge Regression (Gaussian Prior).
Fig. 5Prediction Accuracy (average over 50 training-testing partitions and 3 clusters) by trait and model, all based on model BRR (Pig data set).
Prediction accuracy (correlation between phenotypes, average over 50 training-testing partitions), by trait, cluster, and model (wheat data set).
| Trait | Group | BRR | BayesB | ||||
|---|---|---|---|---|---|---|---|
| Across groups | Interaction model | Stratified analyses | Across groups | Interaction model | Stratified analyses | ||
| E1 | G1 | 0.453 (0.082) | 0.466 (0.079) | 0.455 (0.080) | 0.449 (0.083) | 0.467 (0.079) | 0.461 (0.079) |
| G2 | 0.573 (0.079) | 0.612 (0.079) | 0.613 (0.078) | 0.559 (0.080) | 0.604 (0.081) | 0.604 (0.079) | |
| E2 | G1 | 0.463 (0.079) | 0.463 (0.082) | 0.443 (0.083) | 0.453 (0.080) | 0.459 (0.082) | 0.440 (0.085) |
| G2 | 0.505 (0.088) | 0.488 (0.086) | 0.464 (0.090) | 0.501 (0.085) | 0.494 (0.084) | 0.462 (0.089) | |
| E3 | G1 | 0.410 (0.075) | 0.406 (0.071) | 0.396 (0.073) | 0.410 (0.073) | 0.408 (0.071) | 0.396 (0.072) |
| G2 | 0.362 (0.114) | 0.381 (0.092) | 0.373 (0.088) | 0.361 (0.111) | 0.376 (0.091) | 0.370 (0.089) | |
| E4 | G1 | 0.443 (0.072) | 0.460 (0.070) | 0.458 (0.071) | 0.442 (0.071) | 0.455 (0.071) | 0.454 (0.073) |
| G2 | 0.445 (0.089) | 0.487 (0.075) | 0.489 (0.072) | 0.442 (0.088) | 0.482 (0.074) | 0.481 (0.073) | |
| Average | 0.457 | 0.470 | 0.461 | 0.452 | 0.468 | 0.459 | |
E1–E4 are four different mega environments.
BRR Bayesian Ridge Regression (Gaussian Prior).
Fig. 6Prediction accuracy (average over 50 training-testing partitions and 3 clusters) by environment and model (all based on model BRR): wheat data set.