| Literature DB >> 25670771 |
Zhe Zhang1, Malena Erbe2, Jinlong He3, Ulrike Ober2, Ning Gao3, Hao Zhang3, Henner Simianer4, Jiaqi Li5.
Abstract
Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection.Entities:
Keywords: BLUP|GA; GenPred; genetic architecture; shared data resource; trait specific relationship matrix; whole-genome prediction
Mesh:
Year: 2015 PMID: 25670771 PMCID: PMC4390577 DOI: 10.1534/g3.114.016261
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of datasets
| Dataset | Trait | N | Mean | SD | r2/h2 |
|---|---|---|---|---|---|
| Cattle | Fat percentage | 5024 | −0.06 | 0.28 | 0.94 |
| Milk yield | 5024 | 370.79 | 641.60 | 0.95 | |
| Somatic cell score | 5024 | 102.32 | 11.73 | 0.88 | |
| Loblolly pine | Rustbin | 807 | −0.01 | 0.40 | 0.21 |
| Gall | 807 | −0.02 | 1.13 | 0.12 | |
| Density | 910 | 0.05 | 2.50 | 0.09 | |
| Rootnum | 925 | 0.32 | 0.96 | 0.07 | |
| CWAC | 861 | 2.28 | 42.03 | 0.45 | |
| Rootnum_bin | 925 | 0.11 | 0.26 | 0.10 | |
| QTL-MAS2012 | T1 | 3000 | 0.00 | 176.52 | 0.36 |
| T2 | 3000 | 0.00 | 9.51 | 0.35 | |
| T3 | 3000 | 0.00 | 0.02 | 0.52 | |
| GSA data | PolyUnres | 2000 | − | − | 0.25 |
| GammaUnres | 2000 | − | − | 0.25 | |
| PolyRes | 2000 | − | − | 0.25 | |
| GammaRes | 2000 | − | − | 0.25 |
Mean and (SD) of conventional estimated breeding values for the three cattle traits or phenotypic values for other traits; we did not calculate the statistics for GSA data because it includes 10 replicates of the simulated datasets. GSA, Genetics Society of America
Reliability (r2) for cattle trait estimated breeding value, or heritability (h2) for other trait phenotypes.
Figure 1Accuracy of genomic prediction using genomic best linear unbiased prediction (GBLUP) and BLUP-given genetic architecture (BLUP|GA). Points showed the average accuracies of each fivefold cross validation from scenarios using different population sizes. Results for fat percentage (FP), milk yield (MY), and somatic cell score (SCS) are presented with blue filled cycles, green filled squares, and red filled triangles, respectively.
Accuracy and unbiasedness of genomic prediction in the dairy cattle dataset from training stage
| Method | Fat Percentage | Milk Yield | Somatic Cell Score | ||||
|---|---|---|---|---|---|---|---|
| 5024 | GBLUP | 0.816 ± 0.000 | 1.003 ± 0.001 | 0.774 ± 0.001 | 1.010 ± 0.001 | 0.738 ± 0.001 | 0.996 ± 0.001 |
| BLUP|GA | 0.862 ± 0.000 | 0.959 ± 0.001 | 0.789 ± 0.000 | 0.990 ± 0.001 | 0.741 ± 0.001 | 0.981 ± 0.001 | |
| 4000 | GBLUP | 0.798 ± 0.001 | 1.007 ± 0.001 | 0.757 ± 0.001 | 1.011 ± 0.001 | 0.722 ± 0.001 | 1.001 ± 0.001 |
| BLUP|GA | 0.856 ± 0.000 | 0.965 ± 0.001 | 0.777 ± 0.001 | 0.990 ± 0.001 | 0.723 ± 0.001 | 1.001 ± 0.001 | |
| 2000 | GBLUP | 0.698 ± 0.001 | 0.997 ± 0.002 | 0.680 ± 0.001 | 1.014 ± 0.002 | 0.642 ± 0.001 | 1.005 ± 0.002 |
| BLUP|GA | 0.808 ± 0.001 | 0.963 ± 0.002 | 0.714 ± 0.001 | 0.992 ± 0.002 | 0.643 ± 0.001 | 0.996 ± 0.002 | |
| 1000 | GBLUP | 0.594 ± 0.002 | 1.005 ± 0.004 | 0.632 ± 0.002 | 1.072 ± 0.003 | 0.555 ± 0.003 | 1.019 ± 0.006 |
| BLUP|GA | 0.778 ± 0.001 | 0.978 ± 0.002 | 0.683 ± 0.002 | 1.039 ± 0.002 | 0.556 ± 0.003 | 1.008 ± 0.006 | |
| 500 | GBLUP | 0.557 ± 0.004 | 1.102 ± 0.008 | 0.551 ± 0.004 | 1.151 ± 0.009 | 0.526 ± 0.004 | 1.128 ± 0.009 |
| BLUP|GA | 0.761 ± 0.002 | 0.983 ± 0.003 | 0.600 ± 0.003 | 1.051 ± 0.007 | 0.531 ± 0.004 | 1.098 ± 0.008 | |
| 250 | GBLUP | 0.441 ± 0.006 | 1.111 ± 0.016 | 0.447 ± 0.007 | 1.230 ± 0.018 | 0.441 ± 0.008 | 1.157 ± 0.024 |
| BLUP|GA | 0.697 ± 0.004 | 0.952 ± 0.006 | 0.555 ± 0.006 | 1.087 ± 0.011 | 0.435 ± 0.007 | 1.058 ± 0.022 | |
| 125 | GBLUP | 0.371 ± 0.010 | 1.108 ± 0.032 | 0.361 ± 0.010 | 1.167 ± 0.040 | 0.424 ± 0.009 | 1.328 ± 0.030 |
| BLUP|GA | 0.676 ± 0.005 | 0.959 ± 0.011 | 0.447 ± 0.010 | 1.168 ± 0.030 | 0.435 ± 0.009 | 1.257 ± 0.026 | |
EBV, estimated breeding value; GEBV, genomic estimated breeding value; GBLUP, genomic best linear unbiased prediction; BLUP|GA, best linear unbiased prediction-given genetic architecture.
Accuracies (r) were calculated as the correlation between the conventional EBV and the GEBV in the validation set in cross validation procedure.
Unbiasednesses (b) were calculated as the regression coefficient of the conventional EBV on the GEBV in the validation set.
The mean (± SE) of the 20 averaged accuracies from each replicates of fivefold cross-validation.
Accuracy of BLUP|GA and GBLUP and the optimal parameters used in the application stage in the dairy cattle dataset
| Trait | N | Accuracy | BLUP|GA Parameters | |||
|---|---|---|---|---|---|---|
| GBLUP | BLUP|GA | |||||
| Fat percentage | 125 | 0.321 | 0.01 | 0.44 | 3 | |
| 250 | 0.417 | 0.01 | 0.68 | 3 | ||
| 500 | 0.508 | 0.01 | 0.40 | 3 | ||
| 1000 | 0.629 | 0.05 | 0.20 | 3 | ||
| 2000 | 0.734 | 0.50 | 0.16 | 5 | ||
| 4000 | 0.796 | 0.50 | 0.18 | 5 | ||
| 5024 | – | – | 0.50 | 0.18 | 5 | |
| Milk yield | 125 | 0.370 | 0.01 | 0.02 | 5 | |
| 250 | 0.449 | 0.01 | 0.04 | 5 | ||
| 500 | 0.549 | 0.01 | 0.02 | 5 | ||
| 1000 | 0.638 | 0.05 | 0.04 | 5 | ||
| 2000 | 0.717 | 0.05 | 0.02 | 5 | ||
| 4000 | 0.767 | 0.50 | 0.04 | 5 | ||
| 5024 | – | – | 0.10 | 0.10 | 5 | |
| Somatic cell score | 125 | 0.295 | 0.01 | 0.02 | 5 | |
| 250 | 0.419 | 0.01 | 0.02 | 5 | ||
| 500 | 0.520 | 0.05 | 0.02 | 5 | ||
| 1000 | 0.596 | 0.10 | 0.02 | 5 | ||
| 2000 | 0.665 | 0.50 | 0.02 | 5 | ||
| 4000 | 0.731 | 0.50 | 0.02 | 5 | ||
| 5024 | – | – | 0.50 | 0.04 | 5 | |
BLUP|GA, best linear unbiased prediction-given genetic architecture; GBLUP, genomic best linear unbiased prediction; EBV, estimated breeding value; GEBV, genomic estimated breeding value; SNP, single-nucleotide polymorphisms.
Size of the reference population.
Accuracy is calculated as the correlation between the conventional EBV and GEBV of GEBV in the candidate population with population size of 5024 - N.
Percentage of top SNPs.
Overall weight ω for the genetic architecture part while building T matrix.
Number of selected flanking SNPs near each top SNPs.
Scenario with higher accuracy is shown in bold face.
Figure 2Manhattan plot of the marker effects estimated for fat percentage. The marker effects (gi) were estimated using ridge regression best linear unbiased prediction and rescaled so that the average marker effect was 1, in order to make the sizes of marker effect from different population sizes (N) or different traits comparable.
Accuracy and optimal parameters of BLUP|GA for common datasets obtained from the training stage
| Dataset | Trait | Accuracy | BLUP|GA Parameters | |||
|---|---|---|---|---|---|---|
| GBLUP | BLUP|GA | |||||
| Loblolly pine | Rustbin | 0.298 | 0.12 | 0.140 | 0 | |
| Gall | 0.237 | 0.32 | 0.450 | 0 | ||
| Density | 0.238 | 5.00 | 0.024 | 0 | ||
| Rootnum | 0.268 | 5.20 | 0.024 | 0 | ||
| CWAC | 0.475 | 0.15 | 0.006 | 0 | ||
| Rootnum_bin | 0.25 | 0.005 | 0 | |||
| QTL-MAS2012 | T1 | 0.707 | 0.40 | 0.280 | 5 | |
| T2 | 0.717 | 0.20 | 0.300 | 5 | ||
| T3 | 0.761 | 0.20 | 0.600 | 5 | ||
| GSA dataset | PolyUnres | 0.453 | 5.00 | 0.010 | 2 | |
| GammaUnres | 0.442 | 0.12 | 0.123 | 2 | ||
| PolyRes | 0.390 | 6.00 | 0.010 | 2 | ||
| GammaRes | 0.410 | 0.17 | 0.175 | 2 | ||
BLUP|GA, best linear unbiased prediction-given genetic architecture; GBLUP, genomic best linear unbiased prediction; GSA, Genetics Society of America; SNP, single-nucleotide polymorphism.
Percentage of top SNPs.
Overall weight ω for the genetic architecture part while building T matrix.
Number of selected flanking SNPs near each top SNPs, the nflank was set to 0 for Loblolly and not chosen in a validation procedure.
Scenario with the highest accuracy is shown in bold face.
Accuracy for different genomic selection models in the three validation datasets
| Dataset | Trait | GBLUP | BLUP|GA | BayesA | BayesB | BayesC | RRBLUP |
|---|---|---|---|---|---|---|---|
| Loblolly pine | Rustbin | 0.298 | 0.34 | – | 0.34 | 0.29 | |
| Gall | 0.237 | 0.28 | – | 0.29 | 0.23 | ||
| Density | 0.238 | 0.23 | – | 0.22 | 0.20 | ||
| Rootnum | 0.268 | 0.25 | – | 0.24 | 0.24 | ||
| CWAC | 0.475 | 0.47 | – | 0.47 | |||
| Rootnum_bin | 0.27 | 0.28 | 0.28 | ||||
| QTL-MAS2012 | T1 | 0.732 | 0.794 | 0.794 | – | 0.707 | |
| T2 | 0.771 | 0.834 | 0.834 | – | 0.746 | ||
| T3 | 0.758 | 0.828 | 0.828 | – | 0.723 | ||
| GSA dataset | PolyUnres | 0.453 | 0.453 | 0.451 | 0.452 | 0.453 | |
| GamUnres | 0.442 | 0.539 | 0.544 | 0.542 | 0.447 | ||
| PolyRes | 0.390 | 0.388 | 0.383 | 0.390 | 0.390 | ||
| GammaRes | 0.410 | 0.491 | 0.495 | 0.504 | 0.413 |
GBLUP, genomic best linear unbiased prediction; BLUP|GA, best linear unbiased prediction-given genetic architecture.
Accuracy of BLUP|GA were calculated in the application stage for QTL-MAS2012 dataset and in the training stage for pine and GSA dataset.
BayesA, BayesB, BayesC, and RRBLUP results were obtained from Table S1 in Daetwyler .
Scenario with the highest accuracy is shown in bold face.
Figure 3Heat map of the best linear unbiased prediction−given genetic architecture (BLUP|GA) accuracy for trait T3 in the validation population from QTLMAS dataset. The accuracy of best linear unbiased prediction-given genetic architecture (BLUP|GA) (×100) calculated with the assigned weight (vertical axes) and top% (horizontal axes) is shown in each cell of the heat map. Red area shows scenarios that BLUP|GA performs worse than genomic best linear unbiased prediction (0.758), green area shows scenarios that BLUP|GA performs better than BayesB (0.828). The optimal parameter combination obtained from reference population by cross validation is shown in black box.
Figure 4Cumulative proportion of genetic variance explained by single-nucleotide polymorphisms (SNPs). The top 1% (A), 10% (B) and 100% (C) SNPs were sorted by the size of estimated effects in decreasing order. Results for fat percentage, milk yield, and somatic cell score were plotted with blue solid lines, green dash lines and red dotted lines, respectively. The marker weights for genomic best linear unbiased prediction are shown by black solid lines.
Figure 5Heat maps of the realized relationship matrix (G) and three trait-specific relationship matrices (S) in dairy cattle dataset. The G matrix was built with all markers (A), and S matrices were built with top 1% SNPs for fat% (B), milk yield (C), and somatic cell score (D), respectively. These matrices were calculated with the genotypes of 1000 randomly selected bulls, and these bulls were sorted by their genotypes of the SNP with the largest marker effects for each trait.
Figure 6Regression of absolute increased accuracy of best linear unbiased prediction -given genetic architecture (BLUP|GA) over genomic best linear unbiased prediction (∆) on the distance between T and G matrices (σ).