| Literature DB >> 30116258 |
Sayed M Hosseini-Vardanjani1, Mohammad M Shariati1, Hossein Moradi Shahrebabak2, Mojtaba Tahmoorespur1.
Abstract
Genomic prediction using a large number of markers is challenging, due to the curse of dimensionality as well as multicollinearity arising from linkage disequilibrium between markers. Several methods have been proposed to solve these problems such as Principal Component Analysis (PCA) that is commonly used to reduce the dimension of predictor variables by generating orthogonal variables. Usually, the knowledge from PCA is incorporated in genomic prediction, assuming equal variance for the PCs or a variance proportional to the eigenvalues, both treat variances as fixed. Here, three prior distributions including normal, scaled-t and double exponential were assumed for PC effects in a Bayesian framework with a subset of PCs. These developed PCR models (dPCRm) were compared to routine genomic prediction models (RGPM) i.e., ridge and Bayesian ridge regression, BayesA, BayesB, and PC regression with a subset of PCs but PC variances predefined as proportional to the eigenvalues (PCR-Eigen). The performance of methods was compared by simulating a single trait with heritability of 0.25 on a genome consisted of 3,000 SNPs on three chromosomes and QTL numbers of 15, 60, and 105. After 500 generations of random mating as the historical population, a population was isolated and mated for another 15 generations. The generations 8 and 9 of recent population were used as the reference population and the next six generations as validation populations. The accuracy and bias of predictions were evaluated within the reference population, and each of validation populations. The accuracies of dPCRm were similar to RGPM (0.536 to 0.664 vs. 0.542 to 0.671), and higher than the accuracies of PCR-Eigen (0.504 to 0.641) within reference population over different QTL numbers. Decline in accuracies in validation populations were from 0.633 to 0.310, 0.639 to 0.313, and 0.617 to 0.298 using dPCRm, RGPM and PCR-Eigen, respectively. Prediction biases of dPCRm and RGPM were similar and always much less than biases of PCR-Eigen. In conclusion assuming PC variances as random variables via prior specification yielded higher accuracy than PCR-Eigen and same accuracy as RGPM, while fewer predictors were used.Entities:
Keywords: accuracy; genomic selection; principal component analysis; statistical models; variable selection
Year: 2018 PMID: 30116258 PMCID: PMC6082966 DOI: 10.3389/fgene.2018.00289
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Average number of SNPs and PCs after quality control, over 10 replicates.
| SNP | 2868.4 ± 11.64 | 2871.1 ± 19.81 | 2865.8 ± 26 |
| PC | 1583.7 ± 7.55 | 1595.7 ± 15.56 | 1587 ± 21.97 |
Figure 1The proportion of variance (%) accounted for by each PC (Top), and the cumulative variance of successive PCs (Bottom) for replicate 1 of the scenario with 105 QTL.
Pearson correlations between predicted genomic breeding values and true breeding values for different methods with five-fold cross validation in training populations.
| 105 | 0.658 ± 0.007 | 0.671 ± 0.008 | 0.664 ± 0.009 | 0.667 ± 0.009 | 0.664 ± 0.008 | 0.662 ± 0.009 | 0.653 ± 0.009 | 0.641 ± 0.009 |
| 60 | 0.643 ± 0.01 | 0.654 ± 0.01 | 0.648 ± 0.01 | 0.653 ± 0.01 | 0.651 ± 0.01 | 0.646 ± 0.01 | 0.643 ± 0.01 | 0.625 ± 0.01 |
| 15 | 0.542 ± 0.02 | 0.553 ± 0.02 | 0.556 ± 0.02 | 0.565 ± 0.02 | 0.551 ± 0.02 | 0.548 ± 0.02 | 0.536 ± 0.02 | 0.504 ± 0.02 |
Ridge-R, Ridge regression-BLUP; Bayes-Ridge, Bayesian Ridge regression; PCR-Normal, Bayesian principal component regression with normal distribution of effects; PCR-t, Bayesian principal component regression with scaled t distribution of effects; PCR-Lasso, Bayesian principal component regression with double exponential distribution of effects; PCR-Eigen, Principal component regression-BLUP with eigenvalues as prior variance of effects.
Intercept and regression coefficient of true breeding value on predicted genomic breeding value and coefficient of determination for different estimation methods for 5-fold cross validation in training population.
| 105 | b0 | 0.88 ± 0.03 | 0.86 ± 0.04 | 0.88 ± 0.03 | 0.86 ± 0.03 | 0.83 ± 0.04 | 0.85 ± 0.04 | 0.85 ± 0.04 | 0.66 ± 0.04 |
| b1 | 1.22 ± 0.04 | 1.002 ± 0.04 | 1.05 ± 0.04 | 1.05 ± 0.05 | 1.004 ± 0.04 | 1.09 ± 0.07 | 1.2 ± 0.08 | 0.76 ± 0.03 | |
| R2 | 0.48 ± 0.02 | 0.49 ± 0.02 | 0.50 ± 0.02 | 0.50 ± 0.03 | 0.49 ± 0.02 | 0.49 ± 0.03 | 0.48 ± 0.03 | 0.46 ± 0.02 | |
| 60 | b0 | 0.85 ± 0.05 | 0.84 ± 0.05 | 0.83 ± 0.06 | 0.84 ± 0.05 | 0.78 ± 0.05 | 0.79 ± 0.05 | 0.81 ± 0.05 | 0.63 ± 0.05 |
| b1 | 1.141 ± 0.05 | 0.964 ± 0.03 | 1.007 ± 0.04 | 1.09 ± 0.05 | 0.962 ± 0.03 | 1.05 ± 0.05 | 1.09 ± 0.05 | 0.706 ± 0.03 | |
| R2 | 0.46 ± 0.03 | 0.47 ± 0.03 | 0.48 ± 0.03 | 0.48 ± 0.04 | 0.47 ± 0.03 | 0.48 ± 0.04 | 0.46 ± 0.03 | 0.43 ± 0.03 | |
| 15 | b0 | 0.94 ± 0.08 | 0.92 ± 0.08 | 0.92 ± 0.08 | 0.90 ± 0.07 | 0.87 ± 0.08 | 0.89 ± 0.08 | 0.90 ± 0.08 | 0.79 ± 0.1 |
| b1 | 0.97 ± 0.1 | 0.85 ± 0.07 | 1.02 ± 0.1 | 0.93 ± 0.06 | 0.85 ± 0.07 | 1.007 ± 0.09 | 1.08 ± 0.1 | 0.58 ± 0.07 | |
| R2 | 0.35 ± 0.04 | 0.36 ± 0.05 | 0.36 ± 0.05 | 0.38 ± 0.05 | 0.36 ± 0.05 | 0.36 ± 0.05 | 0.35 ± 0.05 | 0.30 ± 0.05 |
b0, Intercept; b1, regression coefficient; R.
Figure 2Persistency of accuracy across validation generations measured as correlation between true and estimated genomic breeding values with different estimation methods. (Top): 105 QTL; (Middle): 60 QTL; (Bottom): 15 QTL.
Figure 3Regression coefficient of true breeding values on estimated genomic breeding values for different generations of selection candidates. (Top): 105 QTL; (Middle): 60 QTL; (Bottom): 15 QTL.