| Literature DB >> 23050228 |
Vitara Pungpapong1, William M Muir, Xianran Li, Dabao Zhang, Min Zhang.
Abstract
Recent advances in high-throughput genotyping have motivated genomic selection using high-density markers. However, an increasingly large number of markers brings up both statistical and computational issues and makes it difficult to estimate the breeding values. We propose to apply the penalized orthogonal-components regression (POCRE) method to estimate breeding values. As a supervised dimension reduction method, POCRE sequentially constructs linear combinations of markers, i.e. orthogonal components, such that these components are most closely correlated to the phenotype. Such a dimension reduction is able to group highly correlated predictors and allows for collinear or nearly collinear markers. Different from BayesB, which predetermines hyperparameters, POCRE uses an empirical Bayes thresholding method to obtain data-driven optimal hyperparameters and effectively select important markers when constructing each component. Demonstrated through simulation studies, POCRE greatly reduces the computing time compared with BayesB. On the other hand, unlike fBayesB which slightly sacrifices prediction accuracy for fast computation, POCRE provides similar or even better accuracy of predicting breeding values than BayesB in both simulation studies and real data analyses.Entities:
Keywords: GenPred; Shared data resources; genomic selection; genotypic estimate of breeding values (GEBV); penalized orthogonal-components regression (POCRE); phenotypic estimate of breeding values (PEBV)
Mesh:
Substances:
Year: 2012 PMID: 23050228 PMCID: PMC3464110 DOI: 10.1534/g3.112.003822
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Prediction accuracy of BayesB and POCRE
| Generation | ||||||
|---|---|---|---|---|---|---|
| Method | 1001/1002 | 1003 | 1004 | 1005 | 1006 | 1007 |
| BayesB | 0.7038 | 0.5008 | 0.4636 | 0.3716 | 0.3670 | 0.3096 |
| (0.0399) | (0.0460) | (0.0423) | (0.0468) | (0.0426) | (0.0428) | |
| POCRE(CV0) | 0.4503 | 0.3651 | 0.3490 | 0.2983 | ||
| (0.0407) | ( | ( | ( | ( | ( | |
| POCRE(CV1) | 0.4573 | 0.3676 | 0.3563 | 0.3086 | ||
| ( | ( | ( | ( | ( | ( | |
| POCRE(CV2) | 0.4526 | 0.3673 | 0.3535 | 0.3066 | ||
| (0.0408) | ( | ( | ( | ( | ( | |
| POCRE0(CV0) | 0.4430 | 0.3654 | 0.3482 | 0.2974 | ||
| (0.0456) | ( | (0.0430) | ( | ( | ( | |
| POCRE0(CV1) | 0.4532 | 0.3715 | 0.3561 | 0.3074 | ||
| ( | ( | ( | ( | ( | ( | |
| POCRE0(CV2) | 0.4541 | 0.3493 | 0.3061 | |||
| (0.0427) | ( | ( | ( | ( | ( | |
Estimated corr(TBV, EBV) and corresponding standard deviations (in parentheses) among 50 simulated datasets. Values shown in bold indicate better performance than BayesB.
Comparison of the estimated β(TBV, EBV) using BayesB and POCRE
| Generation | ||||||
|---|---|---|---|---|---|---|
| Method | 1001/1002 | 1003 | 1004 | 1005 | 1006 | 1007 |
| BayesB | 0.99999 | 1.00015 | 0.99998 | 1.00015 | 0.99999 | 1.00010 |
| POCRE(CV0) | 0.99997 | 1.00020 | 1.00000 | 1.00020 | 0.99996 | 1.00010 |
| POCRE(CV1) | 0.99997 | 1.00020 | 0.99997 | 1.00020 | 0.99994 | 1.00020 |
| POCRE(CV2) | 0.99997 | 1.00020 | 0.99998 | 1.00020 | 0.99995 | 1.00010 |
| POCRE0(CV0) | 0.99996 | 1.00020 | 0.99992 | 1.00020 | 0.99994 | 1.00015 |
| POCRE0(CV1) | 0.99997 | 1.00010 | 0.99996 | 1.00015 | 0.99995 | 1.00010 |
| POCRE0(CV2) | 0.99997 | 1.00020 | 0.99996 | 1.00015 | 0.99994 | 1.00020 |
Total of 50 simulated datasets.
Results of POCRE and POCRE0 for analyzing pine dataset
| Trait | ||||
|---|---|---|---|---|
| DBH | HT | |||
| Method | Training | Test | Training | Test |
| POCRE | 0.9636 | |||
| ( | ( | ( | ( | |
| POCRE0 | 0.9661 | 0.6900 | 0.6366 | |
| (0.0029) | (0.0472) | (0.0023) | (0.0550) | |
Estimated correlation coefficients between the observed and estimated phenotypic values and the corresponding standard deviations (in parentheses) based on 10-fold random cross-validation. Values in bold indicate the better performance in each column.
Results of POCRE and POCRE0 for analyzing pine dataset
| Trait | ||||
|---|---|---|---|---|
| DBH | HT | |||
| Method | Training | Test | Training | Test |
| POCRE | 0.8999 | 0.7444 | 0.9181 | |
| ( | ( | ( | ( | |
| POCRE0 | 0.7349 | |||
| (0.0198) | (0.0612) | (0.0137) | (0.0444) | |
Estimated β(TBV, EBV) and corresponding standard deviations (in parentheses) based on 10-fold random cross-validation. Values in bold indicate the better performance in each column.
Results of analyzing maize flowering time data using POCRE and POCRE0
| POCRE | POCRE0 | ||||
|---|---|---|---|---|---|
| Training Data | Test Data | Training Data | Test Data | ||
| 1 | 0.9348 | 0.6763 | 0.9406 | 0.6988 | |
| 2 | 0.9437 | 0.9096 | 0.9483 | 0.9140 | |
| Replicate | 3 | 0.9268 | 0.8992 | 0.9399 | 0.9072 |
| 4 | 0.9418 | 0.9060 | 0.9382 | 0.9066 | |
| 5 | 0.9433 | 0.9076 | 0.9460 | 0.9053 | |
| MEAN | 0.9381 | 0.8597 | 0.9426 | 0.8664 | |
| STDEV | 0.0073 | 0.1026 | 0.0043 | 0.0937 | |
Estimated correlation coefficients between PEBV and GEBV, i.e. corr(PEBV, GEBV).
Results of the estimated β(TBV, EBV) in analyzing maize flowering time data using POCRE and POCRE0
| POCRE | POCRE0 | ||||
|---|---|---|---|---|---|
| Training Data | Test Data | Training Data | Test Data | ||
| 1 | 1.0001 | 0.9783 | 1.0000 | 0.9782 | |
| 2 | 1.0000 | 1.0002 | 1.0001 | 0.9999 | |
| Replicate | 3 | 0.9998 | 1.0002 | 1.0001 | 1.0002 |
| 4 | 0.9999 | 1.0004 | 0.9999 | 1.0004 | |
| 5 | 1.0000 | 0.9995 | 1.0000 | 0.9995 | |
| MEAN | 0.9999 | 0.9957 | 1.0000 | 0.9956 | |
| STDEV | 0.0001 | 0.0097 | 0.0001 | 0.0097 | |
Comparison of BayesB and POCRE for analyzing the maize flowering time data using 2000 SNPs
| Method | Training Data | Test Data |
|---|---|---|
| BayesB | 0.8084 (0.0094) | 0.6572 (0.3049) |
| POCRE | ||
| POCRE0 |
Estimated correlation coefficients between the observed and estimated phenotypic values and the corresponding standard deviations (in parentheses). Values in bold indicate better performance than BayesB.
Comparison of BayesB and POCRE for analyzing the maize flowering time data using 2000 SNPs
| Method | Training Data | Test Data |
|---|---|---|
| BayesB | 1.0016 (0.0388) | 1.0233 (0.0281) |
| POCRE | ||
| POCRE0 |
Estimated β(TBV, EBV) and the corresponding standard deviations (in parentheses). Values in bold indicate better performance than BayesB.