| Literature DB >> 21624167 |
Joseph O Ogutu1, Hans-Peter Piepho, Torben Schulz-Streeck.
Abstract
BACKGROUND: Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs.Entities:
Year: 2011 PMID: 21624167 PMCID: PMC3103196 DOI: 10.1186/1753-6561-5-S3-S11
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
| CV/TBV | Sample size | Random Forests | Boosting | Support Vector Machines | Ridge Regression BLUP | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Range | Mean | Range | Mean | Range | Mean | Range | Mean | Range | |
| CV | 439 | 416-514 | 0.466 | 0.392-0.534 | 0.503 | 0.431-0.567 | 0.503 | 0.432-0.567 | 0.530 | 0.451-0.620 |
| TBV | 900 | 0.483 | 0.547 | 0.497 | 0.607 | |||||
Predictive accuracies of random forests, boosted regression trees, epsilon support vector machines and RR-BLUP, expressed as the Pearson correlation between GEBVs and observed values from the 5-fold cross-validation (CV) and between GEBVs and TBV for non-phenotyped individuals (TBV).
Figure 1Importance ranking of the 10031 SNP markers by random forest using percent increase in mean squared error. Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.
Figure 2Importance ranking of the 10031 SNP markers by random forest using tree node impurity. Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.