| Literature DB >> 27286957 |
Abstract
BACKGROUND: The goal of genome-wide prediction (GWP) is to predict phenotypes based on marker genotypes, often obtained through single nucleotide polymorphism (SNP) chips. The major problem with GWP is high-dimensional data from many thousands of SNPs scored on several thousands of individuals. A large number of methods have been developed for GWP, which are mostly parametric methods that assume statistical linearity and only additive genetic effects. The Bayesian additive regression trees (BART) method was recently proposed and is based on the sum of nonparametric regression trees with the priors being used to regularize the parameters. Each regression tree is based on a recursive binary partitioning of the predictor space that approximates an unknown function, which will automatically model nonlinearities within SNPs (dominance) and interactions between SNPs (epistasis). In this study, we introduced BART and compared its predictive performance with that of the LASSO, Bayesian LASSO (BLASSO), genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space (RKHS) regression and random forest (RF) methods.Entities:
Mesh:
Year: 2016 PMID: 27286957 PMCID: PMC4901500 DOI: 10.1186/s12711-016-0219-8
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Mean squared prediction error (MSPE) for the LASSO, Bayesian LASSO (BLASSO), genomic BLUP (GBLUP), reproducing kernel Hilbert space (RKHS) regression, random forests (RF) and Bayesian additive regression trees (BART) methods evaluated on the simulated original QTLMAS2010 data
| Method | Mean squared prediction error (MSPE) | ||||||
|---|---|---|---|---|---|---|---|
| LASSO | |||||||
| |
| ||||||
| | 63.404 | ||||||
| BLASSO |
| ||||||
| GBLUP |
| ||||||
| RKHS | |||||||
| | 66.910 | ||||||
| |
| ||||||
| | 67.200 | ||||||
| RF |
|
|
|
|
|
|
|
| 82.108 | 79.772 | 77.794 | 77.274 | 77.149 |
| 76.419 | |
| BART |
|
|
|
|
|
|
|
| | |||||||
| | 76.231 | 69.974 | 65.703 | 64.967 | 64.324 | 64.213 | 64.574 |
| | 71.325 | 68.537 | 66.755 | 63.772 | 62.782 | 62.919 | 63.476 |
| | 79.264 | 66.554 | 66.376 | 63.596 |
| 63.119 | 63.790 |
| | 72.344 | 70.608 | 65.467 | 62.705 | 62.715 | 63.997 | 64.982 |
| | |||||||
| | 78.656 | 76.734 | 68.282 | 64.126 | 64.218 | 63.697 | 64.566 |
| | 74.893 | 68.379 | 64.858 | 63.762 | 62.884 | 63.108 | 63.402 |
| | 74.128 | 66.817 | 64.788 | 63.836 | 62.596 | 63.175 | 63.807 |
| | 76.757 | 66.284 | 64.512 | 62.648 | 62.823 | 63.912 | 64.976 |
The lowest MSPE obtained with each method is highlighted in italics. M is the number of trees for RF and BART, and and are hyperparameters of the BART priors. The stopping criteria for the regularization coefficient λ in LASSO were obtained based on tenfold cross-validation both at minimum MSE and minimum MSE plus 1 standard error [42]
Mean squared prediction error (MSPE) for the LASSO, Bayesian LASSO (BLASSO), genomic BLUP (GBLUP), reproducing kernel Hilbert space (RKHS) regression, random forests (RF) and Bayesian additive regression trees (BART) methods evaluated on the simulated QTLMAS2010 data when dominance and epistatic effects were added
| Method | Mean squared prediction error (MSPE) | ||||||
|---|---|---|---|---|---|---|---|
| LASSO | |||||||
| |
| ||||||
| | 84.832 | ||||||
| BLASSO |
| ||||||
| GBLUP |
| ||||||
| RKHS | |||||||
| | 92.361 | ||||||
| |
| ||||||
| | 91.906 | ||||||
| RF |
|
|
|
|
|
|
|
| 107.908 | 105.123 | 100.784 | 101.992 | 100.327 | 100.900 |
| |
| BART |
|
|
|
|
|
|
|
| | |||||||
| | 80.717 | 76.892 | 70.845 | 65.294 | 65.196 | 66.283 | 66.906 |
| | 79.277 | 72.720 | 67.061 | 65.120 | 64.943 | 65.542 | 66.593 |
| | 87.030 | 71.401 | 65.635 |
| 65.149 | 66.483 | 68.050 |
| | 79.249 | 71.243 | 67.748 | 64.741 | 65.611 | 68.290 | 70.510 |
| | |||||||
| | 86.328 | 70.452 | 67.744 | 65.465 | 65.308 | 65.801 | 66.998 |
| | 76.438 | 69.833 | 67.123 | 65.522 | 65.045 | 65.513 | 66.601 |
| | 86.653 | 74.651 | 67.164 | 67.220 | 65.074 | 66.544 | 68.163 |
| | 90.456 | 69.571 | 65.085 | 66.086 | 65.790 | 68.298 | 70.566 |
The lowest MSPE obtained with each method is highlighted in italics. h is the bandwidth of the radial basis function kernel. M is the number of trees for RF and BART, and and are hyperparameters of the BART priors. The stopping criteria for the regularization coefficient λ in LASSO were obtained based on tenfold cross-validation both at minimum MSE and minimum MSE plus 1 standard error [42]
Fig. 1Manhattan plots of the penalized regression coefficients from the LASSO, BLASSO, GBLUP and RKHS methods, VIMP (percent decrease in MSE) for RF, VIP (average number of node splits per iteration) for BART from the analyses of the original QTLMAS2010 dataset. Dotted lines delimit chromosomes; the major additive genetic effects on chromosome 3 are indicated by magenta circles and the epistatic loci on chromosome 1 by blue diamonds
Fig. 2Manhattan plots of the penalized regression coefficients from the LASSO, BLASSO, GBLUP and RKHS methods, VIMP (percent decrease in MSE) for RF, VIP (average number of node splits per iteration) for BART from the analyses of the QTLMAS2010 dataset with non-additive genetic effects added on chromosome 5. Dotted lines delimit chromosomes. The dominant locus is indicated by a red square, the over-dominant locus by a green upper triangle, the under-dominant locus by a cyan lower triangle, and the epistatic loci by blue diamonds
Mean squared prediction error (MSPE) for the LASSO, Bayesian LASSO (BLASSO), genomic BLUP (GBLUP), reproducing kernel Hilbert space (RKHS) regression, random forests (RF) and Bayesian additive regression trees (BART) methods evaluated on the pig PorcineSNP60 chip genotype data with one phenotype
| Method | Mean squared prediction error (MSPE) | |||||
|---|---|---|---|---|---|---|
| LASSO | ||||||
| |
| |||||
| | 0.861 | |||||
| BLASSO |
| |||||
| GBLUP |
| |||||
| RKHS | ||||||
| | 0.821 | |||||
| |
| |||||
| | 0.820 | |||||
| RF |
|
|
|
|
|
|
| 0.819 | 0.820 | 0.815 | 0.817 |
| 0.813 | |
| BART |
|
|
|
|
|
|
| | ||||||
| | 0.822 | 0.820 | 0.821 | – | – | – |
| | 0.819 | 0.814 | 0.815 | – | – | – |
| | 0.814 |
| 0.812 | – | – | – |
| | 0.815 | 0.813 | 0.814 | – | – | – |
| | ||||||
| | 0.826 | 0.820 | 0.821 | – | – | – |
| | 0.823 | 0.814 | 0.814 | – | – | – |
| | 0.815 | 0.812 | 0.812 | – | – | – |
| | 0.814 | 0.814 | 0.814 | – | – | – |
The estimates are the mean over five random cross-validation-folds with 70 % training and 30 % test partitions. The lowest MSPE obtained with each method is highlighted in italics. h is the bandwidth of the radial basis function kernel. M is the number of trees for RF and BART, and and are hyperparameters of the BART priors. The stopping criteria for the regularization coefficient λ in LASSO were obtained based on tenfold cross-validation both at minimum MSE and minimum MSE plus 1 standard error [42]
Fig. 3Manhattan plots of the penalized regression coefficients from the LASSO, BLASSO, GBLUP and RKHS methods, VIMP (percent decrease in MSE) for RF, VIP (average number of node splits per iteration) for BART from the analyses of the Cleveland pig dataset. The five most important SNPs in the BART analysis are highlighted in different colors for comparison with the other plots