Literature DB >> 30449953

Bayesian Additive Regression Trees using Bayesian Model Averaging.

Belinda Hernández1, Adrian E Raftery2, Stephen R Pennington3, Andrew C Parnell1,4.   

Abstract

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small n large p" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.

Entities:  

Year:  2017        PMID: 30449953      PMCID: PMC6238959          DOI: 10.1007/s11222-017-9767-1

Source DB:  PubMed          Journal:  Stat Comput        ISSN: 0960-3174            Impact factor:   2.559


  11 in total

1.  Random forest: a classification and regression tool for compound classification and QSAR modeling.

Authors:  Vladimir Svetnik; Andy Liaw; Christopher Tong; J Christopher Culberson; Robert P Sheridan; Bradley P Feuston
Journal:  J Chem Inf Comput Sci       Date:  2003 Nov-Dec

Review 2.  The Bayesian revolution in genetics.

Authors:  Mark A Beaumont; Bruce Rannala
Journal:  Nat Rev Genet       Date:  2004-04       Impact factor: 53.242

3.  The huge Package for High-dimensional Undirected Graph Estimation in R.

Authors:  Tuo Zhao; Han Liu; Kathryn Roeder; John Lafferty; Larry Wasserman
Journal:  J Mach Learn Res       Date:  2012-04       Impact factor: 3.654

Review 4.  Molecular classification of prostate cancer progression: foundation for marker-driven treatment of prostate cancer.

Authors:  Christopher J Logothetis; Gary E Gallick; Sankar N Maity; Jeri Kim; Ana Aparicio; Eleni Efstathiou; Sue-Hwa Lin
Journal:  Cancer Discov       Date:  2013-06-28       Impact factor: 39.397

5.  Big data: How do your data grow?

Authors:  Clifford Lynch
Journal:  Nature       Date:  2008-09-04       Impact factor: 49.962

6.  Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination.

Authors:  Ulf Norinder; Lars Carlsson; Scott Boyer; Martin Eklund
Journal:  J Chem Inf Model       Date:  2014-05-21       Impact factor: 4.956

7.  Why have so few proteomic biomarkers "survived" validation? (Sample size and independent validation considerations).

Authors:  Belinda Hernández; Andrew Parnell; Stephen R Pennington
Journal:  Proteomics       Date:  2014-05-16       Impact factor: 3.984

8.  Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.

Authors:  Stefan Wager; Trevor Hastie; Bradley Efron
Journal:  J Mach Learn Res       Date:  2014-01       Impact factor: 3.654

9.  The behaviour of random forest permutation-based variable importance measures under predictor correlation.

Authors:  Kristin K Nicodemus; James D Malley; Carolin Strobl; Andreas Ziegler
Journal:  BMC Bioinformatics       Date:  2010-02-27       Impact factor: 3.169

10.  Gene selection and classification of microarray data using random forest.

Authors:  Ramón Díaz-Uriarte; Sara Alvarez de Andrés
Journal:  BMC Bioinformatics       Date:  2006-01-06       Impact factor: 3.169

View more
  6 in total

1.  Do German economic research institutes publish efficient growth and inflation forecasts? A Bayesian analysis.

Authors:  Christoph Behrens; Christian Pierdzioch; Marian Risse
Journal:  J Appl Stat       Date:  2019-08-08       Impact factor: 1.416

2.  Artificial Intelligence in Pharmacoepidemiology: A Systematic Review. Part 1-Overview of Knowledge Discovery Techniques in Artificial Intelligence.

Authors:  Maurizio Sessa; Abdul Rauf Khan; David Liang; Morten Andersen; Murat Kulahci
Journal:  Front Pharmacol       Date:  2020-07-16       Impact factor: 5.810

3.  Unraveling the habitat preferences of two closely related bumble bee species in Eastern Europe.

Authors:  Julia C Geue; Henri A Thomassen
Journal:  Ecol Evol       Date:  2020-04-15       Impact factor: 2.912

4.  An Efficient and Effective Model to Handle Missing Data in Classification.

Authors:  Kamran Mehrabani-Zeinabad; Marziyeh Doostfatemeh; Seyyed Mohammad Taghi Ayatollahi
Journal:  Biomed Res Int       Date:  2020-11-25       Impact factor: 3.411

5.  Computational Models Using Multiple Machine Learning Algorithms for Predicting Drug Hepatotoxicity with the DILIrank Dataset.

Authors:  Robert Ancuceanu; Marilena Viorica Hovanet; Adriana Iuliana Anghel; Florentina Furtunescu; Monica Neagu; Carolina Constantin; Mihaela Dinu
Journal:  Int J Mol Sci       Date:  2020-03-19       Impact factor: 5.923

6.  Application of Bayesian Additive Regression Trees for Estimating Daily Concentrations of PM2.5 Components.

Authors:  Tianyu Zhang; Guannan Geng; Yang Liu; Howard H Chang
Journal:  Atmosphere (Basel)       Date:  2020-11-16       Impact factor: 2.686

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.