Literature DB >> 22302520

Identifying representative trees from ensembles.

Mousumi Banerjee1, Ying Ding, Anne-Michelle Noone.   

Abstract

Tree-based methods have become popular for analyzing complex data structures where the primary goal is risk stratification of patients. Ensemble techniques improve the accuracy in prediction and address the instability in a single tree by growing an ensemble of trees and aggregating. However, in the process, individual trees get lost. In this paper, we propose a methodology for identifying the most representative trees in an ensemble on the basis of several tree distance metrics. Although our focus is on binary outcomes, the methods are applicable to censored data as well. For any two trees, the distance metrics are chosen to (1) measure similarity of the covariates used to split the trees; (2) reflect similar clustering of patients in the terminal nodes of the trees; and (3) measure similarity in predictions from the two trees. Whereas the latter focuses on prediction, the first two metrics focus on the architectural similarity between two trees. The most representative trees in the ensemble are chosen on the basis of the average distance between a tree and all other trees in the ensemble. Out-of-bag estimate of error rate is obtained using neighborhoods of representative trees. Simulations and data examples show gains in predictive accuracy when averaging over such neighborhoods. We illustrate our methods using a dataset of kidney cancer treatment receipt (binary outcome) and a second dataset of breast cancer survival (censored outcome).
Copyright © 2012 John Wiley & Sons, Ltd.

Entities:  

Mesh:

Year:  2012        PMID: 22302520     DOI: 10.1002/sim.4492

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  11 in total

1.  Prediction of remission in obsessive compulsive disorder using a novel machine learning strategy.

Authors:  Kathleen D Askland; Sarah Garnaat; Nicholas J Sibrava; Christina L Boisseau; David Strong; Maria Mancebo; Benjamin Greenberg; Steve Rasmussen; Jane Eisen
Journal:  Int J Methods Psychiatr Res       Date:  2015-05-21       Impact factor: 4.035

2.  Tree-Based Analysis.

Authors:  Mousumi Banerjee; Evan Reynolds; Hedvig B Andersson; Brahmajee K Nallamothu
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2019-05

3.  High-Throughput Profiling of Circulating Antibody Signatures for Stroke Diagnosis Using Small Volumes of Whole Blood.

Authors:  Grant C O'Connell; Phillip Stafford; Kyle B Walsh; Opeolu Adeoye; Taura L Barr
Journal:  Neurotherapeutics       Date:  2019-07       Impact factor: 7.620

Review 4.  Risk estimation and risk prediction using machine-learning methods.

Authors:  Jochen Kruppa; Andreas Ziegler; Inke R König
Journal:  Hum Genet       Date:  2012-07-03       Impact factor: 4.132

5.  Tree-based model for thyroid cancer prognostication.

Authors:  Mousumi Banerjee; Daniel G Muenz; Joanne T Chang; Maria Papaleontiou; Megan R Haymart
Journal:  J Clin Endocrinol Metab       Date:  2014-07-17       Impact factor: 5.958

6.  SARS-CoV-2 infection and acute ischemic stroke in Lombardy, Italy.

Authors:  Maria Sessa; Alessandro Padovani; Alessandro Pezzini; Mario Grassi; Giorgio Silvestrelli; Martina Locatelli; Nicola Rifino; Simone Beretta; Massimo Gamba; Elisa Raimondi; Giuditta Giussani; Federico Carimati; Davide Sangalli; Manuel Corato; Simonetta Gerevini; Stefano Masciocchi; Matteo Cortinovis; Sara La Gioia; Francesca Barbieri; Valentina Mazzoleni; Debora Pezzini; Sonia Bonacina; Andrea Pilotto; Alberto Benussi; Mauro Magoni; Enrico Premi; Alessandro Cesare Prelle; Elio Clemente Agostoni; Fernando Palluzzi; Valeria De Giuli; Anna Magherini; Daria Valeria Roccatagliata; Luisa Vinciguerra; Valentina Puglisi; Laura Fusi; Susanna Diamanti; Francesco Santangelo; Rubjona Xhani; Federico Pozzi; Giampiero Grampa; Maurizio Versino; Andrea Salmaggi; Simona Marcheselli; Anna Cavallini; Alessia Giossi; Bruno Censori; Carlo Ferrarese; Alfonso Ciccone
Journal:  J Neurol       Date:  2021-05-24       Impact factor: 4.849

7.  The Reliability of Classification of Terminal Nodes in GUIDE Decision Tree to Predict the Nonalcoholic Fatty Liver Disease.

Authors:  Mehdi Birjandi; Seyyed Mohammad Taghi Ayatollahi; Saeedeh Pourahmad
Journal:  Comput Math Methods Med       Date:  2016-12-07       Impact factor: 2.238

8.  RERT: A Novel Regression Tree Approach to Predict Extrauterine Disease in Endometrial Carcinoma Patients.

Authors:  Marika Vezzoli; Antonella Ravaggi; Laura Zanotti; Rebecca Angelica Miscioscia; Eliana Bignotti; Monica Ragnoli; Angela Gambino; Giuseppina Ruggeri; Stefano Calza; Enrico Sartori; Franco Odicino
Journal:  Sci Rep       Date:  2017-09-05       Impact factor: 4.379

9.  Characteristics Associated With Decreased or Increased Mortality Risk From Glycemic Therapy Among Patients With Type 2 Diabetes and High Cardiovascular Risk: Machine Learning Analysis of the ACCORD Trial.

Authors:  Sanjay Basu; Sridharan Raghavan; Deborah J Wexler; Seth A Berkowitz
Journal:  Diabetes Care       Date:  2017-12-26       Impact factor: 19.112

10.  Generalizability of heterogeneous treatment effects based on causal forests applied to two randomized clinical trials of intensive glycemic control.

Authors:  Sridharan Raghavan; Kevin Josey; Gideon Bahn; Domenic Reda; Sanjay Basu; Seth A Berkowitz; Nicholas Emanuele; Peter Reaven; Debashis Ghosh
Journal:  Ann Epidemiol       Date:  2021-07-17       Impact factor: 3.797

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.