Literature DB >> 20165560

Search for the smallest random forest.

Heping Zhang1, Minghui Wang.   

Abstract

Random forests have emerged as one of the most commonly used nonparametric statistical methods in many scientific areas, particularly in analysis of high throughput genomic data. A general practice in using random forests is to generate a sufficiently large number of trees, although it is subjective as to how large is sufficient. Furthermore, random forests are viewed as "black-box" because of its sheer size. In this work, we address a fundamental issue in the use of random forests: how large does a random forest have to be? To this end, we propose a specific method to find a sub-forest (e.g., in a single digit number of trees) that can achieve the prediction accuracy of a large random forest (in the order of thousands of trees). We tested it on extensive simulation studies and a real study on prognosis of breast cancer. The results show that such sub-forests usually exist and most of them are very small, suggesting they are actually the "representatives" of the whole random forests. We conclude that the sub-forests are indeed the core of a random forest. Thus it is not necessary to use the whole forest for satisfying prediction performance. Also, by reducing the size of a random forest to a manageable size, the random forest is no longer a black-box.

Entities:  

Year:  2009        PMID: 20165560      PMCID: PMC2822360          DOI: 10.4310/sii.2009.v2.n3.a11

Source DB:  PubMed          Journal:  Stat Interface        ISSN: 1938-7989            Impact factor:   0.582


  12 in total

1.  A forest-based approach to identifying gene and gene gene interactions.

Authors:  Xiang Chen; Ching-Ti Liu; Meizhuo Zhang; Heping Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

2.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

3.  Gene-expression profiles in hereditary breast cancer.

Authors:  I Hedenfalk; D Duggan; Y Chen; M Radmacher; M Bittner; R Simon; P Meltzer; B Gusterson; M Esteller; O P Kallioniemi; B Wilfond; A Borg; J Trent; M Raffeld; Z Yakhini; A Ben-Dor; E Dougherty; J Kononen; L Bubendorf; W Fehrle; S Pittaluga; S Gruvberger; N Loman; O Johannsson; H Olsson; G Sauter
Journal:  N Engl J Med       Date:  2001-02-22       Impact factor: 91.245

4.  Tree-based analysis of microarray data for classifying breast cancer.

Authors:  Heping Zhang; Chang-Yung Yu
Journal:  Front Biosci       Date:  2002-05-01

5.  Predicting the clinical status of human breast cancer by using gene expression profiles.

Authors:  M West; C Blanchette; H Dressman; E Huang; S Ishida; R Spang; H Zuzan; J A Olson; J R Marks; J R Nevins
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-18       Impact factor: 11.205

6.  Molecular portraits of human breast tumours.

Authors:  C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein
Journal:  Nature       Date:  2000-08-17       Impact factor: 49.962

7.  Cell and tumor classification using gene expression data: construction of forests.

Authors:  Heping Zhang; Chang-Yung Yu; Burton Singer
Journal:  Proc Natl Acad Sci U S A       Date:  2003-03-17       Impact factor: 11.205

8.  Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group.

Authors: 
Journal:  Lancet       Date:  1998-09-19       Impact factor: 79.321

9.  A genome-wide tree- and forest-based association analysis of comorbidity of alcoholism and smoking.

Authors:  Yuanqing Ye; Xiaoyun Zhong; Heping Zhang
Journal:  BMC Genet       Date:  2005-12-30       Impact factor: 2.797

10.  Variable selection for large p small n regression models with incomplete data: mapping QTL with epistases.

Authors:  Min Zhang; Dabao Zhang; Martin T Wells
Journal:  BMC Bioinformatics       Date:  2008-05-29       Impact factor: 3.169

View more
  9 in total

1.  Comments on Fifty Years of Classification and Regression Trees.

Authors:  Chi Song; Heping Zhang
Journal:  Int Stat Rev       Date:  2014-12-01       Impact factor: 2.217

2.  PECLIDES Neuro: A Personalisable Clinical Decision Support System for Neurological Diseases.

Authors:  Tamara T Müller; Pietro Lio
Journal:  Front Artif Intell       Date:  2020-04-21

3.  The use of classification trees for bioinformatics.

Authors:  Xiang Chen; Minghui Wang; Heping Zhang
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2011-01-06

4.  Energy bagging tree.

Authors:  Taoyun Cao; Xueqin Wang; Heping Zhang
Journal:  Stat Interface       Date:  2016       Impact factor: 0.582

5.  Depth importance in precision medicine (DIPM): a tree- and forest-based method for right-censored survival outcomes.

Authors:  Victoria Chen; Heping Zhang
Journal:  Biostatistics       Date:  2022-01-13       Impact factor: 5.899

6.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

Authors:  Wouter G Touw; Jumamurat R Bayjanov; Lex Overmars; Lennart Backus; Jos Boekhorst; Michiel Wels; Sacha A F T van Hijum
Journal:  Brief Bioinform       Date:  2012-07-10       Impact factor: 11.622

7.  Weighted Random Forests to Improve Arrhythmia Classification.

Authors:  Krzysztof Gajowniczek; Iga Grzegorczyk; Tomasz Ząbkowski; Chandrajit Bajaj
Journal:  Electronics (Basel)       Date:  2020-01-03       Impact factor: 2.397

8.  Prediction of prognosis and survival of patients with gastric cancer by a weighted improved random forest model: an application of machine learning in medicine.

Authors:  Cheng Xu; Jing Wang; Tianlong Zheng; Yue Cao; Fan Ye
Journal:  Arch Med Sci       Date:  2021-04-10       Impact factor: 3.707

9.  Impact of ecological redundancy on the performance of machine learning classifiers in vegetation mapping.

Authors:  Paul D Macintyre; Adriaan Van Niekerk; Mark P Dobrowolski; James L Tsakalos; Ladislav Mucina
Journal:  Ecol Evol       Date:  2018-06-11       Impact factor: 2.912

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.