Literature DB >> 28919667

The Effect of Splitting on Random Forests.

Hemant Ishwaran1.   

Abstract

The effect of a splitting rule on random forests (RF) is systematically studied for regression and classification problems. A class of weighted splitting rules, which includes as special cases CART weighted variance splitting and Gini index splitting, are studied in detail and shown to possess a unique adaptive property to signal and noise. We show for noisy variables that weighted splitting favors end-cut splits. While end-cut splits have traditionally been viewed as undesirable for single trees, we argue for deeply grown trees (a trademark of RF) end-cut splitting is useful because: (a) it maximizes the sample size making it possible for a tree to recover from a bad split, and (b) if a branch repeatedly splits on noise, the tree minimal node size will be reached which promotes termination of the bad branch. For strong variables, weighted variance splitting is shown to possess the desirable property of splitting at points of curvature of the underlying target function. This adaptivity to both noise and signal does not hold for unweighted and heavy weighted splitting rules. These latter rules are either too greedy, making them poor at recognizing noisy scenarios, or they are overly ECP aggressive, making them poor at recognizing signal. These results also shed light on pure random splitting and show that such rules are the least effective. On the other hand, because randomized rules are desirable because of their computational efficiency, we introduce a hybrid method employing random split-point selection which retains the adaptive property of weighted splitting rules while remaining computational efficient.

Entities:  

Keywords:  CART; end-cut preference; law of the iterated logarithm; split-point; splitting rule

Year:  2014        PMID: 28919667      PMCID: PMC5599182          DOI: 10.1007/s10994-014-5451-2

Source DB:  PubMed          Journal:  Mach Learn        ISSN: 0885-6125            Impact factor:   2.940


  1 in total

1.  Probability machines: consistent probability estimation using nonparametric learning machines.

Authors:  J D Malley; J Kruppa; A Dasgupta; K G Malley; A Ziegler
Journal:  Methods Inf Med       Date:  2011-09-14       Impact factor: 2.176

  1 in total
  18 in total

1.  Random Forest Missing Data Algorithms.

Authors:  Fei Tang; Hemant Ishwaran
Journal:  Stat Anal Data Min       Date:  2017-06-13       Impact factor: 1.051

2.  Vascular biomarkers and digital ulcerations in systemic sclerosis: results from a randomized controlled trial of oral treprostinil (DISTOL-1).

Authors:  Christopher A Mecoli; Jamie Perin; Jennifer E Van Eyk; Jie Zhu; Qin Fu; Andrew G Allmon; Youlan Rao; Scott Zeger; Fredrick M Wigley; Laura K Hummers; Ami A Shah
Journal:  Clin Rheumatol       Date:  2019-12-19       Impact factor: 2.980

3.  Censoring Unbiased Regression Trees and Ensembles.

Authors:  Jon Arni Steingrimsson; Liqun Diao; Robert L Strawderman
Journal:  J Am Stat Assoc       Date:  2018-07-09       Impact factor: 5.033

4.  Greedy outcome weighted tree learning of optimal personalized treatment rules.

Authors:  Ruoqing Zhu; Ying-Qi Zhao; Guanhua Chen; Shuangge Ma; Hongyu Zhao
Journal:  Biometrics       Date:  2016-10-04       Impact factor: 2.571

5.  A long non-coding RNA signature to improve prognosis prediction of gastric cancer.

Authors:  Xiaoqiang Zhu; Xianglong Tian; Chenyang Yu; Chaoqin Shen; Tingting Yan; Jie Hong; Zheng Wang; Jing-Yuan Fang; Haoyan Chen
Journal:  Mol Cancer       Date:  2016-09-20       Impact factor: 27.401

6.  A Selective Review on Random Survival Forests for High Dimensional Data.

Authors:  Hong Wang; Gang Li
Journal:  Quant Biosci       Date:  2017

7.  A survival model generalized to regression learning algorithms.

Authors:  Yuanfang Guan; Hongyang Li; Daiyao Yi; Dongdong Zhang; Changchang Yin; Keyu Li; Ping Zhang
Journal:  Nat Comput Sci       Date:  2021-06-21

Review 8.  GradientScanSurv-An exhaustive association test method for gene expression data with censored survival outcome.

Authors:  Ming Yi; Ruoqing Zhu; Robert M Stephens
Journal:  PLoS One       Date:  2018-12-05       Impact factor: 3.240

9.  Exploration of Blood Lipoprotein and Lipid Fraction Profiles in Healthy Subjects through Integrated Univariate, Multivariate, and Network Analysis Reveals Association of Lipase Activity and Cholesterol Esterification with Sex and Age.

Authors:  Yasmijn Balder; Alessia Vignoli; Leonardo Tenori; Claudio Luchinat; Edoardo Saccenti
Journal:  Metabolites       Date:  2021-05-18

10.  Assessing the perceived changes in neighborhood physical and social environments and how they are associated with Chinese internal migrants' mental health.

Authors:  Min Yang; Julian Hagenauer; Martin Dijst; Marco Helbich
Journal:  BMC Public Health       Date:  2021-06-28       Impact factor: 3.295

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.