Literature DB >> 25580094

Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.

Stefan Wager1, Trevor Hastie1, Bradley Efron1.   

Abstract

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.

Entities:  

Keywords:  Monte Carlo noise; bagging; jackknife methods; variance estimation

Year:  2014        PMID: 25580094      PMCID: PMC4286302     

Source DB:  PubMed          Journal:  J Mach Learn Res        ISSN: 1532-4435            Impact factor:   3.654


  2 in total

1.  Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients.

Authors:  T A Stamey; J N Kabalin; J E McNeal; I M Johnstone; F Freiha; E A Redwine; N Yang
Journal:  J Urol       Date:  1989-05       Impact factor: 7.450

2.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

  2 in total
  30 in total

1.  Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

Authors:  Xiaogang Su; Annette T Peña; Lei Liu; Richard A Levine
Journal:  Stat Med       Date:  2018-04-29       Impact factor: 2.373

2.  Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches.

Authors:  Cole Brokamp; Roman Jandarov; M B Rao; Grace LeMasters; Patrick Ryan
Journal:  Atmos Environ (1994)       Date:  2016-12-01       Impact factor: 4.798

3.  Correcting Measurement Error in Satellite Aerosol Optical Depth with Machine Learning for Modeling PM2.5 in the Northeastern USA.

Authors:  Allan C Just; Margherita M De Carli; Alexandra Shtein; Michael Dorman; Alexei Lyapustin; Itai Kloog
Journal:  Remote Sens (Basel)       Date:  2018-05-22       Impact factor: 4.848

4.  Effects of an invasive polychaete on benthic phosphorus cycling at sea basin scale: An ecosystem disservice.

Authors:  Antonia Nyström Sandman; Johan Näslund; Ing-Marie Gren; Karl Norling
Journal:  Ambio       Date:  2018-05-05       Impact factor: 5.129

5.  Estimating the Optimal Personalized Treatment Strategy Based on Selected Variables to Prolong Survival via Random Survival Forest with Weighted Bootstrap.

Authors:  Jincheng Shen; Lu Wang; Stephanie Daignault; Daniel E Spratt; Todd M Morgan; Jeremy M G Taylor
Journal:  J Biopharm Stat       Date:  2017-10-25       Impact factor: 1.051

6.  Drawing inferences for high-dimensional linear models: A selection-assisted partial regression and smoothing approach.

Authors:  Zhe Fei; Ji Zhu; Moulinath Banerjee; Yi Li
Journal:  Biometrics       Date:  2019-03-29       Impact factor: 2.571

7.  PSICA: Decision trees for probabilistic subgroup identification with categorical treatments.

Authors:  Oleg Sysoev; Krzysztof Bartoszek; Eva-Charlotte Ekström; Katarina Ekholm Selling
Journal:  Stat Med       Date:  2019-06-27       Impact factor: 2.373

8.  Bayesian Additive Regression Trees using Bayesian Model Averaging.

Authors:  Belinda Hernández; Adrian E Raftery; Stephen R Pennington; Andrew C Parnell
Journal:  Stat Comput       Date:  2017-07-27       Impact factor: 2.559

9.  Estimating restricted mean treatment effects with stacked survival models.

Authors:  Andrew Wey; David M Vock; John Connett; Kyle Rudser
Journal:  Stat Med       Date:  2016-03-02       Impact factor: 2.373

10.  Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

Authors:  Hemant Ishwaran; Min Lu
Journal:  Stat Med       Date:  2018-06-04       Impact factor: 2.373

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.