Literature DB >> 29869423

Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

Hemant Ishwaran1, Min Lu1.   

Abstract

Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings.
Copyright © 2018 John Wiley & Sons, Ltd.

Entities:  

Keywords:  VIMP; bootstrap; delete-d jackknife; permutation importance; prediction error; subsampling

Mesh:

Year:  2018        PMID: 29869423      PMCID: PMC6279615          DOI: 10.1002/sim.7803

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  9 in total

1.  Assessment and comparison of prognostic classification schemes for survival data.

Authors:  E Graf; C Schmoor; W Sauerbrei; M Schumacher
Journal:  Stat Med       Date:  1999 Sep 15-30       Impact factor: 2.373

2.  Consistent estimation of the expected Brier score in general survival models with right-censored event times.

Authors:  Thomas A Gerds; Martin Schumacher
Journal:  Biom J       Date:  2006-12       Impact factor: 2.207

3.  Identifying important risk factors for survival in patient with systolic heart failure using random survival forests.

Authors:  Eileen Hsich; Eiran Z Gorodeski; Eugene H Blackstone; Hemant Ishwaran; Michael S Lauer
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2010-11-23

4.  Reinforcement Learning Trees.

Authors:  Ruoqing Zhu; Donglin Zeng; Michael R Kosorok
Journal:  J Am Stat Assoc       Date:  2015-04-16       Impact factor: 5.033

5.  Random survival forests for competing risks.

Authors:  Hemant Ishwaran; Thomas A Gerds; Udaya B Kogalur; Richard D Moore; Stephen J Gange; Bryan M Lau
Journal:  Biostatistics       Date:  2014-04-11       Impact factor: 5.899

6.  Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.

Authors:  Stefan Wager; Trevor Hastie; Bradley Efron
Journal:  J Mach Learn Res       Date:  2014-01       Impact factor: 3.654

7.  Evaluating the yield of medical tests.

Authors:  F E Harrell; R M Califf; D B Pryor; K L Lee; R A Rosati
Journal:  JAMA       Date:  1982-05-14       Impact factor: 56.272

8.  The behaviour of random forest permutation-based variable importance measures under predictor correlation.

Authors:  Kristin K Nicodemus; James D Malley; Carolin Strobl; Andreas Ziegler
Journal:  BMC Bioinformatics       Date:  2010-02-27       Impact factor: 3.169

9.  Recursively Imputed Survival Trees.

Authors:  Ruoqing Zhu; Michael R Kosorok
Journal:  J Am Stat Assoc       Date:  2011-12-06       Impact factor: 5.033

  9 in total
  35 in total

1.  Multidimensional Sleep and Mortality in Older Adults: A Machine-Learning Comparison With Other Risk Factors.

Authors:  Meredith L Wallace; Daniel J Buysse; Susan Redline; Katie L Stone; Kristine Ensrud; Yue Leng; Sonia Ancoli-Israel; Martica H Hall
Journal:  J Gerontol A Biol Sci Med Sci       Date:  2019-11-13       Impact factor: 6.053

2.  Vascular biomarkers and digital ulcerations in systemic sclerosis: results from a randomized controlled trial of oral treprostinil (DISTOL-1).

Authors:  Christopher A Mecoli; Jamie Perin; Jennifer E Van Eyk; Jie Zhu; Qin Fu; Andrew G Allmon; Youlan Rao; Scott Zeger; Fredrick M Wigley; Laura K Hummers; Ami A Shah
Journal:  Clin Rheumatol       Date:  2019-12-19       Impact factor: 2.980

3.  Variables of importance in the Scientific Registry of Transplant Recipients database predictive of heart transplant waitlist mortality.

Authors:  Eileen M Hsich; Lucy Thuita; Dennis M McNamara; Joseph G Rogers; Maryam Valapour; Lee R Goldberg; Clyde W Yancy; Eugene H Blackstone; Hemant Ishwaran
Journal:  Am J Transplant       Date:  2019-02-13       Impact factor: 8.086

4.  Context-Specific Transcription Factor Functions Regulate Epigenomic and Transcriptional Dynamics during Cardiac Reprogramming.

Authors:  Nicole R Stone; Casey A Gifford; Reuben Thomas; Karishma J B Pratt; Kaitlen Samse-Knapp; Tamer M A Mohamed; Ethan M Radzinsky; Amelia Schricker; Lin Ye; Pengzhi Yu; Joke G van Bemmel; Kathryn N Ivey; Katherine S Pollard; Deepak Srivastava
Journal:  Cell Stem Cell       Date:  2019-07-03       Impact factor: 24.633

5.  m6A Regulator-Mediated RNA Methylation Modification Patterns Regulate the Immune Microenvironment in Osteoarthritis.

Authors:  Yang Duan; Cheng Yu; Meiping Yan; Yuzhen Ouyang; Songjia Ni
Journal:  Front Genet       Date:  2022-06-23       Impact factor: 4.772

6.  Machine Learning-Based Gray-Level Co-Occurrence Matrix (GLCM) Models for Predicting the Depth of Myometrial Invasion in Patients with Stage I Endometrial Cancer.

Authors:  Xiaoyuan Qian; Du He; Li Qin; Lin Lai; Hongli Wang; Yukun Zhang
Journal:  Cancer Manag Res       Date:  2022-06-30       Impact factor: 3.602

7.  REPLY: THE STANDARDIZATION AND AUTOMATION OF MACHINE LEARNING FOR BIOMEDICAL DATA.

Authors:  Hemant Ishwaran; Robert O'Brien
Journal:  J Thorac Cardiovasc Surg       Date:  2020-08-28       Impact factor: 5.209

8.  Development of a "meta-model" to address missing data, predict patient-specific cancer survival and provide a foundation for clinical decision support.

Authors:  Jason M Baron; Ketan Paranjape; Tara Love; Vishakha Sharma; Denise Heaney; Matthew Prime
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 4.497

9.  Survival Benefit of Donation After Circulatory Death Kidney Transplantation in Children Compared With Remaining on the Waiting List for a Kidney Donated After Brain Death.

Authors:  Sarah J Kizilbash; Michael D Evans; Blanche M Chavers
Journal:  Transplantation       Date:  2022-03-01       Impact factor: 5.385

10.  Discussion on "Nonparametric variable importance assessment using machine learning techniques" by Brian D. Williamson, Peter B. Gilbert, Marco Carone, and Noah Simon.

Authors:  Min Lu; Hemant Ishwaran
Journal:  Biometrics       Date:  2020-12-08       Impact factor: 1.701

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.