Literature DB >> 33043428

Nonparametric variable importance assessment using machine learning techniques.

Brian D Williamson1, Peter B Gilbert1,2, Marco Carone1,2, Noah Simon1.   

Abstract

In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data-generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.
© 2020 The International Biometric Society.

Entities:  

Keywords:  machine learning; nonparametric R2; statistical inference; targeted learning; variable importance

Mesh:

Year:  2020        PMID: 33043428      PMCID: PMC7946807          DOI: 10.1111/biom.13392

Source DB:  PubMed          Journal:  Biometrics        ISSN: 0006-341X            Impact factor:   2.571


  6 in total

1.  Super learner.

Authors:  Mark J van der Laan; Eric C Polley; Alan E Hubbard
Journal:  Stat Appl Genet Mol Biol       Date:  2007-09-16

2.  Targeted estimation of binary variable importance measures with interval-censored outcomes.

Authors:  Stephanie Sapp; Mark J van der Laan; Kimberly Page
Journal:  Int J Biostat       Date:  2014       Impact factor: 0.968

3.  Type A Behavior Pattern: its association with coronary heart disease.

Authors:  M Friedman; R H Rosenman
Journal:  Ann Clin Res       Date:  1971-12

4.  Coronary risk factor screening in three rural communities. The CORIS baseline study.

Authors:  J E Rossouw; J P Du Plessis; A J Benadé; P C Jordaan; J P Kotzé; P L Jooste; J J Ferreira
Journal:  S Afr Med J       Date:  1983-09-17

5.  Estimation of a non-parametric variable importance measure of a continuous exposure.

Authors:  Antoine Chambaz; Pierre Neuvial; Mark J van der Laan
Journal:  Electron J Stat       Date:  2012       Impact factor: 1.125

6.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

  6 in total
  7 in total

1.  Testing a global null hypothesis using ensemble machine learning methods.

Authors:  Sunwoo Han; Youyi Fong; Ying Huang
Journal:  Stat Med       Date:  2022-03-07       Impact factor: 2.497

2.  Interpretable machine learning for genomics.

Authors:  David S Watson
Journal:  Hum Genet       Date:  2021-10-20       Impact factor: 5.881

3.  All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

Authors:  Aaron Fisher; Cynthia Rudin; Francesca Dominici
Journal:  J Mach Learn Res       Date:  2019       Impact factor: 5.177

4.  Efficient nonparametric statistical inference on population feature importance using Shapley values.

Authors:  Brian D Williamson; Jean Feng
Journal:  Proc Mach Learn Res       Date:  2020-07

5.  A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data.

Authors:  Jung-Yi Joyce Lin; Liangyuan Hu; Chuyue Huang; Ji Jiayi; Steven Lawrence; Usha Govindarajulu
Journal:  BMC Med Res Methodol       Date:  2022-05-04       Impact factor: 4.612

6.  Prediction of HIV Sensitivity to Monoclonal Antibodies Using Aminoacid Sequences and Deep Learning.

Authors:  Vlad-RareŞ Dănăilă; Cătălin Buiu
Journal:  Bioinformatics       Date:  2022-07-25       Impact factor: 6.931

7.  Interactions between staphylococcal enterotoxins A and D and superantigen-like proteins 1 and 5 for predicting methicillin and multidrug resistance profiles among Staphylococcus aureus ocular isolates.

Authors:  Min Lu; Jean-Marie Parel; Darlene Miller
Journal:  PLoS One       Date:  2021-07-28       Impact factor: 3.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.