Literature DB >> 30561510

On what to permute in test-based approaches for variable importance measures in Random Forests.

Stefano Nembrini1.   

Abstract

MOTIVATION: In bioinformatics applications, it is currently customary to permute the outcome variable in order to produce inference on covariates to test novel methods or statistics whose distributions are poorly known. The seminal publication of Altmann et al. in Bioinformatics uses the same permutation scheme to obtain P-values that can be treated as corrected measure of feature importance to rectify the bias of the Gini variable importance in Random Forests. Since then, such method has been used in applied work to also draw statistical conclusions on variable importance measures from resulting P-values.
RESULTS: In this paper, we show that permuting the outcome may produce unexpected results, including P-values with undesirable properties and illustrate how more refined permutation schemes can be appropriate to obtain desirable results, including high power in discovering relevant variables. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2019        PMID: 30561510     DOI: 10.1093/bioinformatics/bty1025

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  1 in total

1.  Blood identification at the single-cell level based on a combination of laser tweezers Raman spectroscopy and machine learning.

Authors:  Ziqi Wang; Yiming Liu; Weilai Lu; Yu Vincent Fu; Zhehai Zhou
Journal:  Biomed Opt Express       Date:  2021-11-12       Impact factor: 3.732

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.