Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Permutation importance: a corrected feature importance measure.

Literature DB >> 20385727

Permutation importance: a corrected feature importance measure.

André Altmann¹, Laura Toloşi, Oliver Sander, Thomas Lengauer.

Abstract

MOTIVATION: In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred.
RESULTS: In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. The P-value of the observed importance provides a corrected measure of feature importance. We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies. We propose an improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models. AVAILABILITY: R code for the method presented in this article is available at http://www.mpi-inf.mpg.de/ approximately altmann/download/PIMP.R CONTACT: altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh：

Year: 2010 PMID： 20385727 DOI： 10.1093/bioinformatics/btq134

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

180 in total

1. Random forest classification of etiologies for an orphan disease.

Authors: Jaime Lynn Speiser; Valerie L Durkalski; William M Lee
Journal: Stat Med Date: 2014-11-03 Impact factor: 2.373

2. Power of data mining methods to detect genetic associations and interactions.

Authors: Annette M Molinaro; Nicholas Carriero; Robert Bjornson; Patricia Hartge; Nathaniel Rothman; Nilanjan Chatterjee
Journal: Hum Hered Date: 2011-09-17 Impact factor: 0.444

Review 3. Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors: Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal: Chem Rev Date: 2020-06-10 Impact factor: 60.622

4. Machine learning for modeling animal movement.

Authors: Dhanushi A Wijeyakulasuriya; Elizabeth W Eisenhauer; Benjamin A Shaby; Ephraim M Hanks
Journal: PLoS One Date: 2020-07-27 Impact factor: 3.240

5. Pseudo-pneumatosis of the gastrointestinal tract: its incidence and the accuracy of a checklist supported by artificial intelligence (AI) techniques to reduce the misinterpretation of pneumatosis.

Authors: Andrea Alessandro Esposito; Stefania Zannoni; Laura Castoldi; Caterina Giannitto; Emanuele Avola; Elena Casiraghi; Onofrio Catalano; Gianpaolo Carrafiello
Journal: Emerg Radiol Date: 2021-05-22

6. Factors Affecting Sentinel Node Metastasis in Thin (T1) Cutaneous Melanomas: Development and External Validation of a Predictive Nomogram.

Authors: Andrea Maurichi; Rosalba Miceli; Hanna Eriksson; Julia Newton-Bishop; Jérémie Nsengimana; May Chan; Andrew J Hayes; Kara Heelan; David Adams; Roberto Patuzzo; Francesco Barretta; Gianfranco Gallino; Catherine Harwood; Daniele Bergamaschi; Dorothy Bennett; Konstantinos Lasithiotakis; Paola Ghiorzo; Bruna Dalmasso; Ausilia Manganoni; Francesca Consoli; Ilaria Mattavelli; Consuelo Barbieri; Andrea Leva; Umberto Cortinovis; Vittoria Espeli; Cristina Mangas; Pietro Quaglino; Simone Ribero; Paolo Broganelli; Giovanni Pellacani; Caterina Longo; Corrado Del Forno; Lorenzo Borgognoni; Serena Sestini; Nicola Pimpinelli; Sara Fortunato; Alessandra Chiarugi; Paolo Nardini; Elena Morittu; Antonio Florita; Mara Cossa; Barbara Valeri; Massimo Milione; Giancarlo Pruneri; Odysseas Zoras; Andrea Anichini; Roberta Mortarini; Mario Santinami
Journal: J Clin Oncol Date: 2020-03-13 Impact factor: 44.544

7. Environmental factors drive language density more in food-producing than in hunter-gatherer populations.

Authors: Curdin Derungs; Martina Köhl; Robert Weibel; Balthasar Bickel
Journal: Proc Biol Sci Date: 2018-08-22 Impact factor: 5.349

8. Interpretation of machine learning predictions for patient outcomes in electronic health records.

Authors: William La Cava; Christopher Bauer; Jason H Moore; Sarah A Pendergrass
Journal: AMIA Annu Symp Proc Date: 2020-03-04

9. askMUSIC: Leveraging a Clinical Registry to Develop a New Machine Learning Model to Inform Patients of Prostate Cancer Treatments Chosen by Similar Men.

Authors: Gregory B Auffenberg; Khurshid R Ghani; Shreyas Ramani; Etiowo Usoro; Brian Denton; Craig Rogers; Benjamin Stockton; David C Miller; Karandeep Singh
Journal: Eur Urol Date: 2018-10-11 Impact factor: 20.096

10. Early Detection of Heart Failure With Reduced Ejection Fraction Using Perioperative Data Among Noncardiac Surgical Patients: A Machine-Learning Approach.

Authors: Michael R Mathis; Milo C Engoren; Hyeon Joo; Michael D Maile; Keith D Aaronson; Michael L Burns; Michael W Sjoding; Nicholas J Douville; Allison M Janda; Yaokun Hu; Kayvan Najarian; Sachin Kheterpal
Journal: Anesth Analg Date: 2020-05 Impact factor: 5.108