| Literature DB >> 18620558 |
Carolin Strobl1, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, Achim Zeileis.
Abstract
BACKGROUND: Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables.Entities:
Mesh:
Year: 2008 PMID: 18620558 PMCID: PMC2491635 DOI: 10.1186/1471-2105-9-307
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation design. Regression coefficients of the data generating process.
| ⋯ | ||||||||||
| 5 | 5 | 2 | 0 | -5 | -5 | -2 | 0 | ⋯ | 0 |
Figure 1Selection rates. Relative selection rates for twelve variables in the first splits (left) and in all splits (right) of all trees in random forests built with different values for mtry.
Figure 2Permutation scheme for the original marginal (left) and for the newly suggested conditional (right) permutation importance.
Figure 3Permutation importance. Median permutation importance for marginal (dashed) and conditional (solid) permutation scheme along with inter-quartile range. Note that the ordering of variables in the plot is arbitrary.
Figure 4Example: peptide-binding data. Marginal (top) and conditional (bottom) permutation importance of 104 predictors of peptide-binding.