Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Enriched random forests.

Literature DB >> 18650208

Enriched random forests.

Dhammika Amaratunga¹, Javier Cabrera, Yung-Seop Lee.

Abstract

Although the random forest classification procedure works well in datasets with many features, when the number of features is huge and the percentage of truly informative features is small, such as with DNA microarray data, its performance tends to decline significantly. In such instances, the procedure can be improved by reducing the contribution of trees whose nodes are populated by non-informative features. To some extent, this can be achieved by prefiltering, but we propose a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features. This results in an 'enriched random forest'. We illustrate the superior performance of this procedure in several actual microarray datasets.

Mesh：

Year: 2008 PMID： 18650208 DOI： 10.1093/bioinformatics/btn356

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

34 in total

Enriched random forests.

1. Maximal conditional chi-square importance in random forests.

2. Effect of Amino Acid Substitutions Within the V3 Region of HIV-1 CRF01_AE on Interaction with CCR5-Coreceptor.

Review 3. A Survey of Data Mining and Deep Learning in Bioinformatics.

4. Correction for population stratification in random forest analysis.

5. A Weighted Random Forests Approach to Improve Predictive Performance.

6. Definitions, methods, and applications in interpretable machine learning.

7. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

Review 8. Random forests for genomic data analysis.

9. Machine-Learning Algorithms Predict Graft Failure After Liver Transplantation.

10. Identification of protein functions using a machine-learning approach based on sequence-derived properties.