Literature DB >> 18650208

Enriched random forests.

Dhammika Amaratunga1, Javier Cabrera, Yung-Seop Lee.   

Abstract

Although the random forest classification procedure works well in datasets with many features, when the number of features is huge and the percentage of truly informative features is small, such as with DNA microarray data, its performance tends to decline significantly. In such instances, the procedure can be improved by reducing the contribution of trees whose nodes are populated by non-informative features. To some extent, this can be achieved by prefiltering, but we propose a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features. This results in an 'enriched random forest'. We illustrate the superior performance of this procedure in several actual microarray datasets.

Mesh:

Year:  2008        PMID: 18650208     DOI: 10.1093/bioinformatics/btn356

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  34 in total

1.  Maximal conditional chi-square importance in random forests.

Authors:  Minghui Wang; Xiang Chen; Heping Zhang
Journal:  Bioinformatics       Date:  2010-02-03       Impact factor: 6.937

2.  Effect of Amino Acid Substitutions Within the V3 Region of HIV-1 CRF01_AE on Interaction with CCR5-Coreceptor.

Authors:  Sayamon Hongjaisee; Martine Braibant; Francis Barin; Nicole Ngo-Giang-Huong; Wasna Sirirungsi; Tanawan Samleerat
Journal:  AIDS Res Hum Retroviruses       Date:  2017-06-12       Impact factor: 2.205

Review 3.  A Survey of Data Mining and Deep Learning in Bioinformatics.

Authors:  Kun Lan; Dan-Tong Wang; Simon Fong; Lian-Sheng Liu; Kelvin K L Wong; Nilanjan Dey
Journal:  J Med Syst       Date:  2018-06-28       Impact factor: 4.460

4.  Correction for population stratification in random forest analysis.

Authors:  Yang Zhao; Feng Chen; Rihong Zhai; Xihong Lin; Zhaoxi Wang; Li Su; David C Christiani
Journal:  Int J Epidemiol       Date:  2012-11-12       Impact factor: 7.196

5.  A Weighted Random Forests Approach to Improve Predictive Performance.

Authors:  Stacey J Winham; Robert R Freimuth; Joanna M Biernacka
Journal:  Stat Anal Data Min       Date:  2013-12-01       Impact factor: 1.051

6.  Definitions, methods, and applications in interpretable machine learning.

Authors:  W James Murdoch; Chandan Singh; Karl Kumbier; Reza Abbasi-Asl; Bin Yu
Journal:  Proc Natl Acad Sci U S A       Date:  2019-10-16       Impact factor: 11.205

7.  Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

Authors:  Thanh-Tung Nguyen; Joshua Huang; Qingyao Wu; Thuy Nguyen; Mark Li
Journal:  BMC Genomics       Date:  2015-01-21       Impact factor: 3.969

Review 8.  Random forests for genomic data analysis.

Authors:  Xi Chen; Hemant Ishwaran
Journal:  Genomics       Date:  2012-04-21       Impact factor: 5.736

9.  Machine-Learning Algorithms Predict Graft Failure After Liver Transplantation.

Authors:  Lawrence Lau; Yamuna Kankanige; Benjamin Rubinstein; Robert Jones; Christopher Christophi; Vijayaragavan Muralidharan; James Bailey
Journal:  Transplantation       Date:  2017-04       Impact factor: 4.939

10.  Identification of protein functions using a machine-learning approach based on sequence-derived properties.

Authors:  Bum Ju Lee; Moon Sun Shin; Young Joon Oh; Hae Seok Oh; Keun Ho Ryu
Journal:  Proteome Sci       Date:  2009-08-09       Impact factor: 2.480

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.