Literature DB >> 21523931

Classification of large microarray datasets using fast random forest construction.

Elena A Manilich1, Z Meral Özsoyoğlu, Valeriy Trubachev, Tomas Radivoyevitch.   

Abstract

Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

Mesh:

Year:  2011        PMID: 21523931     DOI: 10.1142/s021972001100546x

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  2 in total

1.  Use of random forest to estimate population attributable fractions from a case-control study of Salmonella enterica serotype Enteritidis infections.

Authors:  W Gu; A R Vieira; R M Hoekstra; P M Griffin; D Cole
Journal:  Epidemiol Infect       Date:  2015-02-12       Impact factor: 4.434

2.  Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes.

Authors:  Sharmistha Pal; Yingtao Bi; Luke Macyszyn; Louise C Showe; Donald M O'Rourke; Ramana V Davuluri
Journal:  Nucleic Acids Res       Date:  2014-02-06       Impact factor: 16.971

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.