Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Gene Selection with Sequential Classification and Regression Tree Algorithm.

Literature DB >> 25364211

Gene Selection with Sequential Classification and Regression Tree Algorithm.

Abstract

BACKGROUND: In the typical setting of gene-selection problems from high-dimensional data, e.g., gene expression data from microarray or next-generation sequencing-based technologies, an enormous volume of high-throughput data is generated, and there is often a need for a simple, computationally-inexpensive, non-parametric screening procedure than can quickly and accurately find a low-dimensional variable subset that preserves biological information from the original very high-dimensional data (dimension p > 40,000). This is in contrast to the very sophisticated variable selection methods that are computationally expensive, need pre-processing routines, and often require calibration of priors.
RESULTS: We present a tree-based sequential CART (S-CART) approach to variable selection in the binary classification setting and compare it against the more sophisticated procedures using simulated and real biological data. In simulated data, we analyze S-CART performance versus (i) a random forest (RF), (ii) a fully-parametric Bayesian stochastic search variable selection (SSVS), and (iii) the moderated t-test statistic from the LIMMA package in R. The simulation study is based on a hierarchical Bayesian model, where dataset dimensionality, percentage of significant variables, and substructure via dependency vary. Selection efficacy is measured through false-discovery and missed-discovery rates. In all scenarios, the S-CART method is seen to consistently outperform SSVS and RF in both speed and detection accuracy. We demonstrate the utility of the S-CART technique both on simulated data and in a control-treatment mouse study. We show that the network analysis based on the S-CART-selected gene subset in essence recapitulates the biological findings of the study using only a fraction of the original set of genes considered in the study's analysis.
CONCLUSIONS: The relatively simple-minded gene selection algorithms like S-CART may often in practical circumstances be preferred over much more sophisticated ones. The advantage of the "greedy" selection methods utilized by S-CART and the likes is that they scale well with the problem size and require virtually no tuning or training while remaining efficient in extracting the relevant information from microarray-like datasets containing large number of redundant or irrelevant variables. AVAILABILITY: The MATLAB 7.4b code for the S-CART implementation is available for download from https://neyman.mcg.edu/posts/scart.zip.

Entities: Chemical Disease Gene Species

Year: 2011 PMID： 25364211 PMCID： PMC4214923

Source DB: PubMed Journal: Biostat Bioinforma Biomath ISSN： 0976-1594

Keyword Cloud
References

8 in total

1. Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage.

Authors: Naijun Sha; Marina Vannucci; Mahlet G Tadesse; Philip J Brown; Ilaria Dragoni; Nick Davies; Tracy C Roberts; Andrea Contestabile; Mike Salmon; Chris Buckley; Francesco Falciani
Journal: Biometrics Date: 2004-09 Impact factor: 2.571

2. Revealing strengths and weaknesses of methods for gene network inference.

Authors: Daniel Marbach; Robert J Prill; Thomas Schaffter; Claudio Mattiussi; Dario Floreano; Gustavo Stolovitzky
Journal: Proc Natl Acad Sci U S A Date: 2010-03-22 Impact factor: 11.205

Review 3. Applications of DNA microarrays in biology.

Authors: Roland B Stoughton
Journal: Annu Rev Biochem Date: 2005 Impact factor: 23.643

4. Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

Authors: Gordon K Smyth
Journal: Stat Appl Genet Mol Biol Date: 2004-02-12

5. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest.

Authors: Somnath Datta
Journal: Stat Appl Genet Mol Biol Date: 2008-02-19

6. Stochastic search gene suggestion: a Bayesian hierarchical model for gene mapping.

Authors: Michael D Swartz; Marek Kimmel; Peter Mueller; Christopher I Amos
Journal: Biometrics Date: 2006-06 Impact factor: 2.571

7. Creatine improves health and survival of mice.

Authors: A Bender; J Beckers; I Schneider; S M Hölter; T Haack; T Ruthsatz; D M Vogt-Weisenhorn; L Becker; J Genius; D Rujescu; M Irmler; T Mijalski; M Mader; L Quintanilla-Martinez; H Fuchs; V Gailus-Durner; M Hrabé de Angelis; W Wurst; J Schmidt; T Klopstock
Journal: Neurobiol Aging Date: 2007-04-09 Impact factor: 4.673

8. The null distribution of stochastic search gene suggestion: a Bayesian approach to gene mapping.

Authors: Michael D Swartz; Sanjay Shete
Journal: BMC Proc Date: 2007-12-18

8 in total