Literature DB >> 16428263

Genetic test bed for feature selection.

Ashish Choudhary1, Marcel Brun, Jianping Hua, James Lowey, Ed Suh, Edward R Dougherty.   

Abstract

MOTIVATION: Given a large set of potential features, such as the set of all gene-expression values from a microarray, it is necessary to find a small subset with which to classify. The task of finding an optimal feature set of a given size is inherently combinatoric because to assure optimality all feature sets of a given size must be checked. Thus, numerous suboptimal feature-selection algorithms have been proposed. There are strong impediments to evaluate feature-selection algorithms using real data when data are limited, a common situation in genetic classification. The difficulty is compound. First, there are no class-conditional distributions from which to draw data points, only a single small labeled sample. Second, there are no test data with which to estimate the feature-set errors, and one must depend on a training-data-based error estimator. Finally, there is no optimal feature set with which to compare the feature sets found by the algorithms.
RESULTS: This paper describes a genetic test bed for the evaluation of feature-selection algorithms. It begins with a large biological feature-label dataset that is used as an empirical distribution and, using massively parallel computation, finds the top feature sets of various sizes based on a given sample size and classification rule. The user can draw random samples from the data, apply a proposed algorithm, and evaluate the proficiency of the proposed algorithm via three different measures (code provided). A key feature of the test bed is that, once a dataset is input, a single command creates the entire test bed relative to the dataset. The particular dataset used for the first version of the test bed comes from a microarray-based classification study that analyzes a large number of microarrays, prepared with RNA from breast tumor samples from each of 295 patients. AVAILABILITY: The software and supplementary material are available at http://public.tgen.org/tgen-cb/support/testbed/ CONTACT: edward@ece.tamu.edu.

Entities:  

Mesh:

Year:  2006        PMID: 16428263     DOI: 10.1093/bioinformatics/btl008

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  A hybrid BPSO-CGA approach for gene selection and classification of microarray data.

Authors:  Li-Yeh Chuang; Cheng-Huei Yang; Jung-Chike Li; Cheng-Hong Yang
Journal:  J Comput Biol       Date:  2011-01-06       Impact factor: 1.479

2.  Quantification of the impact of feature selection on the variance of cross-validation error estimation.

Authors:  Yufei Xiao; Jianping Hua; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

3.  Validation of computational methods in genomics.

Authors:  Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal:  Curr Genomics       Date:  2007-03       Impact factor: 2.236

4.  MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets.

Authors:  Bracken M King; Bruce Tidor
Journal:  Bioinformatics       Date:  2009-03-04       Impact factor: 6.937

5.  Which is better: holdout or full-sample classifier design?

Authors:  Marcel Brun; Qian Xu; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2008

6.  Performance of feature selection methods.

Authors:  Edward R Dougherty; Jianping Hua; Chao Sima
Journal:  Curr Genomics       Date:  2009-09       Impact factor: 2.236

7.  Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer.

Authors:  Jing Wang; Kim Anh Do; Sijin Wen; Spyros Tsavachidis; Timothy J McDonnell; Christopher J Logothetis; Kevin R Coombes
Journal:  Cancer Inform       Date:  2007-02-14

8.  An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

Authors:  Zena M Hira; George Trigeorgis; Duncan F Gillies
Journal:  PLoS One       Date:  2014-03-03       Impact factor: 3.240

9.  Gene selection for cancer classification with the help of bees.

Authors:  Johra Muhammad Moosa; Rameen Shakur; Mohammad Kaykobad; Mohammad Sohel Rahman
Journal:  BMC Med Genomics       Date:  2016-08-10       Impact factor: 3.063

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.