Literature DB >> 16908500

The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms.

Xin Zhou1, K Z Mao.   

Abstract

MOTIVATION: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem.
RESULTS: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised the absolute value of w2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators. AVAILABILITY: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/ The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments.

Entities:  

Mesh:

Year:  2006        PMID: 16908500     DOI: 10.1093/bioinformatics/btl438

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  5 in total

1.  Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors:  Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

2.  Validation of computational methods in genomics.

Authors:  Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal:  Curr Genomics       Date:  2007-03       Impact factor: 2.236

3.  Predicting antitumor activity of peptides by consensus of regression models trained on a small data sample.

Authors:  Andreja Radman; Matija Gredičak; Ivica Kopriva; Ivanka Jerić
Journal:  Int J Mol Sci       Date:  2011-11-29       Impact factor: 5.923

4.  Analyzing kernel matrices for the identification of differentially expressed genes.

Authors:  Xiao-Lei Xia; Huanlai Xing; Xueqin Liu
Journal:  PLoS One       Date:  2013-12-09       Impact factor: 3.240

5.  Effective classification and gene expression profiling for the Facioscapulohumeral Muscular Dystrophy.

Authors:  Félix F González-Navarro; Lluís A Belanche-Muñoz; Karen A Silva-Colón
Journal:  PLoS One       Date:  2013-12-13       Impact factor: 3.240

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.