Literature DB >> 16870934

What should be expected from feature selection in small-sample settings.

Chao Sima1, Edward R Dougherty.   

Abstract

MOTIVATION: High-throughput technologies for rapid measurement of vast numbers of biological variables offer the potential for highly discriminatory diagnosis and prognosis; however, high dimensionality together with small samples creates the need for feature selection, while at the same time making feature-selection algorithms less reliable. Feature selection must typically be carried out from among thousands of gene-expression features and in the context of a small sample (small number of microarrays). Two basic questions arise: (1) Can one expect feature selection to yield a feature set whose error is close to that of an optimal feature set? (2) If a good feature set is not found, should it be expected that good feature sets do not exist?
RESULTS: The two questions translate quantitatively into questions concerning conditional expectation. (1) Given the error of an optimal feature set, what is the conditionally expected error of the selected feature set? (2) Given the error of the selected feature set, what is the conditionally expected error of the optimal feature set? We address these questions using three classification rules (linear discriminant analysis, linear support vector machine and k-nearest-neighbor classification) and feature selection via sequential floating forward search and the t-test. We consider three feature-label models and patient data from a study concerning survival prognosis for breast cancer. With regard to the two focus questions, there is similarity across all experiments: (1) One cannot expect to find a feature set whose error is close to optimal, and (2) the inability to find a good feature set should not lead to the conclusion that good feature sets do not exist. In practice, the latter conclusion may be more immediately relevant, since when faced with the common occurrence that a feature set discovered from the data does not give satisfactory results, the experimenter can draw no conclusions regarding the existence or nonexistence of suitable feature sets. AVAILABILITY: http://ee.tamu.edu/~edward/feature_regression/

Entities:  

Mesh:

Year:  2006        PMID: 16870934     DOI: 10.1093/bioinformatics/btl407

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors:  Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

2.  Validation of computational methods in genomics.

Authors:  Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal:  Curr Genomics       Date:  2007-03       Impact factor: 2.236

3.  An efficient method to identify differentially expressed genes in microarray experiments.

Authors:  Huaizhen Qin; Tao Feng; Scott A Harding; Chung-Jui Tsai; Shuanglin Zhang
Journal:  Bioinformatics       Date:  2008-05-03       Impact factor: 6.937

4.  Effect of finite sample size on feature selection and classification: a simulation study.

Authors:  Ted W Way; Berkman Sahiner; Lubomir M Hadjiiski; Heang-Ping Chan
Journal:  Med Phys       Date:  2010-02       Impact factor: 4.071

5.  High-dimensional bolstered error estimation.

Authors:  Chao Sima; Ulisses M Braga-Neto; Edward R Dougherty
Journal:  Bioinformatics       Date:  2011-09-13       Impact factor: 6.937

6.  Comparative metabolomics reveals the mechanism of avermectin production enhancement by S-adenosylmethionine.

Authors:  Pingping Tian; Peng Cao; Dong Hu; Depei Wang; Jian Zhang; Lin Wang; Yan Zhu; Qiang Gao
Journal:  J Ind Microbiol Biotechnol       Date:  2016-12-17       Impact factor: 3.346

7.  Noninvasive stool-based detection of infant gastrointestinal development using gene expression profiles from exfoliated epithelial cells.

Authors:  Robert S Chapkin; Chen Zhao; Ivan Ivanov; Laurie A Davidson; Jennifer S Goldsby; Joanne R Lupton; Rose Ann Mathai; Marcia H Monaco; Deshanie Rai; W Michael Russell; Sharon M Donovan; Edward R Dougherty
Journal:  Am J Physiol Gastrointest Liver Physiol       Date:  2010-03-04       Impact factor: 4.052

8.  Performance of feature selection methods.

Authors:  Edward R Dougherty; Jianping Hua; Chao Sima
Journal:  Curr Genomics       Date:  2009-09       Impact factor: 2.236

9.  Characterization of the effectiveness of reporting lists of small feature sets relative to the accuracy of the prior biological knowledge.

Authors:  Chen Zhao; Michael L Bittner; Robert S Chapkin; Edward R Dougherty
Journal:  Cancer Inform       Date:  2010-03-18

10.  Effective feature selection framework for cluster analysis of microarray data.

Authors:  Gouchol Pok; Jyh-Charn Steve Liu; Keun Ho Ryu
Journal:  Bioinformation       Date:  2010-02-28
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.