Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Optimal number of features as a function of sample size for various classification rules.

Literature DB >> 15572470

Optimal number of features as a function of sample size for various classification rules.

Jianping Hua¹, Zixiang Xiong, James Lowey, Edward Suh, Edward R Dougherty.

Abstract

MOTIVATION: Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features.
RESULTS: Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. AVAILABILITY: For the companion website, please visit http://public.tgen.org/tamu/ofs/ CONTACT: e-dougherty@ee.tamu.edu.

Entities: Disease Species

Mesh：

Substances：
DNA

Year: 2004 PMID： 15572470 DOI： 10.1093/bioinformatics/bti171

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

75 in total

1. A hybrid BPSO-CGA approach for gene selection and classification of microarray data.

Authors: Li-Yeh Chuang; Cheng-Huei Yang; Jung-Chike Li; Cheng-Hong Yang
Journal: J Comput Biol Date: 2011-01-06 Impact factor: 1.479

2. Using Low-Frequency Oscillations to Detect Temporal Lobe Epilepsy with Machine Learning.

Authors: Gyujoon Hwang; Veena A Nair; Jed Mathis; Cole J Cook; Rosaleena Mohanty; Gengyan Zhao; Neelima Tellapragada; Candida Ustine; Onyekachi O Nwoke; Charlene Rivera-Bonet; Megan Rozman; Linda Allen; Courtney Forseth; Dace N Almane; Peter Kraegel; Andrew Nencka; Elizabeth Felton; Aaron F Struck; Rasmus Birn; Rama Maganti; Lisa L Conant; Colin J Humphries; Bruce Hermann; Manoj Raghavan; Edgar A DeYoe; Jeffrey R Binder; Elizabeth Meyerand; Vivek Prabhakaran
Journal: Brain Connect Date: 2019-03

3. Radiomics robustness assessment and classification evaluation: A two-stage method demonstrated on multivendor FFDM.

Authors: Kayla Robinson; Hui Li; Li Lan; David Schacht; Maryellen Giger
Journal: Med Phys Date: 2019-03-12 Impact factor: 4.071

4. Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors: Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal: EURASIP J Bioinform Syst Biol Date: 2007

5. Validation of computational methods in genomics.

Authors: Edward R Doughtery; Hua Jianping; Michael L Bittner
Journal: Curr Genomics Date: 2007-03 Impact factor: 2.236

6. High-dimensional bolstered error estimation.

Authors: Chao Sima; Ulisses M Braga-Neto; Edward R Dougherty
Journal: Bioinformatics Date: 2011-09-13 Impact factor: 6.937

7. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection.

Authors: Xing Song; Lemuel R Waitman; Yong Hu; Alan S L Yu; David C Robbins; Mei Liu
Journal: J Am Med Inform Assoc Date: 2019-03-01 Impact factor: 4.497

8. Prediction of high proliferative index in pituitary macroadenomas using MRI-based radiomics and machine learning.

Authors: Lorenzo Ugga; Renato Cuocolo; Domenico Solari; Elia Guadagno; Alessandra D'Amico; Teresa Somma; Paolo Cappabianca; Maria Laura Del Basso de Caro; Luigi Maria Cavallo; Arturo Brunetti
Journal: Neuroradiology Date: 2019-08-02 Impact factor: 2.804

9. Transfer Learning From Convolutional Neural Networks for Computer-Aided Diagnosis: A Comparison of Digital Breast Tomosynthesis and Full-Field Digital Mammography.

Authors: Kayla Mendel; Hui Li; Deepa Sheth; Maryellen Giger
Journal: Acad Radiol Date: 2018-08-01 Impact factor: 3.173

10. Performance of feature selection methods.

Authors: Edward R Dougherty; Jianping Hua; Chao Sima
Journal: Curr Genomics Date: 2009-09 Impact factor: 2.236