| Literature DB >> 24561345 |
Wengang Zhou1, Julie A Dickerson2.
Abstract
Identifying key biomarkers for different cancer types can improve diagnosis accuracy and treatment. Gene expression data can help differentiate between cancer subtypes. However the limitation of having a small number of samples versus a larger number of genes represented in a dataset leads to the overfitting of classification models. Feature selection methods can help select the most distinguishing feature sets for classifying different cancers. A new class dependent feature selection approach integrates the F-statistic, Maximum Relevance Binary Particle Swarm Optimization (MRBPSO) and Class Dependent Multi-category Classification (CDMC) system. This feature selection method combines filter and wrapper based methods. A set of highly differentially expressed genes (features) are pre-selected using the F statistic for each dataset as a filter for selecting the most meaningful features. MRBPSO and CDMC function as a wrapper to select desirable feature subsets for each class and classify the samples using those chosen class-dependent feature subsets. The performance of the proposed methods is evaluated on eight real cancer datasets. The results indicate that the class-dependent approaches can effectively identify biomarkers related to each cancer type and improve classification accuracy compared to class independent feature selection methods.Entities:
Keywords: Binary particle swarm optimization; Cancer biomarker discovery; Class dependent multi-category classification; Feature selection; Support vector machine
Mesh:
Substances:
Year: 2014 PMID: 24561345 DOI: 10.1016/j.compbiomed.2014.01.014
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589