| Literature DB >> 25549938 |
Jin Cao1, Li Zhang2, Bangjun Wang1, Fanzhang Li1, Jiwen Yang2.
Abstract
For cancer classification problems based on gene expression, the data usually has only a few dozen sizes but has thousands to tens of thousands of genes which could contain a large number of irrelevant genes. A robust feature selection algorithm is required to remove irrelevant genes and choose the informative ones. Support vector data description (SVDD) has been applied to gene selection for many years. However, SVDD cannot address the problems with multiple classes since it only considers the target class. In addition, it is time-consuming when applying SVDD to gene selection. This paper proposes a novel fast feature selection method based on multiple SVDD and applies it to multi-class microarray data. A recursive feature elimination (RFE) scheme is introduced to iteratively remove irrelevant features, so the proposed method is called multiple SVDD-RFE (MSVDD-RFE). To make full use of all classes for a given task, MSVDD-RFE independently selects a relevant gene subset for each class. The final selected gene subset is the union of these relevant gene subsets. The effectiveness and accuracy of MSVDD-RFE are validated by experiments on five publicly available microarray datasets. Our proposed method is faster and more effective than other methods.Entities:
Keywords: Gene expression data; Gene selection; Multi-class classification; Support vector data description; Support vector machine
Mesh:
Year: 2014 PMID: 25549938 DOI: 10.1016/j.jbi.2014.12.009
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317