| Literature DB >> 16180916 |
Abstract
In this paper, we perform diagnostic pattern recognition on a gene-expression profile data set by using one-class classification. Unlike conventional multiclass classifiers, the one-class (OC) classifier is built on one class only. For optimal performance, it accepts samples coming from the class used for training and rejects all samples from other classes. We evaluate six OC classifiers: the Gaussian model, Parzen windows, support vector data description (with two types of kernels: inner product and Gaussian), nearest neighbor data description, K-means, and PCA on three gene-expression profile data sets, those being an SRBCT data set, a Colon data set, and a Leukemia data set. Providing there is a good splitting of training and test samples and feature selection, most OC classifiers can produce high quality results. Parzen windows and support vector data description are "over-strict" in most cases, while nearest neighbor data description is "over-loose". Other classifiers are intermediate between these two extremes. The main difficulty for the OC classifier is it is difficult to obtain an optimum decision threshold if there are a limited number of training samples.Entities:
Mesh:
Year: 2005 PMID: 16180916 DOI: 10.1021/ci049726v
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956