Alan R Dabney1. 1. Department of Biostatistics, University of Washington, Seattle, 98195, USA. adabney@u.washington.edu
Abstract
MOTIVATION: Classification of biological samples by microarrays is a topic of much interest. A number of methods have been proposed and successfully applied to this problem. It has recently been shown that classification by nearest centroids provides an accurate predictor that may outperform much more complicated methods. The 'Prediction Analysis of Microarrays' (PAM) approach is one such example, which the authors strongly motivate by its simplicity and interpretability. In this spirit, I seek to assess the performance of classifiers simpler than even PAM. RESULTS: I surprisingly show that the modified t-statistics and shrunken centroids employed by PAM tend to increase misclassification error when compared with their simpler counterparts. Based on these observations, I propose a classification method called 'Classification to Nearest Centroids' (ClaNC). ClaNC ranks genes by standard t-statistics, does not shrink centroids and uses a class-specific gene-selection procedure. Because of these modifications, ClaNC is arguably simpler and easier to interpret than PAM, and it can be viewed as a traditional nearest centroid classifier that uses specially selected genes. I demonstrate that ClaNC error rates tend to be significantly less than those for PAM, for a given number of active genes. AVAILABILITY: Point-and-click software is freely available at http://students.washington.edu/adabney/clanc.
MOTIVATION: Classification of biological samples by microarrays is a topic of much interest. A number of methods have been proposed and successfully applied to this problem. It has recently been shown that classification by nearest centroids provides an accurate predictor that may outperform much more complicated methods. The 'Prediction Analysis of Microarrays' (PAM) approach is one such example, which the authors strongly motivate by its simplicity and interpretability. In this spirit, I seek to assess the performance of classifiers simpler than even PAM. RESULTS: I surprisingly show that the modified t-statistics and shrunken centroids employed by PAM tend to increase misclassification error when compared with their simpler counterparts. Based on these observations, I propose a classification method called 'Classification to Nearest Centroids' (ClaNC). ClaNC ranks genes by standard t-statistics, does not shrink centroids and uses a class-specific gene-selection procedure. Because of these modifications, ClaNC is arguably simpler and easier to interpret than PAM, and it can be viewed as a traditional nearest centroid classifier that uses specially selected genes. I demonstrate that ClaNC error rates tend to be significantly less than those for PAM, for a given number of active genes. AVAILABILITY: Point-and-click software is freely available at http://students.washington.edu/adabney/clanc.
Authors: Marion Haubitz; David M Good; Alexander Woywodt; Hermann Haller; Harald Rupprecht; Dan Theodorescu; Mohammed Dakna; Joshua J Coon; Harald Mischak Journal: Mol Cell Proteomics Date: 2009-06-28 Impact factor: 5.911
Authors: Lee A D Cooper; David A Gutman; Candace Chisolm; Christina Appin; Jun Kong; Yuan Rong; Tahsin Kurc; Erwin G Van Meir; Joel H Saltz; Carlos S Moreno; Daniel J Brat Journal: Am J Pathol Date: 2012-03-20 Impact factor: 4.307
Authors: Panagiotis A Konstantinopoulos; Dimitrios Spentzos; Beth Y Karlan; Toshiyasu Taniguchi; Elena Fountzilas; Nancy Francoeur; Douglas A Levine; Stephen A Cannistra Journal: J Clin Oncol Date: 2010-06-14 Impact factor: 44.544
Authors: Panagiotis A Konstantinopoulos; Elena Fountzilas; Jeffrey D Goldsmith; Manoj Bhasin; Kamana Pillay; Nancy Francoeur; Towia A Libermann; Mark C Gebhardt; Dimitrios Spentzos Journal: PLoS One Date: 2010-04-01 Impact factor: 3.240
Authors: Martin H van Vliet; Pia Burgmer; Linda de Quartel; Jaap P L Brand; Leonie C M de Best; Henk Viëtor; Bob Löwenberg; Peter J M Valk; Erik H van Beers Journal: Genet Test Mol Biomarkers Date: 2013-03-13
Authors: Jun S Wei; Peter Johansson; Qing-Rong Chen; Young K Song; Steffen Durinck; Xinyu Wen; Adam T C Cheuk; Malcolm A Smith; Peter Houghton; Christopher Morton; Javed Khan Journal: Clin Cancer Res Date: 2009-08-25 Impact factor: 12.531