David R Bickel1. 1. Medical College of Georgia, Office of Biostatistics and Bioinformatics, 1120 Fifteenth St, AE-3037 Augusta 30912-4900, USA. dbickel@mail.mcg.edu
Abstract
MOTIVATION: The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions. RESULTS: A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to a public data set, showing that rank-based methods perform better than log-based methods. AVAILABILITY: Software is available from http://www.davidbickel.com.
MOTIVATION: The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions. RESULTS: A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to a public data set, showing that rank-based methods perform better than log-based methods. AVAILABILITY: Software is available from http://www.davidbickel.com.
Authors: Zheng Yin; Xiaobo Zhou; Chris Bakal; Fuhai Li; Youxian Sun; Norbert Perrimon; Stephen T C Wong Journal: BMC Bioinformatics Date: 2008-06-05 Impact factor: 3.169