| Literature DB >> 25364221 |
Kean Ming Tan1, Daniela M Witten2.
Abstract
We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log likelihood. We apply an ℓ1 penalty to the means of the biclusters in order to obtain sparse and interpretable biclusters. Our proposal amounts to a sparse, symmetrized version of k-means clustering. We show that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of our proposal, and that a relaxation of our proposal yields the singular value decomposition. In addition, we propose a framework for bi-clustering based on the matrix-variate normal distribution. The performances of our proposals are demonstrated in a simulation study and on a gene expression data set. This article has supplementary material online.Entities:
Keywords: Clustering; Gene expression; Matrix-variate normal distribution; Unsupervised learning; ℓ1 penalty
Year: 2014 PMID: 25364221 PMCID: PMC4212513 DOI: 10.1080/10618600.2013.852554
Source DB: PubMed Journal: J Comput Graph Stat ISSN: 1061-8600 Impact factor: 2.302