Literature DB >> 20419039

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition.

Feng Pan1, Xiang Zhang, Wei Wang.   

Abstract

Simultaneously clustering columns and rows (co-clustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of m × n (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed sampling-based matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Extensive experimental results on synthetic and several well-known real-life datasets show that CRD achieves competitive accuracy to existing co-clustering methods but with much less computational cost.

Entities:  

Year:  2008        PMID: 20419039      PMCID: PMC2858408          DOI: 10.1109/ICDE.2008.4497548

Source DB:  PubMed          Journal:  Proc ACM SIGMOD Int Conf Manag Data        ISSN: 0730-8078


  2 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

2.  Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Authors:  Georges Natsoulis; Laurent El Ghaoui; Gert R G Lanckriet; Alexander M Tolley; Fabrice Leroy; Shane Dunlea; Barrett P Eynon; Cecelia I Pearson; Stuart Tugendreich; Kurt Jarnagin
Journal:  Genome Res       Date:  2005-05       Impact factor: 9.043

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.