Literature DB >> 22915836

CRD: Fast Co-clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition.

Feng Pan1, Xiang Zhang, Wei Wang.   

Abstract

The problem of simultaneously clustering columns and rows (co-clustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, co-clustering algorithms have been shown to be more effective in discovering hidden clustering structures in the data matrix. The complexity of previous co-clustering algorithms is usually O(m × n), where m and n are the numbers of rows and columns in the data matrix respectively. This limits their applicability to data matrices involving a large number of columns and rows. Moreover, some huge datasets can not be entirely held in main memory during co-clustering which violates the assumption made by the previous algorithms. In this paper, we propose a general framework for fast co-clustering large datasets, CRD. By utilizing recently developed sampling-based matrix decomposition methods, CRD achieves an execution time linear in m and n. Also, CRD does not require the whole data matrix be in the main memory. We conducted extensive experiments on both real and synthetic data. Compared with previous co-clustering algorithms, CRD achieves competitive accuracy but with much less computational cost.

Entities:  

Year:  2008        PMID: 22915836      PMCID: PMC3422895     

Source DB:  PubMed          Journal:  Proc Int Conf Data Eng        ISSN: 1084-4627


  2 in total

1.  Learning the parts of objects by non-negative matrix factorization.

Authors:  D D Lee; H S Seung
Journal:  Nature       Date:  1999-10-21       Impact factor: 49.962

2.  Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Authors:  Georges Natsoulis; Laurent El Ghaoui; Gert R G Lanckriet; Alexander M Tolley; Fabrice Leroy; Shane Dunlea; Barrett P Eynon; Cecelia I Pearson; Stuart Tugendreich; Kurt Jarnagin
Journal:  Genome Res       Date:  2005-05       Impact factor: 9.043

  2 in total
  1 in total

1.  Metro maps of plant disease dynamics--automated mining of differences using hyperspectral images.

Authors:  Mirwaes Wahabzada; Anne-Katrin Mahlein; Christian Bauckhage; Ulrike Steiner; Erich-Christian Oerke; Kristian Kersting
Journal:  PLoS One       Date:  2015-01-26       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.