Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 CRD: Fast Co-clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition.

Literature DB >> 22915836

CRD: Fast Co-clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition.

Abstract

The problem of simultaneously clustering columns and rows (co-clustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, co-clustering algorithms have been shown to be more effective in discovering hidden clustering structures in the data matrix. The complexity of previous co-clustering algorithms is usually O(m × n), where m and n are the numbers of rows and columns in the data matrix respectively. This limits their applicability to data matrices involving a large number of columns and rows. Moreover, some huge datasets can not be entirely held in main memory during co-clustering which violates the assumption made by the previous algorithms. In this paper, we propose a general framework for fast co-clustering large datasets, CRD. By utilizing recently developed sampling-based matrix decomposition methods, CRD achieves an execution time linear in m and n. Also, CRD does not require the whole data matrix be in the main memory. We conducted extensive experiments on both real and synthetic data. Compared with previous co-clustering algorithms, CRD achieves competitive accuracy but with much less computational cost.

Entities: Gene

Year: 2008 PMID： 22915836 PMCID： PMC3422895

Source DB: PubMed Journal: Proc Int Conf Data Eng ISSN： 1084-4627

2 in total

1. Learning the parts of objects by non-negative matrix factorization.

Authors: D D Lee; H S Seung
Journal: Nature Date: 1999-10-21 Impact factor: 49.962

2. Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Authors: Georges Natsoulis; Laurent El Ghaoui; Gert R G Lanckriet; Alexander M Tolley; Fabrice Leroy; Shane Dunlea; Barrett P Eynon; Cecelia I Pearson; Stuart Tugendreich; Kurt Jarnagin
Journal: Genome Res Date: 2005-05 Impact factor: 9.043

2 in total

1 in total

1. Metro maps of plant disease dynamics--automated mining of differences using hyperspectral images.

Authors: Mirwaes Wahabzada; Anne-Katrin Mahlein; Christian Bauckhage; Ulrike Steiner; Erich-Christian Oerke; Kristian Kersting
Journal: PLoS One Date: 2015-01-26 Impact factor: 3.240

1 in total