Literature DB >> 20160846

A Scalable Framework For Cluster Ensembles.

Prodip Hore1, Lawrence O Hall, Dmitry B Goldgof.   

Abstract

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.

Entities:  

Year:  2009        PMID: 20160846      PMCID: PMC2654620          DOI: 10.1016/j.patcog.2008.09.027

Source DB:  PubMed          Journal:  Pattern Recognit        ISSN: 0031-3203            Impact factor:   7.740


  7 in total

1.  Scalable model-based clustering for large databases based on data summarization.

Authors:  Huidong Jin; Man-Leung Wong; K S Leung
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-11       Impact factor: 6.226

2.  Combining multiple clusterings using evidence accumulation.

Authors:  Ana L N Fred; Anil K Jain
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-06       Impact factor: 6.226

3.  Clustering ensembles: models of consensus and weak partitions.

Authors:  Alexander Topchy; Anil K Jain; William Punch
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-12       Impact factor: 6.226

4.  Online clustering algorithms for radar emitter classification.

Authors:  Jun Liu; Jim P Y Lee; Lingjie Li; Zhi-Quan Luo; K Max Wong
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-08       Impact factor: 6.226

5.  On weighting clustering.

Authors:  Richard Nock; Frank Nielsen
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2006-08       Impact factor: 6.226

6.  Complexity reduction for "large image" processing.

Authors:  N R Pal; J C Bezdek
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  2002

7.  Bagging to improve the accuracy of a clustering procedure.

Authors:  Sandrine Dudoit; Jane Fridlyand
Journal:  Bioinformatics       Date:  2003-06-12       Impact factor: 6.937

  7 in total
  2 in total

1.  A Sparsification Approach for Temporal Graphical Model Decomposition.

Authors:  Ning Ruan; Ruoming Jin; Victor E Lee; Kun Huang
Journal:  Proc IEEE Int Conf Data Min       Date:  2009-12-06

2.  Scalable analysis of Big pathology image data cohorts using efficient methods and high-performance computing strategies.

Authors:  Tahsin Kurc; Xin Qi; Daihou Wang; Fusheng Wang; George Teodoro; Lee Cooper; Michael Nalisnik; Lin Yang; Joel Saltz; David J Foran
Journal:  BMC Bioinformatics       Date:  2015-12-01       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.