Literature DB >> 16285371

Scalable model-based clustering for large databases based on data summarization.

Huidong Jin1, Man-Leung Wong, K S Leung.   

Abstract

The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources such as memory and computation time. In this paper, two scalable clustering algorithms, bEMADS and gEMADS, are presented based on the Gaussian mixture model. Both summarize data into subclusters and then generate Gaussian mixtures from their data summaries. Their core algorithm, EMADS, is defined on data summaries and approximates the aggregate behavior of each subcluster of data under the Gaussian mixture model. EMADS is provably convergent. Experimental results substantiate that both algorithms can run several orders of magnitude faster than expectation-maximization with little loss of accuracy.

Mesh:

Year:  2005        PMID: 16285371     DOI: 10.1109/TPAMI.2005.226

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  3 in total

1.  A Scalable Framework For Cluster Ensembles.

Authors:  Prodip Hore; Lawrence O Hall; Dmitry B Goldgof
Journal:  Pattern Recognit       Date:  2009-05       Impact factor: 7.740

2.  Similarity measure and domain adaptation in multiple mixture model clustering: An application to image processing.

Authors:  Siow Hoo Leong; Seng Huat Ong
Journal:  PLoS One       Date:  2017-07-07       Impact factor: 3.240

3.  Compatibility Evaluation of Clustering Algorithms for Contemporary Extracellular Neural Spike Sorting.

Authors:  Rakesh Veerabhadrappa; Masood Ul Hassan; James Zhang; Asim Bhatti
Journal:  Front Syst Neurosci       Date:  2020-06-30
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.