Literature DB >> 26441434

A Hybrid Approach to Clustering in Big Data.

Dheeraj Kumar, James C Bezdek, Marimuthu Palaniswami, Sutharshan Rajasegarar, Christopher Leckie, Timothy Craig Havens.   

Abstract

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k -means model. Specifically, we use k -means, single pass k -means, online k -means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k -means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

Year:  2015        PMID: 26441434     DOI: 10.1109/TCYB.2015.2477416

Source DB:  PubMed          Journal:  IEEE Trans Cybern        ISSN: 2168-2267            Impact factor:   11.448


  3 in total

1.  Clustering Algorithm in English Language Learning Pattern Matching under Big Data Framework.

Authors:  Liying Zheng
Journal:  Comput Intell Neurosci       Date:  2022-09-06

2.  Cluster tendency assessment in neuronal spike data.

Authors:  Sara Mahallati; James C Bezdek; Milos R Popovic; Taufik A Valiante
Journal:  PLoS One       Date:  2019-11-12       Impact factor: 3.240

3.  A novel sampling-based visual topic models with computational intelligence for big social health data clustering.

Authors:  K Narasimhulu; K T Meena Abarna; B Siva Kumar; T Suresh
Journal:  J Supercomput       Date:  2022-01-19       Impact factor: 2.557

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.