Literature DB >> 29856755

Efficient similarity-based data clustering by optimal object to cluster reallocation.

Mathias Rossignol1, Mathieu Lagrange2, Arshia Cont1.   

Abstract

We present an iterative flat hard clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs a maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach permits the relaxing of some conditions on usable affinity matrices like semi-positiveness, as well as opening possibilities for computational optimization required for large datasets. Systematic evaluation on a variety of data sets shows that compared with kernel k-means and the spectral clustering methods, the proposed approach gives equivalent or better performance, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections. Material enabling the reproducibility of the results is made available online.

Entities:  

Mesh:

Year:  2018        PMID: 29856755      PMCID: PMC5983489          DOI: 10.1371/journal.pone.0197450

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  3 in total

1.  Evaluation and comparison of gene clustering methods in microarray analysis.

Authors:  Anbupalam Thalamuthu; Indranil Mukhopadhyay; Xiaojing Zheng; George C Tseng
Journal:  Bioinformatics       Date:  2006-07-31       Impact factor: 6.937

2.  Mercer kernel-based clustering in feature space.

Authors:  M Girolami
Journal:  IEEE Trans Neural Netw       Date:  2002

3.  Weighted graph cuts without eigenvectors a multilevel approach.

Authors:  Inderjit S Dhillon; Yuqiang Guan; Brian Kulis
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2007-11       Impact factor: 6.226

  3 in total
  1 in total

1.  Blind method for discovering number of clusters in multidimensional datasets by regression on linkage hierarchies generated from random data.

Authors:  Osbert C Zalay
Journal:  PLoS One       Date:  2020-01-23       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.