Literature DB >> 26475830

Fast clustering using adaptive density peak detection.

Xiao-Feng Wang1, Yifan Xu2.   

Abstract

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

Keywords:  Clustering; automatic intrinsic parameter selection; density peak; fast computation; multivariate kernel density estimation

Mesh:

Year:  2015        PMID: 26475830     DOI: 10.1177/0962280215609948

Source DB:  PubMed          Journal:  Stat Methods Med Res        ISSN: 0962-2802            Impact factor:   3.021


  5 in total

1.  SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.

Authors:  Ruth Huh; Yuchen Yang; Yuchao Jiang; Yin Shen; Yun Li
Journal:  Nucleic Acids Res       Date:  2020-01-10       Impact factor: 16.971

2.  Single-Cell RNA-Seq of Mouse Dopaminergic Neurons Informs Candidate Gene Selection for Sporadic Parkinson Disease.

Authors:  Paul W Hook; Sarah A McClymont; Gabrielle H Cannon; William D Law; A Jennifer Morton; Loyal A Goff; Andrew S McCallion
Journal:  Am J Hum Genet       Date:  2018-03-01       Impact factor: 11.025

3.  CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles.

Authors:  Sylvain Träger; Giorgio Tamò; Deniz Aydin; Giulia Fonti; Martina Audagnotto; Matteo Dal Peraro
Journal:  Bioinformatics       Date:  2021-05-17       Impact factor: 6.937

4.  Clusterdv: a simple density-based clustering method that is robust, general and automatic.

Authors:  João C Marques; Michael B Orger
Journal:  Bioinformatics       Date:  2019-06-01       Impact factor: 6.937

5.  LINEAGE: Label-free identification of endogenous informative single-cell mitochondrial RNA mutation for lineage analysis.

Authors:  Li Lin; Yufeng Zhang; Weizhou Qian; Yao Liu; Yingkun Zhang; Fanghe Lin; Cenxi Liu; Guangxing Lu; Di Sun; Xiaoxu Guo; YanLing Song; Jia Song; Chaoyong Yang; Jin Li
Journal:  Proc Natl Acad Sci U S A       Date:  2022-02-01       Impact factor: 12.779

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.