Literature DB >> 20421667

Parallel spectral clustering in distributed systems.

Wen-Yen Chen1, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, Edward Y Chang.   

Abstract

Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.

Mesh:

Year:  2011        PMID: 20421667     DOI: 10.1109/TPAMI.2010.88

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  13 in total

1.  Spectral clustering strategies for heterogeneous disease expression data.

Authors:  Grace T Huang; Kathryn I Cunningham; Panayiotis V Benos; Chakra S Chennubhotla
Journal:  Pac Symp Biocomput       Date:  2013

2.  SPEX2: automated concise extraction of spatial gene expression patterns from Fly embryo ISH images.

Authors:  Kriti Puniyani; Christos Faloutsos; Eric P Xing
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

3.  Sampling from Determinantal Point Processes for Scalable Manifold Learning.

Authors:  Christian Wachinger; Polina Golland
Journal:  Inf Process Med Imaging       Date:  2015

4.  A coevolutionary residue network at the site of a functionally important conformational change in a phosphohexomutase enzyme family.

Authors:  Yingying Lee; Jacob Mick; Cristina Furdui; Lesa J Beamer
Journal:  PLoS One       Date:  2012-06-07       Impact factor: 3.240

5.  Enhancing the usability and performance of structured association mapping algorithms using automation, parallelization, and visualization in the GenAMap software system.

Authors:  Ross E Curtis; Anuj Goyal; Eric P Xing
Journal:  BMC Genet       Date:  2012-04-03       Impact factor: 2.797

6.  Semi-supervised consensus clustering for gene expression data analysis.

Authors:  Yunli Wang; Youlian Pan
Journal:  BioData Min       Date:  2014-05-08       Impact factor: 2.522

7.  Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering.

Authors:  Lerato Lerato; Thomas Niesler
Journal:  PLoS One       Date:  2015-10-30       Impact factor: 3.240

8.  DeepAISE - An interpretable and recurrent neural survival model for early prediction of sepsis.

Authors:  Supreeth P Shashikumar; Christopher S Josef; Ashish Sharma; Shamim Nemati
Journal:  Artif Intell Med       Date:  2021-02-13       Impact factor: 5.326

9.  Evaluation of clustering and topic modeling methods over health-related tweets and emails.

Authors:  Juan Antonio Lossio-Ventura; Sergio Gonzales; Juandiego Morzan; Hugo Alatrista-Salas; Tina Hernandez-Boussard; Jiang Bian
Journal:  Artif Intell Med       Date:  2021-05-07       Impact factor: 7.011

10.  Parallel clustering algorithm for large-scale biological data sets.

Authors:  Minchao Wang; Wu Zhang; Wang Ding; Dongbo Dai; Huiran Zhang; Hao Xie; Luonan Chen; Yike Guo; Jiang Xie
Journal:  PLoS One       Date:  2014-04-04       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.