Literature DB >> 19407357

Parallel clustering algorithm for large data sets with applications in bioinformatics.

Victor Olman1, Fenglou Mao, Hongwei Wu, Ying Xu.   

Abstract

Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.

Mesh:

Year:  2009        PMID: 19407357     DOI: 10.1109/TCBB.2007.70272

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  4 in total

1.  HAMSTER: visualizing microarray experiments as a set of minimum spanning trees.

Authors:  Raymond Wan; Larisa Kiseleva; Hajime Harada; Hiroshi Mamitsuka; Paul Horton
Journal:  Source Code Biol Med       Date:  2009-11-20

2.  An improved fuzzy c-means clustering algorithm based on shadowed sets and PSO.

Authors:  Jian Zhang; Ling Shen
Journal:  Comput Intell Neurosci       Date:  2014-11-12

3.  Hybrid Fuzzy Clustering Method Based on FCM and Enhanced Logarithmical PSO (ELPSO).

Authors:  Jian Zhang; Zongheng Ma
Journal:  Comput Intell Neurosci       Date:  2020-03-18

4.  Barcodes for genomes and applications.

Authors:  Fengfeng Zhou; Victor Olman; Ying Xu
Journal:  BMC Bioinformatics       Date:  2008-12-17       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.