Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 PACC: Large scale connected component computation on Hadoop and Spark.

Literature DB >> 32187232

PACC: Large scale connected component computation on Hadoop and Spark.

Ha-Myung Park¹, Namyong Park², Sung-Hyon Myaeng³, U Kang⁴.

Abstract

A connected component in a graph is a set of nodes linked to each other by paths. The problem of finding connected components has been applied to diverse graph analysis tasks such as graph partitioning, graph compression, and pattern recognition. Several distributed algorithms have been proposed to find connected components in enormous graphs. Ironically, the distributed algorithms do not scale enough due to unnecessary data IO & processing, massive intermediate data, numerous rounds of computations, and load balancing issues. In this paper, we propose a fast and scalable distributed algorithm PACC (Partition-Aware Connected Components) for connected component computation based on three key techniques: two-step processing of partitioning & computation, edge filtering, and sketching. PACC considerably shrinks the size of intermediate data, the size of input graph, and the number of rounds without suffering from load balancing issues. PACC performs 2.9 to 10.7 times faster on real-world graphs compared to the state-of-the-art MapReduce and Spark algorithms.

Entities: Chemical Disease Gene

Year: 2020 PMID： 32187232 PMCID： PMC7080249 DOI： 10.1371/journal.pone.0229936

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Keyword Cloud
References

2 in total

Review 1. Scale-free networks in cell biology.

Authors: Réka Albert
Journal: J Cell Sci Date: 2005-11-01 Impact factor: 5.285

2. Protein homology network families reveal step-wise diversification of Type III and Type IV secretion systems.

Authors: Duccio Medini; Antonello Covacci; Claudio Donati
Journal: PLoS Comput Biol Date: 2006-12-01 Impact factor: 4.475

2 in total