| Literature DB >> 29315405 |
Ariful Azad1, Georgios A Pavlopoulos2, Christos A Ouzounis3, Nikos C Kyrpides2, Aydin Buluç1,4.
Abstract
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein-protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL's scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.Entities:
Mesh:
Year: 2018 PMID: 29315405 PMCID: PMC5888241 DOI: 10.1093/nar/gkx1313
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Computational infrastructure used for HipMCL benchmarking
| Edison (Cray XC30 supercomputer) | Cori2 (Cray XC40 supercomputer) | In-house system | ||
|---|---|---|---|---|
| Overall system | #nodes | 5586 | 9688 | 1 |
| #cores | 134 064 | 658 784 | 8 | |
| aggregate memory | 357 terabyte | 1 petabyte | 1 terabyte | |
| max #nodes used in experiments | 2025 | 2048 | 8 | |
| One computing node of the system | processor | Intel Ivy Bridge | Intel KNL | Intel Xeon |
| number of cores | 24 | 68 (272 threads) | 8 | |
| memory | 64 gigabyte | 112 gigabyte | 1 terabyte |
The impact of parallelizing different steps of MCL when clustering a eukaryotic network with 3 million nodes and 359 million edges (Table 3)
| File I/O (s) | Expansion (s) | Prune (s) | Inflation (ss) | Components (s) | |
|---|---|---|---|---|---|
| MCL (1 node) | 600.12 | 1052.11 | 9.93 | 199.97 | 608.77 |
| HipMCL (1024 nodes) | 7.23 | 27.20 | 0.92 | 0.19 | 0.19 |
| HipMCL speedup | 83× | 39× | 11× | 1052× | 3288× |
The last row shows the speedups achieved by HipMCL on 1024 nodes of Edison (Table 1). While HipMCL drastically reduces the running time of all five steps, expansion remains the most expensive step in Markov clustering. Hence, we spent the majority of our research effort to make the expansion step scalable.
Figure 1.An example of expansion and pruning of b ( = 2) columns of a column stochastic matrix A. Non-zero entries are shown with filled circles. Here, A is a submatrix of A, consisting all N rows and b ( = 2) columns that are currently being expanded. The product AxAis computed and pruned to obtain the final result for these b columns. Parts of matrices that are active in the current expansion are shown in darker shades. For comparison, MCL sets b to 1. HipMCL dynamically selects a large value for b from the range [1,N] such that the expanded columns of A2 do not overflow memory. When these columns are expanded and pruned, the computation moves to the next set of b columns.
Figure 2.Execution of the sparse SUMMA algorithm for sparse matrix–matrix multiplication A2 = A*A on a 3-by-3 process grid. We use the same input matrix from Figure 1 and denote submatrices local to different processes by blue squares. Here, we show the first stage of the sparse SUMMA algorithm where members of the first process column broadcast their local pieces of A horizontally (along the process row) and members of the first process row broadcast their local pieces of A vertically (along the process column). Broadcasting processes in the first stage are marked with blue shades and the direction of data communication is shown with red arrowheads. The rightmost figure depicts each process that locally multiplies the received parts of A and merges the multiplied results to its local part of the output matrix A2.
Clustering quality results of HipMCL by directly comparing it to the original MCL
| Dataset | Inflation | #clusters from MCL | #clusters from HipMCL | F-score | #mismatched clusters |
|---|---|---|---|---|---|
| Eukarya |V| = 3 243 106 |E| = 359,744,161 | 1.4 | 228 965 | 228 965 | 0.99 | 8 |
| 2 | 284 026 | 284 026 | 1.00 | 1 | |
| 4 | 446 216 | 446 216 | 1.00 | 1 | |
| 6 | 597 014 | 597 014 | 1.00 | 0 | |
| Archaea |V| = 1 644 227 |E| = 204 784 551 | 1.4 | 87 559 | 87 559 | 0.99 | 19 |
| 2 | 107 207 | 107 207 | 1.00 | 0 | |
| 4 | 163 840 | 163 840 | 1.00 | 0 | |
| 6 | 222 937 | 222 937 | 1.00 | 0 | |
| Viruses |V| = 219,715 |E| = 4 583 048 | 1.4 | 34 519 | 34 519 | 1.00 | 0 |
| 2 | 37 216 | 37 216 | 1.00 | 0 | |
| 4 | 41 835 | 41 835 | 1.00 | 0 | |
| 6 | 45 294 | 45 294 | 1.00 | 0 |
All experiments were run on Edison (NERSC). Column 1: |V| Vertices, |E| Edges. Column 2: The inflation value used for MCL. Column 3: The clusters produced by MCL. Column 4: The number of clusters produced by HipMCL. Column 5: The F-score comparing the results of MCL and HipMCL. As shown, results are identical. Column 6: Very few HipMCL clusters that contain slightly different number of proteins compared to the ones produced by MCL.
Evaluation of HipMCL clustering for large-scale networks
| Network | #nodes (millions) | #edges (billions) | #clusters (millions) | HipMCL runtime (h) | Running platform |
|---|---|---|---|---|---|
| Isolate-1 | 47 | 7 | 1.59 | 1 | 1024 nodes on Edison |
| Isolate-2 | 69 | 12 | 3.37 | 1.66 | 1024 nodes on Edison |
| Isolate-3 | 70 | 68 | 2.88 | 2.41 | 2048 nodes on Cori2 |
| Metaclust50 | 282 | 37 | 41.52 | 3.23 | 2048 nodes on Cori2 |
Figure 3.Comparison of runtimes of the original MCL and HipMCL using three networks. Both axes are in log scale. Both MCL and HipMCL ran on Edison. MCL ran on a single compute node with 24 cores. HipMCL ran on increasing number of compute nodes to show how the clustering time reduces as we add more computing resources. HipMCL uses all 24 cores available in each node via multithreading. HipMCL ran on up to 64 nodes (1536 cores) for the smaller viruses’ network and on up to 1024 nodes (24 576 cores) for archaea and eukarya networks. The performance improvement of the highest concurrency HipMCL execution compared to single-node MCL and HipMCL executions are shown to the right of each subfigure. See text for details.