| Literature DB >> 30322177 |
Fang Zhang1, Anjun Ma2,3, Zhao Wang4, Qin Ma5,6, Bingqiang Liu7, Lan Huang8, Yan Wang9.
Abstract
Overlapping structures of protein⁻protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein⁻protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well.Entities:
Keywords: central edge selection; overlapping community detection; overlapping node pruning; protein–protein interaction network
Mesh:
Substances:
Year: 2018 PMID: 30322177 PMCID: PMC6222769 DOI: 10.3390/molecules23102633
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Six real networks.
| Dataset | Nodes | Edges | BCN |
|---|---|---|---|
|
| 34 | 78 | 2 |
|
| 62 | 158 | 2 |
|
| 115 | 612 | 12 |
|
| 1396 | 2092 | - |
|
| 1883 | 2597 | - |
|
| 2172 | 5124 | - |
Benchmark networks categories number (BCN) refers to the number of categories on benchmark networks that are recorded in each publication.
Figure 1Example of the limitation of the central node selection (CNS) algorithm.
Figure 2Workflow of the CES algorithm.
Figure 3An example of a CES result.
Figure 4An example of the first pruning strategy.
Figure 5An example of the second pruning strategy.
The comparison of the algorithms’ running times.
| Methods |
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|
| RT(s) | ||||||||
| Datasets | ||||||||
| CES | 0.02 |
|
| 1.742 | 58.746 | 863.617 | ||
| CNS | 0.124 | 0.609 | 2.809 | 67.487 | 1395.1 | 15,780 | ||
| CPM |
| 0.3 | 0.8 |
|
|
| ||
| LC | 0.636 | 1.841 | 7.331 | 20.988 | 187 | 1682.22 | ||
The runtime in seconds (RT(s)) in the table represent the runtime and the bold numbers represent the best RT among all algorithms.
The evaluation results of four overlapping community detection (OCD) algorithms (CES, CNS, clique percolation method (CPM), and link clustering (LC)) on three benchmark networks (Karate Network, Dolphin Network, and Football Network).
| Dataset |
|
|
| ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Evaluation | EQ | NMI | CR | BCN | ECN | EQ | NMI | CR | BCN | ECN | EQ | NMI | CR | BCN | ECN |
| CES | 0.37 | 0.92 | 100% | 2 | 2 | 0.38 | 0.76 | 100% | 2 | 2 | 0.40 | 0.52 | 99% | 12 | 12 |
| CNS | 0.35 | 0.69 | 100% | 2 | 2 | 0.46 | 0.41 | 100% | 2 | 3 | 0.28 | 0.62 | 44% | 12 | 5 |
| CPM | 0.19 | 0.18 | 94% | 2 | 3 | 0.36 | 0.32 | 74% | 2 | 4 | 0.19 | 0.26 | 100% | 12 | 4 |
| LC | 0.17 | 0.06 | 97% | 2 | 12 | 0.18 | 1 × 10−16 | 87% | 2 | 22 | 0.16 | 5.5 × 10−17 | 100% | 2 | 46 |
Figure 6The selection of GF.
The visualization of three algorithms’ (CES, CNS, and CPM) results on three benchmark networks (Karate Network, Dolphin Network, and Football Network).
| Algorithms | CES | CNS | CPM | |
|---|---|---|---|---|
| Datasets | ||||
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
The results of four algorithms (CES, CNS, CPM, and LC) on three Protein–Protein Interaction (PPI) networks (M. musculus Network, E. coli Network, and Cerevisiae).
| Dataset |
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
| Evaluation | EQ | CR | ECN | EQ | CR | ECN | EQ | CR | ECN |
| CES |
|
| 85 |
|
| 77 |
|
| 105 |
| CNS | 0.534 | 65% | 43 | 0.49 | 72% | 18 | 0.438 | 55% | 46 |
| CPM | 0.191 | 18% | 41 | 0.226 | 23% | 19 | 0.467 | 53% | 161 |
| LC | 0.19 | 78% | 149 | 0.10 | 60% | 47 | 0.06 | 92% | 580 |
The bold numbers represent the best result among all algorithms.
Figure 7Visualization of the predicted PPI network using four algorithms. (A) M. musculus dataset. (B) E. coli dataset. (C) Cerevisiae dataset.
Total number of significant categories with p-value ≤ 0.001 predicted by each algorithm.
| Datasets | CES | CNS | CPM | LC |
|---|---|---|---|---|
|
| 66/85 | 33/43 | 40/41 | 118/149 |
|
| 44/77 | 17/18 | 15/19 | 10/47 |
|
| 79/105 | 44/46 | 159/161 | 344/580 |
Figure 8Overlapping structure of the No.1 category in E. coli.