| Literature DB >> 24818139 |
Xiwei Tang1, Jianxin Wang2, Min Li2, Yiming He2, Yi Pan3.
Abstract
Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24818139 PMCID: PMC4003846 DOI: 10.1155/2014/354539
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Example to illustrate the clustering process. This example network has 12 vertices, and every edge has confidence. Suppose the weighted density threshold Td = 0.2. The vertex 0 is taken as a seed protein and the original cluster 0 is constructed. In the first step of the breadth first search, the vertex 1 has the highest edge weight 0.75 among the neighbors of the vertex 0. We add vertex 1 to the cluster and this cluster {0,1} now has the weighted density 0.75 that is bigger than the density threshold 0.2. Similarly, the vertices 2, 3, 4, and 5 are added to the cluster in sequence and the cluster {0,1, 2,3, 4,5} now has the weighted density 0.23 which is still more than the threshold 0.2. Next, the neighbors of vertex 4 are considered. Of these, vertex 6 has the highest edge weight 0.52 and is added to the cluster. However, the weighted density of the cluster {0,1, 2,3, 4,5, 6} is 0.19 and less than the threshold 0.2. Thus, the vertex 6 is removed and the neighbor of the vertex 3 is examined. Because the weighted value between the vertex 3 and its neighboring vertex 9 is 0.51 and less than 0.52, the vertex 9 is not added to the cluster. When the neighbors of the vertex 2 are checked, the vertex 10 is added to the cluster. Since the weighted density of the cluster {0,1, 2,3, 4,5, 10} is less than 0.2, the vertex 10 is removed. And, likewise, the vertex 11 is not added to the cluster. We stop extending the cluster and output the final cluster {0,1, 2,3, 4,5}. For simplicity, the elimination of redundant clusters is not shown in this figure.
Algorithm 1ClusterBFS algorithm.
Algorithm 2Breadth First Search: BFS(G, v).
Algorithm 3Redundancy-filtering (C).
Figure 2Example to illustrate the Redundancy-filtering. Complex B and Complex C contain 14 and 10 proteins, respectively. They share 4 proteins A, B, C, and D.
Figure 3Example to illustrate the maximum matching ratio. R1 and R2 are real complexes, while P1, P2, and P3 are three predictions. An edge connects a reference complex and a predicted complex, if their overlap score is larger than zero. The maximum matching is shown by the thick edges. Note that P2 was not matched to R1 since P1 provides a better match with R1. The maximum matching ratio in this example is (0.8 + 0.75)/2 = 0.775.
Comparison of the number of predictions matching at least one known complex.
| ClusterBFS | MCL | ClusterONE | HC-PIN | SPICi | MCODE | |
|---|---|---|---|---|---|---|
| OS ≥ 0.0 |
| 300 | 203 | 281 | 156 | 111 |
| OS ≥ 0.1 |
| 199 | 148 | 187 | 123 | 98 |
| OS ≥ 0.2 |
| 182 | 131 | 170 | 114 | 91 |
| OS ≥ 0.3 |
| 169 | 123 | 159 | 111 | 88 |
| OS ≥ 0.4 |
| 161 | 116 | 150 | 107 | 81 |
| OS ≥ 0.5 |
| 146 | 98 | 139 | 95 | 74 |
| OS ≥ 0.6 |
| 135 | 87 | 129 | 83 | 64 |
| OS ≥ 0.7 |
| 106 | 69 | 97 | 65 | 52 |
| OS ≥ 0.8 |
| 93 | 51 | 86 | 51 | 42 |
| OS ≥ 0.9 |
| 77 | 38 | 75 | 39 | 34 |
| OS = 1.0 |
| 74 | 33 | 70 | 33 | 30 |
Comparison of the number of real complexes matching at least one detected complex.
| ClusterBFS | MCL | ClusterONE | HC-PIN | SPICi | MCODE | |
|---|---|---|---|---|---|---|
| OS ≥ 0.0 |
| 408 | 408 | 408 | 408 | 408 |
| OS ≥ 0.1 |
| 256 | 195 | 244 | 176 | 142 |
| OS ≥ 0.2 |
| 227 | 164 | 209 | 142 | 113 |
| OS ≥ 0.3 |
| 196 | 149 | 185 | 129 | 103 |
| OS ≥ 0.4 |
| 181 | 137 | 171 | 117 | 92 |
| OS ≥ 0.5 |
| 159 | 111 | 150 | 104 | 82 |
| OS ≥ 0.6 |
| 145 | 96 | 136 | 88 | 70 |
| OS ≥ 0.7 |
| 109 | 73 | 100 | 68 | 54 |
| OS ≥ 0.8 |
| 94 | 52 | 87 | 53 | 42 |
| OS ≥ 0.9 |
| 77 | 38 | 75 | 39 | 34 |
| OS = 1.0 |
| 74 | 33 | 70 | 33 | 30 |
Figure 4The F-measure and MMR of various algorithms on Collins dataset.
Figure 5Co-localization and co-annotation scores of complexes identified by various methods on Collins dataset.
Figure 6F-measure and MMR of various methods for Krogan's core dataset.
Figure 7F-measure and MMR of various methods for Krogan's extended dataset.
Figure 8The effect of parameter Td. Figure 8 shows how the variation of parameter Td affects the F-measure of ClusterBFS.