| Literature DB >> 20846398 |
Mahnaz Habibi1, Changiz Eslahchi, Limsoon Wong.
Abstract
BACKGROUND: Protein complexes play an important role in cellular mechanisms. Recently, several methods have been presented to predict protein complexes in a protein interaction network. In these methods, a protein complex is predicted as a dense subgraph of protein interactions. However, interactions data are incomplete and a protein complex does not have to be a complete or dense subgraph.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20846398 PMCID: PMC2949670 DOI: 10.1186/1752-0509-4-129
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Connectivity of two known complexes. Part (A) contains two known complexes reported by MIPS (MIPS ID: 510.40.10 and 550.1.213). In complex 1, except for one vertex, there are at least two independent paths between every two proteins. In complex 2, except for two vertices, there are at least two independent paths between every two proteins. Part (B) are two 2-connected subgraphs obtained from the network in Part (A).
Summary statistics of each data set.
| Data set | Proteins | Interactions | Min. Deg | Avg.Deg | Max. Deg |
|---|---|---|---|---|---|
| 5040 | 27557 | 0 | 10.93 | 318 | |
| 1563 | 6531 | 0 | 8.36 | 81 | |
| 1373 | 3200 | 0 | 4.66 | 52 | |
| 2672 | 7073 | 0 | 5.29 | 140 | |
| 1563 | 3596 | 1 | 4.60 | 62 | |
| 775 | 732 | 0 | 1.8 | 54 | |
| 823 | 823 | 0 | 1.7 | 21 |
Summary statistics of each protein complex data sets for each PPI network.
| PPI | MPC | APC | ||||
|---|---|---|---|---|---|---|
| No. of | Avg. Size | Max size | No. of | Avg. Size | Max size | |
| 651 | 11.94 | 88 | 62 | 9.29 | 34 | |
| 443 | 11.31 | 80 | 53 | 8.84 | 27 | |
| 439 | 11.35 | 88 | 54 | 8.72 | 26 | |
| 531 | 10.89 | 75 | 56 | 8.94 | 31 | |
| 543 | 10.55 | 70 | 30 | 6.60 | 18 | |
| 119 | 5.85 | 20 | 15 | 4.86 | 8 | |
| 355 | 9.15 | 56 | 12 | 6.41 | 14 | |
Features of clusters predicted by different algorithms on the both the original and Cnetworks.
| CMC | MCL | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg | Avg | Avg | Avg | ||||||||
| Setting | Cluster | Size | Prec | Recall | Cluster | Size | Prec | Recall | |||
| (1) | 295 | 9.58 | 0.210 | 0.124 | 0.78 | 376 | 10.41 | 0.098 | 0.072 | 0.40 | |
| (2) | 296 | 9.36 | 0.75 | 647 | 9.43 | 0.49 | |||||
| (1) | 155 | 10.51 | 0.367 | 0.203 | 0.66 | 160 | 11.72 | 0.471 | 0.194 | 0.50 | |
| (2) | 299 | 9.55 | 0.62 | 327 | 9.40 | 0.57 | |||||
| (1) | 110 | 11.5 | 0.390 | 0.118 | 0.44 | 115 | 9.81 | 0.252 | 0.39 | ||
| (2) | 213 | 9.69 | 0.49 | 373 | 9.63 | 0.479 | 0.40 | ||||
| (1) | 215 | 8.93 | 0.251 | 0.124 | 0.60 | 246 | 8.07 | 0.146 | 0.094 | 0.43 | |
| (2) | 166 | 8.15 | 0.64 | 247 | 7.77 | 0.54 | |||||
| (1) | 121 | 8.17 | 0.206 | 0.057 | 0.32 | 146 | 8.19 | 0.486 | 0.34 | ||
| (2) | 149 | 7.89 | 0.45 | 96 | 7.28 | 0.110 | 0.36 | ||||
| (1) | 174 | 8.73 | 0.253 | 0.109 | 0.51 | 174 | 6.31 | 0.367 | 0.119 | 0.78 | |
| (2) | 341 | 9.27 | 0.60 | 301 | 7.36 | 0.81 | |||||
| (1) | 95 | 9.61 | 0.463 | 0.185 | 0.63 | 105 | 6.41 | 0.381 | 0.126 | 0.74 | |
| (2) | 228 | 9.14 | 0.62 | 295 | 7.33 | 0.72 | |||||
| (1) | 54 | 9.40 | 0.125 | 0.50 | 89 | 5.98 | 0.370 | 0.074 | 0.61 | ||
| (2) | 121 | 9.17 | 0.446 | 0.45 | 158 | 6.81 | 0.59 | ||||
| (1) | 100 | 7.90 | 0.380 | 0.109 | 0.61 | 92 | 6.25 | 0.423 | 0.081 | 0.72 | |
| (2) | 205 | 7.66 | 0.68 | 200 | 6.77 | 0.69 | |||||
| (1) | 42 | 5.59 | 0.285 | 0.040 | 0.29 | 15 | 6.85 | 0.333 | 0.046 | 0.41 | |
| (2) | 51 | 5.00 | 0.37 | 26 | 6.84 | 0.38 | |||||
In the setting column, (1) refers to the original network and (2) refers the network obtained by seggregation according to informative cellular component GO term annotations.
Figure 2Examples of the clusters predicted on the original and the . Part (A) illustrates the impact of using informative cellular component GO term annotations on the performance of CMC. CMC predicts the unmatched cluster on the original network. This cluster is refined in Cto matched well with the real complex in MPC. Part (B) shows a seven-member cluster predicted by PCP after the input PPI network is cleansed using informative cellular component GO term annotations.
Figure 3The frequency distribution of known protein complexes having various density of protein interactions within them.
The kscore and average density of different PPI networks on MPC.
| Data Set | |||||
|---|---|---|---|---|---|
| Avg Density | 0.41 | 0.29 | 0.21 | 0.20 | 0.25 |
| 1 | 0.995 | 0.929 | 0.970 | 0.870 | 0.983 |
| 2 | 0.784 | 0.868 | 0.758 | 0.678 | 0.748 |
| 3 | 0.537 | 0.521 | 0.494 | 0.351 | 0.446 |
| 4 | 0.374 | 0.318 | 0.397 | 0.254 | 0.232 |
Precision and recall values of different algorithms on each PPI network.
| APC | MPC | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Data | No. of | Match | Recall/Prec | Match | Match | Recall/Prec | Match |
| CFA | (1) | 423 | 52 | 131 | 119 | 184 | ||
| CMC | (1) | 296 | 51 | 0.822 0.293 | 87 | 101 | 0.155 0.361 | 107 |
| MCL | (1) | 647 | 51 | 0.822 0.179 | 116 | 113 | 0.173 0.241 | 156 |
| PCP | (1) | 341 | 50 | 0.806 0.290 | 99 | 103 | 0.158 0.343 | 117 |
| RNSC | (1) | 301 | 52 | 0.838 | 104 | 102 | 0.156 0.425 | 128 |
| CFA | (2) | 324 | 51 | 148 | 122 | 176 | ||
| CMC | (2) | 299 | 50 | 0.943 0.347 | 104 | 106 | 0.239 0.401 | 120 |
| MCL | (2) | 327 | 50 | 0.943 0.422 | 138 | 116 | 0.261 0.486 | 159 |
| PCP | (2) | 228 | 48 | 0.905 0.403 | 92 | 108 | 0.243 0.482 | 110 |
| RNSC | (2) | 295 | 50 | 0.943 0.362 | 107 | 104 | 0.234 0.410 | 121 |
| CFA | (3) | 235 | 49 | 117 | 119 | 140 | ||
| CMC | (3) | 213 | 31 | 0.574 0.347 | 74 | 70 | 0.159 0.417 | 89 |
| MCL | (3) | 373 | 47 | 0.870 0.332 | 124 | 117 | 0.266 0.479 | 179 |
| PCP | (3) | 121 | 28 | 0.518 0.388 | 47 | 62 | 0.141 0.446 | 54 |
| RNSC | (3) | 158 | 25 | 0.463 0.392 | 62 | 58 | 0.132 0.487 | 77 |
| CFA | (4) | 330 | 45 | 149 | 104 | 176 | ||
| CMC | (4) | 166 | 40 | 0.714 0.379 | 63 | 87 | 0.163 0.494 | 82 |
| MCL | (4) | 247 | 45 | 0.803 0.368 | 91 | 98 | 0.184 0.477 | 118 |
| PCP | (4) | 205 | 40 | 0.714 0.400 | 82 | 84 | 0.158 0.458 | 94 |
| RNSC | (4) | 200 | 37 | 0.660 0.430 | 86 | 88 | 0.165 0.510 | 102 |
| CFA | (5) | 120 | 13 | 20 | 62 | 50 | ||
| CMC | (5) | 149 | 6 | 0.200 0.060 | 9 | 56 | 0.103 0.335 | 50 |
| MCL | (5) | 96 | 12 | 0.400 0.250 | 24 | 60 | 0.110 0.500 | 48 |
| PCP | (5) | 51 | 4 | 0.133 0.098 | 5 | 33 | 0.060 0.372 | 19 |
| RNSC | (5) | 26 | 5 | 0.166 | 13 | 21 | 0.038 | 19 |
| CFA | (6) | 45 | 3 | 4 | 15 | 12 | ||
| CMC | (6) | 9 | 0 | 0.000 0.000 | 0 | 1 | 0.008 0.111 | 1 |
| MCL | (6) | 65 | 3 | 0.200 0.076 | 5 | 15 | 0.126 0.230 | 15 |
| PCP | (6) | 8 | 0 | 0.000 0.000 | 0 | 1 | 0.008 0.125 | 1 |
| RNSC | (6) | 11 | 0 | 0.000 0.000 | 0 | 1 | 0.008 | 6 |
The "data sets" column refers to networks, where (1) denotes PPI, (2) denotes PPI, (3) denotes PPI, (4) denotes PPI, (5) denotes PPI, and (6) denotes PPI. The best precision and recall value for each PPI network are highlighted in bold font.
Detailed breakdown of predicted clusters by different algorithms with respect to APC and MPC reference protein complexes.
| Method | | | | | | | | | | | No. of Cluster | Precision |
|---|---|---|---|---|---|---|---|
| CFA | 184 | 131 | 208 | 423 | 0.492 | ||
| CMC | 107 | 87 | 125 | 38 | 18 | 296 | 0.422 |
| MCL | 156 | 116 | 177 | 61 | 21 | 647 | 0.274 |
| PCP | 117 | 99 | 140 | 41 | 23 | 341 | 0.411 |
| RNSC | 128 | 104 | 151 | 47 | 23 | 301 | 0.502 |
Here A is the set of matched clusters on MPC and B is the set of matched clusters on APC. The number of matched clusters on MPC or APC, exclusively MPC and exclusively APC are shown.
Figure 4The Venn diagram of matched complexes. A Venn diagram of the combined set of complexes in APC and MPC that are correctly predicted by CFA, CMC and RNSC based on PPInetwork.
Precision and recall values after removing highly overlapping clusters.
| Method | Data Set | No. of Cluster | Recall/Prec/F-measure | Recall/Prec/F-measure | ||||
|---|---|---|---|---|---|---|---|---|
| CFA | (1) | 238 | ||||||
| CMC | (1) | 208 | 0.741 | 0.235 | 0.358 | 0.145 | 0.322 | 0.200 |
| MCL | (1) | 467 | 0.790 | 0.113 | 0.199 | 0.147 | 0.164 | 0.155 |
| PCP | (1) | 230 | 0.758 | 0.226 | 0.348 | 0.133 | 0.282 | 0.181 |
| RNSC | (1) | 186 | 0.809 | 0.274 | 0.409 | 0.150 | 0.365 | 0.213 |
| CFA | (2) | 164 | ||||||
| CMC | (2) | 197 | 0.274 | 0.423 | 0.214 | 0.355 | 0.267 | |
| MCL | (2) | 191 | 0.905 | 0.272 | 0.419 | 0.221 | 0.356 | 0.272 |
| PCP | (2) | 144 | 0.811 | 0.319 | 0.458 | 0.214 | 0.416 | 0.283 |
| RNSC | (2) | 152 | 0.348 | 0.506 | 0.205 | 0.263 | 0.230 | |
| CFA | (3) | 124 | 0.475 | |||||
| CMC | (3) | 122 | 0.500 | 0.237 | 0.322 | 0.123 | 0.295 | 0.173 |
| MCL | (3) | 215 | 0.851 | 0.237 | 0.371 | 0.248 | 0.395 | 0.305 |
| PCP | (3) | 82 | 0.481 | 0.485 | 0.116 | 0.365 | 0.176 | |
| RNSC | (3) | 90 | 0.425 | 0.255 | 0.320 | 0.120 | 0.377 | 0.183 |
| CFA | (4) | 169 | 0.337 | 0.469 | ||||
| CMC | (4) | 120 | 0.714 | 0.341 | 0.462 | 0.158 | 0.450 | 0.234 |
| MCL | (4) | 150 | 0.293 | 0.425 | 0.169 | 0.386 | 0.235 | |
| PCP | (4) | 130 | 0.678 | 0.300 | 0.416 | 0.133 | 0.369 | 0.196 |
| RNSC | (4) | 108 | 0.660 | 0.160 | 0.500 | 0.242 | ||
| CFA | (5) | 96 | 0.156 | 0.225 | 0.385 | 0.165 | ||
| CMC | (5) | 109 | 0.166 | 0.045 | 0.072 | 0.073 | 0.247 | 0.113 |
| MCL | (5) | 71 | 0.169 | |||||
| PCP | (5) | 43 | 0.133 | 0.093 | 0.110 | 0.055 | 0.325 | 0.094 |
| RNSC | (5) | 16 | 0.166 | 0.217 | 0.023 | 0.562 | 0.045 | |
| CFA | (6) | 41 | 0.049 | 0.071 | 0.195 | |||
| CMC | (6) | 8 | 0.000 | 0.000 | 0.000 | 0.008 | 0.125 | 0.015 |
| MCL | (6) | 52 | 0.117 | 0.192 | 0.145 | |||
| PCP | (6) | 8 | 0.000 | 0.000 | 0.000 | 0.008 | 0.125 | 0.015 |
| RNSC | (6) | 5 | 0.000 | 0.000 | 0.000 | 0.008 | 0.016 | |
The "data sets" column refers to networks, where (1) denotes PPI, (2) denotes PPI, (3) denotes PPI, (4) denotes PPI, (5) denotes PPI, and (6) denotes PPI. The best precision and recall value for each PPI network are highlighted in bold font.
Figure 5Examples of matched and unmatched clusters. Examples of matched (cluster 1 and 2) predicted clusters by CFA with different density. And an example of unmatched cluster predicted by CFA which contains proteins having the same specific GO annotation (GO: 0015031; protein transport).
Precision and recall values of maximal k-connected (k ≥ 1) subgraphs, C1, C2, ..., C9, and their union U.
| Data | Prec/Recall | Prec/Recall | Prec/Recall | Prec/Recall | Prec/Recall | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0.356 | 0.163 | 0.486 | 0.248 | 0.685 | 0.252 | 0.537 | 0.184 | 0.423 | 0.112 | |
| 0.380 | 0.149 | 0.497 | 0.241 | 0.535 | 0.161 | 0.462 | 0.184 | 0.461 | 0.058 | |
| 0.516 | 0.150 | 0.597 | 0.187 | 0.523 | 0.102 | 0.549 | 0.150 | 0.555 | 0.023 | |
| 0.631 | 0.112 | 0.666 | 0.37 | 0.520 | 0.045 | 0.709 | 0.090 | 0.000 | 0.000 | |
| 0.615 | 0.094 | 0.666 | 0.070 | 0.538 | 0.022 | 0.720 | 0.065 | -- | -- | |
| 0.614 | 0.059 | 0.562 | 0.049 | 0.600 | 0.013 | 0.645 | 0.045 | -- | -- | |
| 0.561 | 0.043 | 0.800 | 0.024 | 0.500 | 0.002 | 0.608 | 0.037 | -- | -- | |
| 0.680 | 0.0353 | 0.714 | 0.018 | 1.000 | 0.002 | 0.666 | 0.033 | -- | -- | |
| 0.880 | 0.0276 | 0.000 | 0.000 | -- | -- | -- | -- | -- | -- | |
| 0.435 | 0.182 | 0.543 | 0.275 | 0.595 | 0.271 | 0.533 | 0.195 | 0.416 | 0.114 | |
Figure 6. The F-measure graphs of five mentioned methods by varying the threshold on matching scores for (A) PPI, (B) PPI, (C) PPIand (D) PPI.
Pseudo codes of CFA
| Step1:// Find maximal k-connected subgraphs |
| Input: Graph |
| Output: All vertices in |
| The reduced graph is returned. |
| Input: Connected graph |
| Output: Fragment the graph |
| Find some |
| call |
| Input: Graph |
| Output: |
| Input: Graph |
| Output: Maximal k-connected subgraphs in |
| Set |
| Set |
| Increment |
| Set |
| Set |
| Set |
| Remove all subgraphs of size less than 4 in the set |
Optimal parameters for CMC, MCL, PCP and RNSC algorithms.
| Algorithm | Parameter | Optimal value |
|---|---|---|
| MCL | Inflation | 1.8 |
| CMC | Min-deg-ratio | 1 |
| Overlap-threshold | 0.5 | |
| Merge-threshold | 0.25 | |
| Min-size | 4 | |
| PCP | FSWeight-threshold | 0.4 |
| Min clique size | 4 | |
| Overlap-threshold | 0.5 | |
| RNSC | Diversification frequency | 50 |
| Tabu length | 50 | |
| Number of experiments | 3 | |
| Scaled stopping tolerance | 15 | |
| Shuffling diversification length | 9 |