| Literature DB >> 26868667 |
Quanzhong Liu1, Jiangning Song2,3,4, Jinyan Li5.
Abstract
Most protein complex detection methods utilize unsupervised techniques to cluster densely connected nodes in a protein-protein interaction (PPI) network, in spite of the fact that many true complexes are not dense subgraphs. Supervised methods have been proposed recently, but they do not answer why a group of proteins are predicted as a complex, and they have not investigated how to detect new complexes of one species by training the model on the PPI data of another species. We propose a novel supervised method to address these issues. The key idea is to discover emerging patterns (EPs), a type of contrast pattern, which can clearly distinguish true complexes from random subgraphs in a PPI network. An integrative score of EPs is defined to measure how likely a subgraph of proteins can form a complex. New complexes thus can grow from our seed proteins by iteratively updating this score. The performance of our method is tested on eight benchmark PPI datasets and compared with seven unsupervised methods, two supervised and one semi-supervised methods under five standards to assess the quality of the predicted complexes. The results show that in most cases our method achieved a better performance, sometimes significantly.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26868667 PMCID: PMC4751475 DOI: 10.1038/srep21223
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Sparse complexes in the MIPS complex catalogue database.
Examples of emerging patterns from the Collins PPI network.
| Emerging patterns | Frequency (%) in subgraphs representing true complexes | Frequency (%) in random subgraphs |
|---|---|---|
| { | 99.4 | 0 |
| { | 88.6 | 0 |
| { | 2.4 | 85 |
Performance comparison with SCI-BN and RM.
| Train | Test | Method | Precision | Recall | F1 |
|---|---|---|---|---|---|
| MIPS | TAP06 | 0.399 | |||
| MIPS | TAP06 | SCI-BN | 0.312 | 0.489 | 0.381 |
| MIPS | TAP06 | SCI-SVM | 0.247 | 0.377 | 0.298 |
| MIPS | TAP06 | RM | 0.433 | 0.429 | |
| TAP06 | MIPS | ||||
| TAP06 | MIPS | SCI-BN | 0.219 | 0.537 | 0.312 |
| TAP06 | MIPS | SCI-SVM | 0.176 | 0.379 | 0.240 |
| TAP06 | MIPS | RM | 0.489 | 0.525 | 0.506 |
Performance comparison with NN.
| Train | Test | Method | Precision | Recall | F1 |
|---|---|---|---|---|---|
| MIPS | MIPS | ClusterEPs | |||
| MIPS | MIPS | SCI-BN | 0.273 | 0.473 | 0.346 |
| MIPS | MIPS | SCI-SVM | 0.239 | 0.412 | 0.302 |
| MIPS | MIPS | RM | 0.419 | 0.67 | 0.514 |
| MIPS | MIPS | NN | 0.333 | 0.491 | 0.397 |
Performance comparison of eight algorithms tested on five yeast PPI datasets using MIPS as the test set.
| Datasets | Methods | #cluster | Frac | Acc | MMR | Composite score |
|---|---|---|---|---|---|---|
| Gavin | ClusterEPs | 240 | 0.479 | |||
| MCL | 252 | 0.681 | 0.331 | 1.515 | ||
| MCODE | 135 | 0.611 | 0.447 | 0.301 | 1.359 | |
| CMC | 339 | 0.663 | 0.452 | 0.347 | 1.462 | |
| ClusterONE | 196 | 0.708 | 0.5 | 0.375 | 1.583 | |
| RNSC | 138 | 0.611 | 0.485 | 0.319 | 1.415 | |
| RRW | 234 | 0.664 | 0.446 | 0.34 | 1.45 | |
| CFinder | 137 | 0.558 | 0.487 | 0.279 | 1.324 | |
| Krogan core | ClusterEPs | 364 | 0.420 | |||
| MCL | 376 | 0.6 | 0.272 | 1.313 | ||
| MCODE | 79 | 0.326 | 0.334 | 0.144 | 0.804 | |
| CMC | 156 | 0.37 | 0.374 | 0.172 | 0.916 | |
| ClusterONE | 522 | 0.674 | 0.44 | 0.319 | 1.433 | |
| RNSC | 87 | 0.422 | 0.383 | 0.182 | 0.987 | |
| RRW | 329 | 0.504 | 0.361 | 0.248 | 1.113 | |
| CFinder | 115 | 0.341 | 0.369 | 0.166 | 0.876 | |
| Krogan extended | ClusterEPs | 561 | 0.386 | |||
| MCL | 483 | 0.436 | 0.409 | 0.193 | 1.038 | |
| MCODE | 64 | 0.192 | 0.294 | 0.097 | 0.583 | |
| CMC | 421 | 0.365 | 0.343 | 0.172 | 0.88 | |
| ClusterONE | 530 | 0.577 | 0.284 | 1.283 | ||
| RNSC | 93 | 0.365 | 0.369 | 0.159 | 0.893 | |
| RRW | 232 | 0.468 | 0.354 | 0.222 | 1.044 | |
| CFinder | 121 | 0.218 | 0.315 | 0.107 | 0.64 | |
| Collins | ClusterEPs | 171 | 0.705 | 0.528 | 1.654 | |
| MCL | 183 | 0.739 | 0.537 | 0.397 | 1.673 | |
| MCODE | 112 | 0.652 | 0.499 | 0.351 | 1.502 | |
| CMC | 184 | 0.591 | 0.509 | 0.309 | 1.409 | |
| ClusterONE | 195 | 0.418 | ||||
| RNSC | 94 | 0.608 | 0.499 | 0.306 | 1.413 | |
| RRW | 190 | 0.678 | 0.446 | 0.378 | 1.502 | |
| CFinder | 114 | 0.557 | 0.495 | 0.312 | 1.364 | |
| BioGRID | ClusterEPs | 862 | 0.366 | |||
| MCL | 338 | 0.196 | 0.348 | 0.083 | 0.627 | |
| MCODE | 85 | 0.201 | 0.325 | 0.08 | 0.606 | |
| CMC | N/A | N/A | N/A | N/A | N/A | |
| ClusterONE | 473 | 0.466 | 0.195 | 1.101 | ||
| RNSC | 209 | 0.481 | 0.43 | 0.212 | 1.123 | |
| RRW | 253 | 0.402 | 0.348 | 0.176 | 0.926 | |
| CFinder | N/A | N/A | N/A | N/A | N/A |
Performance comparison of eight algorithms tested on five yeast PPI datasets using SGD as the test set.
| Datasets | Methods | #cluster | Frac | Acc | MMR | Composite score |
|---|---|---|---|---|---|---|
| Gavin | ClusterEPs | 244 | 0.657 | |||
| MCL | 253 | 0.75 | 0.689 | 0.438 | 1.877 | |
| MCODE | 135 | 0.602 | 0.628 | 0.38 | 1.61 | |
| CMC | 339 | 0.726 | 0.612 | 0.466 | 1.804 | |
| ClusterONE | 196 | 0.789 | 0.476 | 1.971 | ||
| RNSC | 143 | 0.648 | 0.684 | 0.398 | 1.73 | |
| RRW | 237 | 0.758 | 0.667 | 0.471 | 1.896 | |
| CFinder | 137 | 0.609 | 0.668 | 0.369 | 1.646 | |
| Krogan core | ClusterEPs | 292 | 0.642 | 0.584 | 1.656 | |
| MCL | 367 | 0.636 | 0.637 | 0.354 | 1.627 | |
| MCODE | 79 | 0.394 | 0.477 | 0.218 | 1.089 | |
| CMC | 156 | 0.394 | 0.516 | 0.232 | 1.142 | |
| ClusterONE | 522 | 0.418 | ||||
| RNSC | 88 | 0.43 | 0.529 | 0.23 | 1.189 | |
| RRW | 264 | 0.606 | 0.561 | 0.361 | 1.528 | |
| CFinder | 115 | 0.418 | 0.494 | 0.243 | 1.155 | |
| Krogan extended | ClusterEPs | 522 | 0.550 | |||
| MCL | 517 | 0.492 | 0.594 | 0.253 | 1.339 | |
| MCODE | 64 | 0.278 | 0.422 | 0.147 | 0.847 | |
| CMC | 351 | 0.401 | 0.513 | 0.232 | 1.146 | |
| ClusterONE | 530 | 0.594 | 0.364 | 1.586 | ||
| RNSC | 97 | 0.406 | 0.527 | 0.225 | 1.158 | |
| RRW | 232 | 0.54 | 0.529 | 0.311 | 1.38 | |
| CFinder | 88 | 0.262 | 0.47 | 0.155 | 0.887 | |
| Collins | ClusterEPs | 385 | 0.805 | 0.646 | 1.987 | |
| MCL | 181 | 0.723 | 0.518 | 2.077 | ||
| MCODE | 112 | 0.672 | 0.66 | 0.439 | 1.771 | |
| CMC | 184 | 0.589 | 0.626 | 0.37 | 1.585 | |
| ClusterONE | 195 | 0.828 | 0.532 | |||
| RNSC | 95 | 0.627 | 0.661 | 0.377 | 1.665 | |
| RRW | 190 | 0.754 | 0.656 | 0.494 | 1.904 | |
| CFinder | 114 | 0.605 | 0.648 | 0.412 | 1.665 | |
| BioGRID | ClusterEPs | 780 | 0.539 | |||
| MCL | 335 | 0.3 | 0.46 | 0.144 | 0.904 | |
| MCODE | 85 | 0.21 | 0.424 | 0.094 | 0.728 | |
| CMC | N/A | N/A | N/A | N/A | ||
| ClusterONE | 481 | 0.562 | 0.277 | 1.467 | ||
| RNSC | 220 | 0.527 | 0.616 | 0.287 | 1.43 | |
| RRW | 270 | 0.485 | 0.534 | 0.263 | 1.282 | |
| CFinder | N/A | N/A | N/A | N/A | N/A |
Performance of ClusterEPs in the detection of human complexes through the model constructed on the yeast PPI and complexes data.
| Method | Trained yeast PPI | Trained yeast complexes | Human PPI | Precision | Recall | F1 | Acc | MMR |
|---|---|---|---|---|---|---|---|---|
| ClusterEPs | Gavin | MIPS + SGD | HPRD | 0.211 | 0.603 | 0.313 | 0.243 | 0.206 |
| ClusterEPs | Krogan core | MIPS + SGD | HPRD | 0.232 | 0.551 | 0.326 | 0.226 | 0.195 |
| ClusterEPs | krogan extended | MIPS + SGD | HPRD | 0.217 | 0.586 | 0.317 | 0.238 | 0.207 |
| ClusterEPs | Collins | MIPS + SGD | HPRD | 0.191 | 0.612 | 0.292 | 0.247 | 0.201 |
| ClusterEPs | BioGRID | MIPS + SGD | HPRD | 0.197 | 0.580 | 0.294 | 0.239 | 0.198 |
| ClusterEPs | DIP | MIPS + SGD | HPRD | 0.190 | 0.574 | 0.286 | 0.245 | 0.199 |
| ClusterEPs | CombinedYeast | MIPS + SGD | HPRD | 0.208 | 0.482 | 0.290 | 0.213 | 0.173 |
| ClusterONE | HPRD | 0.169 | 0.376 | 0.233 | 0.317 | 0.114 | ||
| ClusterEPs | CombinedYeast | MIPS + SGD | HSN | 0.121 | 0.429 | 0.189 | 0.248 | 0.139 |
| ClusterONE | HSN | 0.057 | 0.128 | 0.079 | 0.308 | 0.035 | ||
| ClusterEPs | CombinedYeast | MIPS + SGD | HSN + HPRD | 0.115 | 0.448 | 0.183 | 0.217 | 0.170 |
| ClusterONE | HSN + HPRD | 0.062 | 0.164 | 0.090 | 0.312 | 0.057 | ||
| ClusterEPs | CombinedYeast | MIPS + SGD | HSNHPRD_F | 0.196 | 0.504 | 0.282 | 0.237 | 0.171 |
| ClusterONE | HSNHPRD_F | 0.125 | 0.242 | 0.165 | 0.332 | 0.074 |
Figure 2The RNA polymerase I complex as predicted by ClusterONE and ClusterEPs.
The red nodes represent proteins that belong to the true complex, while the white color nodes represent proteins that do not belong to the true complex. The shaded area indicates the whole predicted subgraph.
Figure 3The RecQ helicase-Topo III complex as predicted by ClusterEPs.
The red nodes represent proteins that belong to the true complex and the white color nodes represent proteins that do not belong to the true complex.
Figure 4The Ubiquitin E3 ligase complex as predicted by ClusterEPs.
Figure 5Four examples of complexes identified by ClusterEPs.