| Literature DB >> 34972161 |
Meghana Venkata Palukuri1, Edward M Marcotte2.
Abstract
Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34972161 PMCID: PMC8719692 DOI: 10.1371/journal.pone.0262056
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Best parameters found and used in each of the experiments.
| PPI Network: | hu.MAP | Yeast | Yeast | Yeast |
|---|---|---|---|---|
|
| train: CORUM, test: CORUM (independent) | 1. train: TAP, test: MIPS | 2. train: MIPS, test: TAP | 3. train: MIPS, test: MIPS |
|
| All nodes | All nodes | All nodes | All nodes |
|
| 10x positives | 1.1x positives | 1.1x positives | 1.1x positives |
|
| Uniform | Uniform | Uniform | Uniform |
|
| ϵ- greedy + iterative simulated annealing | ϵ- greedy + iterative simulated annealing | ϵ- greedy + pseudo-metropolis | ϵ- greedy + iterative simulated annealing |
|
| 0.01 | 0.01 | 0.01 | 0.01 |
|
| T0 = 1.75 and α = 0.005 | T0 = 0.88 and α = 1.8 | Probability p = 0.1 | T0 = 0.88 and α = 1.8 |
|
| 20 | 4 | 9 | 10 |
|
| All neighbors | All neighbors | All neighbors | All neighbors |
|
| Qi overlap measure = 0.375 | Qi overlap measure = 0.1 | Qi overlap measure = 0.3 | Qi overlap measure = 0.9 |
Evaluating learned complexes on hu.MAP w.r.t ‘refined CORUM’ complexes.
| Method | FMM | CMF | Unbiased | Qi et al. | F-Grand | F-weighted | ||
|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 score | F1 score | Sn-PPV accuracy | F1 Score (t = 0.5) | k-Clique | k-Clique | |
| Super.Complex |
| 0.534 |
| 0.783 | 0.888 | 0.739 |
|
|
| hu.MAP (ClusterONE + MCL) | 0.471 |
| 0.559 |
|
|
| 0.77 | 0.967 |
Refined CORUM comprises 188 complexes after cleaning original CORUM complexes.
Comparing our method with 6 supervised and 4 unsupervised methods on a yeast PPI network.
| Method | Train | Test | Precision | Recall | F-measure |
|---|---|---|---|---|---|
|
| TAP | MIPS |
| 0.629 |
|
| ClusterSS | TAP | MIPS | 0.526 | 0.807 | 0.636 |
| ClusterEPs | TAP | MIPS | 0.606 | 0.664 | 0.633 |
| RM | TAP | MIPS | 0.489 | 0.525 | 0.506 |
| SCI-BN | TAP | MIPS | 0.219 | 0.537 | 0.312 |
| SCI-SVM | TAP | MIPS | 0.176 | 0.379 | 0.240 |
| ClusterONE | MIPS | 0.428 | 0.435 | 0.431 | |
| COACH | MIPS | 0.364 | 0.495 | 0.419 | |
| CMC | MIPS | 0.46 | 0.38 | 0.416 | |
| MCODE | MIPS | 0.4 | 0.1 | 0.16 | |
|
| MIPS | TAP |
| 0.581 |
|
| ClusterSS | MIPS | TAP | 0.477 | 0.864 | 0.614 |
| ClusterEPs | MIPS | TAP | 0.424 | 0.782 | 0.548 |
| RM | MIPS | TAP | 0.424 | 0.433 | 0.429 |
| SCI-BN | MIPS | TAP | 0.312 | 0.489 | 0.381 |
| SCI-SVM | MIPS | TAP | 0.247 | 0.377 | 0.298 |
| ClusterONE | TAP | 0.480 | 0.46 | 0.47 | |
| COACH | TAP | 0.387 | 0.533 | 0.449 | |
| CMC | TAP | 0.447 | 0.353 | 0.395 | |
| MCODE | TAP | 0.422 | 0.127 | 0.195 | |
|
| MIPS | MIPS |
|
|
|
| NN | MIPS | MIPS | 0.333 | 0.491 | 0.397 |
Precision, recall, and F-measures are from Qi et al. Parameters for each of the Super.Complex experiments are given in .