| Literature DB >> 28878201 |
Abstract
Recent advances in high-throughput laboratory techniques captured large-scale protein-protein interaction (PPI) data, making it possible to create a detailed map of protein interaction networks, and thus enable us to detect protein complexes from these PPI networks. However, most of the current state-of-the-art studies still have some problems, for instance, incapability of identifying overlapping clusters, without considering the inherent organization within protein complexes, and overlooking the biological meaning of complexes. Therefore, we present a novel overlapping protein complexes prediction method based on core-attachment structure and function annotations (CFOCM), which performs in two stages: first, it detects protein complex cores with the maximum value of our defined cluster closeness function, in which the proteins are also closely related to at least one common function. Then it appends attach proteins into these detected cores to form the returned complexes. For performance evaluation, CFOCM and six classical methods have been used to identify protein complexes on three different yeast PPI networks, and three sets of real complexes including the Munich Information Center for Protein Sequences (MIPS), the Saccharomyces Genome Database (SGD) and the Catalogues of Yeast protein Complexes (CYC2008) are selected as benchmark sets, and the results show that CFOCM is indeed effective and robust for achieving the highest F-measure values in all tests.Entities:
Keywords: clustering; overlapping; protein–protein interaction network
Mesh:
Substances:
Year: 2017 PMID: 28878201 PMCID: PMC5618559 DOI: 10.3390/ijms18091910
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The effect of t, showing how the variation of parameter t affects the performance of our proposed overlapping protein complexes prediction method based on core–attachment structure and function annotations (CFOCM) in terms of F-measure.
Figure 2Comparative performance of CFOCM and the other six methods in DIP data using benchmark MIPS, SGD, CYC2008, respectively.
Results of various approaches using DIP data.
| Algorithms | MCL | MCODE | RNSC | COER | ClusterONE | COACH | CFOCM |
|---|---|---|---|---|---|---|---|
| # complexes | 4838 | 63 | 543 | 592 | 341 | 746 | 748 |
| 305 | 31 | 65 | 78 | 69 | 179 | 205 | |
| 117 | 42 | 96 | 113 | 89 | 134 | 126 | |
| 621 | 39 | 106 | 117 | 112 | 231 | 285 | |
| 262 | 53 | 134 | 138 | 121 | 176 | 168 | |
| 853 | 46 | 134 | 153 | 145 | 311 | 351 | |
| 358 | 55 | 149 | 168 | 132 | 215 | 196 |
Figure 3Comparative performance of CFOCM and the other six methods in Gavin data using benchmarks MIPS, SGD, CYC2008, respectively.
Results of various approaches using Gavin data.
| Algorithms | MCL | MCODE | RNSC | COER | ClusterONE | COACH | CFOCM |
|---|---|---|---|---|---|---|---|
| # complexes | 232 | 69 | 476 | 267 | 292 | 326 | 453 |
| 59 | 31 | 22 | 69 | 65 | 106 | 191 | |
| 96 | 47 | 21 | 98 | 80 | 94 | 91 | |
| 86 | 46 | 53 | 101 | 109 | 130 | 250 | |
| 114 | 61 | 55 | 120 | 121 | 118 | 119 | |
| 115 | 51 | 68 | 130 | 136 | 171 | 305 | |
| 142 | 63 | 79 | 148 | 143 | 135 | 131 |
Figure 4Comparative performance of CFOCM and the other six methods in Srihari data using benchmarks MIPS, SGD, CYC2008, respectively.
Results of various approaches using Srihari data.
| Algorithms | MCL | MCODE | RNSC | COER | ClusterONE | COACH | CFOCM |
|---|---|---|---|---|---|---|---|
| # complexes | 4732 | 88 | 552 | 525 | 773 | 726 | 758 |
| 325 | 26 | 78 | 92 | 117 | 219 | 225 | |
| 168 | 42 | 102 | 111 | 131 | 150 | 152 | |
| 654 | 36 | 108 | 176 | 224 | 299 | 322 | |
| 292 | 44 | 184 | 189 | 217 | 231 | 240 | |
| 846 | 46 | 138 | 218 | 275 | 397 | 452 | |
| 362 | 57 | 154 | 236 | 272 | 281 | 290 |
Results of CFOCM and CFOCM without using Gene Ontology (GO) (unCFOCM) on DIP data.
| Algorithms + Benchmark | # Complexes |
|
| Precision | Recall | F-Measure |
|---|---|---|---|---|---|---|
| CFOCM + MIPS | 748 | 205 | 126 | 0.2741 | 0.6207 | 0.3802 |
| unCFOCM + MIPS | 862 | 213 | 130 | 0.2471 | 0.6404 | 0.3566 |
| CFOCM + SGD | 748 | 285 | 168 | 0.381 | 0.5201 | 0.4398 |
| unCFOCM + SGD | 862 | 297 | 175 | 0.3445 | 0.5418 | 0.4212 |
| CFOCM + CYC2008 | 748 | 351 | 196 | 0.4693 | 0.4804 | 0.4748 |
| unCFOCM + CYC2008 | 862 | 363 | 201 | 0.4211 | 0.4926 | 0.4541 |
Results of CFOCM and CFOCM without using Gene Ontology (GO) (unCFOCM) on Gavin data.
| Algorithms + Benchmark | # Complexes |
|
| Precision | Recall | F-Measure |
|---|---|---|---|---|---|---|
| CFOCM + MIPS | 453 | 191 | 91 | 0.4216 | 0.4483 | 0.4345 |
| unCFOCM + MIPS | 551 | 197 | 92 | 0.3575 | 0.4532 | 0.3997 |
| CFOCM + SGD | 453 | 250 | 119 | 0.5519 | 0.3684 | 0.4419 |
| unCFOCM + SGD | 551 | 262 | 124 | 0.4755 | 0.3839 | 0.4248 |
| CFOCM + CYC2008 | 453 | 305 | 131 | 0.6733 | 0.3211 | 0.4348 |
| unCFOCM + CYC2008 | 551 | 321 | 138 | 0.5826 | 0.3382 | 0.4280 |
Figure 5The Glycine decarboxylase complex and the RNA polymerase I complex as detected by CFOCM. The yellow nodes represent proteins within the complex core, while the blue node proteins represent proteins that are attachments.
Figure 6The diagram of Merge_Similar_Cores algorithm. In the example, (A) is the family graph of clique {A,B,C}, including cliques {{A,B,C},{A,B,D},{A,B,H},{A,C,E},{B,C,F},{B,C,G}}, and the proteins set is {A,B,C,D,E,F,G,H}. In (B), the common Gene Ontology (GO) item is GO:02, and reserve vertex E as , while drop vertex F is . In (C), drop vertex G is . In (D), drop vertex H is , and returns the next candidate-core A,B,C,D,E, as no remove operation can improve the .
Three protein-protein interaction (PPI) networks used in the experiments.
| Dataset | #Proteins | #Interactions | Average Node Degree |
|---|---|---|---|
| DIP | 4930 | 17203 | 6.98 |
| Gavin | 1430 | 6531 | 9.13 |
| Srihari | 3680 | 20,000 | 10.87 |