| Literature DB >> 28737728 |
Abstract
Protein complexes play significant roles in cellular processes. Identifying protein complexes from protein-protein interaction (PPI) networks is an effective strategy to understand biological processes and cellular functions. A number of methods have recently been proposed to detect protein complexes. However, most of methods predict protein complexes from static PPI networks, and usually overlook the inherent dynamics and topological properties of protein complexes. In this paper, we proposed a novel method, called NABCAM (Neighbor Affinity-Based Core-Attachment Method), to identify protein complexes from dynamic PPI networks. Firstly, the centrality score of every protein is calculated. The proteins with the highest centrality scores are regarded as the seed proteins. Secondly, the seed proteins are expanded to complex cores by calculating the similarity values between the seed proteins and their neighboring proteins. Thirdly, the attachments are appended to their corresponding protein complex cores by comparing the affinity among neighbors inside the core, against that outside the core. Finally, filtering processes are carried out to obtain the final clustering result. The result in the DIP database shows that the NABCAM algorithm can predict protein complexes effectively in comparison with other state-of-the-art methods. Moreover, many protein complexes predicted by our method are biologically significant.Entities:
Keywords: core-attachment; neighbor affinity; protein complexes; protein-protein interaction (PPI) network
Mesh:
Substances:
Year: 2017 PMID: 28737728 PMCID: PMC6151993 DOI: 10.3390/molecules22071223
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Dynamic protein-protein interaction (PPI) networks construction: (1) the subnet of time point 1; (2) the subnet of time point 2; (3) the subnet of time point 3; (4) the subnet of time point 4.
Figure 2A formation process of attachment: these proteins inside the black circle constitute a complex core; the yellow protein represents a candidate neighbor protein of complex core; the blue proteins represent neighbors inside core; the green proteins represent neighbors outside core.
Figure 3The formation process of a protein complex: (a) the red protein represents the seed protein; (b) the green proteins represent neighbor proteins of the seed protein; (c) these proteins inside the green dotted circle constitute a complex core; (d) the blue proteins represent neighbor proteins of the core; (e) the proteins inside the blue dotted circle constitute a protein complex.
Figure 4The description of the Neighbor Affinity-Based Core-Attachment Method (NABCAM) algorithm.
The number of proteins and interactions in each subnet of different PPI networks.
| Data | Timestamp | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DIP | Proteins | 797 | 941 | 796 | 623 | 601 | 530 | 493 | 944 | 1090 | 592 | 661 | 461 |
| Interactions | 981 | 1444 | 1188 | 745 | 750 | 646 | 573 | 1705 | 2185 | 856 | 974 | 526 | |
| MIPS | Proteins | 737 | 897 | 781 | 583 | 570 | 531 | 470 | 839 | 1014 | 523 | 616 | 402 |
| Interactions | 1097 | 1443 | 1183 | 754 | 684 | 642 | 504 | 1238 | 1637 | 878 | 1207 | 700 | |
| Krogan | Proteins | 336 | 379 | 320 | 256 | 206 | 189 | 202 | 580 | 626 | 304 | 330 | 250 |
| Interactions | 334 | 464 | 331 | 234 | 210 | 184 | 213 | 1025 | 1081 | 314 | 373 | 258 |
Figure 5Visualization of a protein complex: (a) standard complex; (b) identified complex: the yellow protein represents the wrong protein; the blue proteins represent correct proteins.
Figure 6Precision, recall and f-measure values of various algorithms on the DIP dataset.
Figure 7Precision, recall and f-measure values of various algorithms on the MIPS dataset.
Figure 8Precision, recall and f-measure values of various algorithms on the Krogan dataset.
Functional enrichment analysis of complexes detected on different datasets.
| Data | Algorithm | PC | <10−15 | [10−15, 10−10) | [10−10, 10−5) | [10−5, 0.01) | ≥0.01 |
|---|---|---|---|---|---|---|---|
| DIP | NABCAM | 1702 | 136 (7.99%) | 230 (13.51%) | 820 (48.18%) | 343 (20.15%) | 173 (10.16%) |
| MCL | 1053 | 19 (1.80%) | 47 (4.46%) | 183 (17.38%) | 362 (34.38%) | 442 (41.98%) | |
| CORE | 344 | 1 (0.29%) | 3 (0.87%) | 78 (22.67%) | 114 (33.14%) | 148 (43.02%) | |
| ClusterOne | 574 | 21 (3.66%) | 52 (9.06%) | 177 (30.84%) | 184 (32.06%) | 140 (24.39%) | |
| MIPS | NABCAM | 966 | 30 (3.10%) | 70 (7.25%) | 332 (34.37%) | 333 (34.47%) | 201 (20.81%) |
| MCL | 606 | 5 (0.83%) | 13 (2.15%) | 94 (15.51%) | 220 (36.30%) | 274 (45.21%) | |
| CORE | 340 | 0 (0.00%) | 4 (1.18%) | 65 (19.12%) | 107 (31.47%) | 164 (48.24%) | |
| ClusterOne | 372 | 7 (1.88%) | 16 (4.30%) | 117 (31.45%) | 126 (33.87%) | 106 (28.49%) | |
| Krogan | NABCAM | 587 | 75 (12.78%) | 75 (12.78%) | 304 (51.79%) | 108 (18.39%) | 25 (4.26%) |
| MCL | 403 | 16 (3.97%) | 43 (10.67%) | 103 (25.56%) | 119 (29.53%) | 122 (30.27%) | |
| CORE | 255 | 3 (1.18%) | 10 (3.92%) | 60 (23.53%) | 102 (40.00%) | 80 (31.37%) | |
| ClusterOne | 399 | 13 (3.26%) | 43 (10.78%) | 98 (24.56%) | 120(30.08%) | 125 (31.33%) |
Predicted protein complexes with small p-values on different datasets.
| Data | ID | Cluster Frequency | Gene Ontology Term | |
|---|---|---|---|---|
| DIP | 1 | 2.45 × 10−47 | 30 out of 34 genes, 88.2% | ribosomal small subunit biogenesis |
| 2 | 4.48 × 10−38 | 22 out of 23 genes, 95.7% | mRNA splicing, via spliceosome | |
| 3 | 1.41 × 10−37 | 21 out of 21 genes, 100.0% | mRNA splicing, via spliceosome | |
| 4 | 3.88 × 10−36 | 22 out of 23 genes, 95.7% | ribosomal small subunit biogenesis | |
| 5 | 1.45 × 10−33 | 12 out of 12 genes, 100.0% | polyadenylation-dependent snoRNA 3′-end processing | |
| MIPS | 1 | 9.62 × 10−27 | 16 out of 18 genes, 88.9% | ribosomal large subunit biogenesis |
| 2 | 1.15 × 10−25 | 18 out of 25 genes, 72.0% | mitotic sister chromatid segregation | |
| 3 | 1.15 × 10−23 | 16 out of 17 genes, 94.1% | mitotic nuclear division | |
| 4 | 4.02 × 10−23 | 14 out of 16 genes, 87.5% | ribosomal large subunit biogenesis | |
| 5 | 2.69 × 10−22 | 16 out of 23 genes, 69.6% | mitotic sister chromatid segregation | |
| Krogan | 1 | 1.36 × 10−34 | 17 out of 18 genes, 94.4% | ncRNA transcription |
| 2 | 3.69 × 10−34 | 13 out of 15 genes, 86.7% | tRNA catabolic process | |
| 3 | 2.49 × 10−33 | 13 out of 14 genes, 92.9% | chromatin disassembly | |
| 4 | 3.17 × 10−32 | 13 out of 16 genes, 81.2% | exonucleolytic trimming to generate mature 3′-end of 5.8S rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA) | |
| 5 | 5.79 × 10−32 | 18 out of 18 genes, 100.0% | mRNA splicing, via spliceosome |