| Literature DB >> 28361714 |
Abstract
BACKGROUND: Intuitively, proteins in the same protein complexes should highly interact with each other but rarely interact with the other proteins in protein-protein interaction (PPI) networks. Surprisingly, many existing computational algorithms do not directly detect protein complexes based on both of these topological properties. Most of them, depending on mathematical definitions of either "modularity" or "conductance", have their own limitations: Modularity has the inherent resolution problem ignoring small protein complexes; and conductance characterizes the separability of complexes but fails to capture the interaction density within complexes.Entities:
Keywords: Dense subnetwork; Low conductance set; Mixed integer programming; Protein complex identification
Mesh:
Year: 2017 PMID: 28361714 PMCID: PMC5475323 DOI: 10.1186/s12918-017-0405-5
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
The detailed information of four yeast PPI networks and the numbers of covered SGD and MIPS reference complexes
| Network | #. proteins | #. interactions | SGD | MIPS |
|---|---|---|---|---|
| SceDIP | 5136 | 22491 | 224 | 184 |
| SceBG | 6438 | 80577 | 234 | 189 |
| SceIntAct | 5453 | 54134 | 231 | 187 |
| SceMINT | 5414 | 27316 | 230 | 188 |
Fig. 1Comparison of all competing algorithms by SGD reference dataset in terms of the composite scores. Shades of the same colorindicate different evaluating scores. Each bar height reflects the value of the composite score
Fig. 2Comparison of all competing algorithms by MIPS reference dataset in terms of the composite scores. Shades of the same color indicate different evaluating scores. Each bar height reflects the value of the composite score
Comparison of protein complex prediction by SGD reference dataset
| Network | Method | # complex | #. matched | coverage | Recall | Precision | F-measure | Sn | PPV | Acc | MMR |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SceDIP | FLCD | 2134 |
| 3921 | 0.6786 | 0.2020 |
| 0.5964 | 0.5003 | 0.5462 |
|
| CONE | 380 | 86 | 1503 | 0.3839 | 0.2579 | 0.3085 | 0.4082 | 0.6203 | 0.5032 | 0.1950 | |
| LinkC | 1839 | 137 | 3735 | 0.6116 | 0.1289 | 0.2130 | 0.6290 | 0.4820 |
| 0.3276 | |
| SR-MCL | 3216 | 44 | 4678 | 0.2228 | 0.0221 | 0.0412 | 0.5120 | 0.2893 | 0.3489 | 0.0708 | |
| SceBG | FLCD | 4027 |
| 5836 | 0.7821 | 0.2000 | 0.3181 | 0.7363 | 0.5621 |
|
|
| CONE | 522 | 122 | 2735 | 0.5214 | 0.2433 |
| 0.6488 | 0.6035 | 0.6257 | 0.2542 | |
| LinkC | 5382 | 164 | 6076 | 0.7008 | 0.1217 | 0.2072 | 0.8880 | 0.4373 | 0.6231 | 0.4100 | |
| SR-MCL | 1862 | 108 | 5889 | 0.4615 | 0.1245 | 0.1961 | 0.8999 | 0.3034 | 0.5225 | 0.2151 | |
| SceIntAct | FLCD | 3394 |
| 4678 | 0.7446 | 0.1933 | 0.3069 | 0.6699 | 0.5391 |
|
|
| CONE | 496 | 117 | 1994 | 0.5065 | 0.2419 |
| 0.5742 | 0.5944 | 0.5842 | 0.2742 | |
| LinkC | 1297 | 93 | 5290 | 0.4026 | 0.0941 | 0.1525 | 0.9223 | 0.2393 | 0.4698 | 0.2285 | |
| SR-MCL | 1079 | 68 | 5342 | 0.2294 | 0.0437 | 0.1517 | 0.7784 | 0.2402 | 0.4341 | 0.1213 | |
| SceMINT | FLCD | 2483 |
| 4210 | 0.6826 | 0.2280 |
| 0.6524 | 0.5284 | 0.5871 |
|
| CONE | 513 | 110 | 2335 | 0.4783 | 0.2027 | 0.2848 | 0.5370 | 0.5954 | 0.5654 | 0.2442 | |
| LinkC | 2201 | 144 | 4068 | 0.6261 | 0.1595 | 0.2542 | 0.6757 | 0.5540 |
| 0.3743 | |
| SR-MCL | 3698 | 33 | 4976 | 0.1435 | 0.0169 | 0.0302 | 0.5013 | 0.2597 | 0.3608 | 0.0609 |
CONE and LinkC are short for ClusterONE and LinkComm, respectively
Bold values denote the best scores corresponding to specific criteria
Comparison of protein complex prediction by MIPS reference dataset
| Network | Method | # complex | #. matched | Coverage | Recall | Precision | F-measure | Sn | PPV | Acc | MMR |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SceDIP | FLCD | 2134 |
| 3921 | 0.6522 | 0.1603 |
| 0.4001 | 0.3901 | 0.3951 |
|
| CONE | 380 | 74 | 1503 | 0.4022 | 0.1868 | 0.2551 | 0.2749 | 0.4015 | 0.3322 | 0.1533 | |
| LinkC | 1839 | 109 | 3735 | 0.5924 | 0.1104 | 0.1862 | 0.4775 | 0.3646 |
| 0.2993 | |
| SR-MCL | 2851 | 41 | 4687 | 0.1964 | 0.0230 | 0.0402 | 0.4592 | 0.2104 | 0.3108 | 0.0726 | |
| SceBG | FLCD | 4027 |
| 5836 | 0.6561 | 0.1393 |
| 0.4643 | 0.4315 | 0.4476 |
|
| CONE | 522 | 86 | 2735 | 0.4450 | 0.1533 | 0.2293 | 0.4537 | 0.4452 | 0.4494 | 0.1795 | |
| LinkC | 5382 | 109 | 6076 | 0.6349 | 0.0918 | 0.1604 | 0.8179 | 0.3504 |
| 0.3285 | |
| SR-MCL | 1862 | 65 | 5889 | 0.3439 | 0.0673 | 0.1126 | 0.7360 | 0.2436 | 0.4234 | 0.1384 | |
| SceIntAct | FLCD | 3394 |
| 4678 | 0.6417 | 0.1452 |
| 0.4183 | 0.4034 | 0.4108 |
|
| CONE | 496 | 79 | 1994 | 0.4225 | 0.1633 | 0.2356 | 0.3587 | 0.4296 | 0.3925 | 0.1927 | |
| LinkC | 1297 | 80 | 5290 | 0.4278 | 0.0732 | 0.1251 | 0.9028 | 0.1986 |
| 0.1886 | |
| SR-MCL | 1079 | 45 | 5342 | 0.1337 | 0.0190 | 0.0941 | 0.6246 | 0.1850 | 0.3399 | 0.0960 | |
| SceMINT | FLCD | 2483 |
| 4210 | 0.5904 | 0.1800 |
| 0.4147 | 0.4086 | 0.4116 |
|
| CONE | 513 | 67 | 2335 | 0.3564 | 0.1267 | 0.1869 | 0.3274 | 0.4017 | 0.3626 | 0.1519 | |
| LinkC | 2201 | 100 | 4068 | 0.5319 | 0.1040 | 0.1740 | 0.4744 | 0.4038 |
| 0.2744 | |
| SR-MCL | 3698 | 24 | 4976 | 0.1277 | 0.0112 | 0.0205 | 0.4192 | 0.1999 | 0.2894 | 0.0481 |
CONE and LinkC are short for ClusterONE and LinkComm, respectively
Bold values denote the best scores corresponding to specific criteria
Comparison by GO enrichment analysis
| Network | Method | # complex | % enriched | # GO | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SceDIP | FLCD | 2134 |
|
| |||||||
| CONE | 380 | 71.8 | 852 | ||||||||
| LinkC | 1839 | 67.4 | 1273 | ||||||||
| SR-MCL | 2851 | 23.5 | 957 | ||||||||
| SceBG | FLCD | 4027 | 72.4 |
| |||||||
| CONE | 522 |
| 1282 | ||||||||
| LinkC | 5382 | 39.8 | 1554 | ||||||||
| SR-MCL | 1862 | 56.4 | 1702 | ||||||||
| SceIntAct | FLCD | 3394 | 62.4 |
| |||||||
| CONE | 496 |
| 1031 | ||||||||
| LinkC | 1297 | 46.5 | 1129 | ||||||||
| SR-MCL | 1079 | 44.7 | 888 | ||||||||
| SceMINT | FLCD | 2483 |
|
| |||||||
| CONE | 513 | 59.4 | 954 | ||||||||
| LinkC | 2201 | 32.1 | 1123 | ||||||||
| SR-MCL | 3698 | 19.7 | 856 |
“% enriched” presents the percentage of complexes that are enriched with at least one GO term. “# GO” denotes the number of enriched GO terms
Bold values denote the best scores corresponding to specific criteria
Fig. 3Statistical significance of the predicted complexes of all competing algorithms
Fig. 4Illustrations of predicted complexes in SceBG network. to are Smc5-Smc6 complexes predicted by FLCD, ClusterONE, LinkComm, and SR-MCL, respectively. Nodes in blue are proteins in the reference Smc5-Smc6 complex and nodes in white are proteins outside the reference Smc5-Smc6 complex. Nodes in yellow are proteins failed to be detected by the corresponding algorithms. to are RNase complexes predicted by FLCD, ClusterONE, LinkComm, and SR-MCL, respectively. Nodes in red are proteins in the reference RNase complex and nodes in white are proteins outside the reference RNase complex
Fig. 5A motivating example for FLCD. Red dotted lines mark the complexes detected based on conductance minimization. Blue dashed lines mark the complexes predicted by our FLCD algorithm. Nodes with green border lines are removed by FLCD due to the lack of dense interactions
The FLCD algorithm
|
|
|---|
|
|
|
|
| 1 While ( |
| 2 Estimate |
| 3 Sort nodes in |
| 4 Finding the lowest-conductance set |
| 5 Identifying the node set |
| 6 Considering |
| 7 EndWhile |
| 8 Remove duplicated complexes and complexes with size smaller than three in |