| Literature DB >> 25349902 |
Feng Yu, Zhi Yang, Nan Tang, Hong Lin, Jian Wang, Zhi Yang.
Abstract
BACKGROUND: Protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, making it possible to predict protein complexes from protein -protein interaction networks. However, most of current methods are unsupervised learning based methods which can't utilize the information of the large amount of available known complexes.Entities:
Mesh:
Year: 2014 PMID: 25349902 PMCID: PMC4243764 DOI: 10.1186/1752-0509-8-S3-S4
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Protein complex detection algorithm
Feature distribution
| Features of the unweighted network | Features of the weighted network | ||||
|---|---|---|---|---|---|
| 1 | Graph density | 1 | 1 | Graph density | 1 |
| 2 | Degree statistics | 2 | 2 | Degree statistics | 2 |
| 4 | Clustering coefficient | 1 | 3 | Edge weight statistics | 1 |
| - | - | - | 5 | Topological change | 2 |
Figure 1Size distribution of the positive, intermediate and negative samples. Horizontal axis denotes the complex size and vertical axis denotes the number of complex samples with certain sizes.
Performance comparison between different Regression models (clique_size ≥ 3)
| merg_thred | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Model100 | 0.5244 | 0.5429 | 0.5467 | 0.5518 | 0.5628 | 0.5656 | 0.5791 | 0.5806 | 0.5719 |
| Model200 | 0.5060 | 0.5380 | 0.5542 | 0.5525 | 0.5577 | 0.5586 | 0.5798 | 0.5819 | 0.5748 |
| Model300 | 0.5070 | 0.5353 | 0.5451 | 0.5543 | 0.5637 | 0.5696 | 0.5887 | 0.5866 | 0.5783 |
| Model400 | 0.5058 | 0.5414 | 0.5427 | 0.5578 | 0.5660 | 0.5730 | 0.5880 | 0.5904 | 0.5808 |
| Model500 | 0.5115 | 0.5462 | 0.5465 | 0.5596 | 0.5710 | 0.5747 | 0.5896 | 0.5783 | |
| Model600 | 0.4942 | 0.5368 | 0.5380 | 0.5460 | 0.5544 | 0.5667 | 0.5819 | 0.5831 | 0.5741 |
| Model700 | 0.4995 | 0.5378 | 0.5417 | 0.5408 | 0.5498 | 0.5617 | 0.5713 | 0.5737 | 0.5692 |
| Model800 | 0.5042 | 0.5331 | 0.5428 | 0.5402 | 0.5470 | 0.5544 | 0.5632 | 0.5676 | 0.5660 |
| Model900 | 0.4962 | 0.5251 | 0.5338 | 0.5320 | 0.5355 | 0.5513 | 0.5592 | 0.5638 | 0.5627 |
| Model1000 | 0.4966 | 0.5282 | 0.5339 | 0.5323 | 0.5322 | 0.5491 | 0.5540 | 0.5604 | 0.5598 |
| Model1500 | 0.4936 | 0.5243 | 0.5273 | 0.5240 | 0.5278 | 0.5435 | 0.5563 | 0.5605 | 0.5597 |
| Model2000 | 0.4944 | 0.5126 | 0.5185 | 0.5221 | 0.5262 | 0.5344 | 0.5487 | 0.5516 | 0.5513 |
| Model2500 | 0.4813 | 0.5129 | 0.5183 | 0.5215 | 0.5239 | 0.5292 | 0.5418 | 0.5431 | 0.5426 |
| Model3000 | 0.4849 | 0.5142 | 0.5210 | 0.5192 | 0.5233 | 0.5297 | 0.5406 | 0.5409 | 0.5401 |
| Model3500 | 0.4749 | 0.5161 | 0.5247 | 0.5215 | 0.5231 | 0.5299 | 0.5391 | 0.5390 | 0.5379 |
| Model4000 | 0.4789 | 0.5143 | 0.5238 | 0.5157 | 0.5200 | 0.5257 | 0.5368 | 0.5368 | 0.5357 |
Figure 2F-measure curve of different models. Horizontal axis denotes the merg_thred and vertical axis denotes the F-measure.
Figure 3F-measure and accuracy curves of two-category and three-category training sets. Horizontal axis denotes the merg_thred and vertical axis denotes the F-measure and accuracy.
Performance comparison of different clique_sizes and merg_threds
| Clique_size ≥ 4 | Clique_size ≥ 3 | |||||||
|---|---|---|---|---|---|---|---|---|
| 0.1 | 96 | 0.7813 | 0.2635 | 0.3941 | 212 | 0.6792 | 0.4102 | 0.5115 |
| 0.2 | 113 | 0.7876 | 0.2934 | 0.4275 | 286 | 0.6434 | 0.4746 | 0.5462 |
| 0.3 | 125 | 0.76 | 0.3009 | 0.4311 | 333 | 0.6186 | 0.4895 | 0.5465 |
| 0.4 | 142 | 0.7465 | 0.3084 | 0.4365 | 388 | 0.6237 | 0.5075 | 0.5596 |
| 0.5 | 171 | 0.7427 | 0.3129 | 0.4403 | 491 | 0.6273 | 0.5240 | 0.5710 |
| 0.6 | 203 | 0.7340 | 0.3159 | 0.4417 | 624 | 0.6298 | 0.5284 | 0.5747 |
| 0.7 | 248 | 0.7540 | 0.3189 | 0.4482 | 755 | 0.6530 | 0.5374 | 0.5896 |
| 0.8 | 303 | 0.7294 | 0.3234 | 0.4481 | 880 | 0.6477 | 0.5434 | |
| 0.9 | 368 | 0.6440 | 0.3249 | 0.4319 | 945 | 0.6180 | 0.5434 | 0.5783 |
Num denotes the number of the predicted complex, P the precision, R the recall and F the F-measure.
Experimental results of three different feature sets
| Unweighted feature set | Weighted feature set | All feature set | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.1 | 169 | 0.5621 | 0.3009 | 0.3920 | 205 | 0.6537 | 0.4042 | 0.4995 | 212 | 0.6792 | 0.4102 | 0.5115 |
| 0.2 | 242 | 0.4876 | 0.3443 | 0.4036 | 266 | 0.6353 | 0.4536 | 0.5293 | 286 | 0.6434 | 0.4746 | 0.5462 |
| 0.3 | 309 | 0.4434 | 0.3757 | 0.4068 | 301 | 0.6146 | 0.4790 | 0.5384 | 333 | 0.6186 | 0.4895 | 0.5465 |
| 0.4 | 389 | 0.4216 | 0.4042 | 0.4127 | 350 | 0.62 | 0.4910 | 0.5480 | 388 | 0.6237 | 0.5075 | 0.5596 |
| 0.5 | 476 | 0.4202 | 0.4132 | 0.4166 | 443 | 0.6163 | 0.5090 | 0.5575 | 491 | 0.6273 | 0.5240 | 0.5710 |
| 0.6 | 576 | 0.4184 | 0.4266 | 0.4225 | 572 | 0.6189 | 0.5165 | 0.5631 | 624 | 0.6298 | 0.5284 | 0.5747 |
| 0.7 | 703 | 0.4296 | 0.4371 | 0.4333 | 717 | 0.6374 | 0.5329 | 0.5805 | 755 | 0.6530 | 0.5374 | 0.5896 |
| 0.8 | 815 | 0.4331 | 0.4431 | 834 | 0.6319 | 0.5389 | 880 | 0.6477 | 0.5434 | |||
| 0.9 | 910 | 0.4286 | 0.4446 | 0.4364 | 930 | 0.6108 | 0.5404 | 0.5734 | 945 | 0.6180 | 0.5434 | 0.5783 |
Two feature rank lists
| Rank | Feature | Weight | Feature | F |
|---|---|---|---|---|
| 1 | Mean edge weight of the weighted network | 0.1529 | 0.5697 | |
| 2 | Density of unweighted network | 0.0964 | Mean edge weight of the weighted network | 0.5709 |
| 3 | 0.0243 | 0.5768 | ||
| 4 | Mean of the unweighted clustering coefficient | 0.0198 | 0.5773 | |
| 5 | Topological change 7 | 0.0051 | Density of the unweighted network | 0.5884 |
| 6 | Mean degree of the unweighted network | 0.0047 | Mean degree of the unweighted network | 0.5884 |
| 7 | Topological change 5 | 0.0044 | Medium degree of the unweighted network | 0.5884 |
| 8 | 0.0040 | Topological change 7 | 0.5889 | |
| 9 | Medium degree of the unweighted network | 0.0027 | Mean of the unweighted clustering coefficient | 0.5896 |
| 10 | 0.0008 | Topological change 5 | 0.5905 |
Performance comparison between the Equal Weight (EW) and Regression models (RM)
| merg_ | model | Num | Ncp | P | Ncb | R | F | Sn | PPV | Acc |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.1 | EW | 167 | 94 | 0.5629 | 182 | 0.2725 | 0.3672 | 0.6075 | 0.6550 | 0.6308 |
| RM | 212 | 144 | 0.6792 | 274 | 0.4102 | 0.5115 | 0.3473 | 0.7849 | 0.5221 | |
| 0.2 | EW | 227 | 109 | 0.4802 | 217 | 0.3249 | 0.3875 | 0.6092 | 0.6635 | 0.6358 |
| RM | 286 | 184 | 0.6434 | 317 | 0.4746 | 0.5462 | 0.4054 | 0.7749 | 0.5605 | |
| 0.3 | EW | 289 | 127 | 0.4394 | 234 | 0.3503 | 0.3898 | 0.6190 | 0.6329 | 0.6259 |
| RM | 333 | 206 | 0.6186 | 327 | 0.4895 | 0.5465 | 0.4242 | 0.7713 | 0.5720 | |
| 0.4 | EW | 336 | 151 | 0.4494 | 254 | 0.3802 | 0.4119 | 0.6256 | 0.6393 | 0.6324 |
| RM | 388 | 242 | 0.6237 | 339 | 0.5075 | 0.5596 | 0.4466 | 0.7557 | 0.5809 | |
| 0.5 | EW | 405 | 188 | 0.4642 | 269 | 0.4027 | 0.4313 | 0.6229 | 0.6387 | 0.6307 |
| RM | 491 | 308 | 0.6273 | 350 | 0.5240 | 0.5710 | 0.4649 | 0.7317 | 0.5832 | |
| 0.6 | EW | 475 | 221 | 0.4653 | 279 | 0.4177 | 0.4402 | 0.6197 | 0.6328 | 0.6262 |
| RM | 624 | 393 | 0.6298 | 353 | 0.5284 | 0.5747 | 0.4779 | 0.7335 | 0.5920 | |
| 0.7 | EW | 568 | 261 | 0.4595 | 285 | 0.4266 | 0.4425 | 0.6201 | 0.6353 | 0.6277 |
| RM | 755 | 493 | 0.6530 | 359 | 0.5374 | 0.5896 | 0.4908 | 0.7370 | 0.6014 | |
| 0.8 | EW | 674 | 298 | 0.4421 | 288 | 0.4311 | 0.6203 | 0.6362 | 0.6282 | |
| RM | 880 | 570 | 0.6477 | 363 | 0.5434 | 0.4938 | 0.7363 | 0.6030 | ||
| 0.9 | EW | 819 | 348 | 0.4249 | 288 | 0.4311 | 0.4280 | 0.6195 | 0.6395 | 0.6294 |
| RM | 945 | 584 | 0.6180 | 363 | 0.5434 | 0.5783 | 0.4942 | 0.7350 | 0.6027 |
Ncp denotes the number of the correct predictions that match at least a true complex and Ncb denotes the number of the true complexes that match at least one predicted complex.
Performance comparison with MCODE, COACH, CMC and ClusterONE on four PIN datasets
| Dataset | Method | Num | P | R | F | Sn | PPV | Acc |
|---|---|---|---|---|---|---|---|---|
| DIP | MCODE | 79 | 0.5570 | 0.1332 | 0.2150 | 0.2758 | 0.6880 | 0.4356 |
| COACH | 747 | 0.4351 | 0.5195 | 0.4735 | 0.4779 | 0.6921 | 0.5751 | |
| CMC | 262 | 0.5687 | 0.4102 | 0.4766 | 0.4791 | 0.7241 | 0.5890 | |
| ClusterONE | 354 | 0.5113 | 0.4072 | 0.4533 | 0.3903 | 0.7124 | 0.5273 | |
| Ours | 613 | 0.6232 | 0.5269 | 0.5710 | 0.4764 | 0.7375 | 0.5927 | |
| Gavin | MCODE | 78 | 0.8718 | 0.2809 | 0.4249 | 0.4174 | 0.7017 | 0.5412 |
| COACH | 326 | 0.7393 | 0.6086 | 0.6676 | 0.6277 | 0.7162 | 0.6705 | |
| CMC | 202 | 0.7228 | 0.4176 | 0.5294 | 0.3817 | 0.7067 | 0.5194 | |
| ClusterONE | 200 | 0.8050 | 0.5693 | 0.6669 | 0.6211 | 0.7048 | 0.6617 | |
| Ours | 275 | 0.8145 | 0.5730 | 0.6728 | 0.5083 | 0.7526 | 0.6185 | |
| Krogan | MCODE | 63 | 0.6349 | 0.1544 | 0.2484 | 0.4439 | 0.4865 | 0.4642 |
| COACH | 570 | 0.4439 | 0.4865 | 0.4642 | 0.502 | 0.6575 | 0.5745 | |
| CMC | 242 | 0.5909 | 0.3555 | 0.4439 | 0.3263 | 0.7215 | 0.4852 | |
| ClusterONE | 258 | 0.5349 | 0.4381 | 0.4817 | 0.4865 | 0.7567 | 0.6067 | |
| Ours | 465 | 0.5591 | 0.4955 | 0.5254 | 0.4944 | 0.7189 | 0.5962 | |
| Collins | MCODE | 111 | 0.8468 | 0.431 | 0.5713 | 0.5438 | 0.7600 | 0.6429 |
| COACH | 251 | 0.6972 | 0.5651 | 0.6243 | 0.6275 | 0.7931 | 0.7054 | |
| CMC | 172 | 0.6919 | 0.4234 | 0.5253 | 0.4882 | 0.7336 | 0.5985 | |
| ClusterONE | 180 | 0.8222 | 0.5958 | 0.6909 | 0.6526 | 0.7275 | 0.6891 | |
| Ours | 150 | 0.8133 | 0.5096 | 0.6266 | 0.6338 | 0.7431 | 0.6863 |
Details of four PIN datasets
| Dataset | #original | #original interactions | #remained | #remained |
|---|---|---|---|---|
| DIP | 4928 | 17201 | 3449 | 11081 |
| Gavin | 1430 | 6531 | 1304 | 5941 |
| Krogan | 3581 | 14076 | 2270 | 9218 |
| Collins | 1622 | 9074 | 1513 | 8949 |
Performance comparison with Qi et al.'s method
| MIPS(training set) | TAP06(training set) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TAP06(testing set) | MIPS(testing set) | |||||||||||
| Ours | 271 | 115 | 0.424 | 65 | 0.433 | 262 | 128 | 0.489 | 105 | 0.525 | ||
| SCI-BN | 0.312 | 0.489 | 0.381 | 0.219 | 0.537 | 0.312 | ||||||
| SCI-SVM | 0.247 | 0.377 | 0.298 | 0.176 | 0.379 | 0.24 | ||||||
| MCODE | 45 | 19 | 0.422 | 19 | 0.127 | 0.195 | 45 | 18 | 0.4 | 20 | 01 | 0.160 |
| ClusterOne | 173 | 83 | 0.480 | 69 | 0.46 | 173 | 74 | 0.428 | 87 | 0.435 | 0.431 | |
| COACH | 294 | 114 | 0.387 | 80 | 0.533 | 0.449 | 294 | 107 | 0.364 | 99 | 0.495 | 0.419 |
| CMC | 161 | 72 | 0.447 | 53 | 0.353 | 0.395 | 161 | 74 | 0.460 | 76 | 0.380 | 0.416 |
| Ours1 | 274 | 119 | 0.434 | 69 | 0.46 | 270 | 142 | 0.526 | 102 | 0.51 | ||
Ten predicted complexes with low p-values that match the true complexes
| ID | Complex | Match_score | p-value | ||
|---|---|---|---|---|---|
| GO_Process | GO_Function | GO_Component | |||
| 1 | YLR148W YMR231W YLR396C | 0.83 | 6.72e-14 | 2.90e-10 | 1.10e-15 |
| 2 | YMR224C YNL250W YDR369C | 1.0 | 2.90e-07 | 3.58e-10 | 2.93e-10 |
| 3 | YLR438C-A YNL147W YBL026W YMR268C YER112W YDR378C | 1.0 | 1.57e-13 | 1.10e-07 | 1.51e-24 |
| 4 | YPR162C YJL194W YLL004W YNL261W YBR060C YHR118C YML065W | 0.86 | 3.38e-17 | 2.43e-16 | 1.15e-18 |
| 5 | YDL225W YLR314C YHR033W YCR002C YJR076C YHR107C | 0.83 | 4.76e-07 | 1.73e-10 | 2.39e-13 |
| 6 | YMR047C YLR335W YLR347C YAR002W YNL189W | 0.8 | 2.51e-09 | 4.23e-06 | 1.92e-03 |
| 7 | YBL016W YDR103W YLR362W YDL159W YGR040W | 1.0 | 1.21e-10 | 6.83e-06 | 4.60e-04 |
| 8 | YML056C YIL079C YJL050W YOL115W YDL175C | 0.8 | 5.18e-11 | 1.62e-08 | 3.50e-11 |
| 9 | YPL083C YMR059W YAR008W YLR105C | 1.0 | 2.27e-13 | 9.09e-14 | 2.54e-13 |
| 10 | YKL166C YJL164C YPL203W YIL033C | 1.0 | 3.70e-08 | 5.21e-10 | 8.47e-10 |
Four predicted complexes with low p-values that don't match the true complexes
| ID | Complex | Match_score | p-value | ||
|---|---|---|---|---|---|
| GO_Process | GO_Function | GO_Component | |||
| A | YPL149W YBR217W YMR159C | 0.0 | 2.41e-08 | 1.95e-10 | 2.28e-10 |
| B | YNL214W YDR244W YLR191W YDR142C YGL153W | 0.0 | 2.92e-16 | 9.30e-03 | 1.49e-09 |
| C | YMR047C YJR042W YKL068W YDR192C YDL116W YGR119C YKL057C YKR082W YLR335W YGL092W YER165W YGL172W YAR002W | 0.21 | 1.57e-23 | 8.46e-26 | 2.92e-06 |
| D | YMR047C YJR042W YKL068W YGR218W YDR192C YDL116W YGR119C YLR335W YKR082W YKL057C YGL092W YGL172W | 0.22 | 3.39e-18 | 1.78e-23 | 1.65e-09 |
Figure 4Four complexes which don't match the true complexes.