| Literature DB >> 25708571 |
Yijia Zhang, Hongfei Lin, Zhihao Yang, Jian Wang.
Abstract
BACKGROUND: Accurate determination of protein complexes is crucial for understanding cellular organization and function. High-throughput experimental techniques have generated a large amount of protein-protein interaction (PPI) data, allowing prediction of protein complexes from PPI networks. However, the high-throughput data often includes false positives and false negatives, making accurate prediction of protein complexes difficult.Entities:
Mesh:
Year: 2015 PMID: 25708571 PMCID: PMC4331718 DOI: 10.1186/1471-2164-16-S2-S4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Workflow for extracting PPI data from PubMed abstracts.
Figure 2Example of a protein complex prediction network. (a) A PPI network of eight proteins. (b) The PPI network is annotated using GO slims. (c) Prediction of two protein complexes in the PPI network based on structural and GO annotation similarities.
Figure 3Example of an attributed PPI network. (a) an attributed PPI network. (b) GO slim attributes of protein vertices and type attributes of the PPIs. T1: high-throughput type; T2: biomedical literature type. (c) and (d) are the subgraphs induced by {GS1} and {GS1,GS2} respectively.
Figure 4Examples of ontology correlated cliques.
Statistics of the ontology correlated cliques in Figure 4.
|
|
|
|
| |
|---|---|---|---|---|
| {P2, P5, P6} | {GS1} | 0.667 | 0.625 | 1.251 |
| {P2, P3, P5} | {GS1, GS2} | 0.867 | 1 | 5.202 |
| {P2, P3, P4, P5} | {GS1, GS2} | 0.833 | 1 | 6.664 |
(= 0.6 and = 0.4).
PPI extraction results for five training corpora
| Training dataset | Annotated PPIs | Extracted PPIs | Different from Gavin | Different from Krogan |
|---|---|---|---|---|
| AImed | 5834 | 2957 | 2729 | 2659 |
| BioInfer | 9666 | 1196 | 1100 | 1073 |
| IEPA | 817 | 2223 | 2072 | 2039 |
| HPRD50 | 433 | 2573 | 2390 | 2362 |
| LLL | 300 | 1871 | 1756 | 1722 |
Results of PPI data extracted from biomedical literature.
| COACH | CMC | ClusterONE | COAN | |
|---|---|---|---|---|
| AImed | 0.1938 | 0.1395 | 0.1873 | 0.153 |
| BioInfer | 0.1233 | 0.1072 | 0.1578 | 0.1231 |
| IEPA | 0.1325 | 0.1015 | 0.1597 | 0.1422 |
| HPRD50 | 0.1566 | 0.1175 | 0.1462 | 0.1396 |
| LLL | 0.1345 | 0.1103 | 0.1403 | 0.1325 |
Results of Gavin PPI data and biomedical literature PPI data.
| COACH | CMC | ClusterONE | COAN | |||||
|---|---|---|---|---|---|---|---|---|
| F | σF | F | σF | F | σF | F | σF | |
| Gavin dataset | 0.406 | - | 0.321 | - | 0.418 | - | 0.404 | - |
| Gavin + random I | 0.402 | 0.003 | 0.311 | 0.009 | 0.408 | 0.012 | 0.402 | 0.004 |
| Gavin + random II | 0.398 | 0.005 | 0.298 | 0.005 | 0.389 | 0.013 | 0.401 | 0.005 |
| Gavin + random III | 0.395 | 0.005 | 0.283 | 0.012 | 0.366 | 0.013 | 0.393 | 0.008 |
| Gavin + Aimed | ||||||||
| Gavin + BioInfer | 0.414 | - | 0.329 | - | 0.415 | - | 0.413 | - |
| Gavin + IEPA | 0.406 | - | 0.342 | - | 0.427 | - | 0.409 | - |
| Gavin +HPRD50 | 0.417 | - | 0.32 | - | 0.423 | - | 0.409 | - |
| Gavin + LLL | 0.41 | - | 0.337 | - | 0.411 | - | 0.419 | - |
Gavin + random I, Gavin + random II, and Gavin + random III show the results of randomly adding 1000, 2000 and 3000 interactions to the Gavin dataset, respectively. The highest F-score of each approach is shown in bold.
The results of Krogan PPI data and biomedical literature PPI data.
| COACH | CMC | ClusterONE | COAN | |||||
|---|---|---|---|---|---|---|---|---|
| F | σF | F | σF | F | σF | F | σF | |
| Krogan dataset | 0.441 | - | 0.358 | - | 0.401 | - | 0.451 | - |
| Krogan + random I | 0.439 | 0.002 | 0.353 | 0.004 | 0.379 | 0.014 | 0.445 | 0.002 |
| Krogan + random II | 0.436 | 0.004 | 0.349 | 0.006 | 0.354 | 0.021 | 0.449 | 0.006 |
| Krogan + random III | 0.433 | 0.004 | 0.347 | 0.006 | 0.34 | 0.018 | 0.444 | 0.007 |
| Krogan + Aimed | ||||||||
| Krogan + BioInfer | 0.453 | - | 0.366 | - | 0.405 | - | 0.458 | - |
| Krogan + IEPA | 0.444 | - | 0.398 | - | 0.393 | - | 0.453 | - |
| Krogan + HPRD50 | 0.445 | - | 0.384 | - | 0.389 | - | 0.463 | - |
| Krogan + LLL | 0.454 | - | 0.393 | - | 0.404 | - | 0.453 | - |
Krogan + random I, Krogan + random II, and Krogan + random III show the results of randomly adding 1000, 2000 and 3000 PPIs to the Krogan dataset, respectively. The highest -score of each approach is shown in bold.
The effect of extend_thres on protein complex prediction performance using Attributed PPI network I.
|
| P | R | F | Sn | PPV | Acc |
|---|---|---|---|---|---|---|
| 0.05 | 0.506 | 0.314 | 0.387 | 0.389 | 0.471 | |
| 0.1 | 0.521 | 0.541 | ||||
| 0.2 | 0.55 | 0.341 | 0.421 | 0.451 | 0.611 | 0.525 |
| 0.3 | 0.524 | 0.321 | 0.398 | 0.39 | 0.653 | 0.505 |
| 0.4 | 0.515 | 0.311 | 0.388 | 0.349 | 0.677 | 0.486 |
| 0.5 | 0.506 | 0.314 | 0.387 | 0.332 | 0.699 | 0.482 |
| 0.6 | 0.502 | 0.314 | 0.386 | 0.331 | 0.482 | |
| 0.7 | 0.502 | 0.314 | 0.386 | 0.331 | 0.482 | |
| 0.8 | 0.502 | 0.314 | 0.386 | 0.331 | 0.482 | |
| 0.9 | 0.502 | 0.314 | 0.386 | 0.331 | 0.482 | |
| 1.0 | 0.502 | 0.314 | 0.386 | 0.331 | 0.482 |
F: F-score, P: precision, R: recall. The highest -score of each approach is shown in bold.
The effect of extend_thres on protein complex prediction performance using Attributed PPI network II.
|
| P | R | F | Sn | PPV | Acc |
|---|---|---|---|---|---|---|
| 0.05 | 0.571 | 0.316 | 0.407 | 0.581 | 0.413 | 0.49 |
| 0.1 | 0.525 | 0.576 | ||||
| 0.2 | 0.599 | 0.367 | 0.457 | 0.447 | 0.647 | 0.538 |
| 0.3 | 0.567 | 0.365 | 0.444 | 0.389 | 0.702 | 0.523 |
| 0.4 | 0.559 | 0.355 | 0.434 | 0.348 | 0.72 | 0.501 |
| 0.5 | 0.551 | 0.35 | 0.428 | 0.339 | 0.732 | 0.498 |
| 0.6 | 0.551 | 0.348 | 0.426 | 0.336 | 0.497 | |
| 0.7 | 0.551 | 0.348 | 0.426 | 0.336 | 0.497 | |
| 0.8 | 0.551 | 0.348 | 0.426 | 0.336 | 0.497 | |
| 0.9 | 0.551 | 0.348 | 0.426 | 0.336 | 0.497 | |
| 1.0 | 0.551 | 0.348 | 0.426 | 0.336 | 0.497 |
F: F-score, P: precision, R: recall. The highest -score of each approach is shown in bold.
The contribution weight of high-throughput experimental and literature PPI data using Attributed PPI networks for protein complex prediction.
| High-throughput | Literature | ||
|---|---|---|---|
| Attributed networks I | PPIs | 0.59 | 0.41 |
| Weight | 6351 | 2957 | |
| Attributed networks II | PPIs | 7080 | 2957 |
| Weight | 0.55 | 0.45 | |
Extend_thres = 0.1.
Performance comparison of the weight mechanism.
| P | R | F | Sn | PPV | Acc | |
|---|---|---|---|---|---|---|
| Weight I | 0.589 | 0.36 | 0.447 | 0.521 | 0.541 | 0.531 |
| No weight I | 0.556 | 0.341 | 0.422 | 0.52 | 0.526 | 0.523 |
| Weight II | 0.636 | 0.382 | 0.477 | 0.525 | 0.576 | 0.551 |
| No weight II | 0.623 | 0.368 | 0.463 | 0.523 | 0.567 | 0.544 |
Weight I and Weight II denote the performance using the weight mechanism on attributed PPI networks I and II, respectively. No weight I and No weight II denote the performance without the weight mechanism F: F-score, P: precision, R: recall.
Performance comparison with other protein complex prediction methods.
| PPIN | Methods | #Complexes | P | R | F | Sn | PPV | Acc |
|---|---|---|---|---|---|---|---|---|
| Attr. PPIN I | Our method (BP,MF,CC) | 231 | 0.589 | 0.521 | 0.541 | 0.531 | ||
| Our method (BP,MF) | 182 | 0.659 | 0.326 | 0.436 | 0.471 | 0.571 | 0.518 | |
| ClusterONE | 199 | 0.568 | 0.331 | 0.418 | 0.468 | |||
| COACH | 326 | 0.525 | 0.333 | 0.406 | 0.44 | 0.547 | 0.49 | |
| CMC | 120 | 0.608 | 0.218 | 0.321 | 0.371 | 0.606 | 0.474 | |
| Gavin PPIN | HUNTER | 69 | 0.206 | 0.333 | 0.386 | 0.508 | 0.443 | |
| MCL | 103 | 0.718 | 0.245 | 0.366 | 0.489 | 0.509 | ||
| MCODE | 70 | 0.739 | 0.154 | 0.255 | 0.283 | 0.519 | 0.384 | |
| Attr. PPIN II | Our method (BP,MF,CC) | 247 | 0.636 | 0.382 | 0.525 | 0.576 | 0.551 | |
| Our method (BP,MF) | 206 | 0.679 | 0.348 | 0.46 | 0.477 | 0.578 | 0.525 | |
| ClusterONE | 464 | 0.375 | 0.401 | 0.523 | ||||
| COACH | 345 | 0.617 | 0.343 | 0.441 | 0.432 | 0.544 | 0.485 | |
| Krogan PPIN | CMC | 111 | 0.748 | 0.235 | 0.358 | 0.381 | 0.589 | 0.474 |
| HUNTER | 74 | 0.199 | 0.323 | 0.374 | 0.569 | 0.462 | ||
| MCL | 309 | 0.291 | 0.245 | 0.266 | 0.396 | 0.475 | ||
| MCODE | 72 | 0.75 | 0.159 | 0.263 | 0.27 | 0.552 | 0.386 | |
#Complexes refers to the number of predicted complexes. F: F-score, P: precision, R: recall. The highest F-score of each approach is shown in bold.