| Literature DB >> 23650509 |
Yijia Zhang1, Hongfei Lin, Zhihao Yang, Jian Wang.
Abstract
Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23650509 PMCID: PMC3641129 DOI: 10.1371/journal.pone.0062077
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1An example of protein complex prediction: (a) A PPI network is constructed by eight proteins. (b) The PPI network is annotated by GO slims. (c) Prediction of two protein complexes in the PPI network based on structural and GO annotation similarities.
Figure 2Ontology augmented graph with GO slims.
Figure 3Transition probability matrix of the ontology augmented network example.
Figure 4The pseudo-codes of the COAN algorithm.
The effect of extend_thres on the performance of COAN on the DIP database.
| Extend – thres | Size | Precision | Recall | F1 | Sensitivity | PPV | Accuracy |
| Threshold = 0.1 | 118 | 0.274 | 0.174 | 0.213 |
| 0.192 | 0.368 |
| Threshold = 0.2 | 77 | 0.339 | 0.252 | 0.29 | 0.657 | 0.27 | 0.421 |
| Threshold = 0.3 | 69 | 0.405 | 0.324 | 0.36 | 0.588 | 0.338 | 0.446 |
| Threshold = 0.4 | 50 | 0.462 | 0.407 | 0.433 | 0.515 | 0.41 | 0.46 |
| Threshold = 0.5 | 37 | 0.48 | 0.431 | 0.455 | 0.464 | 0.481 | 0.472 |
| Threshold = 0.6 | 31 |
|
|
| 0.435 | 0.555 | 0.491 |
| Threshold = 0.7 | 28 | 0.457 | 0.412 | 0.433 | 0.403 | 0.598 | 0.491 |
| Threshold = 0.8 | 21 | 0.441 | 0.404 | 0.422 | 0.383 | 0.636 |
|
| Threshold = 0.9 | 14 | 0.433 | 0.397 | 0.414 | 0.369 |
| 0.493 |
The word ‘size’ refers to the size of the largest predicted complex with different extend_thres. The highest value in each row is in bold.
Performance comparison of protein complex prediction methods using the DIP dataset.
| Methods | #Complexes | Size | Precision | Recall |
| Sensitivity | PPV | Accuracy |
| COAN | 383 | 31 | 0.486 | 0.438 |
| 0.435 | 0.555 |
|
| COACH | 730 | 85 | 0.364 |
| 0.41 | 0.544 | 0.38 | 0.455 |
| CMC | 173 | 49 | 0.595 | 0.287 | 0.387 | 0.399 |
| 0.475 |
| HUNTER | 92 | 160 |
| 0.199 | 0.308 | 0.496 | 0.467 | 0.482 |
| MCODE | 77 | 60 | 0.468 | 0.098 | 0.162 | 0.279 | 0.352 | 0.313 |
| MCL | 372 | 498 | 0.21 | 0.232 | 0.221 |
| 0.331 | 0.429 |
The ‘#Complexes’ refers to the number of predicted complexes, and ‘Size’ refers to the size of the largest predicted complex. extend_thres was set at 0.6 for COAN. The highest score is in bold.
Performance comparison of protein complex prediction methods using the Krogan dataset.
| Methods | #Complexes | Size | Precision | Recall |
| Sensitivity | PPV | Accuracy |
| COAN | 237 | 20 | 0.709 | 0.331 |
| 0.388 |
|
|
| COACH | 345 | 24 | 0.617 |
| 0.441 | 0.432 | 0.544 | 0.485 |
| CMC | 111 | 24 | 0.748 | 0.235 | 0.358 | 0.381 | 0.589 | 0.474 |
| HUNTER | 74 | 67 |
| 0.199 | 0.323 | 0.374 | 0.569 | 0.462 |
| MCODE | 72 | 52 | 0.75 | 0.159 | 0.263 | 0.27 | 0.552 | 0.386 |
| MCL | 309 | 486 | 0.291 | 0.245 | 0.266 |
| 0.396 | 0.475 |
The ‘#Complexes’ refers to the number of predicted complexes, and ‘‘Size’ refers to the size of the largest predicted complex. extend_thres was set at 0.6 for COAN. The highest score is in bold.
Figure 5Two protein complexes predicted by COAN method on Krogan dataset.
Examples of predicted complexes using the DIP dataset.
| ID | Predicted complexes | NA | GO biological processes | GO molecular functions | GO cellular components | |||
| Annotation | P-value | Annotation | P-value | Annotation | P-value | |||
| 1 | YDR469W YKL018W YHR119W YBR258C YAR003W YBR175W YLR015W YPL138C | 1 | GO:0051568 (histone H3-K4 methylation) | 1.21e-20 | GO:0042800 (histone methylase activity) | 7.56e-26 | GO:0035097 (histone methyltransferase complex) | 1.68e-25 |
| 2 | YBR234C YNR035C YKL013C YIL062C YDL029W YLR370C YJR065C | 1 | GO:0030041 (actin filament polymerization) | 3.99e-18 | GO:0003779 (actin binding) | 3.66e-12 | GO:0005885 (Arp2/3 protein complex) | 1.16e-21 |
| 3 | YHR187W YMR312W YGR200C YPL101W YPL086C YLR384C | 1 | GO:0006400 (tRNA modification) | 4.65e-11 | GO:0000049 (tRNA binding) | 4.43e-06 | GO:0033588 (Elongator holoenzyme complex) | 6.92e-20 |
| 4 | YPR072W YGR134W YDL165W YNR052C YCR093W YER068W YAL021C YNL288W YIL038C | 1 | GO:0032968 (positive regulation of transcription elongation from RNA polymerase II promoter) | 1.62e-21 | GO:0004842 (ubiquitin-protein ligase activity) | 4.38e-05 | GO:0030015 (CCR4-NOT core complex) | 1.46e-28 |
| 5 | YER133W YKL059C YLR115W YGR156W YAL043C YKR002W YJR093C YNL317W YDR301W YDR195W YLR277C YPR107C YNL222W | 0.87 | GO:0006378 (mRNA polyadenylation) | 1.29e-26 | GO:0003723 (RNA binding) | 1.58e-07 | GO:0005847 (CFII complex) | 1.20e-37 |
| 6 | YPL178W YGR013W YIL061C YHR086W YML046W YLR275W YFL017W-A YBR119W YKL012W YLR298C YMR125W YDR240C YPR182W YDL087C YLR147C YDR235W YGR074W | 0.81 | GO:0000398 (mRNA splicing, via spliceosome) | 3.03e-30 | GO:0003723 (RNA binding) | 5.84e-15 | GO:0005685 (U1 snRNP) | 3.00e-39 |
| 7 | YNL166C YJR076C YDR507C YCR002C YHR107C YDL225W YLR314C | – | GO:0000921 (septin ring assembly) | 5.30e-15 | GO:0005545 (1-phosphatidylinositol binding) | 9.62e-10 | GO:0000144 (cellular bud neck septin ring) | 1.56e-16 |
| 8 | YHR200W YDL147W YER012W YMR308C YML092C YDL188C YMR314W YMR047C YGL011C YOR362C | – | GO:0010499 (proteasomal ubiquitin-independent protein catabolic process) | 2.77e-10 | GO:0004298 (threonine-type endopeptidase activity) | 3.51e-11 | GO:0034515 (proteasome storage granule) | 7.89e-12 |
| 9 | YMR213W YLR117C YPL151C YBR065C YPR182W YLL036C | – | GO:0000398 (mRNA splicing via spliceosome) | 4.58e-10 | GO:0000384 (first spliceosomal transesterification activity) | 5.00e-07 | GO:0071006 (U2-type catalytic step 1 spliceosome) | 1.85e-06 |
‘NA’ refers to the neighborhood affinity score between a predicted complex and a reference complex. ‘-‘denotes the NA score is less than 0.2.