| Literature DB >> 29219066 |
Le Ou-Yang1, Hong Yan1,2, Xiao-Fei Zhang3.
Abstract
BACKGROUND: The accurate identification of protein complexes is important for the understanding of cellular organization. Up to now, computational methods for protein complex detection are mostly focus on mining clusters from protein-protein interaction (PPI) networks. However, PPI data collected by high-throughput experimental techniques are known to be quite noisy. It is hard to achieve reliable prediction results by simply applying computational methods on PPI data. Behind protein interactions, there are protein domains that interact with each other. Therefore, based on domain-protein associations, the joint analysis of PPIs and domain-domain interactions (DDI) has the potential to obtain better performance in protein complex detection. As traditional computational methods are designed to detect protein complexes from a single PPI network, it is necessary to design a new algorithm that could effectively utilize the information inherent in multiple heterogeneous networks.Entities:
Keywords: Domain-domain interaction; Multi-network clustering; Protein complex; Protein-protein interaction
Mesh:
Substances:
Year: 2017 PMID: 29219066 PMCID: PMC5773919 DOI: 10.1186/s12859-017-1877-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic overview of the algorithm. The flowchart of our multi-network clustering procedure for detecting protein complexes
Fig. 2The effect of b. Performance of MNC on protein complex identification with different values of b measured by geometric mean of Acc and FRAC with respect to MIPS benchmark complex set. The x-axis denotes the value of and the y-axis denotes the geometric mean of Acc and FRAC
Fig. 3Comparison with existing protein complex identification algorithms. Performance of existing algorithms and our method in terms of (a) Acc and (b) FRAC, with respect to CYC2008 and SGD
Performance of MNC with different initialize method
| Methods | # complexes | # proteins | Reference sets | |||
|---|---|---|---|---|---|---|
| CYC2008 | SGD | |||||
| Evaluation metrics | ||||||
| Acc | FRAC | Acc | FRAC | |||
| MNC | 1048 | 3038 | 0.697 | 0.726 | 0.651 | 0.648 |
| MNC | 597 | 1952 | 0.695 | 0.685 | 0.652 | 0.609 |
Here “# complexes”denotes the number of complexes predicted by each algorithm, and “# proteins”denotes the number of proteins covered by the complexes predicted by each algorithm. MNC corresponds to the results of MNC with random initial conditions
Fig. 4Performance of existing algorithms and MNC in protein complex detection. Amounts of known protein complexes in reference sets (a) CYC2008 and (b) SGD that are recognized by various algorithms under varying OS threshold ω
The number and percentage of the complexes predicted by MNC and CMC that have P-value falls within different intervals
| Methods |
| ||||
|---|---|---|---|---|---|
| < 1E(-15) | 1E(-15) to 1E(-10) | 1E(-10) to 1E(-5) | 1E(-5) to 1E(-2) | 1E(-2) to 1 | |
| MNC | 50 (4.8%) | 56 (5.3%) | 199 (19%) | 476 (45.4%) | 267 (25.5%) |
| CMC | 30 (7.3%) | 26 (6.3%) | 79 (19.2%) | 173 (42%) | 104 (25.2%) |
Ten predicted protein complexes with smallest P-values
| Index |
| Predicted protein complexes | Gene ontology term | Cluster frequency |
|---|---|---|---|---|
| 2 | 1.21e-31 | YCR035C, YDL111C, YDR280W, YGR095C | polyadenylation-dependent | 12 out of 14 |
| YHR069C, YHR081W, YNL189W, YNL232W | snoRNA 3’-end processing | genes, 85.7% | ||
| YOR001W, YOR076C, YGR158C, YGR195W | ||||
| YOL021C, YOL142W | ||||
| 5 | 8.98e-31 | YAL043C, YDR195W, YDR228C, YDR301W | mRNA polyadenylation | 13 out of 17 |
| YJL033W, YJR093C, YKL018W, YKL059C | genes, 76.5% | |||
| YLR277C, YMR061W, YNL317W, YOR179C | ||||
| YKR002W, YLR115W, YER133W, YGR156W | ||||
| YPR107C | ||||
| 7 | 5.85e-32 | YBR146W, YBR251W, YDR036C, YDR041W | organellar small ribosomal | 14 out of 15 |
| YGL129C, YGR084C, YHL004W, YIL093C | subunit | genes, 93.3% | ||
| YNL137C, YNL306W, YPL118W, YDR347W | ||||
| YJR113C, YKL155C, YDR337W | ||||
| 10 | 3.70-43 | YBR217W, YBR272C, YDL007W, YDL097C | proteasome complex | 20 out of 21 |
| YDR427W, YEL037C, YER012W, YER021W | genes, 95.2% | |||
| YFR052W, YGL004C, YGL048C, YHL030W | ||||
| YOR259C, YOR261C, YPR108W, YHR200W | ||||
| YFR004W, YFR010W, YDL147W, YDR394W | ||||
| YKL145W | ||||
| 13 | 1.65e-35 | YBR119W, YDL087C, YDR235W, YDR240C | U1 snRNP | 14 out of 16 |
| YHR086W, YIL061C, YKL012W, YLR147C | genes, 87.5% | |||
| YML046W, YMR125W, YPL178W, YPR182W | ||||
| YLR275W, YLR298C, YFL017W-A, YGR013W | ||||
| 18 | 4.7e-29 | YBR254C, YDR108W, YDR246W, YDR407C | TRAPP complex | 10 out of 11 |
| YGR166W, YJL044C, YKR068C, YML077W | genes, 90.9% | |||
| YMR218C, YOR115C, YDR472W | ||||
| 27 | 7.34e-36 | YBR055C, YBR152W, YDL098C, YDR473C | U4/U6 x U5 tri-snRNP | 15 out of 15 |
| YJR022W, YKL173W, YLR147C, YLR275W | complex | genes, 100% | ||
| YPR082C, YPR178W, YPR182W, YFL017W-A | ||||
| YGR091W, YOR159C, YOR308C | ||||
| 35 | 2.05e-30 | YBL084C, YDL008W, YDR118W, YFR036W | anaphase-promoting | 11 out of 11 |
| YHR166C, YKL022C, YLR102C, YLR127C | complex | genes, 100% | ||
| YNL172W, YOR249C, YGL240W | ||||
| 46 | 9.34e-32 | YBL093C, YBR193C, YBR253W, YCR081W | transcription factor activity, | 16 out of 17 |
| YDR443C, YER022W, YGL025C, YGR104C | RNA polymerase II | genes, 94.1% | ||
| YNL236W, YNR010W, YOL051W, YOL135C | transcription factor | |||
| YHR041C, YHR058C, YDL005C, YDR308C | binding | |||
| YOR174W | ||||
| 399 | 2.77e-28 | YBR127C, YDL185W, YEL051W, YGR020C | proton-transporting ATPase | 11 out of 11 |
| YKL080W, YLR447C, YMR054W, YOR270C | activity, rotational mechanism | genes, 100% | ||
| YOR332W, YPR036W, YHR039C-A |
Fig. 5The GINS complex as detected by different computational methods. The shadow area shows the complex predicted by each method (a) MNC, (b) MCL and (c) SPICi. Red rectangle nodes represent subunits of the GINS complex in CYC2008, blue circle nodes represent proteins with other functions and green diamond nodes represent protein domains. The solid lines between nodes represent the interactions between proteins (or protein domains). The dash lines between nodes represent the interactions between proteins and protein domains