| Literature DB >> 15251039 |
Christine Brun1, Carl Herrmann, Alain Guénoche.
Abstract
BACKGROUND: Developing reliable and efficient strategies allowing to infer a function to yet uncharacterized proteins based on interaction networks is of crucial interest in the current context of high-throughput data generation. In this paper, we develop a new algorithm for clustering vertices of a protein-protein interaction network using a density function, providing disjoint classes.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15251039 PMCID: PMC487898 DOI: 10.1186/1471-2105-5-95
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of the method. (a) A graph is built from a list of binary protein-protein interactions. (b) The Czekanowski-Dice distance is calculated among all pairs of proteins. (c) A graph Γ is build based on distance values (see text for details). (d) Classes are constructed after computing a density function De. (e) Classes are functionally annotated according to a threshold majority rule in classes (MRC). (f) Function are predicted for uncharacterized proteins by a next neighbor exploration.
Figure 2Comparison of the number of predictions made with our procedure (shaded bars) and the MRC strategy (full bars) as a function of the threshold parameter d. The total number of proteins is 876.
List of predicted functions
| YPL077C | Vesicular transport (90%, 1) ; |
| IES5 | Vesicular transport (100%,1) ; |
| YDL089W | Vesicular transport (100%,1) ; |
| YLR324W | Vesicular transport (100%,1) ; |
| YKR022C | Vesicular transport (100%,1) ; |
| QUT1 | Vesicular transport (83%, 3) ; |
| TVP15 | Vesicular transport (100%, 3) ; Membrane fusion (50%,1) ; |
| YFR008W | Mating response (67%, 2) ; |
| YLR238W | Mating response (67%, 2) ; |
| YNL127W | Mating response (67%, 2) ; |
| PST2 | Mating response (100%,1) ; Signal transduction (100%,1) ; |
| SLX4 | DNA repair (75%, 1) ; Recombination (75%, 1) ; |
| SHU2 | DNA repair (100%,1) ; Recombination (100%,1) ; |
| YCL063W | DNA repair (100%,1) ; Recombination (67%, 1) ; DNA synthesis (50%, 1) ; |
| NKP2 | Mitosis (60%, 1) ; Chromatin/chromosome structure (60%, 1) ; |
| SOG2 | Mitosis (60%, 1) ; |
| YGL079W | Mitosis (71%,1) ; |
| APP2 | Cell structure (50%, 1) ; |
| YBR108W | Cell structure (50%, 2) ; |
| YGR058W | Cell structure (50%, 1) ; |
| YLR456W | RNA processing/modification (57%, 1) ; |
| YNL092W | RNA processing/modification (57%, 1) ; |
| YDR140W | Protein modification (100%,1) ; Pol II transcription (100%,1) ; Chromatin/chromosome structure (100%,1) ; |
| YEL023C | Protein modification (50%, 1) ; DNA repair (50%, 1) ; DNA synthesis (75%, 1) |
| NIS1 | Cell cycle control (50%,2) ; |
| YLR125W | Cell cycle control (62%,2) ; |
| BIT61 | Cell polarity (100%,1) ; |
| YKL082C | Cell polarity (60%,2) ; |
| YGL230C | Pol II transcription (88%, 1) ; |
| TAH18 | Pol II transcription (64%, 2) ; |
| YJL084C | Carbohydrate metabolism (67%, 2) ; |
| TSR2 | Protein synthesis (100%, 2) ; |
| AKL1 | RNA turnover (50%, 1) ; |
| YER071C | Cell structure (50%, 1) ; Protein folding (50%,1) ; |
| YKR007W | Small molecule transport (50%,1) ; Cell stress (100%,1) ; Other metabolism (50%,1) ; |
| RMD1 | Meiosis (75%, 1) ; |
| FIN1 | Signal transduction (75%, 2) ; Differentiation (50%, 2) ; |
Predictions made by our method for 37 previously uncharacterized proteins (no annotation in SGD, version of February 3rd, 2004). The numbers in parenthesis indicate 1) the percentage of annotated proteins in the class sharing this cellular function, and 2) the number of neighbors of the protein which are annotated for this function.
Comparison with the GOM approach
| Protein | Hybrid method | GOM [8] | current SGD annotations (2/06/2004) |
| YLR324W | vesicular transport (≠) | nuclear organization (≠) | peroxisome organization and biogenesis |
| YKR022C | vesicular transport (≠) | nuclear organization (≠) | nuclear mRNA splicing, via spliceosome |
| YFR008W | mating response (=) | pheromone response, mating type determination, sex-specific protein (=) | cell cycle arrest in response to pheromone |
| YLR238W | mating response (=) | nuclear organization (≠) | cell cycle arrest in response to pheromone |
| YNL127W | mating response (=) | budding, cell polarity and filament organization (=) | cell cycle arrest in response to pheromone |
| SLX4 | DNA repair, recombination (≃) | assimilation of ammonia (≠) | DNA replication, DNA dependent DNA replication |
| YCL063W | DNA repair, recombination, DNA synthesis (≠) | biogenesis of cell wall (≠) | vacuole inheritance |
| APP2 | cell structure (=) | (no prediction) | actin filament organization |
| NIS1 | cell cycle control (=) | nuclear organization (≠) | regulation of mitosis |
| YKL082C | cell polarity (=) | (no prediction) | establishment of cell polarity (sensu Saccharomyces) |
| TSR2 | protein synthesis (≃) | organization of cytoplasm (≠) | processing of 20S pre-rRNA |
| AKL1 | RNA turnover (≠) | (no prediction) | actin cytoskeleton organization and biogenesis, regulation of endocytosis |
Comparison of the predictions made by our method and the GOM [8], for the 12 proteins previously uncharacterized (SGD, 2/02/2004) which have received an annotation in the meantime (SGD, 2/06/2004). The hybrid method uses YPD keywords, whereas the GOM uses MIPS keywords. The SGD annotations are Gene Ontology terms. The symbol = means that a prediction is equal or strongly similar to the actual annotation, whereas ≃ means that it is related to, and ≠ indicates that the prediction is different.
Figure 3Comparison of our procedure (full squares) and the MRC strategy (full diamonds) of the rate of true functions recovered (TFR, plot (a)) and the rates of correct predictions (RCP, plot (b)). The straight lines show the linear fit for our procedure (full line) and the MRC procedure (dashed lines). The horizontal axis indicates the number of proteins for which a prediction has been made. All rates in the vertical axis are computed with respect to the total number of annotated proteins.
Figure 4Comparison between the hybrid method (blue squares), the MRC method (red diamonds) and the MRN [4] (green triangle). The rate of true functions recovered is plotted agains the rate of correct predictions. For the hybrid method and the MRC method, the points correspond to thresholds d from 30% to 70% in steps of 5%, whereas for the MRN method the points correspond to predictions made with the n most frequent functions represented among direct interaction partners, with n = 1...5.