| Literature DB >> 14709178 |
Christine Brun1, François Chevenet, David Martin, Jérôme Wojcik, Alain Guénoche, Bernard Jacq.
Abstract
We here describe PRODISTIN, a new computational method allowing the functional clustering of proteins on the basis of protein-protein interaction data. This method, assessed biologically and statistically, enabled us to classify 11% of the Saccharomyces cerevisiae proteome into several groups, the majority of which contained proteins involved in the same biological process(es), and to predict a cellular function for many otherwise uncharacterized proteins.Entities:
Mesh:
Substances:
Year: 2003 PMID: 14709178 PMCID: PMC395738 DOI: 10.1186/gb-2003-5-1-r6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Flowchart of PRODISTIN. (a) A graph is constructed from a list of binary protein-protein interactions. (b) A functional distance based on the identity of the shared interactors is calculated among all proteins. (c) The distance matrix obtained is used to build a classification tree, on which functional classes are subsequently determined and analyzed by evaluating (d) their statistical robustness and (e) their biological relevance.
Figure 2A functional classification tree for 602 yeast proteins computed with the PRODISTIN method. (a) The foundation for protein clustering. PRODISTIN classes are clustered according to the 'cellular role' of proteins only (pink), according to the 'functional category' of proteins only (blue), and according to both criteria (yellow). (b) Functional classification. PRODISTIN classes on the circular classification tree have been colored according to their corresponding 'cellular role'. Protein names have been omitted for clarity (see Additional data file 1 for details of the classes). Classes corresponding to two different 'cellular roles' are colored according to the first annotation used in Additional data file 1.
Figure 3Examples of PRODISTIN classes. (a) Class 21 'lipid and fatty acid metabolism/protein translocation'. (b) Class 20 'DNA synthesis'. (c) Class 50 'RNA processing/modification'. Asterisks indicate founder proteins of the class (that is, annotated in YPD with the 'cellular role' given to the class). Computed class robustness indexes (CRIs) are shown in front of nodes.
Cross-talk between cellular processes after PRODISTIN classification
| Cellular processes | PRODISTIN classes |
| PRODISTIN classes composed of doubly annotated proteins | |
| Cell stress | 10 |
| Cell structure | 14 |
| Lipid fatty acid metabolism | 23 |
| PolII transcription | 34 |
| RNA processing and modification | 50 |
| PRODISTIN classes composed of at least three proteins annotated for a cellular role, three proteins annotated for another one, with some doubly annotated | |
| Cell polarity | 7 |
| Cell polarity | 9 |
| Cell Structure | 13 |
| Chromosome and chromatin structure | 17 |
| Mating response | 24 |
| Protein degradation | 45 |
| Nested PRODISTIN classes | |
| Aging ⊂ Signal transduction | 0 ⊂ 54 |
| Cell cycle control ⊂ Amino acid metabolism | 3 ⊂ 1 |
| Cytokinesis ⊂ Cell polarity | 20 ⊂ 8, 21 ⊂ 8 |
| Mating response ⊂ Cell polarity | 25 ⊂ 8, 26 ⊂ 8 |
| Cell polarity/Mating response ⊂ Signal transduction | 9 ⊂ 54 |
| Cell stress ⊂ Protein degradation/Vesicular transport | 11 ⊂ 45 |
| Cell stress ⊂ Signal transduction | 12 ⊂ 54 |
| Cell structure/Protein complex assembly ⊂ Mitosis | 13 ⊂ 28 |
| Chromatin/Chromosome structure ⊂ PolII transcription | 16 ⊂ 35 |
| Mating response/Differentiation ⊂ Signal transduction | 24 ⊂ 54 |
| PolIII transcription ⊂ PolII transcription | 42 ⊂ 39 |
| RNA processing and modification ⊂ Nucleus-cytoplasm transport | 51 ⊂ 31 |
| RNA splicing ⊂ RNA processing/modification | 53 ⊂ 52 |
| Vesicular transport ⊂ Cell polarity/cell structure | 55 ⊂ 7 |
| Vesicular transport ⊂ Cell polarity | 59 ⊂ 8 |
| Unknown ⊂ Cell structure/protein folding | 60 ⊂ 14 |
| Unknown ⊂ Vesicular transport | 62 ⊂ 56 |
Functional predictions and comparisons with predictions obtained by other means
| Protein name | Class | Predicted function (this study) | Prediction after [ | Prediction after [ | Prediction after [ | GO annotations, September 2003 [ |
| FLM1 | 1, 3 | Amino acid metabolism, cell cycle control (0) | ≈ (0) | ≠ (0) | Mitochondrion organization and biogenesis | |
| VTS1 | 4 | Cell cycle control (0) | ≈ (0) | Protein-vacuolar targeting | ||
| YPR171W | 7 | Cell polarity (1) | ≈ (1) | Cell polarity and structure, actin cytoskeleton organization and biogenesis | ||
| YBR108W | 7 | Cell polarity | ≈ | Unknown | ||
| YGR268C | 7, 55 | Cell polarity, cell structure, vesicular transport | ≈ | Unknown | ||
| DSE1 | 8, 25 | Cell polarity, mating response (1) | Cell wall organization and biogenesis | |||
| YKL082C | 8 | Cell polarity | ≈ | Unknown | ||
| YMR322C | 10 | Cell stress, other metabolism | ≈ | ≈ | Unknown | |
| VPS64 | 14, 60 | Cell structure, protein folding (1) | ≠ (1) | Protein-vacuolar targeting, cell cycle arrest in response to pheromone | ||
| YFR008W | 14, 60 | Cell structure, protein folding (0) | ≠ (1) | Cell cycle arrest in response to pheromone | ||
| YNL127W | 14, 60 | Cell structure, protein folding (0) | ≈ (1) | Cell cycle arrest in response to pheromone | ||
| YJL019W | 22 | DNA synthesis (1) | ≈ (1) | ≈ (1) | Spindle pole duplication | |
| PST2 | 24, 54 | Mating response, differentiation, signal transduction | ≠ | ≈ | Unknown | |
| YLL049W | 29 | Mitosis | ≠ | Unknown | ||
| YNR069C | 29 | Mitosis | Unknown | |||
| NIS1 | 30 | Nucleus-cytoplasm transport (0) | ≠ (0) | Regulation of mitosis | ||
| YKL061W | 30 | Nucleus-cytoplasm transport | ≠ | Unknown | ||
| YDR489W | 30 | Nucleus-cytoplasm transport (0) | ≠ (0) | DNA-dependent DNA replication | ||
| YHL018W | 33 | PolI transcription | Unknown | |||
| YDR179C | 35 | PolII transcription (1) | ≠ (0) | Protein synthesis turnover, protein deneddylation | ||
| YMR025W | 35 | PolII transcription (1) | ≠ (0) | Protein synthesis turnover, protein deneddylation | ||
| YJL058C | 36 | PolII transcription | ≠ | Unknown | ||
| SOH1 | 37 | PolII transcription (1) | Transcription from polII promoter, DNA repair | |||
| YJR083C | 37 | PolII transcription | ≈ | Unknown | ||
| YGL230C | 38 | PolII transcription | ≠ | Unknown | ||
| VAC14 | 43 | Protein degradation (0) | ≠ (0) | Intermediate and energy metabolism, transcription, DNA maintenance, chromatin structure, phospholipid metabolism, vacuole inheritance | ||
| AKL1 | 43 | Protein degradation | ≠ | Unknown | ||
| YHR115C | 43 | Protein degradation | ≠ | Unknown | ||
| YPL105C | 48 | Protein synthesis | ≠ | Unknown | ||
| YLR424W | 49 | RNA processing and modification | ≈ | Unknown | ||
| YKR022C | 49 | RNA processing and modification | ≈ | Unknown | ||
| AIR2 | 52 | RNA processing and modification (1) | RNA metabolism, mRNA nucleus export | |||
| DHH1 | 52 | RNA processing and modification (1) | Deadenylation-dependent decapping, NOT mRNA catabolism, nonsense mediated | |||
| YEL015W | 52 | RNA processing and modification (1) | ≈ (1) | = (1) | ≠ (0) | RNA metabolism |
| YOR285W | 54 | Signal transduction | ≠ | Unknown | ||
| YGL161C | 56 | Vesicular transport | ≈ | ≈ | Unknown | |
| YDR100W | 56 | Vesicular transport | ≈ | ≈ | Unknown | |
| YDR425W | 56 | Vesicular transport (1) | Protein, transport | |||
| YDR084C | 56 | Vesicular transport | ≈ | Unknown | ||
| YGL198W | 56 | Vesicular transport | ≈ | Unknown | ||
| YPL246C | 56 | Vesicular transport | ≈ | Unknown | ||
| YLR285W | 57 | Vesicular transport (0) | ≠ (0) | Chromatin silencing at ribosomal DNA, nicotinamide metabolism |
=, ≈, ≠, are used to indicate when prediction from other bioinformatic methods are the same, almost the same, or different from PRODISTIN predictions. The number in parentheses indicates when the prediction is in accordance or related to (1), or different (0) from functions demonstrated experimentally.
Success rates for PRODISTIN vs majority rule
| MR | PRODISTIN | |
| Success rate | 0.43 | 0.67 |
| Predictions | ||
| Totally in accordance | 0.23 | 0.35 |
| Partially in accordance | 0.69 | 0.76 |
| In disagreement | 0.31 | 0.24 |
| Number of proteins on which a prediction is possible | 520 | 346 |
Figure 4Robustness of PRODISTIN towards false interactions. The prediction rate (number of correct predictions divided by number of predictions) was measured for PRODISTIN (yellow curve) and for the majority rule algorithm (green curve) on networks on which a certain percentage of interactions were randomly 'rewired' (from 10 to 50%) (see text). The number of proteins for which a prediction is possible is also reported as a histogram (dark red, PRODISTIN; blue, majority rule). The values correspond to an average of 50 experiments for each percentage of false interactions introduced into the dataset.
Figure 5Evaluation of PRODISTIN robustness by analysis of the H. pylori interactome. Average class robustness index (CRI) value for the five H. pylori trees obtained with interactions of decreasing PBS (blue histograms) and for the yeast tree (orange histogram).