| Literature DB >> 22540037 |
Vera Pancaldi, Omer S Saraç, Charalampos Rallis, Janel R McLean, Martin Převorovský, Kathleen Gould, Andreas Beyer, Jürg Bähler.
Abstract
A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein-protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70-80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt).Entities:
Keywords: Cbf11; Mak1/2/3; TOR; random forest; support vector machine
Year: 2012 PMID: 22540037 PMCID: PMC3337474 DOI: 10.1534/g3.111.001560
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Description of features used in the prediction of the interactions
| Feature Class | Features and Description of Data |
|---|---|
| Gene Ontology | GO.0006259_DNA_METABOLISM |
| GO.0006350_TRANSCRIPTION | |
| GO.0006412_PROTEIN_BIOSYNTHESIS | |
| GO.0006810_TRANSPORT | |
| GO.0007005_MITOCHONDRION_ORGANIZATION_AND_BIOGENESIS | |
| GO.0007049_CELL_CYCLE | |
| GO.0007165_SIGNAL_TRANSDUCTION | |
| GO.0008150_BIOLOGICAL_PROCESS | |
| GO.0016070_RNA_METABOLISM | |
| GO.0044238_PRIMARY_METABOLISM | |
| Chromosomal position | Strand, chromosome, start and end positions |
| Distance from centromeres/telomeres | Absolute and relative distance |
| Gene physical properties | Length of the ORF, number of introns, length, and GC content of the first intron |
| Protein physical properties | Isoelectric point and mass of the protein (kDa), total and relative abundance of each amino acid in the protein, sulfur and nitrogen content, Codon Adaptation Index, protein length, codon bias, FOP frequency of optimal codons, and indexes of hydropathicity (Gravy score) and aromaticity (frequency of aromatic amino acids, such as phenylalanine, tyrosine, and tryptophan) |
| Protein localization | Protein localization in the cell and index of co-localization ( |
| Experimental gene properties | Experimental gene properties: mRNA half life, ribosome occupancy and density, mRNA levels, and Pol-II occupancy ( |
| Genetic interactions | Known genetic interactions from the BioGRID ( |
| Pair physical features | Same strand, same chromosome, and distance on chromosome |
| Expression correlation | Pearson correlation of mRNA levels over about 100 different experimental conditions ( |
More details can be found in File S1.
We used the terms for a custom-built GO superslim classification.
Figure 1Model cross-validation and model testing on an independent test set. (A, B) ROC curves of 10 repeats with 2-fold cross-validation tests performed with SVM and RF, respectively. ROC curves show how the relationship between the true-and false-positive rate changes as a function of the probability threshold. If we are interested in making a few predictions with the smallest possible chance for errors, we should consider a high threshold. (C, D) Precision-recall curves corresponding to A and B. Precision-recall curves show the rate of correctly predicted interactions vs. the rate of predicted interactions. (E, F) Model testing using an independent fission yeast data set and an equally large degree balanced negative test set with the same proteins. (G, H) ROC and precision-recall curves for a second test set composed of the 204 high-confidence interactions and ∼32,000 random negative pairs, which are assumed not to be interacting. In this test, the ratio of positives to total pairs is similar to what we expect in predicting the whole interactome.
Figure 2Importance of features in the RF classifier. Only the most important features are shown. Expression correlation and GO functional categories are the most important features, followed by protein length, mRNA levels, and protein co-localization (File S3; see File S1 for explanation of feature names). These values are the average of the importance of each feature in 10 realizations of the 2-fold cross-validation test.
Figure 3Estimation of performance of the predictions for the interaction partners of Cbf11. (A) ROC curves for RF (upper curve) and SVM (lower curve) obtained by comparing the predicted Cbf11 interactions with the experimentally verified targets. (B) Corresponding precision recall curves. (C) Distributions of the interactors predicted by RF (red line) and SVM (blue line) and of the corresponding 100 sets of genes picked at random from the fission yeast proteome (RF, pink line; SVM, light-blue line). (D) Overlap between the predicted interactors and the ∼300 experimentally verified targets (solid lines, RF, red; SVM, blue) and overlaps for each of the 100 random sets (RF, pink squares; SVM light-blue triangles). Lower panel: Venn diagrams showing overlaps between experimentally identified targets and the predictions, SVM (E), RF (F), and overlap of RF and SVM (G).
Figure 4Known interactions among predicted Cbf11 interactors. The proteins shown are the subset of the predicted Cbf11 interactors that are annotated to be part of a complex in fission yeast. Many complexes are predicted to interact with Cbf11. In some cases, almost all the subunits are predicted interactors (File S4).
Figure 5Predicted network with interactions confirmed by RF and SVM. The network includes 3438 proteins and 37,325 interactions for clarity. The Cytoscape organic layout was used for visualization (Smoot ). The emerging clusters reflect highly connected portions of the network. These clusters coincide with the GO superslim functional categories of the proteins and are color-coded as indicated (Ashburner ).
Figure 6The stress response sub-network. (A) Known interactions of stress-related genes (from BioGRID). (B) Network in A expanded using our predictions. (C) Detail of the network for Tor1 and Tor2 kinases.
Predicted interactions for both Tor1 and Tor2 obtained with both RF and SVM (score > 0.87)
| Description (GeneDB Annotation) | Common Name | Systematic ID |
|---|---|---|
| Adenylyl cyclase–associated protein Cap1 | Cap1 | SPCC306.09c |
| Arrestin Aly1 related | SPBC839.02 | SPBC839.02 |
| Autophagy-associated protein | SPBC1711.11 | SPBC1711.11 |
| Cytoskeletal signaling protein | SPAC637.13C | SPAC637.13C |
| GTPase activating protein | SPAC1952.17c | SPAC1952.17c |
| GTPase activating protein | SPAC23D3.03c | SPAC23D3.03c |
| GTPase activating protein | SPAC3G9.05 | SPAC3G9.05 |
| Guanyl-nucleotide exchange factor (predict.) | SPAC11E3.11C | SPAC11E3.11C |
| Guanyl-nucleotide exchange factor Sec73 | Sec73 | SPAC19A8.01c |
| Meiotically upregulated gene Mug79 | Mug79 | SPAC6G9.04 |
| Regulator of G-protein signaling Rgs1 | Rgs1 | SPAC22F3.12c |
| RhoGEF Rgf2 | Rgf2 | SPAC1006.06 |
| RhoGEF Scd1 | Scd1 | SPAC16E8.09 |
| Rho-type GTPase activating protein Rga2 | Rga2 | SPAC26A3.09c |
| Rho-type GTPase activating protein Rga3 | Rga3 | SPAC29A4.11 |
| Rho-type GTPase activating protein Rga6 | Rga6 | SPBC354.13 |
| Rho-type GTPase activating protein Rga7 | Rga7 | SPBC23G7.08c |
| RNB-like protein | Sts5 | SPCC16C4.09 |
| Scaffold protein Scd2 | Scd2 | SPAC22H10.07 |
| Sorting nexin Mvp1 | Mvp1 | SPAC3A11.06 |
| SPRY domain protein | SPCC285.10c | SPCC285.10c |
| Type 2a phosphatase regulator Tip41 | Tip41 | SPCC4B3.16 |
| Two-component GAP Byr4 | Byr4 | SPAC222.10c |
Predicted interaction partners of the Tor kinases whose orthologs are associated with human disease
| Human Protein | Disease | Reference | Comment | |
|---|---|---|---|---|
| CLN3 | Btn1 | Batten disease (MIM ID #204200) | Genetically interacts with the core stress signaling pathways; | |
| PDHB | Pdb1 | Pyruvate decarboxylase deficiency (MIM ID #312170) | Related to cerebral ataxia. | |
| CYCS | Cyc1 | Huntington disease (MIM ID #143100) and thrombocytopenia 1 (MIM ID #313900) | Cytochrome C, a protein participating in electron transport chain in mitochondria; also localized in the cytoplasm. | |
| PDK3 | Pkp1 | Congenital myopathy (MIM ID #300580) | Mitochondrial pyruvate dehydrogenase (lipoamide) kinase. | |
| PPOX | Hem14 | Porphyria (MIM IDs #176200, #176000) | Penultimate enzyme of haem biosynthesis targeted to mitochondria. |