| Literature DB >> 25699028 |
Meghana Kshirsagar1, Sylvia Schleker2, Jaime Carbonell1, Judith Klein-Seetharaman3.
Abstract
We consider the problem of building a model to predict protein-protein interactions (PPIs) between the bacterial species Salmonella Typhimurium and the plant host Arabidopsis thaliana which is a host-pathogen pair for which no known PPIs are available. To achieve this, we present approaches, which use homology and statistical learning methods called "transfer learning." In the transfer learning setting, the task of predicting PPIs between Arabidopsis and its pathogen S. Typhimurium is called the "target task." The presented approaches utilize labeled data i.e., known PPIs of other host-pathogen pairs (we call these PPIs the "source tasks"). The homology based approaches use heuristics based on biological intuition to predict PPIs. The transfer learning methods use the similarity of the PPIs from the source tasks to the target task to build a model. For a quantitative evaluation we consider Salmonella-mouse PPI prediction and some other host-pathogen tasks where known PPIs exist. We use metrics such as precision and recall and our results show that our methods perform well on the target task in various transfer settings. We present a brief qualitative analysis of the Arabidopsis-Salmonella predicted interactions. We filter the predictions from all approaches using Gene Ontology term enrichment and only those interactions involving Salmonella effectors. Thereby we observe that Arabidopsis proteins involved e.g., in transcriptional regulation, hormone mediated signaling and defense response may be affected by Salmonella.Entities:
Keywords: host pathogen protein interactions; kernel mean matching; machine learning methods; plant pathogen protein interactions; protein interaction prediction; transfer learning
Year: 2015 PMID: 25699028 PMCID: PMC4313693 DOI: 10.3389/fmicb.2015.00036
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Transfer of PPIs from the source host (for ex: human) to another host, the target host (for example .
Datasets used in the various approaches, their sizes and the appropriate citations.
| 1. Homology based | Human- | 62 190,868 | Schleker et al., | No feature set. Heuristics are used to infer interactions |
| 2. T-SVM | Human- | 62 | Schleker et al., | (a) Protein sequence k-mers |
| (b) Gene expression (from GEO) | ||||
| (c) GO term similarity | ||||
| 3. KMM | Human- | 62 | Schleker et al., | |
| Human- | 1380 | |||
| Human- | 32 | |||
| 22 | PHISTO | Protein sequence k-mers | ||
| 15 | (Tekir et al., | |||
| 13 | ||||
| 23 |
KMM, Kernel Mean Matching;
SVM, Support Vector Machine; GO, Gene Ontology.
This source reports PPIs validated experimentally by biochemical and biophysical methods.
Figure 2Approach-1 (a) Ortholog based protein interaction inference. “S1” represents a Salmonella protein and S2 is the homolog of S1 or S1 itself. H represents a human protein and A represents an Arabidopsis protein that is an ortholog of the human protein.
Figure 3Approach-1(b) Graph based interaction transfer. The big circles show the two protein complexes found to be enriched by Network Blast : the Arabidopsis protein complex on the left, and the human protein complex on the right. The edges within a protein complex are the PPIs within the host organism. The edges connecting the two protein complexes (i.e., the two circles) are the homology edges. The solid line connecting sipA with a human protein node is a bootstrap interaction. We use this to infer the new plant-Salmonella interaction indicated by the dotted line.
Figure 4Transductive Support Vector Machine (SVM) for transfer learning. The first panel shows the conventional SVM classifier. The second panel shows T-SVM with circles representing unlabeled examples. We use examples from the target task i.e., Arabidopsis-Salmonella protein pairs as the unlabeled examples to influence the classifier boundary.
Performance of the machine learning based methods on various transfer settings.
| Baseline | 42.8 | 58.8 | |||
| T-SVM | 45.4 | 61.2 | |||
| KMM-SVM | |||||
| Baseline | 95.4 | 33.8 | 50 | ||
| T-SVM | 67.5 | 43.5 | |||
| KMM-SVM | 52 | ||||
| Baseline | 17.8 | 12.9 | 14.9 | ||
| T-SVM | 15 | 14.5 | 14.7 | ||
| KMM-SVM | |||||
| Baseline | 12.9 | 12.5 | 12.7 | ||
| T-SVM | 10.4 | 15.6 | 12.5 | ||
| KMM-SVM | |||||
We compare them with a simple baseline: inductive kernel-SVM. We report precision (P), recall (R) and f-score (F1). The data that was used to build each of the models is shown in the first column. The second column shows the target task—the data on which we evaluate the model. The numbers in bold font indicate the highest performance in that column (i.e., for that metric).
Computed using the default classifier threshold: 0.5.
The positive:negative class ratio in all datasets was 1:100.
The performance of a random classifier would be F-score = 1.
Figure 5Overlap amongst the novel PPI predictions from each approach. All predictions from the homology based approach and the T-SVM are shown. For the KMM-SVM method, we filter the predictions using a threshold of 0.7 on the interaction probability reported by the classifier. We picked this threshold based on the interaction probabilities reported on the known interactions.
GO terms that were enriched in the most targetted .
| AT1G01030 | B3 domain containing transcription factor | Sequence-specific DNA binding transcription factor activity ; regulation of transcription, DNA-templated | GO:0003700 |
| GO:0006355 | |||
| AT1G06160 | Ethylene-responsive transcription factor ERF094 | DNA binding ; sequence-specific DNA binding transcription factor activity ; regulation of transcription from RNA-polymerase II promoter ; response to jasmonic acid stimulus | GO:0003677 |
| GO:0003700 | |||
| GO:0006355 | |||
| GO:0009753 | |||
| AT1G01060 | Myb-related putative transcription factor | Response to cadmium ion ; response to salt stress ; response to auxin stimulus ; response to cold | GO:0046686 |
| GO:0009651 | |||
| GO:0009733 | |||
| GO:0009409 | |||
| AT1G13180 | Actin-related protein 3 | Actin binding | GO:0003779 |
| AT2G40220 | Ethylene-responsive transcription factor ABI4. Protein glucose insensitive 6 | DNA binding ; response to water deprivation ; positive regulation of transcription, DNA-dependent ; sequence-specific DNA binding | GO:0003677 |
| GO:0009414 | |||
| GO:0045893 | |||
| GO:0043565 | |||
| AT2G46400 | Putative WRKY transcription factor 46 | Response to chitin | GO:0010200 |
| AT1G01080 | Ribonucleoprotein, putative | nucleic acid binding ; RNA binding | GO:0003676 |
| GO:0003723 | |||
| AT3G12110 | Actin-11 | Chloroplast stroma | GO:0009570 |
| AT3G56400 | Probable WRKY transcription factor 70 | Response to salicylic acid stimulus ; sequence-specific DNA binding transcription factor activity ; protein amino acid binding | GO:0009751 |
| GO:0003700 | |||
| GO:0005515 | |||
| AT1G01090 | Pyruvate dehydrogenase E1 component subunit alpha-3, chloroplastic | Chloroplast stroma | GO:0009570 |
| AT4G09570 | Ca-dependent protein kinase 4 | protein amino acid binding | GO:0005515 |
| AT1G01150 | Homeodomain-like protein with RING-type zinc finger domain | Zinc ion binding ; regulation of transcription, DNA-templated | GO:0008270 |
| GO:0006355 | |||
| AT4G18170 | Probable WRKY transcription factor 28 | Regulation of transcription, DNA-templated ; sequence- specific DNA binding transcription factor activity | GO:0006355 |
| GO:0003700 | |||
| AT1G01160 | GRF1-interacting factor 2 | Protein amino acid binding | GO:0005515 |
| AT1G01200 | Ras-related protein RABA3 | GTP binding; small GTPase mediated signal transduction ; protein transport | GO:0005525 |
| GO:0007264 | |||
| GO:0015031 | |||
| AT5G47220 | Ethylene-responsive transcription factor 2 | Positive regulation of transcription, DNA-dependent ; ethylene mediated signaling pathway | GO:0045893 |
| GO:0009873 | |||
| AT1G01250 | Ethylene-responsive TF ERF023 | Sequence-specific DNA binding transcription factor activity ; nuclear envelope | GO:0003700 |
| GO:0005634 | |||
| AT1G01350 | Zinc finger CCCH domain-containing protein 1 | Nucleic acid binding ; zinc ion binding | GO:0003676 |
| GO:0008270 | |||
| AT1G01370 | Histone H3-like centromeric protein HTR12 | DNA binding ; protein amino acid binding | GO:0003677 |
| GO:0005515 |
To get this list, we performed a GO term enrichment analysis using the FuncAssociate tool (Berriz et al., .
List of all enriched GO terms obtained by applying enrichment analysis tool FuncAssociate (Berriz et al., .
| GO:0003676 | Nucleic acid binding |
| GO:0003677 | DNA binding |
| GO:0003700 | Sequence-specific DNA binding TF activity |
| GO:0003723 | RNA binding |
| GO:0003735 | Structural constituent of ribosome |
| GO:0003755 | peptidyl-prolyl cis-trans isomerase activity |
| GO:0003779 | Actin binding |
| GO:0003899 | DNA-directed RNA polymerase activity |
| GO:0004298 | Threonine-type endopeptidase activity |
| GO:0004693 | Cyclin-dependent protein serine/threonine kinase activity |
| GO:0004842 | Ubiquitin-protein transferase activity |
| GO:0004871 | Signal transducer activity |
| GO:0005484 | SNAP receptor activity |
| GO:0005507 | Copper ion binding |
| GO:0005509 | Calcium ion binding |
| GO:0005515 | Protein binding |
| GO:0005525 | GTP binding |
| GO:0005576 | Extracellular region |
| GO:0005622 | Intracellular region |
| GO:0005634 | Nuclear envelope |
| GO:0005839 | Proteasome core complex |
| GO:0005840 | Ribosome |
| GO:0006351 | Transcription, DNA-templated |
| GO:0006355 | Regulation of transcription, DNA-templated |
| GO:0006412 | Translation |
| GO:0006413 | Translational initiation |
| GO:0006457 | Protein folding |
| GO:0006511 | Ubiquitin-dependent protein catabolic process |
| GO:0007264 | Small GTPase mediated signal transduction |
| GO:0007267 | Cell-cell signaling |
| GO:0008233 | Peptidase activity |
| GO:0008270 | Zinc ion binding |
| GO:0008794 | Arsenate reductase (glutaredoxin) activity |
| GO:0009408 | Response to heat |
| GO:0009409 | Response to cold |
| GO:0009414 | Response to water deprivation |
| GO:0009570 | Chloroplast stroma |
| GO:0009579 | Thylakoid |
| GO:0009651 | Response to salt stress |
| GO:0009733 | Response to auxin |
| GO:0009737 | Response to abscisic acid |
| GO:0009739 | Response to gibberellin |
| GO:0009751 | Response to salicylic acid |
| GO:0009753 | Response to jasmonic acid |
| GO:0009828 | Plant-type cell wall loosening |
| GO:0009873 | Ethylene mediated signaling pathway |
| GO:0010200 | Response to chitin |
| GO:0015031 | Protein transport |
| GO:0015035 | Protein disulfide oxidoreductase activity |
| GO:0016491 | Oxidoreductase activity |
| GO:0016607 | Nuclear speck |
| GO:0016762 | Xyloglucan:xyloglucosyl transferase activity |
| GO:0022626 | Cytosolic ribosome |
| GO:0022627 | Cytosolic small ribosomal subunit |
| GO:0042254 | Ribosome biogenesis |
| GO:0042742 | Defense response to bacterium |
| GO:0043565 | Sequence-specific DNA binding |
| GO:0045454 | Cell redox homeostasis |
| GO:0045892 | Negative regulation of transcription, DNA-templated |
| GO:0045893 | Positive regulation of transcription, DNA-templated |
| GO:0046686 | Response to cadmium ion |
| GO:0046872 | Metal ion binding |
| GO:0051726 | Regulation of cell cycle |
The shown terms had a p-value less than 0.001.