Literature DB >> 24872426

MoDPepInt: an interactive web server for prediction of modular domain-peptide interactions.

Kousik Kundu¹, Martin Mann¹, Fabrizio Costa¹, Rolf Backofen².

Abstract

UNLABELLED: MoDPepInt (Modular Domain Peptide Interaction) is a new easy-to-use web server for the prediction of binding partners for modular protein domains. Currently, we offer models for SH2, SH3 and PDZ domains via the tools SH2PepInt, SH3PepInt and PDZPepInt, respectively. More specifically, our server offers predictions for 51 SH2 human domains and 69 SH3 human domains via single domain models, and predictions for 226 PDZ domains across several species, via 43 multidomain models. All models are based on support vector machines with different kernel functions ranging from polynomial, to Gaussian, to advanced graph kernels. In this way, we model non-linear interactions between amino acid residues. Results were validated on manually curated datasets achieving competitive performance against various state-of-the-art approaches.
AVAILABILITY AND IMPLEMENTATION: The MoDPepInt server is available under the URL http://modpepint.informatik.uni-freiburg.de/.

Entities: Chemical Disease Species

Mesh：

Substances：
Peptides

Year: 2014 PMID： 24872426 PMCID： PMC4155253 DOI： 10.1093/bioinformatics/btu350

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Protein–protein interactions are often mediated by modular protein domains in eukaryotes and play an essential role in diverse biological processes such as signal transduction, cellular growth and cell polarity (Pawson and Nash, 2003). Modular domains that specifically bind with short linear peptides are known as peptide recognition modules. Each domain family recognizes peptides with specific characteristics. For example, phosphotyrosine (pY)-containing peptides, proline-rich peptides and C-terminus peptides are recognized by SH2, SH3 and PDZ domains, respectively. However, individual domains from the same family show different binding specificity. Accurate models that can help understand the mechanisms responsible for the highly selective binding affinity are therefore of interest. Recently, several high-throughput techniques, such as protein microarray, phage display and SPOT synthesis, have been developed, which can detect the binding specificity of various modular domains. However, efficient bioinformatics tools are needed to extract meaningful knowledge from the enormous amount of data produced. To this end, we used state-of-the-art machine learning approaches to build support vector machine models that can accurately predict binding specificity. We have collected into a unified web-based system called MoDPepInt (Modular Domain Peptide Interaction), three different tools: SH2PepInt, SH3PepInt and PDZPepInt for three different modular domains, namely, SH2, SH3 and PDZ (Kundu ,b; Kundu and Backofen, 2014). Currently, we offer single domain models for 51 SH2 human and 69 SH3 human domains, and multidomain models for 226 PDZ domains across human, mouse, fly and worm. To assess the quality of our models, we have used manually curated interaction data achieving competitive performance against various state-of-the-art approaches. In summary, MoDPepInt unique features include (i) a domain-peptide prediction system for SH2, SH3 and PDZ in a single platform and (ii) the largest number of modeled domains (see Supplementary Table S1).

2 APPLICATION AND FUNCTIONALITY

2.1 Input

All tools have a unified input format. Query sequences (up to a maximum number of 500) can be supplied either in a FASTA format or using UniProt database accession numbers. PDZPepInt offers predictions also for domains that are newly developed and/or not comprised in the original 226 PDZ domains: the unknown query domain should be supplied in FASTA format. Multiple query domain sequences can also be provided.

2.2 Filters

Several filters are available to increase predictive accuracy. SH2 domains generally recognize phosphotyrosine (pY) residues of binding proteins. For this reason, in SH2PepInt, we offer a phosphotyrosine filter that only considers those peptides whose tyrosine phosphorylation has already been experimentally verified and reported in PhosphoSitePlus database (Hornbeck ). As SH3 domains mainly bind with proline-rich peptides, in SH3PepInt, we offer a proline-rich filter that uses 31 regular expressions to select proline-rich peptides (Carducci ). PDZ domains have the tendency to bind the unstructured C-terminal regions of binding proteins; hence, in PDZPepInt, we offer a filter to select for intrinsically unstructured/disordered regions based on the IUPred algorithm (Dosztanyi ), which selects five C-terminal residues with IUPred scores >0.4 (Akiva ). Finally, a cellular localization filter is available for all tools. This filter considers only those interactions where both the protein containing the peptide and the protein containing the modular domain have the same cellular localization according to the Gene Ontology Database (Ashburner ).

2.3 Processing and output

An internal queuing system (which currently uses 40 computation nodes) balances the submitted jobs in parallel. MoDPepInt is implemented in C++, perl and shell scripting, with runtimes typically ranging in the order of few minutes. The output for all three tools is formatted as a downloadable table. We report for each domain–ligand protein interaction pair (i) the sequence ID, (ii) the ligand binding position, (iii) the ligand binding sequence and (iv) the ligand binding domains. See Figure 1 for the schematic representation of the MoDPepInt pipeline.

Fig. 1.

Schematic representation of the MoDPepInt pipeline

3 DISCUSSION

MoDPepInt collects three protein–protein interaction predictive models that can be efficiently tuned using data derived from various high-throughput experimental techniques and thus do not require structural information as in Brannetti and Hou , 2012. The resulting models exhibit significant performance improvement in comparison with other existing tools. The main sources of performance improvement are due to the following: (i) non-linear modeling and advantage over linear PWM models (Obenauer ), (ii) balanced discriminative training and (iii) datasets pooling. SH2PepInt uses polynomial kernels, and it is trained on additional high-confidence negatives obtained via semisupervised techniques. SH3PepInt uses graph kernels on a complex representation of both the peptide sequence and the aligned domains. The adoption of a graph-type representation allows the inclusion of the physico-chemical properties of amino acids, which increases the generalization capacity of the models. Furthermore, the method does not need any prior alignment of the peptides. This is a big advantage because poly-proline-rich peptides are hard to align. PDZPepInt uses Gaussian kernels, and it is trained on interaction data from additional highly related domains. Using pooling from closely related domains allows to leverage the limited information available for some domains and to extrapolate to unseen, but alignable, novel domains. Once trained, all models can be used to efficiently scan entire proteomes to identify novel interactions with typical runtimes of few minutes. In addition, we offer a meta-web server to be used in non-expert mode that submits the input simultaneously to all tools and displays a summary of the main results. For performance comparisons, details on the novelty of the methods and description of the meta-web server, see Supplementary Information. Funding: This work was funded by the Bundesministerium für Bildung und Forschung (e-bio; FKZ 0316174A to R.B.) and the Excellent Initiative of the German Federal and State Governments (EXC 294 to R.B.). Conflict of interest: none declared.

13 in total

1. SH3-SPOT: an algorithm to predict preferred ligands to different members of the SH3 gene family.

Authors: B Brannetti; A Via; G Cestra; G Cesareni; M Helmer-Citterich
Journal: J Mol Biol Date: 2000-04-28 Impact factor: 5.469

2. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

Review 3. Assembly of cell regulatory systems through protein interaction domains.

Authors: Tony Pawson; Piers Nash
Journal: Science Date: 2003-04-18 Impact factor: 47.728

4. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.

Authors: Zsuzsanna Dosztányi; Veronika Csizmok; Peter Tompa; István Simon
Journal: Bioinformatics Date: 2005-06-14 Impact factor: 6.937

5. Characterization of domain-peptide interaction interface: a case study on the amphiphysin-1 SH3 domain.

Authors: Tingjun Hou; Wei Zhang; David A Case; Wei Wang
Journal: J Mol Biol Date: 2008-01-03 Impact factor: 5.469

6. Characterization of domain-peptide interaction interface: prediction of SH3 domain-mediated protein-protein interaction network in yeast by generic structure-based models.

Authors: Tingjun Hou; Nan Li; Youyong Li; Wei Wang
Journal: J Proteome Res Date: 2012-04-09 Impact factor: 4.466

7. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.

Authors: Peter V Hornbeck; Jon M Kornhauser; Sasha Tkachev; Bin Zhang; Elzbieta Skrzypek; Beth Murray; Vaughan Latham; Michael Sullivan
Journal: Nucleic Acids Res Date: 2011-12-01 Impact factor: 16.971

8. Cluster based prediction of PDZ-peptide interactions.

Authors: Kousik Kundu; Rolf Backofen
Journal: BMC Genomics Date: 2014-01-24 Impact factor: 3.969

9. Semi-supervised prediction of SH2-peptide interactions from imbalanced high-throughput data.

Authors: Kousik Kundu; Fabrizio Costa; Michael Huber; Michael Reth; Rolf Backofen
Journal: PLoS One Date: 2013-05-17 Impact factor: 3.240

10. A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains.

Authors: Kousik Kundu; Fabrizio Costa; Rolf Backofen
Journal: Bioinformatics Date: 2013-07-01 Impact factor: 6.937

17 in total

1. Semisynthetic and in Vitro Phosphorylation of Alpha-Synuclein at Y39 Promotes Functional Partly Helical Membrane-Bound States Resembling Those Induced by PD Mutations.

Authors: Igor Dikiy; Bruno Fauvet; Ana Jovičić; Anne-Laure Mahul-Mellier; Carole Desobry; Farah El-Turk; Aaron D Gitler; Hilal A Lashuel; David Eliezer
Journal: ACS Chem Biol Date: 2016-07-11 Impact factor: 5.100

2. In Vivo CRISPR Screen Identifies TgWIP as a Toxoplasma Modulator of Dendritic Cell Migration.

Authors: Lamba Omar Sangaré; Einar B Ólafsson; Yifan Wang; Ninghan Yang; Lindsay Julien; Ana Camejo; Patricia Pesavento; Saima M Sidik; Sebastian Lourido; Antonio Barragan; Jeroen P J Saeij
Journal: Cell Host Microbe Date: 2019-10-09 Impact factor: 21.023

3. Machine-learning techniques for the prediction of protein-protein interactions.

Authors: Debasree Sarkar; Sudipto Saha
Journal: J Biosci Date: 2019-09 Impact factor: 1.826

4. MotifAnalyzer-PDZ: A computational program to investigate the evolution of PDZ-binding target specificity.

Authors: Jordan Valgardson; Robin Cosbey; Paul Houser; Milo Rupp; Raiden Van Bronkhorst; Michael Lee; Filip Jagodzinski; Jeanine F Amacher
Journal: Protein Sci Date: 2019-11-01 Impact factor: 6.725

5. Functional and structural analysis of rare SLC2A2 variants associated with Fanconi-Bickel syndrome and metabolic traits.

Authors: Osatohanmwen J Enogieru; Peter M U Ung; Sook Wah Yee; Avner Schlessinger; Kathleen M Giacomini
Journal: Hum Mutat Date: 2019-04-25 Impact factor: 4.878

6. Magi-1 scaffolds Na_V1.8 and Slack K_Na channels in dorsal root ganglion neurons regulating excitability and pain.

Authors: Kerri D Pryce; Rasheen Powell; Dalia Agwa; Katherine M Evely; Garrett D Sheehan; Allan Nip; Danielle L Tomasello; Sushmitha Gururaj; Arin Bhattacharjee
Journal: FASEB J Date: 2019-03-12 Impact factor: 5.191

7. Prediction of protein disorder based on IUPred.

Authors: Zsuzsanna Dosztányi
Journal: Protein Sci Date: 2017-11-16 Impact factor: 6.725

8. Large G protein α-subunit XLαs limits clathrin-mediated endocytosis and regulates tissue iron levels in vivo.

Authors: Qing He; Richard Bouley; Zun Liu; Marc N Wein; Yan Zhu; Jordan M Spatz; Chia-Yu Wang; Paola Divieti Pajevic; Antonius Plagge; Jodie L Babitt; Murat Bastepe
Journal: Proc Natl Acad Sci U S A Date: 2017-10-23 Impact factor: 11.205

9. Proteomic Analysis of NCK1/2 Adaptors Uncovers Paralog-specific Interactions That Reveal a New Role for NCK2 in Cell Abscission During Cytokinesis.

Authors: Kévin Jacquet; Sara L Banerjee; François J M Chartier; Sabine Elowe; Nicolas Bisson
Journal: Mol Cell Proteomics Date: 2018-07-12 Impact factor: 5.911

10. Regulation of OATP1B1 Function by Tyrosine Kinase-mediated Phosphorylation.

Authors: Elizabeth R Hayden; Mingqing Chen; Kyle Z Pasquariello; Alice A Gibson; James J Petti; Shichen Shen; Jun Qu; Su Sien Ong; Taosheng Chen; Yan Jin; Muhammad Erfan Uddin; Kevin M Huang; Aviv Paz; Alex Sparreboom; Shuiying Hu; Jason A Sprowl
Journal: Clin Cancer Res Date: 2021-03-04 Impact factor: 12.531