Literature DB >> 17478500

MODPROPEP: a program for knowledge-based modeling of protein-peptide complexes.

Abstract

MODPROPEP is a web server for knowledge-based modeling of protein-peptide complexes, specifically peptides in complex with major histocompatibility complex (MHC) proteins and kinases. The available crystal structures of protein-peptide complexes in PDB are used as templates for modeling peptides of desired sequence in the substrate-binding pocket of MHCs or protein kinases. The substrate peptides are modeled using the same backbone conformation as in the template and the side-chain conformations are obtained by the program SCWRL. MODPROPEP provides a number of user-friendly interfaces for visualizing the structure of the modeled protein-peptide complexes and analyzing the contacts made by the modeled peptide ligand in the substrate-binding pocket of the MHC or protein kinase. Analysis of these specific inter-molecular contacts is crucial for understanding structural basis of the substrate specificity of these two protein families. This software also provides appropriate interfaces for identifying, putative MHC-binding peptides in the sequence of an antigen or phosphorylation sites on the substrate protein of a kinase, by scoring these inter-molecular contacts using residue-based statistical pair potentials. MODPROPEP would complement various available sequence-based programs (SYFPEITHI, SCANSITE, etc.) for predicting substrates of MHCs and protein kinases. The program is available at http://www.nii.res.in/modpropep.html.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Peptides
Proteins

Year: 2007 PMID： 17478500 PMCID： PMC1933231 DOI： 10.1093/nar/gkm266

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Proteins involved in a majority of cellular processes usually perform their function by binding to some target proteins and forming protein–protein complexes. Interactions between two or more proteins often occur over short contiguous stretches of amino acids within one protein. For example, recognition of substrate proteins by various protein kinases during cell signaling events is governed primarily by specific interactions between the kinase and a contiguous peptide stretch containing the phosphorylation site. Several receptors have peptide fragments as ligands e.g. the major histocompatibility complex (MHC) (1). Thus understanding molecular details of interactions between proteins and short peptide motifs is essential for dissecting underlying mechanism of several major cellular processes. Among the various proteins which interact specifically with short peptide motifs, protein kinases and MHCs represent two major protein families whose substrate specificities have been extensively studied by various experimental approaches (2–4). Although a number of computational tools such as NetPhosK (5), KinasePhos (6), GPS (7), Scansite (8), SYFPEITHI (9), ProPred (10), etc. are available for predicting the putative substrate peptides for protein kinases and MHC proteins, these methods are mostly based on available experimental binding data for a given class of protein kinase or MHC. These tools predict substrate peptides based on identification of the conserved motifs in a set of known peptide substrates and do not use information from the three dimensional structure of the protein–peptide complex. Hence, these sequence-based prediction tools do not give information about key residues in kinases and MHCs which control substrate specificity. Information about specificity determining residues (SDR) can help in design of novel peptide ligands. Correct identification of SDRs of a given protein kinase or MHC can help in prediction of substrates for those protein kinases or MHCs for which no peptide-binding data is available, as demonstrated successfully in structure-based substrate prediction methods like PREDIKIN (11) and PREDEP (12). These studies have demonstrated that structural analysis of interactions in protein–peptide complexes can lead to novel insight into the mode of substrate recognition. Therefore, molecular modeling of peptide–MHC and peptide–kinase interactions have been carried out by several groups using ab initio docking (13) or MD simulation approach (14). However, the compute intensive nature of these calculations has limited such studies to few protein–peptide complexes. Since knowledge-based methods are less compute intensive, and have better prediction accuracy, development of suitable knowledge-based tools for modeling protein–peptide complexes would permit quick structural analysis of MHCs and protein kinases with their substrate peptides. A knowledge-based approach has been used recently for developing kinDOCK (15), a powerful tool for modeling of ATP analogs into the active site pocket of protein kinases. However, no such user-friendly tool is presently available for knowledge-based modeling of peptides in the binding pockets of MHCs or protein kinases. Therefore, we have developed MODPROPEP, a web server for structural modeling of peptides of any desired sequences in the active site pockets of kinases/MHCs having known crystal structures or homology models of kinases/MHCs. In this manuscript, we give a brief description of the development of MODPROPEP, various assumptions made in the knowledge-based modeling protocol, various features of MODPROPREP and few examples of its use.

METHODS

Compilation of crystal structures

The available crystal structures of MHC and protein kinases were downloaded from PDB website at http://www.rcsb.org (16). The structures were divided into two groups, i.e. structures in complex with substrate peptide ligand and structures without the bound peptide ligand. These crystal structures were manually examined and chain/residue numbering was appropriately edited if necessary. All the crystal structures were categorized into three major classes, i.e. class I MHC, class II MHC and protein kinases. Each of these three classes was further grouped into various functional families of protein kinases or MHC alleles. Detailed analysis of these crystal structures indicated, that all the protein kinases shared a conserved structural fold despite their sequence divergence. For example, crystal structures of IR and PHK, which share a sequence identity of only 40% can be superposed with a Cα RMSD of 1.6 Å. Similar conservation of structures was also observed both for class I and class II MHC structures which share a higher degree of sequence identity within themselves. BLAST alignment of large number of protein kinases and MHC proteins available in sequence databases with these crystal structures indicated that, homology models can be obtained for most of these sequences with reasonable accuracy. Comparison of the bound peptide structures indicated that in all these three classes of proteins, the substrate peptides bind at a structurally homologous site on the conserved fold and the bound peptides maintain a more or less similar extended conformation. This suggested the possibility that bound peptides from peptide–protein complexes can be transformed to the protein structures lacking the bound peptide based on optimum superposition of the protein structures. It may be noted that similar assumption has been used successfully in structural modeling studies of protein–ligand complexes involving protein kinases (11), MHCs (12,17) and other enzyme families (18,19). There are several examples where more than one crystal structure of an allele is found with bound peptides of different length. It is generally assumed that, three residues on each side of the phosphorylation site make significant contact with the protein kinase and are responsible for the specificity of a kinase (11). Therefore, bound peptides having more than seven amino acids were truncated to three amino acids on either side of the phosphorylation site. All these structures were stored in the template library of MODPROPEP.

Modeling of protein–peptide complexes

The current template library of MODPROPEP has protein–peptide complex crystal structures for 16 alleles of class I, 12 alleles of class II MHC proteins and six different protein kinase families. Figure 1 shows a flowchart depicting various tasks which can be performed using MODPROPEP. For these MHC alleles and protein kinase families, substrate peptide of any desired sequence can be modeled. Modeling of peptide in the binding pocket of MHC or protein kinase is carried out by using the same backbone conformation as in the template complex and the side-chain conformations are generated by the program SCWRL (20), which uses a backbone-dependent rotamer library approach. The template library of MODPROPEP has structures for many MHC alleles or kinases families without the bound peptide substrate. For modeling of peptide substrates in complex with any of these MHC alleles or kinase families, peptide conformations are transformed from the available crystal structures of the protein–peptide complexes after optimum superposition of the proteins. If no crystal structures are available for a given protein kinase or MHC protein, the program can model its structure in complex with peptides of desired sequence using the crystal structure of the closest homologous protein–peptide complexes. Sequences of various MHC alleles have been obtained from the IMGT/HLA database (21) and stored locally so that the user can select from the list of alleles the protein to be modeled. The crystal structure having maximum sequence similarity is used as a template for modeling the structure of query allele. All sequence alignments are carried out using a local version of the program BLAST. The SCWRL program is used for mutating the residues as per the BLAST alignment and generate the desired homology model. Since only protein–peptide complexes are used to generate the homology models, the backbone of the bound peptide is appropriately mutated by SCWRL to model the substrate of desired sequence. Thus, MODPROPREP provides options for modeling peptide of any desired sequence in complex with any MHC protein or protein kinase.

Figure 1.

A flowchart depicting the organization and features of MODPROPEP. Pink boxes represent the information provided by the user as input.

A flowchart depicting the organization and features of MODPROPEP. Pink boxes represent the information provided by the user as input. In order to analyze the interactions between the peptide and the protein, the residues of the MHC or the kinase, which are in contact with different side chains of the modeled peptide, are identified using a distance-based cut off. Based on these contact residues, putative binding pockets are defined for each of the residues in the peptide. MODPROPEP provides a user-friendly Jmol java applet interface (http://jmol.sourceforge.net) for visualizing the modeled complexes and analyzing the binding pockets in detail. Apart from structural modeling of the peptide of a given sequence in the substrate-binding pocket of MHC protein or protein kinase, MODPROPEP is also capable of scanning an antigenic protein for potential MHC-binding peptides. Similarly putative substrate proteins for various protein kinases can be scanned for potential phosphorylation sites. Scanning of input sequence is done by breaking the protein sequence into all possible overlapping peptides of a given length. This length is usually the length of the bound peptide present in the template protein–peptide complex, i.e. 9 or 10 mer for class I MHC and longer peptides for class II MHC. However, for protein kinases only heptameric peptides containing Ser/Thr/Tyr as central residue are chosen. For each of these peptides, instead of building all atom side-chain conformations, as a first step, contacting residue pairs between peptide and the protein are identified based on Cβ–Cβ distances. The binding score of these peptides with the MHC or kinase is evaluated using residue-based statistical energy function by Miyazawa and Jernigan (MJ) (22). It may be noted that a similar scoring scheme has been used earlier for identifying MHC-binding peptides using a threading approach (12). Apart from MJ statistical potential, the program also has options for ranking peptide-binding affinities using residue-based statistical energy function by Betancourt and Thirumalai (BT) (23) or other user-defined residue-based schemes. The peptides are sorted according to their binding score and the user can select some or all of these peptides for detailed side-chain modeling by SCWRL depending on their preliminary scores.

Query interface

Currently, the structural library of program contains crystal structures of class I MHC, class II MHC and protein kinases. The modeling of protein–peptide complexes involving these three classes of protein is possible. User can access the features involving each class by clicking the links on the horizontal bar just below the header graphics. The program requires user to select a MHC allele or protein kinase from the pull-down menu. The program automatically shows the peptide length options available for modeling for that MHC allele or protein kinase. Program takes the user to available crystal structure templates for the selected protein and peptide length. From here the user can decide a task, which is either modeling of peptides or scanning a protein sequence for favorable binders. The user is prompted to enter the sequence of peptides as one letter code of amino acids. The program models the peptides in complex with the selected protein that are available for download as files in PDB format. If no ligand bound structure is available for the selected protein, the peptide is modeled by transferring the ligand peptide coordinates from a homologous protein–peptide complex. Figure 2 shows an example where a peptide has been modeled in complex with the kinase GSK3-beta by transferring the coordinates from CDK2. In order to test the accuracy of this ligand transformation approach, we modeled a peptide in complex with PKB by transforming the bound peptide from PKA. The tutorial section of MODPROPEP shows the superposition of the modeled and the experimentally determined bound peptide in the active site of PKB. As can be seen, backbone of both the peptides superpose quite well with an RMSD of 1.3 Å.

Figure 2.

A snapshot from MODPROPEP showing the result of transfer of bound peptide from the kinase CDK2 (template:1QMZ) to GSK3-beta (template:1GNG). Links are provided for downloading pdb coordinates of the modeled complex and viewing the superposition of the two protein structures along with the peptide in the Jmol applet. A pop-up window shows the BLAST alignment between CDK2 and GSK3-beta. MODPROPEP provides a user-friendly interface to analyze each modeled peptide in detail for contact with the protein. Inter-residue contacts can be calculated either based on the distance between Cβ atoms or based on the distance between any two atoms in a pair of residues. A list of neighboring residues in the protein is displayed for each residue in the peptide. These amino acids on the protein define the binding subsite for each of the peptide residues. Residue pairs having steric clashes are highlighted in yellow. The program also provides interface for analyzing detailed atomic contacts between each pair of residue. Additionally, MODPROPEP uses Jmol applet for the rapid visualization of these subsites in the proteins. Mouse click on a peptide residue shows that residue and the neighboring residues in the protein in Jmol applet on right-hand side. Clicked peptide residue is depicted in ball and stick, while the neighboring residues are shown in CPK. The protein backbone is shown in ribbon while the peptide backbone is shown in the sticks. As mentioned earlier, the current version of MODPROPEP permits scoring various bound peptides using residue-based statistical scoring matrices given by MJ and BT. Both these scoring matrices have been used in the literature for evaluating binding energy of protein–peptide complexes. It has been reported that, while MJ potential gives better results for binding of peptides involving hydrophobic interfaces, BT potential is more appropriate for binding of peptides involving polar contacts. Here, we discuss a typical example of ranking the site of phosphorylation on the beta-adducin protein (accession no: P35612) by protein kinase A (24). Out of a total of 118 S/T containing heptamers, RTPSFLK containing the experimentally identified phosphorylation site S713, is ranked 8 by MJ potential, while scoring by BT matrix gives it a rank of 3. Modeling of this peptide in complex with PKA shows R710 is stabilized by contacts with E127 and E170. The screenshots for this example are available at the tutorial page of MODPROPEP. Prediction of phosphorylation site in Limulus myosin III by PKA (25), and PS1 by GSK3-beta (26) indicates that, the true phosphorylation sites identified in recent experiments are ranked as high-scoring peptides by MODPROPEP using BT matrix. Figure 3 shows the ranking of a recently identified class I MHC allele HLA-A*0201 ligand by MODPROPEP (27). As can be seen, out of a total of 625 nonameric peptides present in the antigen CABL1_HUMAN (accession no: Q8TDN4), VALEFALHL has a rank of 13 and 26 by MJ and BT potentials, respectively. Analysis of inter-molecular contacts indicates that, this peptide is stabilized by interactions involving K66, A150, V152, Y159 and W167. We have also tested the predictive ability of MODPROPEP using all the known substrates of PKA cataloged in phospho.ELM (28). Our results indicate that, in 76% of cases the true phosphorylation site can be ranked within top 30% using BT matrix. Similar benchmarking on 90 class I MHC–peptide complexes shows that, MODPROPEP can rank the true binder within top 30% in 61% of cases.

Figure 3.

A snapshot from MODPROPEP showing the result of scanning of CABL1_HUMAN protein for HLA-A*0201 restricted antigenic peptides. The experimentally identified substrate peptide VALEFALHL, is chosen for modeling in complex with HLA-A*0201 using 1AKJ as template. The residues of HLA-A*0201 in contact with the peptide residues are depicted in tabular format. The right-hand side frame shows the 3D structure of a selected peptide residue and its contacts with HLA-A*0201.

Implementation of the web server

MODPROPEP has been implemented using Perl, CGI scripts, java scripts, Jmol applet and apache web server. BLAST program downloaded from NCBI website is used for local alignments. SCWRL3 is used for the side-chain modeling. Various structural superpositions have been carried out using the program ProFit (http://www.bioinf.org.uk/software/profit).

DISCUSSION

MODPROPEP is a web server for knowledge-based modeling of peptide ligands in the active site of various MHCs and protein kinases. The software uses available crystal structures as templates and uses the program SCWRL to mutate the sequence of the protein as well as the peptide to model any peptide–MHC or peptide–kinase complex. It provides a number of user-friendly interfaces for visualization and analysis of binding pockets in these protein–peptide complexes. This software has been developed based on the assumption that MHCs and protein kinases have conserved structural fold and the ligand peptides bind essentially at the same site. A major advantage of MODPROPEP over other structural modeling programs is that, it can be used to quickly model a large number of peptides in the binding pockets of MHCs and protein kinases. Thus MODPROPEP will complement various available sequence-based programs for predicting peptide ligands for MHCs and protein kinases. Using this software the user can identify amino acids on the MHC or kinases, which are crucial for selection of a peptide ligand. Such information is important for design of novel peptide ligands or assigning specificities to new alleles of MHCs or novel families of kinases. This software also has an option for searching the MHC-binding peptides in the sequence of an antigen or phosphorylation sites on the substrate protein of a protein kinase using structure-based approach. Presently, the binding energy is being accessed using residue-based statistical potential. This scoring function is appropriate for quick preliminary ranking of putative peptide ligands. High-ranking peptides need to be modeled and detailed interactions with the proteins should be analyzed for prediction of actual binders.

28 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles.

Authors: O Schueler-Furman; Y Altuvia; A Sette; H Margalit
Journal: Protein Sci Date: 2000-09 Impact factor: 6.725

3. ProPred: prediction of HLA-DR binding sites.

Authors: H Singh; G P Raghava
Journal: Bioinformatics Date: 2001-12 Impact factor: 6.937

4. An automated prediction of MHC class I-binding peptides based on positional scanning with peptide libraries.

Authors: K Udaka; K H Wiesmüller; S Kienle; G Jung; H Tamamura; H Yamagishi; K Okumura; P Walden; T Suto; T Kawasaki
Journal: Immunogenetics Date: 2000-08 Impact factor: 2.846

5. Structural basis and prediction of substrate specificity in protein serine/threonine kinases.

Authors: Ross I Brinkworth; Robert A Breinl; Bostjan Kobe
Journal: Proc Natl Acad Sci U S A Date: 2002-12-26 Impact factor: 11.205

6. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs.

Authors: John C Obenauer; Lewis C Cantley; Michael B Yaffe
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

7. A graph-theory algorithm for rapid protein side-chain prediction.

Authors: Adrian A Canutescu; Andrew A Shelenkov; Roland L Dunbrack
Journal: Protein Sci Date: 2003-09 Impact factor: 6.725

Review 8. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.

Authors: Nikolaj Blom; Thomas Sicheritz-Pontén; Ramneek Gupta; Steen Gammeltoft; Søren Brunak
Journal: Proteomics Date: 2004-06 Impact factor: 3.984

9. A structural switch of presenilin 1 by glycogen synthase kinase 3beta-mediated phosphorylation regulates the interaction with beta-catenin and its nuclear signaling.

Authors: Kai Prager; Lihua Wang-Eckhardt; Regina Fluhrer; Richard Killick; Esther Barth; Heike Hampel; Christian Haass; Jochen Walter
Journal: J Biol Chem Date: 2007-03-14 Impact factor: 5.157

10. IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex.

Authors: James Robinson; Matthew J Waller; Peter Parham; Natasja de Groot; Ronald Bontrop; Lorna J Kennedy; Peter Stoehr; Steven G E Marsh
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

17 in total

Review 1. Understanding the focused CD4 T cell response to antigen and pathogenic organisms.

Authors: Jason M Weaver; Andrea J Sant
Journal: Immunol Res Date: 2009-02-07 Impact factor: 2.829

2. Resistance-associated epitopes of HIV-1C-highly probable candidates for a multi-epitope vaccine.

Authors: Jagadish Chandrabose Sundaramurthi; Soumya Swaminathan; Luke Elizabeth Hanna
Journal: Immunogenetics Date: 2012-07-19 Impact factor: 2.846

3. Rapid microsphere-assisted peptide screening (MAPS) of promiscuous MHCII-binding peptides in Zika virus envelope protein.

Authors: Mason R Smith; Luke F Bugada; Fei Wen
Journal: AIChE J Date: 2019-06-11 Impact factor: 3.993

4. Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases.

Authors: Qifang Xu; Kimberly L Malecka; Lauren Fink; E Joseph Jordan; Erin Duffy; Samuel Kolander; Jeffrey R Peterson; Roland L Dunbrack
Journal: Sci Signal Date: 2015-12-01 Impact factor: 8.192

5. Chk1-dependent constitutive phosphorylation of BLM helicase at serine 646 decreases after DNA damage.

Authors: Sarabpreet Kaur; Priyanka Modi; Vivek Srivastava; Richa Mudgal; Shweta Tikoo; Prateek Arora; Debasisa Mohanty; Sagar Sengupta
Journal: Mol Cancer Res Date: 2010-08-18 Impact factor: 5.852

6. PAComplex: a web server to infer peptide antigen families and binding models from TCR-pMHC complexes.

Authors: I-Hsin Liu; Yu-Shu Lo; Jinn-Moon Yang
Journal: Nucleic Acids Res Date: 2011-06-11 Impact factor: 16.971

7. Four distances between pairs of amino acids provide a precise description of their interaction.

Authors: Mati Cohen; Vladimir Potapov; Gideon Schreiber
Journal: PLoS Comput Biol Date: 2009-08-14 Impact factor: 4.475

8. HLA-B*27 subtype specificity determines targeting and viral evolution of a hepatitis C virus-specific CD8+ T cell epitope.

Authors: Katja Nitschke; Alejandro Barriga; Julia Schmidt; Jörg Timm; Sergei Viazov; Thomas Kuntzen; Arthur Y Kim; Georg M Lauer; Todd M Allen; Silvana Gaudieri; Andri Rauch; Christian M Lange; Christoph Sarrazin; Thomas Eiermann; John Sidney; Alessandro Sette; Robert Thimme; Daniel López; Christoph Neumann-Haefelin
Journal: J Hepatol Date: 2013-08-23 Impact factor: 25.083

9. Predicting HLA class I non-permissive amino acid residues substitutions.

Authors: T Andrew Binkowski; Susana R Marino; Andrzej Joachimiak
Journal: PLoS One Date: 2012-08-08 Impact factor: 3.240

10. Structural and dynamical insights on HLA-DR2 complexes that confer susceptibility to multiple sclerosis in Sardinia: a molecular dynamics simulation study.

Authors: Amit Kumar; Eleonora Cocco; Luigi Atzori; Maria Giovanna Marrosu; Enrico Pieroni
Journal: PLoS One Date: 2013-03-26 Impact factor: 3.240