Literature DB >> 23896518

PepBind: a comprehensive database and computational tool for analysis of protein-peptide interactions.

Arindam Atanu Das¹, Om Prakash Sharma, Muthuvel Suresh Kumar, Ramadas Krishna, Premendu P Mathur.

Abstract

Protein-peptide interactions, where one partner is a globular protein (domain) and the other is a flexible linear peptide, are key components of cellular processes predominantly in signaling and regulatory networks, hence are prime targets for drug design. To derive the details of the protein-peptide interaction mechanism is often a cumbersome task, though it can be made easier with the availability of specific databases and tools. The Peptide Binding Protein Database (PepBind) is a curated and searchable repository of the structures, sequences and experimental observations of 3100 protein-peptide complexes. The web interface contains a computational tool, protein inter-chain interaction (PICI), for computing several types of weak or strong interactions at the protein-peptide interaction interface and visualizing the identified interactions between residues in Jmol viewer. This initial database release focuses on providing protein-peptide interface information along with structure and sequence information for protein-peptide complexes deposited in the Protein Data Bank (PDB). Structures in PepBind are classified based on their cellular activity. More than 40% of the structures in the database are found to be involved in different regulatory pathways and nearly 20% in the immune system. These data indicate the importance of protein-peptide complexes in the regulation of cellular processes.

Entities: Chemical Disease Gene

Keywords: PepBind; Peptide-binding proteins database; Protein inter-chain interaction; Protein–peptide complex; Protein–peptide interaction tool; Protein–peptide interface

Mesh：

Substances：
Peptides
Proteins

Year: 2013 PMID： 23896518 PMCID： PMC4357787 DOI： 10.1016/j.gpb.2013.03.002

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Functional analyses of proteins involve the exploration of their interactions with other molecules, which plays vital roles in different pathways. Nearly 60% of the interaction pathways such as signal transduction, apoptotic, immune system and other pathways contain domains with bound peptides [1]. These interactions are prevalent in Src homology 2 (SH2) domain, major histocompatibility complex (MHC), antibodies, proteases, calmodulin, PapD chaperone and OppA (oligopeptide permease A) structures, with variable sequence specificity and binding affinity [2]. Protein–peptide interactions require only a small interface and can occur in many interaction networks. Hence, these are attractive drug targets both for small molecules and inhibitory peptides [3-5]. This implies that synthetic peptides can be designed to alter specific interactions in disease or other pathways [1,6,7]. Out of the structures deposited in the Protein Data Bank (PDB) [8], every month around 20 new entries are shown to exhibit interactions with small peptides. As the number of new and interesting protein–peptide complex structures continue to expand, our understanding of these protein–peptide recognition events should improve. To understand and analyze the protein–peptide interaction mechanisms, a reliable database of protein–peptide complexes is necessary. A number of sequence-based protein–peptide interaction databases are available, such as ELM [9], PhosphoELM [10], DOMINO [11], SCANSITE [12], PepBank [13], APD [14], ASPD [15] and BIOPEP [16]. Structural data are also available on protein–peptide complex structures in peptiDB [17] and PepX [18]. While peptiDB is a set of 103 curated PDB files for non-redundant protein–peptide complexes, PepX contains 1431 non-redundant X-ray structures clustered based on their binding interfaces and backbone variations. Previous studies report heterogeneity of domains or proteins to bind multiple peptides (e.g., at least 13 different types of peptides have been reported to bind to SH3 domains [19]). For detailed analysis of interactions of similar proteins with different peptides, an enormous amount of data concerning protein–peptide complex structures are needed. To address this problem, we have created the Peptide Binding Protein Database (PepBind), which contains 3100 available protein structures from the PDB, irrespective of the structure determination methods and similarity in their protein backbone. Different kinds of interactions have been noted in the stabilization of protein–peptide binding. Analyses of various interacting interfaces between linear peptide and protein domains help us in distinguishing transient and permanent complexes [20-22]. It has been demonstrated that protein-peptide interfaces contain more hydrogen bonds per 100 Å2 solvent accessible surface area (ASA) (i.e., 50% more than protein–protein interactions and 100% more than intrinsically-unstructured regions to protein interactions) [17]. The importance of other interactions such as interactions between nonpolar hydrophobic amino acid residues and ionic interactions in the structure and function of proteins is also well known [23,24]. Knowing the importance of protein–peptide interface hydrogen bonds and other kinds of interactions, we developed and integrated a web-based interaction tool, protein inter-chain interaction (PICI), which calculates all the interface hydrogen bonds along with other interactions (such as disulfide bonds, hydrophobic interactions and ionic interactions) in tertiary structures of protein–peptide complexes and can be visualized with an integrated Jmol [25] viewer. Although a similar tool, Protein Interaction Calculator (PIC) [26], has been available, this tool calculates interface interactions specific for the peptide chain of a protein–peptide complex structure and visualizes them in a single web page along with highlighted interacting residues on sequences. We have also developed a binding prediction server built in PepBind (http://pepbind.bicpu.edu.in/PepBind_prediction_beta.php) to predict the possible protein domains in the PepBind database that may bind the user-defined peptide sequence.

Results

The PepBind database provides researchers with residue and atomic-level information about sequences and structures of protein–peptide complexes and their interfaces, helping in the analysis of protein–peptide interactions by computing various interface interactions and by providing structural information both interactively on screen and in a text format (Figure 1). The PepBind database also maintains a repository of structure coordinate files, PDBML [27] data files and protein–peptide interaction files generated by PICI tool. The database is updated on a regular basis to serve as a resource for structural, functional and protein–peptide interaction studies of peptide-binding proteins. Researchers can also submit protein–peptide complexes to the database, which will be uploaded to PepBind after manual verification.

Figure 1

Interactions between the complex of Apopain with the tetrapeptide inhibitor ACE-DVA-ASK (PDB ID: 1CP3) Hydrogen bonds (A), hydrophobic interaction (B) and ionic interactions (C) were identified by PICI server. Interacting residues are colored in brown.

Database statistics

As shown in Table 1, current version of PepBind contains structural information for a total of 3100 protein–peptide complexes. Based on cellular activity, 1745 complexes of all the 3100 proteins (56.3%) are involved in regulatory pathways, along with inhibitory complexes. Our study shows 1278 structures (41.2%) in the database play major roles in hormonal activity, gene regulation, transcription and signal transduction pathways along with transferases. Furthermore, 600 structures (19.3%) in the database are found to function in the immune system. It has been found that 252 proteins (8.1%) are structural, contractile and membrane proteins involved mainly in transport (5.2%) and cell adhesion (1.9%). In addition, 953 (30.7%) structures have protease or other hydrolase activities, while 10.5% structures in the database are associated with proteins involved in other cellular activities.

Table 1

Contents of the PepBind database

Cellular activity	No. of complexes (%)	Functional category	No. of complexes (%)
Cell cycle	90 (2.9)	Structural, contractile and membrane proteins	252 (8.1)
Structural proteins	126 (4.0)
Cell adhesion	59 (1.9)
Transporta	163 (5.2)

Calmodulin (CaM)	42 (1.3)	Regulatory proteins	1278 (41.2)
Apoptosis	125 (4.0)
Signaling	626 (20.2)
Hormones	84 (2.7)
Transferasesb	415 (12.7)
Transcription	268 (8.6)
Gene regulation	38 (1.2)

Inhibitory complex	663 (21.4)	Inhibitory complexes	663 (21.4)

MHC	340 (10.9)	Immune system	600 (19.3)
Immunoglobulin (Ig)	250 (8.0)
Antibiotics	15 (0.5)
Other immune system proteins	98 (3.1)

Proteases	687 (22.1)	Proteases and other hydrolases	953 (30.7)
Other hydrolases	266 (8.5)	Proteases and other hydrolases	953 (30.7)

Others	326 (10.5)	Others	326 (10.5)

Note: There are totally 3100 protein–peptide complexes in PepBind. Since some proteins are multi-functional, there are overlaps among different categories.

Transporters, channels and pumps;

Transferases along with kinase, phosphomutase, transaldolase and transketolase.

Web interface

The user interface has been developed for browsing through all the contents of the database as a list or by different categories (Figure 2). For the ease of users to search and access data, we have integrated many search tools (Figure 2A) into the web interface. Using the ‘simple search’ function, users can retrieve information about protein–peptide complexes using their PDB ID or protein name. Our ‘keyword search’ tool scans all the fields of all the tables in PepBind for the matched word and returns a list of all protein structures related to the query. Using the ‘advanced search’ function, users can filter search based on peptide length, cellular activity of proteins, structure determination methods (e.g., X-ray diffraction, nuclear magnetic resonance and electron microscopy) and authors contributing to solving protein structure. All these search options with their parameters are joined by ‘AND’ operator for an intensive search. Additionally, to find any protein sequences homologous to the sequence submitted, we provide BLAST searching [28] against PepBind/PDB/SwissProt.

Figure 2

Snapshots of PepBind output A. Search page with search parameters. B. Result summary page showing all the chains with their sequence. C. Jmol showing protein–peptide interface and sequence viewer showing protein chains with identified residues highlighted. D. Detailed result page displaying summary of the protein and other tab options.

The web interface for the output result has been designed to show all the chains present in the protein structure (Figure 2B). Each chain is linked to the PICI web tool for analyzing its interactions with other chains of the protein. This tool shows the interaction details by highlighting the corresponding interacting residues in the displayed sequence along with the Jmol visualization tool for the identified interactions between the residues (Figure 2C). Different tab viewers have been designed for various types of interactions. The protein detail page shows information about protein complex on a single web page under different tabs (Figure 2D), such as summary, sequence and source, gene ontology, methodology, Ramachandran plot, citation and external links. While the ‘sequence and source’ tab displays amino acid sequence in different colors as per their biochemical properties along with source organism data, the ‘Ramachandran plot’ tab shows the Ramachandran plot image developed by the MolProbity [29] server, and the ‘Gene Ontology’ tab shows GO functional annotation [30]. For a structure similarity search, we take advantage of the web service of PDB, which employs the FATCAT algorithm [31] to recognize homologous domains available at PepBind, SCOP [32] and PDP [33].

Discussion

Protein–peptide interactions are the key components of cellular processes such as signal transduction, protein trafficking, defense mechanisms and enzyme regulation. Various databases are available on protein interactions. They can be grouped as protein-small molecule, protein-nucleic acid and protein–protein interaction databases. However, the retrieval of structural and functional information of protein–peptide interactions in biological processes is tedious due to the lack of specific databases to provide such details. The establishment of the PepX database has resolved the difficulty of unavailability of a protein–peptide interaction database, whereby authors have classified the proteins based on backbone variations and binding interfaces. While in PepX, grouping is based solely on 3D similarity, PepBind complements PepX by providing interface information for both the peptide and protein chains of the complexes along with their cellular functions and options for sequence and structure similarity searches. PepBind is integrated with the Jmol viewer to visualize the interface residues along with the interaction files generated by the PICI tool. Furthermore, PepBind provides BLAST search and structure similarity search for protein chains. It also provides a prediction service for binding of user-given peptides to possible protein domains present in the PepBind database. Links to other related databases and servers for the queried protein are provided for further analysis of the structures. These resources include PDB [8], PDBsum [34], Pfam [35], CASTp [36], OCA Browser (http://bip.weizmann.ac.il/oca/), PSI/KB (http://sbkb.org/kb/), SRS [37], MMDB [38], PQS [39], SCOP [32], CATH [40], Proteopedia [41], Jena Library [42] and UniProt [43]. Currently our interaction tool PICI is capable of analyzing inter-chain interactions like hydrogen bond, disulfide bridge, hydrophobic interaction and ionic interaction. Keeping in view the importance of other weak interactions in stabilizing the protein structure, we plan to improve our tool to study interactions such as aromatic-aromatic interactions [44], cation-pi interactions [45] and aromatic-sulfur interactions [46]. In addition, the current interaction tool capabilities will be extended to user-submitted structures, allowing for examination of interfaces in complexes currently not present in the PepBind.

Methods

Data collection and curation

Files for atomic coordinate (pdb files – version 3.30), sequences (fasta files) and other data (pdbml files – version 4.0) of 3100 protein–peptide complexes in the PDB were downloaded following a thorough manual screening of all the available structures in the PDB. Because PepBind intends to be a comprehensive collection of protein–peptide complexes from the PDB, the database contains all the available protein–peptide complexes, irrespective of their sequence or structure redundancy. Classification of all the collected structure data was done in three steps: (I) an automated program to scan the amino acid sequences and classify them based on length of the bound peptide, (II) manual curation for the cellular activity of the complexes through study of the literature and (III) an automated program to read the data file and group the complexes as per their structure determination methods. Functionality has been analyzed through literature studies and classified as proteins involved in different cellular activities and grouped in 19 categories.

Database schema and implementation

The PepBind database consists of a series of server-side scripts written in the PHP programming language with HTML and JavaScript for user interface functions, which runs on the Apache 2.2 web server, using MySQL 5.1 as a database back-end. Atomic coordinate information from the PDB and other related information from other remote databases and web servers were mined through an automated program and stored in a file repository for further processing. We developed sets of PHP scripts for operating with the available data and process them for easy integration in the database and front-end user interface. The first set of scripts reads the PDBML files [27], extracts the data, and inserts them into the database tables; the second set sorts these data with respect to each attribute and the third set generates web pages with specific information about individual complexes.

Utilities and tools

The PICI tool for depicting potential hydrogen bonds and other interactions between the short peptide and core protein was developed and integrated into PepBind. This tool parses the structure coordinate files, removes the hetero atoms and water molecules, and predicts the interaction based on coordinate distance between atoms of amino acid residues of small peptide and the protein. For structures determined by NMR, the first model in the file is taken for calculation by PICI tool. For the two atoms A(1, 1, 1) and B(2, 2, 2), linear distance D is calculated as per the Euclidean distance equation D(A, B) = √{(x1 − x2)2 + (y1 − y2)2 + (z1 − z2)2}. Various potential interactions are calculated based on standard and published criteria. The hydrogen bond is detected if the distance between oxygen or nitrogen atoms of the peptide and the protein domain is ⩽3.5 Å [47]. Interactions between hydrophobic residues (such as alanine, valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, proline and tyrosine) [48] have been predicted if they fall within 5 Å range. Apart from these interactions, ionic residue (arginine, lysine, histidine, aspartic acid and glutamic acid) pairs falling within 6 Å contribute to ionic interactions. The tool with integrated Jmol viewer shows various interactions between the peptide and the amino acid residues of the interacting protein chains. Moreover, it highlights the positions of interacting amino acid residues on the displayed sequence (Figure 2D). This tool also generates an interaction file for each type of interactions. A sequence modification tool has been developed and incorporated into the result page, which can read the protein sequence file and color the amino acid sequence (using single letter code) of protein according to their biochemical properties (such as green for non-polar hydrophobic amino acids, yellow for uncharged polar amino acids, blue for positively charged amino acids, red for negatively charged amino acids and black for non standard amino acids). A web-based prediction server has been provided to find the protein domains present in the database that likely bind to the user-given peptide. The sequence search tool present in the web interface allows users to BLAST search the queried sequence in the database using various parameters. All data related to structure, sequence and interface interactions currently in the PepBind database have been made available for further analysis. These files along with the complete list of the PepBind dataset can be downloaded freely from our database. A reporting tool has been integrated to generate the result in a printer-friendly PDF file.

Authors’ contributions

PPM, RK and MSK conceived and designed the project. AAD collected the data, developed the database, developed the tools and designed the website. OPS developed the BLAST search script. AAD and OPS wrote the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors have no competing interests to declare.

46 in total

1. The Protein Data Bank at 40: reflecting on the past to prepare for the future.

Authors: Helen M Berman; Gerard J Kleywegt; Haruki Nakamura; John L Markley
Journal: Structure Date: 2012-03-07 Impact factor: 5.006

2. Structural characterisation and functional significance of transient protein-protein interactions.

Authors: Irene M A Nooren; Janet M Thornton
Journal: J Mol Biol Date: 2003-01-31 Impact factor: 5.469

3. FATCAT: a web server for flexible structure comparison and structure similarity searching.

Authors: Yuzhen Ye; Adam Godzik
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

Review 4. Inter-residue interactions in protein folding and stability.

Authors: M Michael Gromiha; S Selvaraj
Journal: Prog Biophys Mol Biol Date: 2004-10 Impact factor: 3.667

Review 5. Peptidomimetics, a synthetic tool of drug discovery.

Authors: Josef Vagner; Hongchang Qu; Victor J Hruby
Journal: Curr Opin Chem Biol Date: 2008-05-14 Impact factor: 8.822

6. The structural basis of peptide-protein binding strategies.

Authors: Nir London; Dana Movshovitz-Attias; Ora Schueler-Furman
Journal: Structure Date: 2010-02-10 Impact factor: 5.006

7. The EBI SRS server--recent developments.

Authors: Evgeni M Zdobnov; Rodrigo Lopez; Rolf Apweiler; Thure Etzold
Journal: Bioinformatics Date: 2002-02 Impact factor: 6.937

Review 8. Protein-peptide interactions.

Authors: R L Stanfield; I A Wilson
Journal: Curr Opin Struct Biol Date: 1995-02 Impact factor: 6.809

9. Ion-pairs in proteins.

Authors: D J Barlow; J M Thornton
Journal: J Mol Biol Date: 1983-08-25 Impact factor: 5.469

10. Approved drug mimics of short peptide ligands from protein interaction motifs.

Authors: Laavanya Parthasarathi; Fergal Casey; Amelie Stein; Patrick Aloy; Denis C Shields
Journal: J Chem Inf Model Date: 2008-10-01 Impact factor: 4.956

17 in total

Review 1. Building Bridges Between Structural and Network-Based Systems Biology.

Authors: Christos T Chasapis
Journal: Mol Biotechnol Date: 2019-03 Impact factor: 2.695

2. Peptide Gaussian accelerated molecular dynamics (Pep-GaMD): Enhanced sampling and free energy and kinetics calculations of peptide binding.

Authors: Jinan Wang; Yinglong Miao
Journal: J Chem Phys Date: 2020-10-21 Impact factor: 3.488

3. PixelDB: Protein-peptide complexes annotated with structural conservation of the peptide binding mode.

Authors: Vincent Frappier; Madeleine Duran; Amy E Keating
Journal: Protein Sci Date: 2017-11-02 Impact factor: 6.725

4. GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimization.

Authors: Hasup Lee; Lim Heo; Myeong Sup Lee; Chaok Seok
Journal: Nucleic Acids Res Date: 2015-05-12 Impact factor: 16.971

5. Computational identification and characterization of antigenic properties of Rv3899c of Mycobacterium tuberculosis and its interaction with human leukocyte antigen (HLA).

Authors: Ritam Das; Kandasamy Eniyan; Urmi Bajpai
Journal: Immunogenetics Date: 2021-07-06 Impact factor: 2.846