Literature DB >> 18836195

PiSite: a database of protein interaction sites using multiple binding states in the PDB.

Miho Higurashi¹, Takashi Ishida, Kengo Kinoshita.

Abstract

The vast accumulation of protein structural data has now facilitated the observation of many different complexes in the PDB for the same protein. Therefore, a single protein complex is not sufficient to identify their interaction sites, especially for proteins with multiple binding states or different partners, such as hub proteins. PiSite is a database that provides protein-protein interaction sites at the residue level with consideration of multiple complexes at the same time, by mapping the binding sites of all complexes containing the same protein in the PDB. PiSite provides easy web interfaces with an interactive viewer working with typical web browsers, and the different binding modes can be checked visually. All of the information can also be downloaded for further analyses. In addition, PiSite provides a list of proteins with multiple binding partners and multiple binding states, as well as up-to-date statistics of protein-protein interfaces. PiSite is available at http://pisite.hgc.jp.

Entities: Chemical Gene Species

Mesh：

Substances：

Year: 2008 PMID： 18836195 PMCID： PMC2686547 DOI： 10.1093/nar/gkn659

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Protein–protein interactions are fundamental for proteins to exert their biological functions, and the molecular interactions of proteins can be understood by observing the 3D structure of their complexes. Therefore, many efforts have been made to solve complex structures experimentally, and a large number of structures are now available in the Protein Data Bank (PDB) (1). As a result of the vast accumulation of structural data from several genomic projects (2), we can now estimate the structural changes of proteins (3), the evolution of homo-oligomerization states (4,5) and the changes in protein–protein interactions (6) by analyzing all of the structures in the PDB simultaneously. The structural characteristics of protein–protein interaction sites have been extensively studied (7,8), and the knowledge has been used for the prediction of binding sites (9–12). For the statistical analyses, one representative is usually selected for each group with similar amino acid sequences. However, some proteins, called hub proteins (13), interact with several kinds of different proteins, and thus one representative complex is not sufficient to describe all of the interfaces of a hub protein. For the identification of all of the binding sites of a hub protein, all of the complexes within the PDB should be considered at the same time. Protein structures often consist of structural domains, and protein interactions can sometimes be interpreted as domain–domain interactions (14). Therefore, many databases to describe domain–domain interactions have been developed [3did (15), DIMA2 (15) and DOMINE (16)], and networks of domain interactions have been constructed. In these databases, large-scale interaction networks are the main focus, and the residue-level interactions are not described. On the other hand, for better understanding the molecular interactions of protein domains, SCOPPI (17) and iPFAM (18) provide residue-level information about the interacting domains, but the interacting residues are provided for each pair of interacting domains (iPfam) or a multiple alignment in the family with interface classification (SCOPPI), and thus it is difficult to observe several different binding modes of proteins with multiple partners. In addition, SCOPPI focuses on the diversity at the family level, while we would like to observe the multiple binding modes for individual proteins. In the similar way, PiBase (19) provides detailed physicochemical properties for all the protein complexes in the PDB as a correction of binary relationships. In this article, we describe a new protein–protein interaction database, PiSite, based on the protein complexes in the PDB. We tried to provide a simple user interface to observe the ‘real’ binding sites, by simultaneously considering multiple binding states of individual proteins at the residue level, and not just for protein domains but for entire protein chains. PiSite also provides a list of ‘sociable proteins’, proteins with multiple binding states and multiple binding partners, which are considered as the key molecules in protein interaction networks. It should be noted that the sociable proteins are somewhat different from the so-called hub proteins, because the so-called hub proteins, which are usually defined as proteins with multiple binding partners in a protein interaction network obtained by large-scale experiments (20–23), are sometimes the subunits of a supermolecule (6). Furthermore, PiSite provides up-to-date statistics of protein interfaces. PiSite will be periodically updated every 3 months, using the most recent version of the PDB.

CONSTRUCTION METHOD

Data set

The current version of PiSite was constructed from the PDB entries as of July 2008. We did not use the asymmetric unit of the coordinates, but employed the biological units distributed by RCSB, to eliminate the crystallographic interfaces (ftp://ftp.rcsb.org/pub/pdb/data/biounit/coordinates). All protein chains with more than 30 residues were considered as proteins and we excluded the entries with >5.0 Å resolution and the models with only Cα coordinates. As a result, we selected 110 325 protein chains from the 51 482 PDB entries in our data set.

Mapping

The binding sites that appeared in the PDB were gathered by mapping from all complexes to each protein chain (Fig. 1). For this purpose, at first, a similarity search by BLAST (24) against all protein chains in the data set was carried out for each protein chain. Ideally, the exact match of the amino acid sequence may be sufficient to find other complex structures in the PDB, but to enlarge the complex information and to avoid minor errors in sequence records due to missing residues and/or modifications, the proteins with sequence identity >90% and with coverage of the smaller protein >80% were used to select the entries for mapping. Then, mapping of the binding sites from one complex to a query chain was performed according to the BLAST alignment. The identification of the binding site residues was achieved by the distance criteria: when the minimum distance between the atoms in a residue pair was <4.0 Å, then the pair of residues was defined as contact residues. If the number of contacting residues between a pair of proteins was less than two, then the pair of protein chains was not used for mapping. It should be noted that we sometimes refer to the similar proteins used for the mapping as similar proteins for simplicity, but our focus is not to analyze the interfaces among the family members, in contrast to other protein–protein databases.

Figure 1.

An explanation of residue mapping. The upper panel shows a schematic representation of residue mapping, and the lower panel shows an example of the mapping by using 3D models. In the example, the GTP binding protein RAN (PDB: 1byu, chain A) was used. The gray, light blue, purple, green, blue and red chains were taken from 1byuB, 1a2kD, 1ibrB, 1k5gKL and 1l1mB, respectively (the four letter code indicates the PDB ID and the fifth letter means the chain ID).

Definition of the binding state

By selecting similar entries, as mentioned, we can enumerate all complexes containing the protein chain being considered, and can identify all of the binding partners that interact with the considering protein chain. The binding partners were grouped by sequence identity, and the number of groups was used as the number of binding partners shown in PiSite. Here, we regarded two proteins as being in the same group, if they have >30% sequence identity and >50% coverage of the smaller chain. Each binding state of a protein was defined as the combination and the number of binding partners appeared in the PDB. For example, when we consider the binding state of protein A with the complexes A–B and A–B–C in the PDB, then the number of binding states of protein A is two. If we have another complex A–B–B, then the number of binding states is three, because the number of binding partner B is different from that in the complex A–B. It should be noted that the similarities between binding sites are not considered explicitly in our approach, thus the number of binding state can be smaller than that of the binding modes, while we used all complexes for the residue mapping to get the comprehensive mapping. The definitions of binding states and binding partners are virtually same as those used in Higurashi et al. (6), but we modified the original protocol to handle all PDB entries in PiSite, as described.

Definition of sociable proteins

We basically followed the sociable protein definition by Higurashi et al. (6). However, since automated processes are required in this study and we could not apply manual curation, we used a stricter definition. We defined proteins with three or more binding states and three or more binding partners as sociable proteins. We excluded proteins with more than 10 chains in a single PDB entry as supermolecules. In addition, we also excluded proteins with four or fewer similar proteins in the data set. This last condition is to ensure the reliability of sociable protein identifications. If the number of similar proteins is too small, then this definition may contain some errors.

CONTENTS OF PiSite

The main content of the PiSite is an interactive view of the multiple binding states, and we refer to the corresponding web page as the interaction viewer. The interaction viewer (Fig. 2) shows all binding states appearing in the PDB, and is prepared for each protein chain. It consists of four parts: title table including a link to UniprotKB (25), a molecule viewer, a binding state viewer and a download menu. The title table contains brief descriptions of the protein chain and the numbers of similar proteins, binding states and binding partners.

Figure 2.

An example of a PiSite entry. See the main text for details.

An example of a PiSite entry. See the main text for details. The molecule viewer shows the protein, colored according to the number of binding partners for each residue in default. The amino acid sequence is also colored according to the same procedure, and is shown just below the viewer. The position of each amino acid can be checked by clicking a residue in the sequence, which changes the specified residue into a CPK model (Fig. 2). Visualization of the molecule was done by jV [formally known as pdbjviewer (26)], and thus the view of the molecule can be interactively rotated and translated by mouse operations, and the residue position can be inspected by clicking the residue on the screen. More detailed options for jV are also available, by opening the option screen at the top of the molecule viewer (Fig. 2). The binding state viewer shows a different binding state of the proteins, by communicating with the molecule viewer. By switching to a different binding state with the radio button, the complex structure is shown in the molecule viewer with the same color as the background of the partner name. It may be noteworthy that two or more protein names appear as binding partners as in the cases of the 9th and 10th binding states in Fig. 2, which means that the binding state contains three or more chains, including the chain under consideration. The names of the binding partners are taken from the DBREF record in the PDB entry. All of the data obtained after mapping the similar proteins can be downloaded in a flat file or XML format. The format is described in the download page.

ACCESS TO EACH ENTRY, WITH AN EXAMPLE

PiSite provides two different ways to access each entry. The first way is access from a compiled list. PiSite provides a list of representative sociable proteins and supermolecules, as mentioned above. These lists contain links for the entries and the user can access them by clicking the links. The other way is to search by sequence or keywords. Both sequence and keyword searches are available, through the search form on the top page. A sequence search requires amino acid sequence of a protein chain as an input and is performed by using BLAST. A more general search is by keywords. If the user wants to check the PiSite entry for Ras p21 proteins, for example, then the user can search ‘RAS p21’ as a keyword. As a result, the search results, including a moleculer description of the target and information about the number of binding states and partners are obtained. Each PiSite entry can be accessed from links in a search result page. In this case, the user can access the PiSite entry of the hRas P21 protein (PDBID: 121p, chain A) by clicking the link of the top hit of the search results (Fig. 2). If the PDB ID of a target protein is already known, then rapid search can be available selecting the ‘search by PDB ID’ radio button. The interaction viewer shows that the protein has 10 different binding states and 9 different binding partners. The molecule viewer shows the number of binding partners of each residue, and the residues contacting many partners are colored magenta. Interestingly, this magenta-colored region, including residues 32–42, corresponds to the effector region of a Ras protein. This region of a Ras protein is known as a hot spot and has been identified as a neutralizing epitope of hRas (27). More mapping result details can be obtained from the download menu.

FUNDING

Institute for Bioinformatics Research and Development; Japan Science and Technology Corporation (BIRD-JST); Scientific Research on Priority Areas from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to K.K.). Funding for open access charge: Annual budget for Human Genome Center. Conflict of interest statement. None declared.

27 in total

1. iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions.

Authors: Robert D Finn; Mhairi Marshall; Alex Bateman
Journal: Bioinformatics Date: 2004-09-07 Impact factor: 6.937

2. eF-site and PDBjViewer: database and viewer for protein functional sites.

Authors: Kengo Kinoshita; Haruki Nakamura
Journal: Bioinformatics Date: 2004-02-10 Impact factor: 6.937

3. PIBASE: a comprehensive database of structurally defined protein interfaces.

Authors: Fred P Davis; Andrej Sali
Journal: Bioinformatics Date: 2005-01-18 Impact factor: 6.937

Review 4. The impact of structural genomics: expectations and outcomes.

Authors: John-Marc Chandonia; Steven E Brenner
Journal: Science Date: 2006-01-20 Impact factor: 47.728

Review 5. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

6. Identification of effector residues and a neutralizing epitope of Ha-ras-encoded p21.

Authors: I S Sigal; J B Gibbs; J S D'Alonzo; E M Scolnick
Journal: Proc Natl Acad Sci U S A Date: 1986-07 Impact factor: 11.205

Review 7. Protein-protein interactions: a review of protein dimer structures.

Authors: S Jones; J M Thornton
Journal: Prog Biophys Mol Biol Date: 1995 Impact factor: 3.667

8. SCOPPI: a structural classification of protein-protein interfaces.

Authors: Christof Winter; Andreas Henschel; Wan Kyu Kim; Michael Schroeder
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. 3D complex: a structural classification of protein complexes.

Authors: Emmanuel D Levy; Jose B Pereira-Leal; Cyrus Chothia; Sarah A Teichmann
Journal: PLoS Comput Biol Date: 2006-10-05 Impact factor: 4.475

10. 3did: interacting protein domains of known three-dimensional structure.

Authors: Amelie Stein; Robert B Russell; Patrick Aloy
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

17 in total

1. Improved prediction of protein binding sites from sequences using genetic algorithm.

Authors: Xiuquan Du; Jiaxing Cheng; Jie Song
Journal: Protein J Date: 2009-08 Impact factor: 2.371

2. Evolution of specificity in protein-protein interactions.

Authors: Orit Peleg; Jeong-Mo Choi; Eugene I Shakhnovich
Journal: Biophys J Date: 2014-10-07 Impact factor: 4.033

3. Structural alphabets derived from attractors in conformational space.

Authors: Alessandro Pandini; Arianna Fornili; Jens Kleinjung
Journal: BMC Bioinformatics Date: 2010-02-20 Impact factor: 3.169

4. AraPPISite: a database of fine-grained protein-protein interaction site annotations for Arabidopsis thaliana.

Authors: Hong Li; Shiping Yang; Chuan Wang; Yuan Zhou; Ziding Zhang
Journal: Plant Mol Biol Date: 2016-06-23 Impact factor: 4.076

5. M-ORBIS: mapping of molecular binding sites and surfaces.

Authors: Laurent-Philippe Albou; Olivier Poch; Dino Moras
Journal: Nucleic Acids Res Date: 2010-09-02 Impact factor: 16.971

6. PatternQuery: web application for fast detection of biomacromolecular structural patterns in the entire Protein Data Bank.

Authors: David Sehnal; Lukáš Pravda; Radka Svobodová Vařeková; Crina-Maria Ionescu; Jaroslav Koča
Journal: Nucleic Acids Res Date: 2015-05-26 Impact factor: 16.971