Literature DB >> 20444871

HotPoint: hot spot prediction server for protein interfaces.

Nurcan Tuncbag¹, Ozlem Keskin, Attila Gursoy.

Abstract

The energy distribution along the protein-protein interface is not homogenous; certain residues contribute more to the binding free energy, called 'hot spots'. Here, we present a web server, HotPoint, which predicts hot spots in protein interfaces using an empirical model. The empirical model incorporates a few simple rules consisting of occlusion from solvent and total knowledge-based pair potentials of residues. The prediction model is computationally efficient and achieves high accuracy of 70%. The input to the HotPoint server is a protein complex and two chain identifiers that form an interface. The server provides the hot spot prediction results, a table of residue properties and an interactive 3D visualization of the complex with hot spots highlighted. Results are also downloadable as text files. This web server can be used for analysis of any protein-protein interface which can be utilized by researchers working on binding sites characterization and rational design of small molecules for protein interactions. HotPoint is accessible at http://prism.ccbb.ku.edu.tr/hotpoint.

Entities: Chemical Gene Species

Mesh：

Substances：

Year: 2010 PMID： 20444871 PMCID： PMC2896123 DOI： 10.1093/nar/gkq323

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Most molecular and cellular processes are controlled by protein–protein interactions. Proteins interact through interfaces. The energy distribution along the interface region is not homogenous; certain residues contribute more to the binding free energy, called ‘hot spots’ (1–3). Hot spots form tightly packed regions in protein interfaces (4). Presence of hot spots is important as a target to disrupt malfunctioning association of proteins by therapeutic molecules and for rational design of highly specific protein complexes (5,6). Experimentally, a hot spot can be found by evaluating the change in binding free energy upon mutating it to an alanine. For a limited number of protein complexes, alanine mutations are available and this information is deposited in databases (7,8). Highly efficient computational methods emerged to identify hot spots under the occurrence of limitations in experimental information. Although there is not a strict rule to identify hot spots, combination of several physical and chemical features of residues gives successful results. Several groups developed energy based methods (9–12), learning based methods (13–19) and molecular dynamics based methods (20–22) to predict hot spot residues computationally. Some of these methods work as servers such as Robetta (10,11) and KFC server (23). Robetta server (10,11) performs computational alanine scanning based on estimating energy (including van der Waals, H-bonds) at atomic level for a given complex and outputs changes in the binding free energy values for each residue in the interface. KFC server (23) predicts hot spots for a given complex using a machine learning approach which considers the shape specificity and surrounding structural features of the residues. Server output is the confidence scores and the predictions. Results can be visualized by an interactive viewer. Here, we present HotPoint web server, which provides a user-friendly interface to run the method developed by Tuncbag et al. (19) for online prediction of hot spots in protein interfaces. Our aim is to provide an efficient server at a single location for analysis of any protein–protein interface which can be utilized by researchers interested in protein binding sites. The method principally considers the solvent accessibility and the total contact potential of the interface residues. The output tabulates the interface residues with the highlighted hot spots and their features. Additionally, it provides an interactive 3D visualization of the submitted protein–protein interface with the predicted hot spots for observing their localization. Distinct features of HotPoint from existing servers (Robetta and KFC server) are the improved efficiency and accuracy. The calculation of solvent accessibility and pair potentials of residues are faster than atomic level computations performed by Robetta, and the prediction accuracy is higher than both Robetta and KFC server.

THE HOTPOINT METHOD

HotPoint is based on a few simple rules consisting of solvent accessibility and energetic contribution of residues. The thresholds of the model are adjusted according to a data set composed of 150 experimentally alanine mutated residues of which 58 residues are hot spots and 92 residues are non-hot spots. The interface residues, whose mutations change the binding free energy at least 2.0 kcal/mol, are considered as experimental hot spots. If the mutation results in a change <0.4 kcal/mol, that residue is labeled as experimental non-hot spot. The independent test set is derived from Binding Interface Database (BID) (7), composed of 112 residues (of which 54 residues are hot spots and 58 residues are non-hot spots). The predictive performance of this method is assessed using accuracy (the ratio of number of correctly predicted residues to number of all predicted residues), precision (the ratio of number of correctly classified hot spot residues to the number of all residues classified as hot spots), recall (the proportion of number of correctly classified hot spot residues to the number of all hot spot residues), specificity (the proportion of number of correctly predicted non-hot spot residues to the number of all non-hot spot residues) and F1 score (the balance between precision and recall). Several empirical and machine learning methods are trained and tested using several features [relative accessible surface area (ASA) in complex state, relative change in ASA upon complexation, conservation, amino acid propensity and total contact potential]. At the end, the best performance is achieved by an empirical model based on relative accessibility in complex state and total pair potentials. According to this model, if an individual interface residue is buried (its relative ASA in complex state is ≤20%) and its total contact potential is ≥18.0, this residue is flagged as a hot spot; else, it is flagged as a non-hot spot. The thresholds of the model (20% and 18.0) are inferred from training set. This model demonstrates an accuracy of 0.70, a precision of 0.73, a recall of 0.59 and a specificity of 0.79 on the independent test set, which exceeds the performance of existing approaches [such as, KFC (15), KFCA (15), ISIS (18), Robetta (11)] and machine learning approaches (such as, SVM, BayesNet, decision tree, etc.) on the same test set. The details of the data sets, methodology and an exhaustive comparison with other approaches are available in Tuncbag et al. (19).

HOTPOINT WEB SERVER

The HotPoint web server is available at http://prism.ccbb.ku.edu.tr/hotpoint. Server interface is coded in PHP. The code to predict hot spots is written in Python.

Input

Input data is the protein structure in PDB formatted coordinate file, two chain identifiers forming the interface and the interface definition. User can either run the server with default distance thresholds to extract interface residues or can change the interface definition by submitting a distance threshold. There are two options to submit a structure file. User can enter the four letter PDB code of a protein which is directly downloaded from the ftp site of PDB. The second option is uploading a structure file that is in the PDB format. HotPoint requires two chain identifiers which confine to a protein interface. Server does not work for PDB files containing only one chain and returns an error. For NMR structures, it uses the first model in the prediction and gives results for the first model. HotPoint is specific to protein–protein interfaces; chains corresponding to DNA structures return a warning in the web server. When there is not enough input data, the server informs the users of what is missing. The HotPoint web server is free and open to all users and there are no login requirements.

Extraction of computational hot spots

When a protein structure with its chain identifiers is submitted, HotPoint server starts the calculation of three consecutive steps: Extraction of interface residues: a protein interface is defined as a set of amino acids which represents a region that links two protein chains by non-covalent interactions. According to the default interface definition in the server, if the distance between any two atoms belonging to two residues, one from each chain, is less than the sum of their van der Waals radii plus a 0.5 Å tolerance, these two residues are defined as interacting. Users can change this definition by submitting a distance threshold. Calculation of the features: Residue solvent accessibilities are calculated using Naccess (24). The residue accessibilities in complex state and in monomer state are converted into relative accessibilities by dividing them to maximum accessibility of that residue. Knowledge-based solvent mediated inter-residue potentials are taken from Keskin et al. (25). In the contact potential matrix, there are 210 distinct contact potentials between all possible pairs of 20 amino acids in RT unit (R, universal gas constant; T, absolute temperature). To calculate the total contact potential of a residue in the interface, server extracts the neighbors of that residue whose side chain center of mass are closer than the cutoff (7.0 Å). Another constraint for neighbor extraction is that they should not be close neighbor in sequence (|i−j| ≥4 where i and j are residue numbers). Total contact potential of the residue is defined as the absolute of the sum of the contact potentials with its neighbors (19). Prediction based on empirical model: Finally, the empirical model [presented in Tuncbag et al. (19)] is applied on the residue to determine whether it is a computational hot spot or not. If the relative accessibility of an individual interface residue is ≤20% and its total contact potential is ≥18.0, it is labeled as hot spot (19).

Output

During the processing, the server informs users about the steps it is performing. The output of the server is a table consisting of the interface residues with their features (Figure 1). The interface residues are tabulated with chain names, one-letter residue names, residue numbers, their relative ASA in complex, relative ASA in monomer and total pair potentials. In the last column of the table, the prediction is presented as H (hot spot) or NH (non-hot spot). Background of the predicted hot spots is highlighted with red color. The prediction results as a text file and interface residue coordinates in PDB file format are also downloadable by the user. In this way, the results can be visualized in any visualization tool. Besides the downloadable files, overall complex, the interface residues and hot spots can be visualized interactively using the Jmol (26) applet window in the HotPoint server.

Figure 1.

The output page of HotPoint for the p53 DNA binding domain/53BP2 protein complex (pdb:1ycs, chain A and B). Interface residues of this complex are shown in the table with hot spot predictions: (1) the coordinates of interface residues can be downloaded; (2) hot spot prediction results are also downloadable; and (3) interface with predicted hot spots can be visualized by JMol.

An independent case study: Interleukin-2 and its receptor complex

Interleukin-2 (IL-2) is a cytokine immune system signaling molecule. IL-2 gets functional when it associates with the IL-2 receptor. To find the residues necessary for binding, several residues (K35, R38, M39, T41, F42, K43, F44, Y45, E62, P65, V69 and L72) on IL-2 are mutated to alanine. Among these residues, F42, Y45 and E62 reduce binding affinity of IL-2 to its receptor >100-folds. Further, small inhibitor molecule SP4206 also targets these hot spots of the receptor (27). HotPoint predicts all three experimental hot spots (F42, Y45 and E62) correctly for the IL-2/IL-2 receptor complex (PDB code: 1z92, chain A is IL-2 and chain B is IL-2 receptor). According to our interface definition, M39 cannot be found in the interface residues. So, for the remaining eight residues, HotPoint labels five residues (K35, R38, T41, K43 and P65) as non-hot spot, correctly. However, three residues come as false positives (F44, V69 and L72) from HotPoint prediction. As a result, 8 out of 11 alanine mutations are correctly predicted. This protein complex is independent from the training and test sets. The predictions are illustrated in Figure 2 in 3D using the output files obtained from HotPoint.

Figure 2.

IL-2/IL-2 receptor complex. The PDB code for this complex is 1z92. The red-colored residues are correctly predicted hot spots. The blue-colored ones are correctly predicted non-hot spots. The yellow-colored residues represent non-hot spot residues that are incorrectly predicted as hot spots.

CONCLUSIONS

A small subset of residues in protein interfaces comprises a large portion of binding free energy, namely hot spots. We present HotPoint server to determine computational hot spots in protein interfaces based on solvent accessibility and pair potentials which allows online calculation for all protein interfaces within practical running times. Further, the model outperforms other existing approaches. It tabulates residue level features and prediction results for a given protein complex which are also downloadable. We hope that with its simple architecture and visualization tool, HotPoint would be useful both for the experimentalists and computational scientist working on protein recognition, modeling of protein complexes and drug design.

FUNDING

TUBITAK (Research Grant No 109T343 and 109E207). TUBITAK fellowship (to N.T.). Funding for open access charge: Koc University. Conflict of interest statement. None declared.

26 in total

1. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations.

Authors: Raphael Guerois; Jens Erik Nielsen; Luis Serrano
Journal: J Mol Biol Date: 2002-07-05 Impact factor: 5.469

2. The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces.

Authors: T B Fischer; K V Arunachalam; D Bailey; V Mangual; S Bakhru; R Russo; D Huang; M Paczkowski; V Lalchandani; C Ramachandra; B Ellison; S Galer; J Shapley; E Fuentes; J Tsai
Journal: Bioinformatics Date: 2003-07-22 Impact factor: 6.937

Review 3. Targeting protein-protein interactions with small molecules: challenges and perspectives for computational binding epitope detection and ligand finding.

Authors: Domingo González-Ruiz; Holger Gohlke
Journal: Curr Med Chem Date: 2006 Impact factor: 4.530

Review 4. Principles of protein-protein interactions: what are the preferred ways for proteins to interact?

Authors: Ozlem Keskin; Attila Gursoy; Buyong Ma; Ruth Nussinov
Journal: Chem Rev Date: 2008-03-21 Impact factor: 60.622

Review 5. Anatomy of hot spots in protein interfaces.

Authors: A A Bogan; K S Thorn
Journal: J Mol Biol Date: 1998-07-03 Impact factor: 5.469

6. ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions.

Authors: K S Thorn; A A Bogan
Journal: Bioinformatics Date: 2001-03 Impact factor: 6.937

7. Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy.

Authors: Nurcan Tuncbag; Attila Gursoy; Ozlem Keskin
Journal: Bioinformatics Date: 2009-04-08 Impact factor: 6.937

8. Empirical estimation of the energetic contribution of individual interface residues in structures of protein-protein complexes.

Authors: Mainak Guharoy; Pinak Chakrabarti
Journal: J Comput Aided Mol Des Date: 2009-05-29 Impact factor: 3.686

9. PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces.

Authors: Salam A Assi; Tomoyuki Tanaka; Terence H Rabbitts; Narcis Fernandez-Fuentes
Journal: Nucleic Acids Res Date: 2009-12-11 Impact factor: 16.971

10. A feature-based approach to modeling protein-protein interaction hot spots.

Authors: Kyu-il Cho; Dongsup Kim; Doheon Lee
Journal: Nucleic Acids Res Date: 2009-03-09 Impact factor: 16.971

54 in total

1. Human proteome-scale structural modeling of E2-E3 interactions exploiting interface motifs.

Authors: Gozde Kar; Ozlem Keskin; Ruth Nussinov; Attila Gursoy
Journal: J Proteome Res Date: 2012-01-10 Impact factor: 4.466

Review 2. Tools used to study how protein complexes are assembled in signaling cascades.

Authors: Susan Dwane; Patrick A Kiely
Journal: Bioeng Bugs Date: 2011-09-01

3. A high-affinity protein binder that blocks the IL-6/STAT3 signaling pathway effectively suppresses non-small cell lung cancer.

Authors: Joong-Jae Lee; Hyun Jung Kim; Chul-Su Yang; Hyun-Ho Kyeong; Jung-Min Choi; Da-Eun Hwang; Jae-Min Yuk; Keunwan Park; Yu Jung Kim; Seung-Goo Lee; Dongsup Kim; Eun-Kyeong Jo; Hae-Kap Cheong; Hak-Sung Kim
Journal: Mol Ther Date: 2014-03-31 Impact factor: 11.454

4. Fast and accurate modeling of protein-protein interactions by combining template-interface-based docking with flexible refinement.

Authors: Nurcan Tuncbag; Ozlem Keskin; Ruth Nussinov; Attila Gursoy
Journal: Proteins Date: 2012-01-31

Review 5. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine.

Authors: Avishek Kumar; Brandon M Butler; Sudhir Kumar; S Banu Ozkan
Journal: Curr Opin Struct Biol Date: 2015-12-09 Impact factor: 6.809

Review 6. Computational prediction of protein hot spot residues.

Authors: John Kenneth Morrow; Shuxing Zhang
Journal: Curr Pharm Des Date: 2012 Impact factor: 3.116

7. PredHS: a web server for predicting protein-protein interaction hot spots by using structural neighborhood properties.

Authors: Lei Deng; Qiangfeng Cliff Zhang; Zhigang Chen; Yang Meng; Jihong Guan; Shuigeng Zhou
Journal: Nucleic Acids Res Date: 2014-05-22 Impact factor: 16.971

8. Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM.

Authors: Nurcan Tuncbag; Attila Gursoy; Ruth Nussinov; Ozlem Keskin
Journal: Nat Protoc Date: 2011-08-11 Impact factor: 13.491

9. PRICE (PRotein Interface Conservation and Energetics): a server for the analysis of protein-protein interfaces.

Authors: Mainak Guharoy; Arumay Pal; Maitrayee Dasgupta; Pinak Chakrabarti
Journal: J Struct Funct Genomics Date: 2011-04-26

10. Contribution of anion-π interactions to the stability of Sm/LSm proteins.

Authors: Luka M Breberina; Miloš K Milčić; Milan R Nikolić; Srđan Đ Stojanović
Journal: J Biol Inorg Chem Date: 2014-12-13 Impact factor: 3.358