Literature DB >> 21576226

PRUNE and PROBE--two modular web services for protein-protein docking.

Pralay Mitra1, Debnath Pal.   

Abstract

The protein-protein docking programs typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. Although the tasks can be integrated or performed in a serial order, they are by nature modular, allowing an opportunity to substitute one algorithm with another. We have implemented two modular web services, (i) PRUNE: to select a subset of docking poses generated during sampling search (http://pallab.serc.iisc.ernet.in/prune) and (ii) PROBE: to refine, score and rank them (http://pallab.serc.iisc.ernet.in/probe). The former uses a new interface area based edge-scoring function to eliminate >95% of the poses generated during docking search. In contrast to other multi-parameter-based screening functions, this single parameter based elimination reduces the computational time significantly, in addition to increasing the chances of selecting native-like models in the top rank list. The PROBE server performs ranking of pruned poses, after structure refinement and scoring using a regression model for geometric compatibility, and normalized interaction energy. While web-service similar to PROBE is infrequent, no web-service akin to PRUNE has been described before. Both the servers are publicly accessible and free for use.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21576226      PMCID: PMC3125751          DOI: 10.1093/nar/gkr317

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Docking provides a mechanistic understanding of protein–protein interaction allowing fundamental insight to the researchers. The field of docking has received increasing attention due to its primary role in studying protein interaction networks. However, in absence of experimental or evolutionary information, protein docking is difficult due to the low free energy of biological complexes in general, as well as the computational demands of time to execute the docking scheme to arrive at the putative answers. A number of docking methods have been developed over the past three decades (1). Some of them have also been implemented as free web service (2–10). All docking methodologies typically perform four major tasks: (i) generation of docking poses, (ii) selecting a subset of poses, (iii) their structural refinement and (iv) scoring, ranking for the final assessment of the true quaternary structure. These tasks can be integrated or performed in a serial order; however, they are by nature modular allowing an opportunity to substitute one algorithm with another. This allows us to take the best components from one method and use it as input into another to process and provide an output that has a significant improvement in performance, both in the time-taken and the accuracy. This is a different paradigm than the meta servers, which take output from different servers to provide a filtered final output; here we combine components from various methods to improve the final outcome. One of the tasks that make protein–protein docking unwieldy for common use is the generation of a large number of docking poses. Although attempts have been made to use directed search (11) or pseudo-random methods (12) to reduce the search space, the structural refinement of poses and the integrated or edge-scoring strategy can be further improved. For example, multi-parametric scoring functions require a considerable computing time and reduction of parametric space without losing efficacy can be of significant advantage. We have recently shown that a single parameter edge-scoring function can select a subset of poses from a large pool of unrefined docking poses arising out of exhaustive sampling search (13). Only interface area (IA) is sufficient to screen the poses and can be used as simple rule-based edge-scoring function using the proposition that native like poses must have largest to near the largest interface areas. This edge-scoring function can be integrated into any docking search scheme to eliminate >95% of the poses generated during the docking search. Our single parameter-based pruning technique reduces the computational time significantly, yet increases the chances of selecting native-like models in the top rank list. This unique method, named PRUNE, has been benchmarked (13) against PatchDock (6) and FireDock (14) program, two state-of-the-art methods, widely used for protein–protein docking. In this article, we provide additional results by benchmarking against docking search inputs from FTDock (15), ZDock (8) and Gramm-X (5) as well. To extend the advantages of the modularly designed method of PRUNE, we have developed another modular method, protein binding evaluation (PROBE), which can work on a subset of docking poses by optimizing their side-chain contacts to score and rank them using a regression model for geometric compatibility [based on two highly correlated geometric parameters (16)], and normalized interaction energy (calculated from correlated non-bonded and solvation energies). This simple-scoring function has been benchmarked on state-of-art predictive docking methods and has been shown to perform either equally well or much superior to their sophisticated counterparts (17). We provide additional comparison of PROBE results on a test data set against top two successful servers in the recent CAPRI evaluations (18). The kind of service provided by PROBE is infrequent: FireDock (3) and FiberDock (10), are the only two sister web services we are aware of that allows upload of a subset of docking poses for ranking, although a number of other softwares exist for re-ranking docking poses in local machine (14,19–22). Here we describe the web servers that we have developed using the PRUNE and PROBE algorithms. We briefly discuss their methods and describe their modular usage. These are further elaborated in the Help pages and Tutorials depicted at the web site. The execution time usually takes a few hours. The job status is available in the browser and notifications are sent by email, if email address is provided at the job submission page. The servers PRUNE (http://pallab.serc.iisc.ernet.in/prune) and PROBE (http://pallab.serc.iisc.ernet.in/probe) are publicly available and free for use.

PRUNE: A RULE-BASED METHOD TO SELECT SUBSET OF DOCKING POSES CONTAINING NATIVE-LIKE MODELS

The output of a docking sampling search results in generation of new docking poses that are described by the transformation matrix applied on the initial coordinates of the docking partners. In PRUNE, we only calculate the IA of the unrefined docking poses generated between the reference (static) and the mobile molecule (13). We define IA as the accessible surface area (ASA) of individual subunit buried on complex formation. For IA, ASA of all the atoms in the individual subunits and that of the complex are first calculated as per definition of Lee and Richards (23) using a default probe radius of 1.4Å. An atom is thereafter defined to be an interface atom if it loses its ASA by >0.1Å2 upon complex formation. The summation of the loss of ASA by all the interface atoms divided by 2 is the IA for the dimer. After computing IA of all docking poses, a histogram of 50 bins based upon IA is drawn. A sixth-order polynomial [f(x)] is fitted on this histogram. After that the saddle points of the polynomial f(x) are determined. Proceeding towards increasing absolute values of x for the polynomial f(x), the first two saddle points (I and I) are chosen, where I is the saddle point corresponding to the highest frequency value in the histogram and I is the immediate next saddle point. The bin number at which the straight line joining I and I intersects X-axis is chosen as cut-off point (Cp). If I cannot be computed, then Cp is chosen as double of I. The poses belonging to bin number ≥Cp is selected as the subset of poses suitable for final scoring and ranking. The final subset of docking poses has largest and near-largest interface areas, chosen using the dynamically computed parameter Cp. The underlying rule as described above and used in the PRUNE server has been extensively tested and benchmarked (13). Briefly, the tests were done on 922 bound, and 77 unbound binary docking targets covering 193–7658Å2 interface ranges. The unbound data set was further divided into three categories: rigid body, medium and difficult, as per the definition of Mintseris et al. (24). Sampling at 12° rotation for the bound cases returned 91% cases, where at least one near-native docking pose could be retrieved in the pruned subset. A near-native pose was defined using the ‘acceptable’ (10 Å LRMSD) criterion of Mendez et al. (25). Sampling at 12° rotation for the unbound data set returned only 68% cases with near-native poses. We, therefore, sampled the data set using 9° rotation which improved the success to 83%. This suggested that the chance of locating a near native pose in the pruned subset increased with lowering of the rotation step size. Comparative tests on rank improvement by FireDock scoring on the subsets of top 1000 poses selected by PatchDock program and our method showed that for 61 unbound rigid body docking cases, our subset gave improved rank in 2-fold more cases than PatchDock input; for the 10 medium category cases the comparative improvement was in ∼3-fold cases and in six difficult category cases there were no winners. The ability of the FireDock program to return a correct result within top 10 rank is increased by at least 2-folds if the subset of poses obtained by our method was used, in contrast to use of all the docking poses available from the sampling search.

PRUNE SERVER: INPUT, OUTPUT AND USER INTERFACE

Input and output

The PRUNE server requires four inputs: (i) receptor coordinate file (Protein Data Bank, PDB format), (ii) ligand coordinate file (PDB format), (iii) a file containing the transformation matrices for generating docking poses and (iv) information on the source of the file format of the transformation matrices. Currently, four different file formats for transformation matrices from FTDock (15), ZDock (8), PatchDock (6) and Gramm-X (5) are supported by the server. All the inputs are mandatory as they are used to recreate the three-dimensional representation of the docking poses and compute their IA. A typical example executed in a 2.3 GHz single CPU workstation for pruning 10 000 poses takes 5 min for generation of the protein-complex coordinates and another 76 min for pruning based on interface area, when the two molecules are of chain length 294 and 76. IA is computed using the NACCESS program (26), which takes 1–2 sec per protein molecule depending on the size of the complex. The result page outputs transformation matrices of only those poses that lie beyond the cut-off point (Cp) of the IA distribution, in the same format as the input. The user may download the information on the pruned poses and use it for ranking by any software of his/her choice. Alternately, a single button click option is also provided at the PRUNE result page to score and rank using our scoring method by forwarding the pruned poses to the PROBE server. Example size of the pruned subsets and the percentage of near native poses obtained from PRUNE server is shown in Table 1. These can be compared against the total number of near native poses available from the docking search. As can be seen, for all the targets in the test set, a small subset of the total number of poses is selected which contain at least one near native pose.
Table 1.

Performance of PRUNE showing the percentage of poses selected and the near natives isolated when a list of docking poses derived from four different docking methods are input

Docking
Total number of near native poses generated
Percentage cases after pruning
Target
Partner 1 PDB_ ChainPartner 2 PDB_ ChainFTDockbZDockcPatchDockdGramm-XeFTDockb
ZDockc
PatchDockd
Gramm-Xe
PDB_ ChainsIA (Å2)PosesNear nativePosesNear nativePosesNear nativePosesNear native
1J2J_ABf6051O3Y_A1OXZ_A3385022183.4200.610.40.9400.660
1EAW_ABg7451EAX_A9PTI_A4152725702.2014.62.252.11.17014.1417.1
Capri 88532331702.5500.469.10.3611.81.89X
1Z0K_ABf8932BME_A1YZM_A68110524125.6010.30.270.10.7905.648.3
1CLV_AIh10421JAE_A1QFD_A60302040413.6655.00.847.74.4780.03.26100
1GPW_ABf10491THF_D1K9V_F2420933262.2045.83.5277.54.8890.93.2138.5
Capri 6108619188401.0026.33.4155.94.301003.09X
1UGH_EIi10962SSP_E1UDI_I3066632112.2440.02.2236.02.3331.30.430
Capri 26116622641833.3027.30.561.60.5600.500
2G77_ABh12621FKM_A1Z06_A204002682.7230.02.7813.32.0730.82.0825.0
1DFJ_EIg12919RSA_B2BNH_A4178012.9367.06.471002.37X2.53100
1KXP_ADg16711IJJ_B1KW2_B3151002.311004.3386.11.07X3.09X

aNear natives are defined as per the ‘acceptable’ criteria of Mendez et al. (25).

b12° rotation: 27 720 total poses.

c6° rotation: 54 000 total poses.

dUsing default parameters at PatchDock web server: http://bioinfo3d.cs.tau.ac.il/PatchDock/.

e10° rotation: 10 000 total poses.

fRef. (29).

gRef. (24).

hRef. (28).

iRef. (27).

X means that there were no near native pose in the total set of poses generated, so no near natives could be pruned.

Performance of PRUNE showing the percentage of poses selected and the near natives isolated when a list of docking poses derived from four different docking methods are input aNear natives are defined as per the ‘acceptable’ criteria of Mendez et al. (25). b12° rotation: 27 720 total poses. c6° rotation: 54 000 total poses. dUsing default parameters at PatchDock web server: http://bioinfo3d.cs.tau.ac.il/PatchDock/. e10° rotation: 10 000 total poses. fRef. (29). gRef. (24). hRef. (28). iRef. (27). X means that there were no near native pose in the total set of poses generated, so no near natives could be pruned.

PROBE FOR SCORING AND RANKING DOCKING POSES

The subset of docking poses available after the use of edge- or integrated-scoring function during a docking search is suitable for use with the PROBE algorithm (17). The subset of docking poses received by PROBE are structurally refined using the Restricted Side-Chain Optimization (RISCO) algorithm which includes a step of Monte Carlo based rigid-body optimization (14). However, since a geometry optimization does not guarantee that all steric clashes be eliminated, we use a pseudo-scaling procedure to eliminate spurious estimation of energy/ASA values. It may be noted that both van der Waals and Coulomb interaction makes significant contribution at closer than canonical contact distances and the pseudo-scaling procedure eliminates these potential errors. The same is true, when ASA is used for estimation of solvation energy. The final scoring in PROBE is based on four physiochemical parameters: interface packing (IP) (16), surface complementarity (SC) (16), pseudo-scaled non-bonding energy (NE) and solvation energy (SE). The SC and IP value of the docking pose is used to calculate a distance surface complementarity/interface packing (SP) from an expected SC and IP value representing a geometric model, derived from the linear regression curve fitted on SC and IP values from a non-redundant data set of interfaces (mathematically, SP = |SC−IP × 0.6547−0.1495|). The poses are grouped that are alike within ≤1.0Å root-mean-square deviation (RMSD) and the difference of SP within <0.04. Representative poses for each group are chosen and group-based SP, NE and SE values calculated. The NE and SE values are mapped on to a normalized grid and their values are used together along with SP in a scoring function to compute the scores. The scores are sorted to compute ranks of representative poses. Data showing performance of PRUNE+PROBE on 12 unbound targets chosen arbitrarily—three from different CAPRI targets (http://www.ebi.ac.uk/msd-srv/capri/), one from Bernauer et al. (27), two from the recently published Benchmark 4.0 (28) and the rest from Benchmark 2.0 (24) and 3.0 (29) are given in Table 2. The P-values of the poses calculated using hypergeometric distribution indicate a high confidence of the top ranked predictions (30). Comparison shows that the overall ranking performance of ClusPro (31) is better, but PROBE is able to give better results in cases where ClusPro under performs or fails. Interestingly, the PROBE performance competes strongly with FiberDock, despite use of much simpler scoring function and methodologies.
Table 2.

Comparative performance of PROBE on select unbound targets

TargetRankb
PROBE P-value
PDBClusProcFiberDockdPROBEe
1J2J14No result120.179
1EAW24110.010
Capri 8No resultNo result190.011
1Z0K122850.098
1CLV1820.109
1GPW2130.011
Capri 614510.003
1UGH22970.083
Capri 26122630.072
2G77319230.157
1DFJ2No result70.023
1KXP13010.003

aThe PROBE scoring and ranking was done on the pruned subsets as depicted in Table 1.

bRank of top ‘acceptable’ solution as per Mendez et al. (25).

cTo dock the proteins, the coordinate files of the subunits were uploaded to the ClusPro web server: http://cluspro.bu.edu/.

dThe docking partners were uploaded in PatchDock web server for docking. The maximum permissible top 100 poses output by the PatchDock web server was scored and ranked using FiberDock web server (http://bioinfo3d.cs.tau.ac.il/FiberDock/).

eThe poses were generated locally by the ZDock program using 6° rotational sampling and pruned using the PRUNE server. The pruned subset of poses were scored and ranked using the PROBE server.

Comparative performance of PROBE on select unbound targets aThe PROBE scoring and ranking was done on the pruned subsets as depicted in Table 1. bRank of top ‘acceptable’ solution as per Mendez et al. (25). cTo dock the proteins, the coordinate files of the subunits were uploaded to the ClusPro web server: http://cluspro.bu.edu/. dThe docking partners were uploaded in PatchDock web server for docking. The maximum permissible top 100 poses output by the PatchDock web server was scored and ranked using FiberDock web server (http://bioinfo3d.cs.tau.ac.il/FiberDock/). eThe poses were generated locally by the ZDock program using 6° rotational sampling and pruned using the PRUNE server. The pruned subset of poses were scored and ranked using the PROBE server.

PROBE SERVER: INPUT, OUTPUT AND USER INTERFACE

The primary task of PROBE server is to take a subset of docking poses and rank them after refining and scoring. Therefore, the inputs are designed to cater to two situations: (i) where the user has subset of poses generated by some docking search technique and he/she wants to rank them or, (ii) the user has two unbound protein molecules to dock them. For all uploaded coordinate files, a number of preprocessing steps are performed, such as, checking the minimum peptide length of 25, removal of solvent molecules—like water, resolving the atoms with multiple occupancy, or conversion of atoms under amino acid type from HETATM MSE to ATOM MET (if the user wish to do so) as per PDB file format. If the first option is selected then additionally the file with transformation matrix information is needed to recreate the docking poses. The format of the matrix file also needs to be mentioned using drop down box (currently supported formats are: FTDock, ZDock, PatchDock and Gramm-X). For the second option, the server uses FTDock module (15) to generate the docking poses. The default rotation sampling is set to 12° (default of FTDock module) for generating the poses, but the user can avail other choices, such as 9°, 15° and 18°. The time taken for docking-pose generation increases as the rotation step size is decreased. We, therefore, strongly recommend that the user generate their own docking poses and upload the same for scoring and ranking by PRUNE+PROBE. In any case, users wishing to avail denser sampling (<9°) need to generate the docking poses locally in their own system. By default, the server returns the top 10 predictions, but the user may change it. The output of the PROBE server is a rank-sorted list of complexes with details on individual parameters, and the final PROBE score values. Hyperlinks are provided to download the coordinates of receptor and the ligand files. A summary of results in tabulated form is also provided (Figure 1). The coordinates of these files are post-processed versions of the original PDB files uploaded by the user. Each predicted complex can also be visualized using the Jmol software (http://jmol.sourceforge.net/) using java applet. For this the Java software has to be installed in the user’s system. The input/output contents are similar to the FireDock and FiberDock server, although derived from different scoring functions.
Figure 1.

The PROBE result page. (A) The table contains five physicochemical parameters (column second–sixth) as computed by the PROBE server at the protein–protein interface. These parameters are used to calculate the PROBE score to determine the ranks. First column which indicates the rank of the predicted complex is hyperlinked to allow download of coordinates and the button at last column is for Jmol-visualization of the complex. A compressed file containing all the predicted complexes can be downloaded as well. A summary table can be retrieved from hyperlink ‘PROBEresult’. (B) The visualization of the complex is available at the browser. The title indicates the rank of the complex.

The PROBE result page. (A) The table contains five physicochemical parameters (column second–sixth) as computed by the PROBE server at the protein–protein interface. These parameters are used to calculate the PROBE score to determine the ranks. First column which indicates the rank of the predicted complex is hyperlinked to allow download of coordinates and the button at last column is for Jmol-visualization of the complex. A compressed file containing all the predicted complexes can be downloaded as well. A summary table can be retrieved from hyperlink ‘PROBEresult’. (B) The visualization of the complex is available at the browser. The title indicates the rank of the complex.

IMPLEMENTATION

The server is designed using C, Perl, Python, FORTRAN, HTML and PHP. The front end of the server is designed in PHP (version 5.2.9) and HTML. Perl has a very strong string-handling features, so it has been used for initial preprocessing of the protein coordinate files. The core of the method has been implemented in C language. The low-level features of C language helps to optimize and parallelize the code wherever needed. The parallel code in C language uses Message Passing Interface (MPI)—an application programming interface which allows many computers to interact with each other and thus distribute the load of the job. The computation takes place in a server which is a Linux cluster with AMD64 bit Opteron system. The pruning of docking poses designed for PROBE and PRUNE server is implemented in Python (version 2.5.2). The Python code first fits the sixth-order polynomial on IA histogram generated by C language and then computes the saddle points to draw a straight line. The cut-off point (Cp) on X-axis is then returned to C program for further computation. The IA calculations are done using FORTRAN code.

CONCLUSION

Two modular web services, PRUNE and PROBE are described for protein–protein docking. The web services will allow the users to try out a variety of methods to improve their predictions. The advantage of such a web service is that it can also be continually upgraded to include new methods. PRUNE is a unique web-service that uses a single parameter, interface-area-based edge function that has been shown to time-efficiently select subsets of docking poses for improved scoring and ranking of poses. PROBE on the other hand, is similar to the previously described FireDock and FiberDock server, using its own method for scoring and ranking for a subset of docking poses. It can also work as a standalone docking server using the FTDock program’s docking-pose generation module. The method has been shown to efficiently score docking poses using its simple scoring function.

FUNDING

Funding for open access charge: Department of Biotechnology, New Delhi, Government of India. Conflict of interest statement. None declared.
  29 in total

1.  ClusPro: an automated docking and discrimination method for the prediction of protein complexes.

Authors:  Stephen R Comeau; David W Gatchell; Sandor Vajda; Carlos J Camacho
Journal:  Bioinformatics       Date:  2004-01-01       Impact factor: 6.937

2.  Assessment of blind predictions of protein-protein interactions: current status of docking methods.

Authors:  Raúl Méndez; Raphaël Leplae; Leonardo De Maria; Shoshana J Wodak
Journal:  Proteins       Date:  2003-07-01

3.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites.

Authors:  Hani Neuvirth; Ran Raz; Gideon Schreiber
Journal:  J Mol Biol       Date:  2004-04-16       Impact factor: 5.469

4.  Using correlated parameters for improved ranking of protein-protein docking decoys.

Authors:  Pralay Mitra; Debnath Pal
Journal:  J Comput Chem       Date:  2010-10-12       Impact factor: 3.376

5.  Protein-Protein Docking Benchmark 2.0: an update.

Authors:  Julian Mintseris; Kevin Wiehe; Brian Pierce; Robert Anderson; Rong Chen; Joël Janin; Zhiping Weng
Journal:  Proteins       Date:  2005-08-01

6.  Automated docking of substrates to proteins by simulated annealing.

Authors:  D S Goodsell; A J Olson
Journal:  Proteins       Date:  1990

7.  Modelling protein docking using shape complementarity, electrostatics and biochemical information.

Authors:  H A Gabb; R M Jackson; M J Sternberg
Journal:  J Mol Biol       Date:  1997-09-12       Impact factor: 5.469

8.  The interpretation of protein structures: estimation of static accessibility.

Authors:  B Lee; F M Richards
Journal:  J Mol Biol       Date:  1971-02-14       Impact factor: 5.469

9.  A geometry-based suite of molecular docking processes.

Authors:  D Fischer; S L Lin; H L Wolfson; R Nussinov
Journal:  J Mol Biol       Date:  1995-04-28       Impact factor: 5.469

10.  PatchDock and SymmDock: servers for rigid and symmetric docking.

Authors:  Dina Schneidman-Duhovny; Yuval Inbar; Ruth Nussinov; Haim J Wolfson
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

View more
  4 in total

1.  Bacterial flagellar switching: a molecular mechanism directed by the logic of an electric motor.

Authors:  Shyantani Maiti; Pralay Mitra
Journal:  J Mol Model       Date:  2018-09-13       Impact factor: 1.810

2.  DOCKSCORE: a webserver for ranking protein-protein docked poses.

Authors:  Sony Malhotra; Oommen K Mathew; Ramanathan Sowdhamini
Journal:  BMC Bioinformatics       Date:  2015-04-24       Impact factor: 3.169

Review 3.  Evolution of In Silico Strategies for Protein-Protein Interaction Drug Discovery.

Authors:  Stephani Joy Y Macalino; Shaherin Basith; Nina Abigail B Clavio; Hyerim Chang; Soosung Kang; Sun Choi
Journal:  Molecules       Date:  2018-08-06       Impact factor: 4.411

4.  Ebola Virus VP35 Protein: Modeling of the Tetrameric Structure and an Analysis of Its Interaction with Human PKR.

Authors:  Anupam Banerjee; Pralay Mitra
Journal:  J Proteome Res       Date:  2020-09-18       Impact factor: 4.466

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.