| Literature DB >> 25811640 |
Edward S C Shih1, Ming-Jing Hwang2.
Abstract
Protein-protein docking (PPD) predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.Entities:
Year: 2015 PMID: 25811640 PMCID: PMC4498300 DOI: 10.3390/biology4020282
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Procedures used to develop NPPD. (a) An example of an amino acid network and the network parameters used in this study for a docking pose; (b) Flowchart of the training and testing of a Bayesian network model of NPPD.
Figure 2TopN success rates for NPPD, ZDOCK, and IRAD on the benchmark dataset of the unbound docking poses of 176 protein complexes. IRMSD < 2.5 Å was used to determine good (near-correct) poses. The success rates of ZDOCK and IRAD were obtained from the ZDOCK website (http://zlab.umassmed.edu/zdock/perf_decoys.shtml).
Number of benchmark complexes successfully predicted by NPPD and/or IRAD at different TopN success rates.
| Set | Top1 | Top10 | Top100 | Top1000 | Top2000 |
|---|---|---|---|---|---|
| NPPD (A) | 9 | 28 | 65 | 102 | 110 |
| IRAD (B) | 16 | 43 | 64 | 92 | 102 |
| Intersection (A∩B) | 3 | 15 | 44 | 80 | 95 |
| Union (A∪B) = a | 22 | 56 | 85 | 114 | 117 |
| Unique to NPPD or IRAD (A⊖B) = b | 19 | 41 | 41 | 34 | 22 |
| Complementarity = b/a | 86% | 73% | 48% | 30% | 19% |
⊖ (Symmetric difference): the set of elements in either of the sets and not in their intersection.
Conditions and Top1/Top10 success rates for NPPD and two other network-based scoring functions.
| Pons | NPPD | Chang | NPPD | |
| Generation of docking poses | FTDock [ | ZDOCK | RossettaDock 1.0 [ | ZDOCK |
| Number of poses generated | 10,000 | 1000 | ||
| Criterion for a success hit | L-RMSD < 10 Å | L-RMSD < 5 Å | ||
| Top 1 success rate * | 5.0% (7.0%) | 8.0% | 2.3% (25.6%) | 11.6% |
| Top10 success rate * | 10.6% (29.8%) | 18.5% | 23.2% (53.4%) | 25.6% |
* The values in parenthesis are success rates produced by combining the network parameters and the energy terms of the sampling method.
Figure 3Benchmark results for NPPD and complementarity of NPPD and several best performing PPD scoring functions. (a) The 20 best performing PPD scoring functions ordered, from left to right, by increasing Top10 success rate. All data except those for NPPD were taken from [67]. Note that the Top1, Top10, and Top100 success rates for each method, shown, respectively, as the left, center, and right bar in each group, were computed using a set of unbound docking poses (~500 for each of 118 complexes) generated by SwarmDock [68], which was different from the set generated by ZDOCK used in Figure 2 and Table 1. The leave-one-out Bayesian model of NPPD was therefore derived using these SwarmDock poses, but otherwise using the same procedures described in Figure 1. The portions of success rates for high, medium, and acceptable quality poses are shown, respectively, in red, orange, and yellow, the criteria for the three quality measures being those used by CAPRI [2]; (b) Complementarity between NPPD and each of another 16 best performing PPD scoring functions. The blue, purple, and green bars indicate the complementarity, as defined in Table 1, computed based on, respectively, the Top1, Top10, or Top100 success rates. The horizontal blue, purple, and green lines are the averaged complementarity for, respectively, theTop1, Top10, or Top100 success rates for all pairs of the 16 scoring functions (three of the scoring functions (SIPPER, PYDOCK_TOT, and PROPNSTS) of the 19 compared in (a) were not included because the data were not made available to us). References for these 19 PPD scoring functions can be found in Reference [67] and references therein.
Figure 4Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].