| Literature DB >> 17623864 |
Shide Liang1, Song Liu, Chi Zhang, Yaoqi Zhou.
Abstract
Near-native selections from docking decoys have proved challenging especially when unbound proteins are used in the molecular docking. One reason is that significant atomic clashes in docking decoys lead to poor predictions of binding affinities of near native decoys. Atomic clashes can be removed by structural refinement through energy minimization. Such an energy minimization, however, will lead to an unrealistic bias toward docked structures with large interfaces. Here, we extend an empirical energy function developed for protein design to protein-protein docking selection by introducing a simple reference state that removes the unrealistic dependence of binding affinity of docking decoys on the buried solvent accessible surface area of interface. The energy function called EMPIRE (EMpirical Protein-InteRaction Energy), when coupled with a refinement strategy, is found to provide a significantly improved success rate in near native selections when applied to RosettaDock and refined ZDOCK docking decoys. Our work underlines the importance of removing nonspecific interactions from specific ones in near native selections from docking decoys. (c) 2007 Wiley-Liss, Inc.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17623864 PMCID: PMC2673351 DOI: 10.1002/prot.21498
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
The Number of Top 5 Decoys with rmsd < 10 Å given by EMPIRE and the RosettaDock scoring function
| Pdb ID | 1CGI | 1CHO | 2PTC | 1TGS | 2SNI | 2SIC | 1CSE | 2KAI |
| EMPIRE | 1 | 5 | 4 | 5 | 5 | 5 | 5 | 5 |
| RosettaDock | 4 | 3 | 2 | 5 | 4 | 5 | 2 | 4 |
| Pdb ID | 1BRC | 1ACB | 1BRS | 1MAH | 1UGH | 1DFJ | 1FSS | 1AVW |
| EMPIRE | 5 | 3 | 4 | 5 | 5 | 5 | 3 | 5 |
| RosettaDock | 1 | 2 | 4 | 5 | 5 | 4 | 5 | 5 |
| Pdb ID | 1PPE | 1TAB | 1UDI | 1STF | 2TEC | 4 HTC | 1MLC | 1WEJ |
| EMPIRE | 5 | 5 | 5 | 5 | 5 | 5 | 2 | 2 |
| RosettaDock | 5 | 5 | 5 | 5 | 5 | 5 | 0 | 0 |
| Pdb ID | 1AHW | 1DQJ | 1BVK | 1FBI | 2JEL | 1BQL | 1JHL | 1NQA |
| EMPIRE | 0 | 1 | 1 | 5 | 5 | 2 | 1 | 5 |
| RosettaDock | 5 | 2 | 5 | 3 | 5 | 5 | 1 | 5 |
| Pdb ID | 1NMB | 1MEL | 2VIR | 1EO8 | 1QFU | 1IAI | 2PCC | 1WQ1 |
| EMPIRE | 5 | 5 | 3 | 1 | 4 | 3 | 4 | 4 |
| RosettaDock | 5 | 5 | 4 | 1 | 5 | 0 | 3 | 3 |
| Pdb ID | AVZ | 1MDA | 1IGC | 1ATN | 1GLA | 1SPB | 2BTF | 1A0Q |
| EMPIRE | 0 | 4 | 1 | 5 | 5 | 5 | 3 | 4 |
| RosettaDock | 0 | 3 | 2 | 5 | 1 | 5 | 4 | 1 |
| Pdb ID | 1BTH | 1FIN | 1FQ1 | 1GOT | 1EFU | 3HHR | #(≥3) | #(>) |
| EMPIRE | 0 | 0 | 4 | 5 | 2 | 2 | 39 | 21 |
| RosettaDock | 0 | 0 | 2 | 0 | 0 | 0 | 34 | 10 |
Enzyme/Inhibitor: the first 22 protein complexes (1CGI-4HTC); antibody-antigen: the next 16 protein complexes (1MLC-1IAI); the others: (2PCC to 1A0Q); and the difficult set (1BTH to 3HHR).
This work.
The high-resolution RosettaDock scoring function.41,71
The number of protein-protein complexes with more than 3 near-native structures (rmsd < 10Å) in top 5 ranked decoys.
The number of near natives given by EMPIRE that is greater than that given by RosettaDock.
The number of near natives given by RosettaDock that is greater than that given by EMPIRE.
The Ranks and rmsd Values of Refined Structures in ZDOCK2.3 Decoy Sets
| Rank(rmsd) | ||||
|---|---|---|---|---|
| Complex | No. of hits | Original | Sidechain | Minimization |
| 1CGI | 77 | 107 (1.54) | 48 (2.02) | 1 (2.18) |
| 1CHO | 99 | 1 (1.26) | 1 (1.01) | 1 (1.57) |
| 2PTC | 48 | 8 (1.03) | 1 (0.44) | 1 (0.44) |
| 1TGS | 109 | 10 (2.46) | 4 (1.55) | 3 (1.85) |
| 2SNI | 1 | 425 (2.22) | 617 (2.22) | 92 (2.22) |
| 2SIC | 52 | 2 (2.06) | 3 (2.06) | 3 (1.04) |
| 1CSE | 29 | 1 (0.50) | 5 (1.10) | 4 (1.24) |
| 2KAI | 16 | 151 (2.30) | 3 (1.69) | 28 (1.69) |
| 1BRC | 54 | 21 (1.21) | 1 (1.73) | 1 (2.30) |
| 1ACB | 93 | 2 (1.44) | 14 (1.44) | 4 (0.93) |
| 1BRS | 21 | 20 (1.30) | 26 (1.97) | 15 (2.29) |
| 1MAH | 28 | 238 (1.78) | 104 (0.84) | 1 (0.89) |
| 1UGH | 20 | 1069 (1.60) | 66 (1.13) | 1 (1.60) |
| 1DFJ | 51 | 517 (2.38) | 1 (1.70) | 1 (1.70) |
| 1FSS | 15 | 54 (1.04) | 1 (1.07) | 2 (1.05) |
| 1AVW | 52 | 1 (1.89) | 12 (1.48) | 1 (1.53) |
| 1PPE | 393 | 1 (0.52) | 1 (1.46) | 1 (0.87) |
| 1TAB | 50 | 1 (0.51) | 1 (1.56) | 1 (1.56) |
| 1UDI | 35 | 12 (1.06) | 1 (0.94) | 1 (0.79) |
| 1STF | 83 | 1 (0.80) | 1 (1.42) | 1 (1.01) |
| 2TEC | 185 | 1 (0.68) | 1 (1.25) | 1 (0.92) |
| 4HTC | 57 | 45 (1.40) | 1 (0.69) | 1 (0.69) |
| 1MLC | 17 | 46 (2.46) | 395 (2.46) | 338 (2.46) |
| 1WEJ | 22 | 5 (0.91) | 12 (0.57) | 62 (0.57) |
| 1AHW | 67 | 25 (1.41) | 7 (1.75) | 4 (1.23) |
| 1DQJ | 0 | − (−) | − (−) | − (−) |
| 1BVK | 2 | 672 (2.34) | 450 (2.34) | 419 (2.34) |
| 1FBI | 5 | 1593 (2.18) | 534 (2.18) | 447 (2.18) |
| 2JEL | 35 | 598 (1.90) | 20 (1.16) | 1 (1.09) |
| 1BQL | 70 | 14 (0.68) | 11 (0.84) | 9 (0.84) |
| 1JHL | 12 | 121 (1.16) | 9 (1.16) | 50 (1.85) |
| 1NCA | 67 | 8 (1.51) | 56 (0.83) | 2 (1.93) |
| 1NMB | 9 | 1 (0.99) | 427 (0.99) | 337 (1.13) |
| 1MEL | 71 | 2 (1.36) | 3 (1.01) | 1 (1.07) |
| 2VIR | 3 | 79 (1.03) | 527 (1.03) | 521 (1.19) |
| 1EO8 | 2 | 55 (0.94) | 607 (0.94) | 72 (0.94) |
| 1QFU | 18 | 21 (0.75) | 92 (0.78) | 1 (0.78) |
| 1IAI | 3 | 52 (1.47) | 106 (1.47) | 429 (1.70) |
| 2PCC | 0 | − (−) | − (−) | − (−) |
| 1WQ1 | 54 | 121 (2.23) | 10 (1.88) | 9 (1.20) |
| 1AVZ | 0 | − (−) | − (−) | − (−) |
| 1MDA | 0 | − (−) | − (−) | − (−) |
| 1IGC | 3 | 141 (1.18) | 785 (1.20) | 227 (1.18) |
| 1ATN | 24 | 1 (0.56) | 1 (0.52) | 1 (0.80) |
| 1GLA | 0 | − (−) | − (−) | − (−) |
| 1SPB | 112 | 2 (0.61) | 1 (0.61) | 1 (0.95) |
| 2BTF | 35 | 1 (0.65) | 1 (1.02) | 1 (0.83) |
| 1A0O | 4 | 21 (2.45) | 13 (2.25) | 427 (2.45) |
| Top 1 (Top 10) | 10 (18) | 14 (22) | 20 (29) | |
Enzyme/Inhibitor: the first 22 protein complexes (1CGI-4HTC), antibody-antigen: the next 16 protein complexes (1MLC-1IAI). The rest are 10 other complexes.
The number of hits (near-native structures with interface rmsd < 2.5Å).
The highest rank of hits (and its interface rmsd).
Original decoys without any refinement.
Results after sidechain optimization.
Results after sidechain optimization and energy minimization.
Docking decoys from unbound and bound structures.
Figure 1The binding affinity as a function of rmsd (Å) for the original ZDOCK decoys (a), after sidechain optimization (b), and after 50 steps of minimization(c). Only top 500 ranked decoys for each case are shown in this figure. This is the result of target 1DFJ.
Figure 2The number of successful predictions with or without the reference state as labeled for original ZDOCK decoys, after sidechain optimization, after further 50, 100, and 200 steps of minimization.
Figure 3As in Figure 1, but compares the binding affinity for EMPIRE with or without the reference state (target 1ATN). Only top-ranked 500 decoys are shown.
Figure 4The binding affinity as a function of the buried solvent accessible surface area of interface for EMPIRE with or without the reference state (target 1ATN). Only top-ranked 500 decoys are shown. The solid line denotes the result from linear regression on the data given by EMPIRE without the reference state (with a correlation coefficient of −0.54). There is no correlation for the data given by EMPIRE (with the reference state).
Figure 5As in Figure 1, but compares the binding affinity for EMPIRE with or without the reference state (target 1GOT). Only top-ranked 500 decoys are shown. The solid lines denote the result from linear regression on the data with a correlation coefficient of 0.12 for EMPIRE without the reference state and 0.41 for EMPIRE.