| Literature DB >> 35132139 |
Jing Wang1, Alexey Ishchenko2, Wei Zhang2, Asghar Razavi2, David Langley2.
Abstract
Although seeking to develop a general and accurate binding free energy calculation method for protein-protein and protein-ligand interactions has been a continuous effort for decades, only limited successes have been obtained so far. Here, we report the development of a metadynamics-based procedure that calculates Dissociation Free Energy (DFE) and its application to 19 non-congeneric protein-protein complexes and hundreds of protein-ligand complexes covering eight targets. We achieved very high correlations in comparison to experimental binding free energies for these diverse sets of systems, demonstrating the generality and accuracy of the method. Since structures of most proteins are available owing to the recent success of prediction by artificial intelligence, a general free energy method such as DFE, combined with other methods, can make structure-based drug design a widely viable and reliable solution to develop both traditional small molecule drugs and biologic drugs as well as PROTACS.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35132139 PMCID: PMC8821539 DOI: 10.1038/s41598-022-05875-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Concept and workflow of a DFE calculation. (a) One-way trip dissociation of a molecular complex by running metadynamics using an intermolecular distance D as CV. The dotted curve on the right panel is an illustrative trajectory of the centroid of the CV residue of protein B (green ribbon). (b) Primitive FES from a number of replicas of different random seeds, noting that a wall is placed in each run. (c) Average FES stemming from an ensemble of primitive FES, noting the position r0 of the bound state minimum and the boundary rb separating the bound state region and the free state region. (d) Boltzmann factor as a function of D, narrowly concentrated around r0. (e) Convergence analysis: DFE is iteratively calculated as more and more replicas enter into averaging, showing that DFE converges when a sufficient number of replicas are averaged.
Calculation of DFE of 19 non-congeneric PPCs to compare with experimental binding free energies.
| PDB code | PPI type | CVa | DFE (kcal/mol) | ∆Gcb (kcal/mol) | ∆Gec (kcal/mol) | Kdd (M) |
|---|---|---|---|---|---|---|
| 1EMV | IM9 immunity protein–Colicin E9 nuclease | V37A–F86B | − 32.42 ± 0.16 | − 15.65 | − 19.32 | 2.4 × 10–14 |
| 2PTC | Trypsin–BPTI | D194E–K50I | − 36.59 ± 0.27 | − 17.53 | − 18.75 | 6 × 10–14 |
| 1BVN | α-Amylase–Tendamistat | D300P–A823T | − 34.67 ± 0.07 | − 16.66 | − 15.65 | 9.2 × 10–12 |
| 1R0R | A serine protease–OMTKY3 | L126E–T17I | − 32.38 ± 0.13 | − 15.63 | − 14.94 | 2.94 × 10–11 |
| 1ACB | Chymotrypsin–Eglin C | S214E–L45I | − 29.38 ± 0.2 | − 14.27 | − 13.76 | 2 × 10–10 |
| 1AY7 | Rnase SA–Barstar | T82A–L20B | − 28.54 ± 0.18 | − 13.9 | − 13.76 | 2 × 10–10 |
| 2UUY | Trypsin–Tryptase inhibitor | D196E–K39I | − 24.11 ± 0.32 | − 11.9 | − 11.7 | 5.6 × 10–9 |
| 1KAC | Adenovirus protein–Human receptor | L430A–W59B | − 23.99 ± 0.16 | − 11.84 | − 11.11 | 1.48 × 10–8 |
| 3BZD | TCR Vβ8.2–Enterotoxin C-3 | F75A–M24B | − 17.73 ± 0.23 | − 9.02 | − 9.95 | 9.6 × 10–8 |
| 2C0L | PEX5–SCP2 | S612A–L31B | − 14.56 ± 0.15 | − 7.59 | − 9.88 | 1.09 × 10–7 |
| 1KTZ | TGFβ–TGFβ receptor | V33A–T51B | − 20.78 ± 0.05 | − 10.39 | − 9.27 | 2.9 × 10–7 |
| 3LVK | Cys desulfurase–Sulfurtransferase | D52A–R27B | − 17.72 ± 0.17 | − 9.01 | − 9.25 | 3 × 10–7 |
| 1FFW | Chemotaxis protein CheY–Chemotaxis protein CheA | L84A–L212B | − 14.46 ± 0.07 | − 7.54 | − 8.33 | 1.35 × 10–6 |
| 3F1P | HIF2A–ARNT | V340A–I458B | − 20.83 ± 0.47 | − 10.42 | − 8.3 | 1.4 × 10–6 |
| 1US7 | HSP90–P50 | F104A–L205B | − 17.59 ± 0.16 | − 8.96 | − 8.28 | 1.46 × 10–6 |
| 3A4S | UBC9–SLD2 | I45B–I408C | − 13.61 ± 0.17 | − 7.16 | − 7.87 | 2.81 × 10–6 |
| 1QA9 | CD2–CD58 | K91A–D33B | − 20.74 ± 0.12 | − 10.38 | − 7.16 | 9 × 10–6 |
| 2OOB | CBL-B–Ubiquitin | L69B–A937A | − 9.81 ± 0.15 | − 5.44 | − 5.99 | 6 × 10–5 |
| 3SGB | Proteinase B–OMTKY3 | L18I–S195E | − 18.84 ± 0.03 | − 9.52 | − 15.24 | 1.79 × 10–11 |
aA CV was defined by the distance between the centroid of the backbone heavy atoms of a residue in protein 1 and the centroid of the backbone heavy atoms of a residue in protein 2. For 2PTC, 2C0L, 3LVK, 3F1P and 2OOB, the Cβ was also included in the definition of the corresponding centroid.
bCalculated binding free energy by introducing DFE into Eq. (3) in the text.
cExperimental binding free energy calculated using ∆Ge = kT ln(Kd) at T of 310 K.
dExperimental binding constant Kd from ref.[27].
Figure 2Correlation plots between calculated DFE and experimental binding free energies ∆Ge. (a) Correlation plot for 19 PPCs. The open circle is an outlier (corresponding to complex 3SGB in Table 1). The indicated R2 and SE (Standard Error) correspond to the Least-Square-Fitting after excluding the outlier. (b) Correlation plot for PLCs for target CDK2.
Comparison of calculated DFE and binding free energies ∆Gc before and after the correction.
| PDB code | Na | tb (ns) | Applied corrections | DFEc (kcal/mol) | DFEnc d (kcal/mol) | ∆Gce (kcal/mol) | ∆Gcnc f (kcal/mol) |
|---|---|---|---|---|---|---|---|
| 1EMV | 50 | 40 | No corrections | − 32.42 | − 32.42 | − 15.65 | − 15.57 |
| 2PTC | 50 | 20 | 18 runs extended to 30 ns; 2 runs to 40 ns | − 36.59 | − 35.95 | − 17.53 | − 17.43 |
| 1BVN | 50 | 40 | No corrections | − 34.67 | − 34.67 | − 16.66 | − 16.57 |
| 1R0R | 50 | 20 | No corrections | − 32.38 | − 32.38 | − 15.63 | − 15.55 |
| 1ACB | 50 | 30 | No corrections | − 29.38 | − 29.38 | − 14.27 | − 14.22 |
| 1AY7 | 60 | 40 | 13 runs with Invasions removed | − 28.54 | − 30.76 | − 13.9 | − 14.36 |
| 2UUY | 50 | 20 | 4 runs extended to 30 ns | − 24.11 | − 24.14 | − 11.9 | − 11.88 |
| 1KAC | 50 | 30 | No corrections | − 23.99 | − 23.99 | − 11.84 | − 11.82 |
| 3BZD | 50 | 20 | 3 runs with Invasions removed | − 17.73 | − 17.77 | − 9.02 | − 9.04 |
| 2C0L | 50 | 20 | 13 runs with repeated sampling removed | − 14.56 | − 15.9 | − 7.59 | − 7.61 |
| 1KTZ | 50 | 20 | 8 runs extended to 30 ns | − 20.78 | − 20.79 | − 10.39 | − 10.39 |
| 3LVK | 50 | 20 | 3 runs with repeated sampling removed; 11 runs extended to 30 ns; 2 runs to 40 ns | − 17.72 | − 16.89 | − 9.01 | − 9.02 |
| 1FFW | 50 | 10 | 11 runs with repeated sampling removed; 6 runs extended to 20 ns | − 14.46 | − 13.24 | − 7.54 | − 7.03 |
| 3F1P | 50 | 30 | 7 runs with repeated sampling removed; 2 runs extended to 40 ns | − 20.83 | − 20.31 | − 10.42 | − 10.41 |
| 1US7 | 50 | 10 | No corrections | − 17.59 | − 17.59 | − 8.96 | − 8.96 |
| 3A4S | 50 | 10 | 2 runs with Invasions removed; 17 runs extended 20 ns | − 13.61 | − 14.42 | − 7.16 | − 7.55 |
| 1QA9 | 50 | 20 | No Corrections | − 20.74 | − 20.74 | − 10.38 | − 10.37 |
| 2OOB | 50 | 10 | 2 runs with repeated sampling removed; 5 runs extended to 20 ns | − 9.81 | − 9.54 | − 5.44 | − 5.5 |
| 3SGB | 50 | 40 | No correction | − 18.84 | − 18.84 | − 9.52 | − 8.96 |
aThe number of runs.
bChemical time of a run before the correction process.
cDFE after the correction process.
dDFE before the correction process.
eCalculated binding free energy after the correction process.
fCalculated binding free energy before the correction process.
R2, standard errors (SE) and relationships between DFE and experimental binding free energies ∆Ge for 8 protein targets with different sets of ligands, as well as the simulation conditions.
| Target | Number of ligands | CVa | Nb | tc (ns) | R2 | SE (kcal/mol) | Relationship |
|---|---|---|---|---|---|---|---|
| CDK2 | 16 | F80 | 40 | 20 | 0.67 | 0.72 | ∆Ge = 0.858DFE + 7.26 |
| TYK2 | 16 | M978 | 50 | 10 | 0.66 | 0.79 | ∆Ge = 0.602DFE − 0.85 |
| P38α | 34 | L75 | 40 | 20 | 0.6 | 0.65 | ∆Ge = 0.39DFE − 3.33 |
| JNK1 | 21 | L110 | 27 | 10 | 0.51 | 0.62 | ∆Ge = 0.559DFE − 2.86 |
| MCL1 | 42 | L290 | 30 | 10 | 0.48 | 0.78 | ∆Ge = 0.642DFE − 1.96 |
| PTP1B | 23 | R221 | 40 | 15 | 0.35 | 1.09 | ∆Ge = 0.628DFE + 4.19 |
| BACE | 36 | W176 | 40 | 20 | 0.32 | 0.65 | ∆Ge = 0.268DFE − 4.18 |
| Thrombind | 11 | S214 | 50 | 10 | 0.01 (0.62) | 0.57 (0.36) | (∆Ge = 0.793DFE + 0.37) |
| Averaged | 0.45 (0.53) | 0.73 (0.71) |
aA CV was defined by the distance between the centroid of the backbone heavy atoms of a residue of the corresponding protein and the centroid of the heavy atoms of the corresponding whole ligand, except that, for TYK2 and Thrombin, the Cβ was also included in the definition of the corresponding centroid; and for CDK2, P38α and BACE, the heavy atoms of a whole residue were used.
bThe number of runs performed for each ligand.
cChemical time of a run.
dThe numbers in the parentheses correspond to the results after exclusion of two outlier points in the Thrombin data set.
Figure 3Examination of each run against one-way trip description. (a) D versus Time plot of a one-way trip run in which the sampling initially stays around r0 (lower-level dotted red line), and then goes up passing through rb (higher-level dotted red line) without returning back to the level around r0. (b) D versus Time plot of a multi-trip run in which the sampling goes up passing through rb and then returns back to the original level around r0. (c) D versus Time plot of a run where D goes toward opposite direction of dissociation signaling invasions, conformational changes or other movements unrelated to the dissociation. (d) Average FES (orange curve) and the primitive FES (blue curve) corresponding to the run in panel (c). The primitive FES of this run invades the “inner” region, left to the left wall of the primary minimum of the average FES. (e) D versus Time plot of a run that stays in the bound state because the simulation time is not long enough. (f) D versus Time plot after the run in panel e is extended to longer simulation time, showing a one-way trip behavior.