Son Tung Ngo1,2, Khanh B Vu3, Le Minh Bui3, Van V Vu3. 1. Laboratory of Theoretical and Computational Biophysics, Ton Duc Thang University, Ho Chi Minh City 7000000, Vietnam. 2. Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City 7000000, Vietnam. 3. NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City 700000, Vietnam.
Abstract
The binding between two biomolecules is one of the most critical factors controlling many bioprocesses. Therefore, it is of great interest to derive a reliable method to calculate the free binding energy between two biomolecules. In this work, we have demonstrated that the binding affinity of ligands to proteins can be determined through biased sampling simulations. The umbrella sampling (US) method was applied on 20 protein-ligand complexes, including the cathepsin K (CTSK), type II dehydroquinase (DHQase), heat shock protein 90 (HSP90), and factor Xa (FXa) systems. The ligand-binding affinity was evaluated as the difference between the largest and smallest values of the free-energy curve, which was obtained via a potential of mean force analysis. The calculated affinities differ sizably from the previously reported experimental values, with an average difference of ∼3.14 kcal/mol. However, the calculated results are in good correlation with the experimental data, with correlation coefficients of 0.76, 0.87, 0.96, and 0.97 for CTSK, DHQase, HSP90, and FXa, respectively. Thus, the binding free energy of a new ligand can be reliably estimated using our US approach. Furthermore, the root-mean-square errors (RMSEs) of binding affinity of these systems are 1.13, 0.90, 0.37, and 0.25 kcal/mol, for CTSK, DHQase, HSP90, and FXa, respectively. The small RMSE values indicate the good precision of the biased sampling method that can distinguish the ligands exhibiting similar binding affinities.
The binding between two biomolecules is one of the most critical factors controlling many bioprocesses. Therefore, it is of great interest to derive a reliable method to calculate the free binding energy between two biomolecules. In this work, we have demonstrated that the binding affinity of ligands to proteins can be determined through biased sampling simulations. The umbrella sampling (US) method was applied on 20 protein-ligand complexes, including the cathepsin K (CTSK), type II dehydroquinase (DHQase), heat shock protein 90 (HSP90), and factor Xa (FXa) systems. The ligand-binding affinity was evaluated as the difference between the largest and smallest values of the free-energy curve, which was obtained via a potential of mean force analysis. The calculated affinities differ sizably from the previously reported experimental values, with an average difference of ∼3.14 kcal/mol. However, the calculated results are in good correlation with the experimental data, with correlation coefficients of 0.76, 0.87, 0.96, and 0.97 for CTSK, DHQase, HSP90, and FXa, respectively. Thus, the binding free energy of a new ligand can be reliably estimated using our US approach. Furthermore, the root-mean-square errors (RMSEs) of binding affinity of these systems are 1.13, 0.90, 0.37, and 0.25 kcal/mol, for CTSK, DHQase, HSP90, and FXa, respectively. The small RMSE values indicate the good precision of the biased sampling method that can distinguish the ligands exhibiting similar binding affinities.
A large number of biological
processes involve the binding of two or more biomolecules, which is
often evaluated through Gibbs free-energy difference.[1] The accurate determination or ranking of the binding affinity
is a prerequisite for the synthesis of potential inhibitors that would
allow for the reduction of therapeutic development and medication
cost.[2] Several schemes have been developed,
including molecular docking,[3−6] quantitative structure–activity relationship,[7−10] linear interaction energy,[11,12] molecular mechanism/Poisson–Boltzmann
surface area (MM-PBSA),[13−15] fast pulling of ligand,[16,17] free-energy perturbation,[18,19] thermodynamic integration,[20,21] and nonequilibrium molecular dynamics (MD) simulations.[22] Many studies in this area have been published
in recent years.[23−26] Nevertheless, precise prediction of binding affinity yet remains
elusive.The biased sampling method has emerged as a promising
approach for the determination of the binding free energy of protein–ligand
complexes[27−29] or protein–protein complexes.[23,30] Furthermore, the free-energy permeation of an ion across a channel
has been also investigated using this approach.[31] The ligand is normally forced to move out of an enzyme
active site by an external force. The movement of the ligand is tracked
and recorded along the reaction coordinate (ξ), and then the
obtained conformations were subjected to umbrella sampling (US) simulations.
The free-energy barrier between smallest and largest values extracted
from the potential of mean force (PMF) during US simulations can be
used for the prediction of the binding affinity.[30] Here, the PMF values are determined using the weighted
histogram analysis method (WHAM)[32] calculation.In this work, the US method was applied to 20 protein–ligand
complexes with solvent-exposed binding sites, including cathepsin
K (CTSK), type II dehydroquinase (DHQase), heat shock protein 90 (HSP90),
and factor Xa (FXa) systems. CTSK is a protease associating with a
number of biological problems. The weakening of bone and cartilage
is related to the action of CTSK on the catabolization of collagen,
elastin, and gelatin. The CTSK inhibitors have thus been developed
to prevent osteoporosis.[33] Moreover, the
enzyme was also shown to be involved in humanbreast cancers[34] and overexpressed in glioblastoma.[35] DHQase is known to be linked with the shikimic
acid pathway. Inhibiting DHQase is a potential method to treat malaria,
a parasitic infection causing more than a million deaths every year.[36] HSP90 is a chaperone protein that helps to stabilize
other proteins under the effect of irregular temperature.[37] The protein also aids the protein folding and
degradation processes.[37] As HSP90 stabilizes
proteins needed for tumor growth, HSP90 inhibitors have been screened
for anticancer therapy development.[38] FXa
is an enzyme involved in the coagulation cascade.[39] FXa inhibitors have been developed to prevent the venous
thromboembolism.[40] Our obtained results
are well correlated with previously reported experimental data with
high relation coefficients, on the basis of which the experimental
binding free energy of a new ligand can be reliably calculated. The
method is also demonstrated to be precise with a relatively small
root-mean-square error (RMSE).
Results and Discussion
Selected Model and Generated
Conformation for Umbrella Sampling Simulation
Many enzymes
and proteins contain sophisticated ligand-binding sites, which require
significant protein conformational changes upon ligand binding or
dissociation. The binding affinity estimation for these proteins requires
simultaneous calculations of both free energy and the kinetics of
protein folding, which requires a tremendous amount of computational
processing time. The biased sampling method used in our work may not
be suitable for these proteins. Many other proteins, however, contain
solvent-exposed ligand-binding sites, which allows their ligands to
easily move in/out without substantial protein conformational changes.
Thus, for the complexes of these proteins, the ligand-binding affinity
can be rapidly determined through US computation, as mentioned in Materials and Methods section. The structure of
the protein could be restrained using a harmonic potential applied
on the Cα atoms during the simulations. In this work,
initially, the compound is forced to mobilize out of the binding site
onward the unbinding pathway by an external harmonic force with a
constant velocity as described above, according to a previous work.[16] Although the slower pulling speeds provided
better results, the efficient gain is not significant.[16,30] The forms of recorded pulling forces are consistent with the previous
works (Figure S1).[41] The coordination of the ligand center of mass and applied force
value is recorded every 0.1 ps over FPL simulations, as shown in Figure S2. The value was employed to estimate
the initial conformations of biasing sampling simulations, as mentioned
above.
Free-Energy Calculation
The free-energy profile along
the reaction coordinate (ξ) was analyzed using the “gmx
wham” tools. The obtained PMF value of each studied system
is shown as a function of the reaction coordinates (ξ) in Figure . These PMF curves
show similar trends and are in good agreement with a previous study.[42] The free energy starts at zero value, then drops
to a minimum value, and finally increases to a stable value when ξ
reaches 1.0–1.2 nm (Figure ). This range corresponds to the state where the noncovalent
interaction between the protein and the ligand is broken. The calculated
binding free energy ΔGUS can be
determined as the difference between the largest and smallest values
of PMF curves, which is described in Figure C. The binding affinities calculated for
the complexes of CTSK, DHQ, HSP90, and FXa are provided in Table . The computational
error of the binding free energy was estimated using bootstrapping
analysis with 100 rounds of computations.[43]
Figure 1
PMF
curves obtained from WHAM analysis for CTSK systems (A), DHQase complexes
(B), HSP90 systems (C), and FXa systems (D). The errors were evaluated
over 100 rounds of bootstrapping analysis.[43]
Figure 4
Computational scheme.
(A) Initial windows of US simulations with indications of displacement
of the center of mass of the ligand; (B) the representative histograms
of US simulations; (C) the binding free energy is estimated as the
difference between the maximum and minimum values of the PMF curve
provided from the US simulations.[30] Shown
in (C) is the calculation result of the 1AUO system. The error of computations was
evaluated over 100 rounds of bootstrapping analysis.[43]
Table 1
Free-Energy Values
Obtained from US Simulationsa
complexes
protein
ΔGUS
ΔGEXP
1AU0
CTSK
–9.05 ± 0.36
–10.51
1AU2
CTSK
–8.47 ± 0.70
–10.82
1MEM
CTSK
–11.54 ± 0.57
–10.90
1NLJ
CTSK
–12.74 ± 0.23
–13.45
2FTD
CTSK
–11.91 ± 0.69
–14.26
1GU1
DHQase
–6.83 ± 0.63
–6.21
2C4W
DHQase
–9.84 ± 0.51
–6.45
2XD9
DHQase
–16.21 ± 1.07
–8.86
2Y71
DHQase
–22.25 ± 0.77
–10.12
4B6O
DHQase
–12.96 ± 0.55
–9.61
3K99
HSP90
–12.56 ± 1.01
–9.91
3QDD
HSP90
–20.62 ± 1.38
–12.04
3R4M
HSP90
–7.17 ± 0.44
–8.13
3R4N
HSP90
–9.27 ± 0.66
–9.45
3RLQ
HSP90
–15.61 ± 0.94
–10.15
1F0R
FXa
–12.65 ± 0.55
–10.51
1KSN
FXa
–24.45 ± 0.52
–12.90
1NFU
FXa
–15.43 ± 0.97
–10.63
2J34
FXa
–16.06 ± 0.28
–10.74
1FJS
FXa
–13.48 ± 0.77
–10.14
The errors were evaluated over 100
rounds of bootstrapping analysis.[43] The
unit of free energy is kcal/mol.
PMF
curves obtained from WHAM analysis for CTSK systems (A), DHQase complexes
(B), HSP90 systems (C), and FXa systems (D). The errors were evaluated
over 100 rounds of bootstrapping analysis.[43]The errors were evaluated over 100
rounds of bootstrapping analysis.[43] The
unit of free energy is kcal/mol.The calculated binding free energies (ΔGUS) of the HSP90, DHQase, and FXa systems are larger than
the corresponding experimental values (ΔGEXP), with average differences of ∼3.11, 5.37, and 5.43
kcal/mol, respectively. In contrast, the average ΔGUS of the CTSK system is slightly (1.24 kcal/mol) smaller
than the experimental value. The larger calculated binding free energy
likely resulted from the inaccuracy of mimicking the interaction among
constituent molecules, including protein, ligand, and water molecules.[44,45] The differences ΔGUS and ΔGEXP are similar to the difference obtained with
an alchemical binding free-energy calculation described in previous
studies.[46,47] In some cases, the difference is more than
10 kcal/mol (Table ). Nevertheless, there is a good correlation between ΔGUS and ΔGEXP in each studied system, especially in the case of HSP90 and FXa
(Figure ). The regression
equations obtained for the studied systems areUsing these equations and our US approach described
herein, the binding affinity of a new ligand to these proteins can
be reliably calculated.
Figure 2
Correlation between computed and experimental
binding free energies of CTSK complexes (A), DHQase complexes (B),
HSP90 complexes (C), and FXa complexes (D).
Correlation between computed and experimental
binding free energies of CTSK complexes (A), DHQase complexes (B),
HSP90 complexes (C), and FXa complexes (D).In addition, the errors of calculations are small (Table ). The computed errors
(δ) obtained for CTSK, DHQase, HSP90, and FXa systems are ca.
5, 5, 7, and 4%, respectively. These values are significantly smaller
than those obtained from other methods, such as FPL (∼7–15%[16,48]) and MM-PBSA (∼20%[49,50]). It is generally difficult
to select the dielectric constant for polar solvation calculation
as the ligand is partially solvent-exposed, resulting in poor precision
in the MM-PBSA method.[50] Furthermore, the
polar solvation free energy is also dependent on the calculating approach,
which are Poisson–Boltzmann (PB) or generalized Born (GB) schemes.
Many studies indicated that the obtained results strongly depend on
the studied system.[50] The PB method provides
better results in some cases,[51] whereas
the GB method is more accurate/precise in some other cases.[52] In addition, the lower precision of the FPL
method in comparison with the US method is probably caused by the
fewer number of ligand samples over the unbinding pathway. Furthermore,
the binding affinity estimation using the FPL method is relatively
simple wherein it scores the ligand affinity using recorded rupture
force[41] or pulling work.[16] Absolutely, these metrics do not consider the contribution
of entropic conformation, resulting in reducing the precision of the
calculation. Moreover, the RMSE analysis indicates that the computational
study would predict the experimental value with the precisions of
RMSECTSK = 1.13, RMSEDHQase = 0.90, RMSEHSP90 = 0.37, and RMSEFXa = 0.25 kcal/mol. The estimation
with high precision would definitely distinguish the ligands with
similar binding affinities.The better results obtained with
the US method compared with the FPL scheme[16,48] come at the cost of computational resources, although. In the FPL
scheme, one calculation requires ca. 0.7 ns whereas the total molecular
dynamics (MD) simulation time in the US method is ca. 120.7 ns (including
FPL simulation as well). Fortunately, recent developments in GPU computing
help remarkably speeding up the US simulation studies. One compute
node with dual Xeon E5-2670V3 and GTX 1070 GPU can approximately produce
one US calculation per day. It is noteworthy that the computing resource
required for US calculation is similar to that found for the MM-PBSA
method, which required several nanoseconds of MD simulation to generate
equilibrium snapshots for the free-energy calculation.[50] The MM-PBSA method would take a much longer
CPU time to evaluate the contribution of conformational entropy for
a large complex consisting of more than 4000 atoms. Hence, for large
systems, the US computation has great advantages over the MM-PBSA
method.
Conclusions
In this work, we have
demonstrated that biased sampling simulation is a highly appropriate
approach for ranking the affinity between ligand and enzyme for the
CTSK, DHQase, HSP90, and FXa complexes. The computational binding
affinity of the inhibitor was evaluated as the difference between
the largest and smallest values of the free-energy curve obtained
with PMF calculation. The calculated binding affinities sizably differ
from the corresponding experimental values, with an average difference
of of 3.14 kcal/mol (some values >10 kcal/mol).[53] To decrease the difference, more advanced methods, such
as the combination of free-energy perturbation and US simulations,
should be employed.[53] However, it is very
time consuming to evaluate the absolute binding free energy of a ligand.
The calculated binding free energy highly correlates with the experimental
values, with correlation coefficients ranging from R = 0.76–0.97, which allows for the reliable estimation of
the binding affinity of new ligands. Furthermore, the high precision,
which is observed with the values ranging from RMSE = 1.13–0.25
kcal/mol, would easily categorize the ligands with similar binding
affinities. Using the US method, both the accuracy and precision are
significantly increased in comparison with FPL and MM-PBSA methods.[16,48−50] Besides, the computational processing time required
in the US method is acceptable, compared to that in the MM-PBSA method.
Especially, when simulating large systems, the US computation would
require much lesser CPU time than the MM-PBSA method.
Materials and
Methods
Target Complexes
Three-dimensional structures of complexes
were obtained from the protein data bank (PDB). The PDB IDs of the
CTSK complexes are 1AU0,[54]1AU2,[54]1MEM,[55]1NLJ,[56] and 2FTD.[57] The IDs
of the DHQase complexes are 1GU1,[58]2C4W,[59]2XD9,[60]2Y71,[61] and 4B6O.[62] The IDs of HSP90 complexes are 3K99,[63]3QDD,[64]3R4M,[65]3R4N,[65] and 3RLQ.[66] The IDs of FXa complexes are 1F0R,[67]1FJS,[68]1KSN,[69]1NFU,[70] and 2J34.[71] The structures of the ligands are described in Table S1. The ligand was forced to move out of
the cavity of the enzyme along an unbinding pathway, which was predicted
using the Caver program,[72] as described
in previous works.[16] The complexes were
rotated to fit the unbinding pathway along Z axis.
A representative model with the unbinding direction is shown in Figure .
Figure 3
Representation of the
initial simulation setup. The ligand (green) is pulled along the unbinding
direction (Z axis) using a harmonic force with a
cantilever spring constant of 600 kJ/mol/nm2 and a pulling
speed of ν = 0.005 nm/ps, as described in previous studies.[73,74]
Representation of the
initial simulation setup. The ligand (green) is pulled along the unbinding
direction (Z axis) using a harmonic force with a
cantilever spring constant of 600 kJ/mol/nm2 and a pulling
speed of ν = 0.005 nm/ps, as described in previous studies.[73,74]
Atomistic Simulations
GROMACS 5.1.3 was used to simulate the soluble complexes.[75] The proteins were parameterized using the Amber99SB-ILDN
force field.[76] The ligands were represented
using the general Amber force field.[77] Charges
were obtained by employing the restrained electrostatic potential
method[78] using quantum chemical calculation
at MP2 level with 6-31G(d,p) basis set. Each protein–ligand
complex was inserted into a rectangular box with periodic boundary
conditions (Figure ). The dimensions of the boxes for CTSK, DHQase, HSP90, and FXa are
4.92 × 5.67 × 7.19, 5.23 × 6.68 × 8.02, 5.52 ×
6.13 × 8.72, and 6.08 × 6.40 × 8.50 nm3,
respectively. The simulation parameters were referred from previous
studies.[79] van der Waals interaction cutoff
was set at 0.9 nm, and the particle mesh Ewald method[80] is used to treat the electrostatic interaction with a radius
of 0.9 nm. Initially, the solvated systems were energy-minimized using
the steepest descent method. The complexes were then relaxed during
NVT and NPT simulations at 300 K for 100 ps. During relaxed simulations,
the Cα atoms of the proteins were restrained by a
small harmonic force.The last conformations of NPT simulations
were used as the starting structures of ligand unbinding simulation
using the FPL method.[16,17] An external harmonic force was
applied to dissociate the ligands along the pulling orientation (Figure ). The Cα atoms of the proteins were restrained over the steered-molecular
dynamic (SMD) simulations by a small harmonic potential. The ligands
were thus dissociated utilizing a harmonic force with the cantilever
spring constant of k = 600 kJ/mol/nm2 along
the reaction coordinate (ξ). The pulling speed was chosen as
ν = 0.005 nm/ps according to previous studies.[16,48] The length of the SMD simulations was ca. 500 ps. During the FPL
process, the pulling force and the coordination of the ligand center
of mass were recorded every 0.1 ps.The initial structures of
biased sampling simulations were extracted from the unbinding trajectory
with the spacing of the center of mass of the ligand shown in Figure A. During the FPL process, the conformation of the solvated
complexes was recorded at every step of ∼0.2 nm and then used
as the initial conformation for the US simulations.[30] Eleven snapshots were extracted from bound to unbound states
along the reaction coordinate (ξ). However, because the ligand
narrowly diffuses in the bound state, as shown in the histograms of
simulations (Figure B), an additional conformation was taken, resulting in the spacing
of 0.1 nm between the first three windows (Figure A). Totally, 12 conformations along the reaction
coordinate were obtained during FPL simulations. These conformations
were subjected to US simulations over 10 ns. Therefore, there are
totally 120 ns of MD simulations used for US simulations. To avoid
initial fluctuation, the first 0.5 ns duration was removed from the
analysis. The weighted histogram analysis method (WHAM)[32] was used to evaluate a potential of the mean
force (PMF) along the reaction coordinate (ξ), following the
gmx wham protocol.[81] The WHAM equations
can be expressed as follows[81]wherein which with temperature T and Boltzmann constant kB, h is the histogram with total number
of values n, g = 1 + 2τ is the statistical inefficiency with autocorrelated
time τ corresponding to umbrella
sampling i; and P(ξ) corresponds
to the unbiased probability distribution relating to PMF through where ξ0 corresponds
to W(ξ0) = 0. The binding free energy
ΔG is simply evaluated as the difference between
the largest and smallest values of the PMF curve (Figure C).[30] The error of computations was evaluated over 100 rounds of bootstrapping
analysis.[43]Computational scheme.
(A) Initial windows of US simulations with indications of displacement
of the center of mass of the ligand; (B) the representative histograms
of US simulations; (C) the binding free energy is estimated as the
difference between the maximum and minimum values of the PMF curve
provided from the US simulations.[30] Shown
in (C) is the calculation result of the 1AUO system. The error of computations was
evaluated over 100 rounds of bootstrapping analysis.[43]
Authors: Nguyen Trong Dan; Hoang Duc Quang; Vuong Van Truong; Do Huu Nghi; Nguyen Manh Cuong; To Dao Cuong; Tran Quoc Toan; Long Giang Bach; Nguyen Huu Thuan Anh; Nguyen Thi Mai; Ngo Thi Lan; Luu Van Chinh; Pham Minh Quan Journal: Sci Rep Date: 2020-07-10 Impact factor: 4.379