Literature DB >> 31459599

Effective Estimation of Ligand-Binding Affinity Using Biased Sampling Method.

Son Tung Ngo^1,2, Khanh B Vu³, Le Minh Bui³, Van V Vu³.

Abstract

The binding between two biomolecules is one of the most critical factors controlling many bioprocesses. Therefore, it is of great interest to derive a reliable method to calculate the free binding energy between two biomolecules. In this work, we have demonstrated that the binding affinity of ligands to proteins can be determined through biased sampling simulations. The umbrella sampling (US) method was applied on 20 protein-ligand complexes, including the cathepsin K (CTSK), type II dehydroquinase (DHQase), heat shock protein 90 (HSP90), and factor Xa (FXa) systems. The ligand-binding affinity was evaluated as the difference between the largest and smallest values of the free-energy curve, which was obtained via a potential of mean force analysis. The calculated affinities differ sizably from the previously reported experimental values, with an average difference of ∼3.14 kcal/mol. However, the calculated results are in good correlation with the experimental data, with correlation coefficients of 0.76, 0.87, 0.96, and 0.97 for CTSK, DHQase, HSP90, and FXa, respectively. Thus, the binding free energy of a new ligand can be reliably estimated using our US approach. Furthermore, the root-mean-square errors (RMSEs) of binding affinity of these systems are 1.13, 0.90, 0.37, and 0.25 kcal/mol, for CTSK, DHQase, HSP90, and FXa, respectively. The small RMSE values indicate the good precision of the biased sampling method that can distinguish the ligands exhibiting similar binding affinities.

Entities: Chemical Disease Gene Mutation Species

Year: 2019 PMID： 31459599 PMCID： PMC6648447 DOI： 10.1021/acsomega.8b03258

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

A large number of biological processes involve the binding of two or more biomolecules, which is often evaluated through Gibbs free-energy difference.[1] The accurate determination or ranking of the binding affinity is a prerequisite for the synthesis of potential inhibitors that would allow for the reduction of therapeutic development and medication cost.[2] Several schemes have been developed, including molecular docking,[3−6] quantitative structure–activity relationship,[7−10] linear interaction energy,[11,12] molecular mechanism/Poisson–Boltzmann surface area (MM-PBSA),[13−15] fast pulling of ligand,[16,17] free-energy perturbation,[18,19] thermodynamic integration,[20,21] and nonequilibrium molecular dynamics (MD) simulations.[22] Many studies in this area have been published in recent years.[23−26] Nevertheless, precise prediction of binding affinity yet remains elusive. The biased sampling method has emerged as a promising approach for the determination of the binding free energy of protein–ligand complexes[27−29] or protein–protein complexes.[23,30] Furthermore, the free-energy permeation of an ion across a channel has been also investigated using this approach.[31] The ligand is normally forced to move out of an enzyme active site by an external force. The movement of the ligand is tracked and recorded along the reaction coordinate (ξ), and then the obtained conformations were subjected to umbrella sampling (US) simulations. The free-energy barrier between smallest and largest values extracted from the potential of mean force (PMF) during US simulations can be used for the prediction of the binding affinity.[30] Here, the PMF values are determined using the weighted histogram analysis method (WHAM)[32] calculation. In this work, the US method was applied to 20 protein–ligand complexes with solvent-exposed binding sites, including cathepsin K (CTSK), type II dehydroquinase (DHQase), heat shock protein 90 (HSP90), and factor Xa (FXa) systems. CTSK is a protease associating with a number of biological problems. The weakening of bone and cartilage is related to the action of CTSK on the catabolization of collagen, elastin, and gelatin. The CTSK inhibitors have thus been developed to prevent osteoporosis.[33] Moreover, the enzyme was also shown to be involved in human breast cancers[34] and overexpressed in glioblastoma.[35] DHQase is known to be linked with the shikimic acid pathway. Inhibiting DHQase is a potential method to treat malaria, a parasitic infection causing more than a million deaths every year.[36] HSP90 is a chaperone protein that helps to stabilize other proteins under the effect of irregular temperature.[37] The protein also aids the protein folding and degradation processes.[37] As HSP90 stabilizes proteins needed for tumor growth, HSP90 inhibitors have been screened for anticancer therapy development.[38] FXa is an enzyme involved in the coagulation cascade.[39] FXa inhibitors have been developed to prevent the venous thromboembolism.[40] Our obtained results are well correlated with previously reported experimental data with high relation coefficients, on the basis of which the experimental binding free energy of a new ligand can be reliably calculated. The method is also demonstrated to be precise with a relatively small root-mean-square error (RMSE).

Results and Discussion

Selected Model and Generated Conformation for Umbrella Sampling Simulation

Many enzymes and proteins contain sophisticated ligand-binding sites, which require significant protein conformational changes upon ligand binding or dissociation. The binding affinity estimation for these proteins requires simultaneous calculations of both free energy and the kinetics of protein folding, which requires a tremendous amount of computational processing time. The biased sampling method used in our work may not be suitable for these proteins. Many other proteins, however, contain solvent-exposed ligand-binding sites, which allows their ligands to easily move in/out without substantial protein conformational changes. Thus, for the complexes of these proteins, the ligand-binding affinity can be rapidly determined through US computation, as mentioned in Materials and Methods section. The structure of the protein could be restrained using a harmonic potential applied on the Cα atoms during the simulations. In this work, initially, the compound is forced to mobilize out of the binding site onward the unbinding pathway by an external harmonic force with a constant velocity as described above, according to a previous work.[16] Although the slower pulling speeds provided better results, the efficient gain is not significant.[16,30] The forms of recorded pulling forces are consistent with the previous works (Figure S1).[41] The coordination of the ligand center of mass and applied force value is recorded every 0.1 ps over FPL simulations, as shown in Figure S2. The value was employed to estimate the initial conformations of biasing sampling simulations, as mentioned above.

Free-Energy Calculation

The free-energy profile along the reaction coordinate (ξ) was analyzed using the “gmx wham” tools. The obtained PMF value of each studied system is shown as a function of the reaction coordinates (ξ) in Figure . These PMF curves show similar trends and are in good agreement with a previous study.[42] The free energy starts at zero value, then drops to a minimum value, and finally increases to a stable value when ξ reaches 1.0–1.2 nm (Figure ). This range corresponds to the state where the noncovalent interaction between the protein and the ligand is broken. The calculated binding free energy ΔGUS can be determined as the difference between the largest and smallest values of PMF curves, which is described in Figure C. The binding affinities calculated for the complexes of CTSK, DHQ, HSP90, and FXa are provided in Table . The computational error of the binding free energy was estimated using bootstrapping analysis with 100 rounds of computations.[43]

Figure 1

PMF curves obtained from WHAM analysis for CTSK systems (A), DHQase complexes (B), HSP90 systems (C), and FXa systems (D). The errors were evaluated over 100 rounds of bootstrapping analysis.[43]

Figure 4

Computational scheme. (A) Initial windows of US simulations with indications of displacement of the center of mass of the ligand; (B) the representative histograms of US simulations; (C) the binding free energy is estimated as the difference between the maximum and minimum values of the PMF curve provided from the US simulations.[30] Shown in (C) is the calculation result of the 1AUO system. The error of computations was evaluated over 100 rounds of bootstrapping analysis.[43]

Table 1

Free-Energy Values Obtained from US Simulationsa

complexes	protein	ΔG_US	ΔG_EXP
1AU0	CTSK	–9.05 ± 0.36	–10.51
1AU2	CTSK	–8.47 ± 0.70	–10.82
1MEM	CTSK	–11.54 ± 0.57	–10.90
1NLJ	CTSK	–12.74 ± 0.23	–13.45
2FTD	CTSK	–11.91 ± 0.69	–14.26
1GU1	DHQase	–6.83 ± 0.63	–6.21
2C4W	DHQase	–9.84 ± 0.51	–6.45
2XD9	DHQase	–16.21 ± 1.07	–8.86
2Y71	DHQase	–22.25 ± 0.77	–10.12
4B6O	DHQase	–12.96 ± 0.55	–9.61
3K99	HSP90	–12.56 ± 1.01	–9.91
3QDD	HSP90	–20.62 ± 1.38	–12.04
3R4M	HSP90	–7.17 ± 0.44	–8.13
3R4N	HSP90	–9.27 ± 0.66	–9.45
3RLQ	HSP90	–15.61 ± 0.94	–10.15
1F0R	FXa	–12.65 ± 0.55	–10.51
1KSN	FXa	–24.45 ± 0.52	–12.90
1NFU	FXa	–15.43 ± 0.97	–10.63
2J34	FXa	–16.06 ± 0.28	–10.74
1FJS	FXa	–13.48 ± 0.77	–10.14

The errors were evaluated over 100 rounds of bootstrapping analysis.[43] The unit of free energy is kcal/mol.

PMF curves obtained from WHAM analysis for CTSK systems (A), DHQase complexes (B), HSP90 systems (C), and FXa systems (D). The errors were evaluated over 100 rounds of bootstrapping analysis.[43] The errors were evaluated over 100 rounds of bootstrapping analysis.[43] The unit of free energy is kcal/mol. The calculated binding free energies (ΔGUS) of the HSP90, DHQase, and FXa systems are larger than the corresponding experimental values (ΔGEXP), with average differences of ∼3.11, 5.37, and 5.43 kcal/mol, respectively. In contrast, the average ΔGUS of the CTSK system is slightly (1.24 kcal/mol) smaller than the experimental value. The larger calculated binding free energy likely resulted from the inaccuracy of mimicking the interaction among constituent molecules, including protein, ligand, and water molecules.[44,45] The differences ΔGUS and ΔGEXP are similar to the difference obtained with an alchemical binding free-energy calculation described in previous studies.[46,47] In some cases, the difference is more than 10 kcal/mol (Table ). Nevertheless, there is a good correlation between ΔGUS and ΔGEXP in each studied system, especially in the case of HSP90 and FXa (Figure ). The regression equations obtained for the studied systems areUsing these equations and our US approach described herein, the binding affinity of a new ligand to these proteins can be reliably calculated.

Figure 2

Correlation between computed and experimental binding free energies of CTSK complexes (A), DHQase complexes (B), HSP90 complexes (C), and FXa complexes (D).

Correlation between computed and experimental binding free energies of CTSK complexes (A), DHQase complexes (B), HSP90 complexes (C), and FXa complexes (D). In addition, the errors of calculations are small (Table ). The computed errors (δ) obtained for CTSK, DHQase, HSP90, and FXa systems are ca. 5, 5, 7, and 4%, respectively. These values are significantly smaller than those obtained from other methods, such as FPL (∼7–15%[16,48]) and MM-PBSA (∼20%[49,50]). It is generally difficult to select the dielectric constant for polar solvation calculation as the ligand is partially solvent-exposed, resulting in poor precision in the MM-PBSA method.[50] Furthermore, the polar solvation free energy is also dependent on the calculating approach, which are Poisson–Boltzmann (PB) or generalized Born (GB) schemes. Many studies indicated that the obtained results strongly depend on the studied system.[50] The PB method provides better results in some cases,[51] whereas the GB method is more accurate/precise in some other cases.[52] In addition, the lower precision of the FPL method in comparison with the US method is probably caused by the fewer number of ligand samples over the unbinding pathway. Furthermore, the binding affinity estimation using the FPL method is relatively simple wherein it scores the ligand affinity using recorded rupture force[41] or pulling work.[16] Absolutely, these metrics do not consider the contribution of entropic conformation, resulting in reducing the precision of the calculation. Moreover, the RMSE analysis indicates that the computational study would predict the experimental value with the precisions of RMSECTSK = 1.13, RMSEDHQase = 0.90, RMSEHSP90 = 0.37, and RMSEFXa = 0.25 kcal/mol. The estimation with high precision would definitely distinguish the ligands with similar binding affinities. The better results obtained with the US method compared with the FPL scheme[16,48] come at the cost of computational resources, although. In the FPL scheme, one calculation requires ca. 0.7 ns whereas the total molecular dynamics (MD) simulation time in the US method is ca. 120.7 ns (including FPL simulation as well). Fortunately, recent developments in GPU computing help remarkably speeding up the US simulation studies. One compute node with dual Xeon E5-2670V3 and GTX 1070 GPU can approximately produce one US calculation per day. It is noteworthy that the computing resource required for US calculation is similar to that found for the MM-PBSA method, which required several nanoseconds of MD simulation to generate equilibrium snapshots for the free-energy calculation.[50] The MM-PBSA method would take a much longer CPU time to evaluate the contribution of conformational entropy for a large complex consisting of more than 4000 atoms. Hence, for large systems, the US computation has great advantages over the MM-PBSA method.

Conclusions

In this work, we have demonstrated that biased sampling simulation is a highly appropriate approach for ranking the affinity between ligand and enzyme for the CTSK, DHQase, HSP90, and FXa complexes. The computational binding affinity of the inhibitor was evaluated as the difference between the largest and smallest values of the free-energy curve obtained with PMF calculation. The calculated binding affinities sizably differ from the corresponding experimental values, with an average difference of of 3.14 kcal/mol (some values >10 kcal/mol).[53] To decrease the difference, more advanced methods, such as the combination of free-energy perturbation and US simulations, should be employed.[53] However, it is very time consuming to evaluate the absolute binding free energy of a ligand. The calculated binding free energy highly correlates with the experimental values, with correlation coefficients ranging from R = 0.76–0.97, which allows for the reliable estimation of the binding affinity of new ligands. Furthermore, the high precision, which is observed with the values ranging from RMSE = 1.13–0.25 kcal/mol, would easily categorize the ligands with similar binding affinities. Using the US method, both the accuracy and precision are significantly increased in comparison with FPL and MM-PBSA methods.[16,48−50] Besides, the computational processing time required in the US method is acceptable, compared to that in the MM-PBSA method. Especially, when simulating large systems, the US computation would require much lesser CPU time than the MM-PBSA method.

Materials and Methods

Target Complexes

Three-dimensional structures of complexes were obtained from the protein data bank (PDB). The PDB IDs of the CTSK complexes are 1AU0,[54]1AU2,[54]1MEM,[55]1NLJ,[56] and 2FTD.[57] The IDs of the DHQase complexes are 1GU1,[58]2C4W,[59]2XD9,[60]2Y71,[61] and 4B6O.[62] The IDs of HSP90 complexes are 3K99,[63]3QDD,[64]3R4M,[65]3R4N,[65] and 3RLQ.[66] The IDs of FXa complexes are 1F0R,[67]1FJS,[68]1KSN,[69]1NFU,[70] and 2J34.[71] The structures of the ligands are described in Table S1. The ligand was forced to move out of the cavity of the enzyme along an unbinding pathway, which was predicted using the Caver program,[72] as described in previous works.[16] The complexes were rotated to fit the unbinding pathway along Z axis. A representative model with the unbinding direction is shown in Figure .

Figure 3

Representation of the initial simulation setup. The ligand (green) is pulled along the unbinding direction (Z axis) using a harmonic force with a cantilever spring constant of 600 kJ/mol/nm2 and a pulling speed of ν = 0.005 nm/ps, as described in previous studies.[73,74]

Atomistic Simulations

GROMACS 5.1.3 was used to simulate the soluble complexes.[75] The proteins were parameterized using the Amber99SB-ILDN force field.[76] The ligands were represented using the general Amber force field.[77] Charges were obtained by employing the restrained electrostatic potential method[78] using quantum chemical calculation at MP2 level with 6-31G(d,p) basis set. Each protein–ligand complex was inserted into a rectangular box with periodic boundary conditions (Figure ). The dimensions of the boxes for CTSK, DHQase, HSP90, and FXa are 4.92 × 5.67 × 7.19, 5.23 × 6.68 × 8.02, 5.52 × 6.13 × 8.72, and 6.08 × 6.40 × 8.50 nm3, respectively. The simulation parameters were referred from previous studies.[79] van der Waals interaction cutoff was set at 0.9 nm, and the particle mesh Ewald method[80] is used to treat the electrostatic interaction with a radius of 0.9 nm. Initially, the solvated systems were energy-minimized using the steepest descent method. The complexes were then relaxed during NVT and NPT simulations at 300 K for 100 ps. During relaxed simulations, the Cα atoms of the proteins were restrained by a small harmonic force. The last conformations of NPT simulations were used as the starting structures of ligand unbinding simulation using the FPL method.[16,17] An external harmonic force was applied to dissociate the ligands along the pulling orientation (Figure ). The Cα atoms of the proteins were restrained over the steered-molecular dynamic (SMD) simulations by a small harmonic potential. The ligands were thus dissociated utilizing a harmonic force with the cantilever spring constant of k = 600 kJ/mol/nm2 along the reaction coordinate (ξ). The pulling speed was chosen as ν = 0.005 nm/ps according to previous studies.[16,48] The length of the SMD simulations was ca. 500 ps. During the FPL process, the pulling force and the coordination of the ligand center of mass were recorded every 0.1 ps. The initial structures of biased sampling simulations were extracted from the unbinding trajectory with the spacing of the center of mass of the ligand shown in Figure A. During the FPL process, the conformation of the solvated complexes was recorded at every step of ∼0.2 nm and then used as the initial conformation for the US simulations.[30] Eleven snapshots were extracted from bound to unbound states along the reaction coordinate (ξ). However, because the ligand narrowly diffuses in the bound state, as shown in the histograms of simulations (Figure B), an additional conformation was taken, resulting in the spacing of 0.1 nm between the first three windows (Figure A). Totally, 12 conformations along the reaction coordinate were obtained during FPL simulations. These conformations were subjected to US simulations over 10 ns. Therefore, there are totally 120 ns of MD simulations used for US simulations. To avoid initial fluctuation, the first 0.5 ns duration was removed from the analysis. The weighted histogram analysis method (WHAM)[32] was used to evaluate a potential of the mean force (PMF) along the reaction coordinate (ξ), following the gmx wham protocol.[81] The WHAM equations can be expressed as follows[81]wherein which with temperature T and Boltzmann constant kB, h is the histogram with total number of values n, g = 1 + 2τ is the statistical inefficiency with autocorrelated time τ corresponding to umbrella sampling i; and P(ξ) corresponds to the unbiased probability distribution relating to PMF through where ξ0 corresponds to W(ξ0) = 0. The binding free energy ΔG is simply evaluated as the difference between the largest and smallest values of the PMF curve (Figure C).[30] The error of computations was evaluated over 100 rounds of bootstrapping analysis.[43] Computational scheme. (A) Initial windows of US simulations with indications of displacement of the center of mass of the ligand; (B) the representative histograms of US simulations; (C) the binding free energy is estimated as the difference between the maximum and minimum values of the PMF curve provided from the US simulations.[30] Shown in (C) is the calculation result of the 1AUO system. The error of computations was evaluated over 100 rounds of bootstrapping analysis.[43]

14 in total

1. In silico screening of potential β-secretase (BACE1) inhibitors from VIETHERB database.

Authors: Nguyen Thao Nhung; Nhung Duong; Huong Thi Thu Phung; Quan V Vo; Nguyen Minh Tam
Journal: J Mol Model Date: 2022-02-14 Impact factor: 1.810

2. Umbrella Sampling-Based Method to Compute Ligand-Binding Affinity.

Authors: Son Tung Ngo; Minh Quan Pham
Journal: Methods Mol Biol Date: 2022

Review 3. Thermodynamics and Kinetics of Drug-Target Binding by Molecular Simulation.

Authors: Sergio Decherchi; Andrea Cavalli
Journal: Chem Rev Date: 2020-10-02 Impact factor: 60.622

4. C-Terminal Plays as the Possible Nucleation of the Self-Aggregation of the S-Shape Aβ_11-42 Tetramer in Solution: Intensive MD Study.

Authors: Nguyen Thanh Tung; Philippe Derreumaux; Van V Vu; Pham Cam Nam; Son Tung Ngo
Journal: ACS Omega Date: 2019-06-25

5. Ligand Binding Path Sampling Based on Parallel Cascade Selection Molecular Dynamics: LB-PaCS-MD.

Authors: Hayato Aida; Yasuteru Shigeta; Ryuhei Harada
Journal: Materials (Basel) Date: 2022-02-17 Impact factor: 3.623

6. Insights into the binding and covalent inhibition mechanism of PF-07321332 to SARS-CoV-2 M^pro.

Authors: Son Tung Ngo; Trung Hai Nguyen; Nguyen Thanh Tung; Binh Khanh Mai
Journal: RSC Adv Date: 2022-01-28 Impact factor: 3.361

7. Effective estimation of the inhibitor affinity of HIV-1 protease via a modified LIE approach.

Authors: Son Tung Ngo; Nam Dao Hong; Le Huu Quynh Anh; Dinh Minh Hiep; Nguyen Thanh Tung
Journal: RSC Adv Date: 2020-02-21 Impact factor: 4.036

8. Benchmarking the ability of novel compounds to inhibit SARS-CoV-2 main protease using steered molecular dynamics simulations.

Authors: Rahul Singh; Vijay Kumar Bhardwaj; Pralay Das; Dhananjay Bhattacherjee; Grigory V Zyryanov; Rituraj Purohit
Journal: Comput Biol Med Date: 2022-04-29 Impact factor: 6.698

9. Design, synthesis, structure, in vitro cytotoxic activity evaluation and docking studies on target enzyme GSK-3β of new indirubin-3'-oxime derivatives.

Authors: Nguyen Trong Dan; Hoang Duc Quang; Vuong Van Truong; Do Huu Nghi; Nguyen Manh Cuong; To Dao Cuong; Tran Quoc Toan; Long Giang Bach; Nguyen Huu Thuan Anh; Nguyen Thi Mai; Ngo Thi Lan; Luu Van Chinh; Pham Minh Quan
Journal: Sci Rep Date: 2020-07-10 Impact factor: 4.379

10. Unbinding ligands from SARS-CoV-2 Mpro via umbrella sampling simulations.

Authors: Nguyen Minh Tam; Trung Hai Nguyen; Vu Thi Ngan; Nguyen Thanh Tung; Son Tung Ngo
Journal: R Soc Open Sci Date: 2022-01-26 Impact factor: 2.963