Literature DB >> 32484689

How Good is Jarzynski's Equality for Computer-Aided Drug Design?

Kiet Ho¹, Duc Toan Truong^1,2, Mai Suan Li³.

Abstract

Accurate determination of the binding affinity of the ligand to the receptor remains a difficult problem in computer-aided drug design. Here, we study and compare the efficiency of Jarzynski's equality (JE) combined with steered molecular dynamics and the linear interaction energy (LIE) method by assessing the binding affinity of 23 small compounds to six receptors, including β-lactamase, thrombin, factor Xa, HIV-1 protease (HIV), myeloid cell leukemia-1, and cyclin-dependent kinase 2 proteins. It was shown that Jarzynski's nonequilibrium binding free energy ΔGneqJar correlates with the available experimental data with the correlation levels R = 0.89, 0.86, 0.83, 0.80, 0.83, and 0.81 for six data sets, while for the binding free energy ΔGLIE obtained by the LIE method, we have R = 0.73, 0.80, 0.42, 0.23, 0.85, and 0.01. Therefore, JE is recommended to be used for ranking binding affinities as it provides accurate and robust results. In contrast, LIE is not as reliable as JE, and it should be used with caution, especially when it comes to new systems.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Ligands

Year: 2020 PMID： 32484689 PMCID： PMC7590978 DOI： 10.1021/acs.jpcb.0c02009

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

In computer-aided drug design, the binding affinity between the receptor and the ligand can be classified according to several quantities, but the most important one is the binding free energy (ΔGbind) because it is directly related to the inhibition constant that can be measured experimentally. Many methods have been developed to compute the free energy difference between bound and unbound states. Among them, molecular docking[1] is the fastest but not sufficiently accurate. However, because of its high speed, it is commonly used in computer-aided drug design to select potential compounds from large databases. Molecular dynamics (MD)-based methods such as molecular mechanics Poisson–Boltzmann or generalized Born surface area (MM/PB(GB)SA),[2] linear interaction energy (LIE),[3] thermodynamics integration (TI),[4] and free energy perturbation (FEP)[5] are more accurate than the docking method. Different from other MD methods, FEP and TI are exact and, as expected, the most accurate[6] but much more time consuming. The underlying phenomenological description of FEP and TI is based on the equilibrium mechanics, and in the case of the system with more than 105 atoms, an equilibrium transition from the bound state to the unbound state is practically impossible. A less computationally demanding method is the LIE method,[3,7] in which it is assumed that the ligand–protein binding affinity linearly depends on the polar and nonpolar binding energies. LIE does not consume a lot of computational time because it takes into account only bound and unbound states, while the others mentioned above require knowledge of the transformation process, including transition/intermediate states.[7−9] The contribution from the electrostatic and van der Waals (vdW) interaction energy is obtained through the thermodynamic cycle.[10] In the past few decades, several groups have successfully applied the LIE method to predict new inhibitors of human dihydrofolate reductases[10] and to obtain the binding affinity of inhibitors of neuraminidase,[11] BACE-1,[12] cytochrome P450,[13,14] farnesoid X receptor,[15] and others.[16] Several improved versions of LIE were proposed.[17−19] After the appearance of the atomic force microscopy (AFM) technique in 1985, the steered MD (SMD) method[20−23] was used to study unbinding of the ligand from the protein pocket. Several groups[24−29] showed that SMD is accurate and still fast enough to deal with a large number of compounds. In comparison to MM/PBSA, the SMD method is not only similar in accuracy[25] but also requires less computational time. In the context of SMD-based drug design, it is important to notice that Colizzi et al.[30] and Mai et al.[24] reported that the rupture force (Fmax in the force–displacement/time profile) correlates with the experimental inhibition constant IC50 in such a way that the greater the rupture force, the higher the binding affinity. In addition, it has been shown[31] that the nonequilibrium work Wpull has a better correlation with experiment than the rupture force because it is a function of the entire process, while Fmax is computed only in a single state. Thus, both rupture force and nonequilibrium work can be used as a score function for ranking the binding affinity. Recently, it has been shown[32] that the rupture time and unbinding and binding barriers are also reliable measures for binding affinity. The SMD method becomes a promising tool in Computer-aided drug design[25] as it has been successfully applied to access the binding affinity for a large number of protein–ligand complexes, such as influenza virus, cancer target LSD1, and so forth.[33−37] Combining Jarzynski’s equality (JE)[38] and the Hummer–Szabo formalism,[39] we showed that[32] the nonequilibrium binding free energy ΔGneqJar is a good descriptor for binding affinity. Because this conclusion was obtained only for one data set, which includes 23 ligands in complex with β-lactamase protein, it remains unclear if it is valid for other systems. Therefore, one of our main goals is to check the validity of ΔGneqJar for ranking binding affinity for other complexes. It would be ideal to compare JE with exact methods such as FEP and TI. However, these methods are very laborious if we want to study a large number of protein–ligand complexes. Among MD-based methods, LIE is not only well tested for a number of systems[11−13,15,18,40−42] but also the fastest and, in terms of computational time, comparable to JE. Therefore, our second goal is to choose LIE and compare its effectiveness with JE. We conducted simulations for six sets of complexes that include 23 ligands bound to six different targets: β-lactamase, thrombin, factor Xa, HIV-1 protease (HIV), myeloid cell leukemia-1 (MCL-1), and cyclin dependent kinase 2 (CDK-2) proteins. The correlation level of ΔGneqJar with experimental data exceeds 0.80 for the six data sets, while the LIE method gives lower correlations of 0.73, 0.80, 0.42, 0.23, 0.85, and 0.01. Thus, ΔGneqJar is reliable for ranking binding affinities of all the studied systems and one can expect that this is true in all cases. However, ΔGLIEis less robust and the LIE method should be used with caution.

Materials and Methods

Materials

The atomic structures of all the studied complexes were collected from Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (https://www.rcsb.org/). The PDB identifiers of 23 ligands in complex with thrombin, factor Xa, HIV-1, MCL-1, CDK-2, proteins are shown in Tables S1–S5 in the Supporting Information. Their inhibition constant was experimentally measured using kinetic enzyme assay (see, e.g., Baum et al.[43] and references in the Supporting Information). Similar information on β-lactamase complexes is available in our previous work.[32] Note that thrombin (or factor IIa) is a serine protease that plays an important role in thrombosis and hemostasis because it converts the fibrinogen into fibril in blood clotting.[44] This enzyme occurs in the coagulation process by cleaving two peptide bonds of prothrombin (factor II)[45−48] which is synthesized in the liver.[49] The conversion of prothrombin to thrombin takes place at the penultimate stage in the cascade model of coagulation with the association of factor Xa protein.[44,46] At this stage, the platelet acts as a binding site for coagulation of the combined complexes.[50,51] The structure of thrombin includes two polypeptide chains, a light chain (A) and a heavy one (B). In humans, chains A and B have 36 and 259 residues, respectively.[52] According to the coagulation cascade, factor Xa protein is involved not only in extrinsic but also in intrinsic coagulation.[44,46] When interacting with factor Va protein, factor Xa activates the cleavage of prothrombin to thrombin; thus, thrombin will catalyze the conversion of fibrinogen to fibril. Thrombin inhibitors block the activity of thrombin, whereas factor Xa inhibitors decrease the thrombin production. The conformation of factor Xa includes two chains: a heavy chain with 303 amino acids and a light chain with 139 amino acids. Because of the structural similarities between factor Xa and thrombin protein, factor Xa inhibitors can also bind to thrombin, but this has not yet been confirmed by experiment and simulation. In addition, one factor Xa molecule is responsible for generating more than 1000 molecules of thrombins,[53,54] which makes factor Xa protein a more attractive target for the development of antithrombotic drugs than thrombin. HIV-1 protease has been known as the dimeric aspartic protease which cleaves the nascent polyproteins of HIV and plays a key role in viral replication.[55−57] HIV-1 protease inhibitors are a good example of the success of structure-based drug design in the AIDS treatment.[58,59] However, besides the high ratio of side effects, more and more HIV-1 mutants are found and linked toward HIV viral replication.[60,61] The development of new efficient HIV-1 protease inhibitors remains a challenge. MCL-1 protein, a member of the Bcl-2 family, is an important target in the treatment of human cancer.[62] Previous studies have shown that overexpression of MCL-1 can help cancerous cells escape death.[63,64] Amplification of MCL-1 has been observed not only in the lung, breast, pancreatic, and ovarian cancers but also in melanoma and leukemia.[65−71] In addition, overexpression of MCL-1 has triggered a resistance mechanism that is against a number of anticancer drugs including paclitaxel, vincristine, and gemcitabine.[72,73] The silencing of MCL-1 also potently induced the disappearance of subgroups of non-small-cell lung cancer cell lines.[74] However, no MCL-1 inhibitor has been passed into clinical trial. This problem prompts us to consider MCL-1 as a promising target for various types of cancers. CDK-2 protein is a member of 20 serine kinase proteins and plays a crucial role in cell division, transcription, and post-transcriptional modification.[75,76] CDK-2 is especially important as it can prevent cell cycle dysfunction.[77] In association with cyclin E and cyclin A, CDK-2 protein was observed in the regulation of the G1-S phase of the cell cycle. The idea of using CDK-2 inhibitors to influence cancer progression has been supported with much evidence.[78] Thus, CDK-2 is a potential drug target for anticancer therapy.

Experimental Binding Free Energy

Experimental binding free energy is related to the inhibition constant Ki by the following equation where Ki is measured in M. The Ki values, obtained in various experimental studies, are given in Tables S1–S5 for thrombin, factor Xa, HIV-1, MCL-1, and CDK-2 complexes, respectively. Ki of 23 β-lactamase complexes is available in our previous work.[32]

Steered MD Simulation

In SMD, an external force with a constant pulling speed v is applied to the ligand through a spring with a spring constant k. Then, the force acts on the ligand, F = k(Δx – vt), where Δx is the displacement of a pulled atom from its initial position. As in the typical AFM experiment, we chose k = 600 kJ/mol nm2. Similar to our previous studies,[24,32] we used the pulling speed v = 5 m/s because the rupture force[79] and work[31] depend on v, but the correlation between the experimental and simulation data on the binding affinity is not sensitive to it. SMD simulations were conducted using the GROMACS package.[80] It is well known that the rupture force is sensitive to the pulling direction,[24,81] which prompted us to determine which pulling path should be chosen. For this purpose, we followed the approach proposed earlier[24,81] that Caver software[82] was first employed to search for all possible pulling paths. Then, for each target, we selected a representative ligand to pull along these directions using SMD simulation and chose a path that corresponds to the lowest rupture force. The chosen direction was used for SMD simulation for the remaining compounds and was adapted along the z-axis for six targets, including thrombin, factor Xa, HIV-1, MCL-1, CDK-2, and β-lactamase (Figure ).

Figure 1

3D structure of typical complexes of thrombin, factor Xa, β-lactamase, HIV-1, MLC-1, and CDK-2 protein. The pulling direction along the z-axis in SMD simulation is shown. Plots were made using PyMol. A receptor–ligand complex was simulated with the CHARMM27 force field[83] and TIP3P water model.[84] A solvated box was set to ensure that the smallest distance between the protein and the boundary was large enough to prevent protein–protein self-interaction. The ligand conformation and the SwissParam server (http://www.swissparam.ch)[85] were used to create its topology. The system including the receptor–ligand complex, solvents, and counter ions was equilibrated through three steps. First, energy minimization was carried out using the steepest descent method and the criterion that convergence is achieved when the maximum force becomes less than 100 kJ/mol/nm. Then, position-restrained MD simulation was performed for 500 ps in the NVT ensemble with a velocity rescaling thermostat[86] followed by 500 ps in the NPT ensemble with a Parrinello–Rahman barostat[87] to make sure that the system was stable. Temperature and pressure, in our simulation, were set at 300 K and 1 atm, respectively. Twenty independent SMD trajectories of 600 ps were performed for each complex. We also restrained all Cα atoms of the receptor to avoid the drift of the protein under the external force. From the force–time/position profile, we collected the rupture force Fmax, a force needed for the protein–ligand dissociation. We can compute the pulling work as

Equation for Jarzynski’s Nonequilibrium Binding Free Energy ΔGneqJar

According to JE,[38] for an irreversible process, the difference between the equilibrium free energies of the two states can be calculated from the distribution of work performed for transformation between them. Extending JE to the case when the external force with a constant pulling speed is applied, Hummer and Szabo[88] showed that the time-dependent binding free energy is given by the following equationwhere ⟨...⟩ is the average over N trajectories, z is the time-dependent displacement, and W is the nonequilibrium or pulling work at time t, that is, W = Wpull(t), where Wpull is defined by eq . Equation means that we can truly extract an equilibrium quantity by assembling the external work of an infinite number of nonequilibrium processes. In this study, when the transformation is not slow enough and the number of SMD runs is finite, we can obtain only Jarzynski’s nonequilibrium binding free energy ΔGneqJar.[32] Therefore ΔGneqJar is defined by eq , but for the nonequilibrium case.

LIE Method

In LIE, the binding free energy of a ligand ΔGLIE can be expressed with two terms: polar and nonpolar contributions, ΔGLIE = ΔGpolar + ΔGnonpolar. The entropy is implicitly taken into account. In MD simulation, the polar and nonpolar energy are the electrostatic and vdW interaction energies, respectively. To express free energy in terms of potential energy, Åqvist et al.[3] adopted the linear approximation[89,90] for the electrostatic interaction, where ΔGpolar = ΔGelec ∼1/2Δ⟨Velec⟩. The exact scale factor 1/2 was subsequently changed to improve the accuracy.[40,41] The studies of Pratt–Chandler[91] and Ben-Naim–Marcus[92] also showed that we can replace ΔGnonpolar with a part of the difference in the vdW interaction energy. Therefore, an estimation of the binding free energy through a thermodynamic cycle in the LIE method is given by the following equationwhere Vl–sel and Vl–svdW denote the electrostatic and vdW interaction between the ligand and surrounding (protein, water, and counter ions), respectively. The interaction energy in the bound and free states is the energy at the time t = 0 and at the end of simulation, when the ligand is completely released from the binding site. In the bound state, the ligand is the most stable in the protein active site, whereas in the unbound state, the ligand is far from the active site and practically does not interact with the receptor. The radius of polar/nonpolar interaction is infinite, but using the cutoff parameters, protein–ligand’s association will be broken down if the pulling time is long enough for us to drive the ligand far away from the protein’s pocket than 1.4 nm. As shown below, the initial and final snapshots, obtained in SMD simulations, are sufficient for estimation of the LIE binding free energy, which is given in eq . However, to enhance structural sampling, for each SMD trajectory, initial and final snapshots were selected as initial conformations for performing short molecular dynamic simulation with 250 ps and 500 conformations were recorded for evaluating ΔGLIE.

Parameters for Estimation of ΔGLIE

To obtain ΔGLIE, in the original model[9] scale factor β = 1/2, γ = 0 and the value of α is found by fitting to experimental data or by the so-called semi-empirical method. However, one can expect that[41,42] electrostatic effects are different in aqueous and protein environments. Then, the polar scale coefficient β is different for these conditions and is denoted as βprotein and βwater. The same is true for the vdW interaction with the scale factors αprotein and αwater. Later, it was realized that the use of parameters separately for protein and water is not necessary. Instead, one can use four values of parameter β ranking from 0.35 to 0.5, which is linked with the number of charged groups of the ligand. Further studies[13,40,42,93] allowed the parametrization of α, β, and γ which are used to compute the absolute binding free energy. In this scheme, scale factors β of all complexes, which depend on the chemical structure of the ligand, are presented in Tables S6–S11. A compound that contains a charged group has β = 1/2, while for the dipolar compound without any hydroxyl groups, β = 0.43. The dipolar compound that has one hydroxyl group has β = 0.37 and β = 0.33 for the dipolar compounds with two or more hydroxyl groups. Following Ljungberg et al.,[94] we set γ = 0 (kcal/mol) and tried to find the best value of the α parameter that would provide the best correlation between modeling and experiment.

Results and Discussion

Rupture Force and Time

When the pulling spring began to move away from the anchored molecule, the force exerted on the ligand increased rapidly, as evident from the force–time and force–displacement profiles (Figures and S1). At this stage, the ligand can be considered as being restrained in a harmonic potential as the force almost linearly depends on the displacement. The appearance of the maximum force Fmax at time tmax demonstrated the rupture event after which the ligand–protein association is broken down. It was shown that the higher the rupture force Fmax, the stronger the binding affinity,[24,30,32] and this conclusion was obtained for many systems.

Figure 2

Top panel: Dependence of the force acting on the ligand on displacement (left) and time (right). Rupture force Fmax occurs at time tmax. Bottom panel: Displacement dependence of Jarzynski’s binding free energy, showing that the ligand binding/unbinding is a barrier crossing event with the unbinding barrier ΔGunbind‡ and the binding barrier ΔGbind‡. Over 23 thrombin complexes, 1SL3 has the highest Fmax of 1250.9 pN, while 3SV2 has the lowest Fmax of 754 pN (Table ). From the force–time profiles of 23 ligands complexed with factor Xa and MCL-1, we found that the rupture force ranked from 695.3 to 1199.2 pN and from 536.8 to 1408.4 pN, respectively (Tables S12 and S14). The correlation between rupture force and experimental binding free energies is R = 0.81, 0.78, and 0.80 for thrombin (Figure ), factor Xa (Figure S2), and MCL-1 (Figure S4), respectively. These correlation levels are close to R = 0.82 for the β-lactamase complexes.[32] As evident from Figures S3 and S5, a lower correlation was obtained for HIV-1 (R = 0.64) and CDK2 (R = 0.66), but, as will be shown below, the JE method remains reliable for these sets because Jarzynski’s nonequilibrium free energy highly correlates with experiment (Table ).

Table 1

SMD Results Obtained for 23 Thrombin Complexesa

no.	ligand PDB ID	complex PDB ID	ΔG_exp (kcal/mol)	F_max (pN)	W_pull (kcal/mol)	ΔG_neq^Jar (kcal/mol)	t_max (ps)	ΔG_unbind^‡ (kcal/mol)	ΔG_bind^‡ (kcal/mol)
1	170	1SL3	–16.3	1250.9 ± 135.5	146.3 ± 22.2	–74.7 ± 1.9	226.7 ± 35.5	111.2 ± 24.3	230.9 ± 40.0
2	IH2	1C4V	–14.8	996.1 ± 109.2	118.0 ± 12.6	–100.0 ± 1.8	205.0 ± 22.7	72.4 ± 15.8	190.0 ± 28.1
3	IH1	1C4U	–14.2	1235.8 ± 217.4	148.2 ± 35.3	–78.4 ± 1.9	267.8 ± 39.5	108.6 ± 37.0	256.3 ± 71.1
4	33U	2ZO3	–13.7	959.6 ± 136.5	113.0 ± 18.3	–77.6 ± 1.9	200.9 ± 27.6	66.9 ± 18.8	179.5 ± 35.8
5	177	1TA6	–12.5	901.0 ± 128.0	94.9 ± 15.8	–75.6 ± 1.8	195.5 ± 23.4	57.0 ± 17.3	151.6 ± 33.0
6	T76	1NT1	–12.2	1069.8 ± 189.7	118.6 ± 22.6	–72.2 ± 1.9	222.5 ± 41.6	84.8 ± 27.2	203.0 ± 49.7
7	BM9	1BMN	–11.6	969.4 ± 136.6	107.7 ± 18.0	–73.7 ± 1.9	203.1 ± 25.4	69.3 ± 19.2	176.7 ± 36.7
8	MID	1ETS	–11.3	859.2 ± 131.0	104.8 ± 17.2	–62.8 ± 1.9	204.3 ± 29.7	47.3 ± 15.0	151.8 ± 30.7
9	894	2JH6	–10.7	741.8 ± 108.1	77.6 ± 9.5	–55.7 ± 1.8	166.8 ± 19.1	39.4 ± 12.6	116.8 ± 20.2
10	23U	3DHK	–10.0	769.0 ± 148.2	86.5 ± 13.2	–59.7 ± 1.9	182.1 ± 21.0	42.2 ± 17.2	128.1 ± 29.9
11	64U	3DUX	–9.6	938.3 ± 119.0	100.2 ± 15.1	–66.8 ± 1.9	202.9 ± 28.8	61.2 ± 15.1	160.9 ± 28.5
12	22U	2ZC9	–9.2	736.8 ± 112.8	75.5 ± 9.0	–55.7 ± 1.9	161.5 ± 23.4	38.6 ± 13.0	113.7 ± 20.8
13	29U	2ZGX	–9.2	913.4 ± 143.9	98.6 ± 16.0	–66.7 ± 1.9	192.7 ± 30.2	61.2 ± 19.8	159.6 ± 35.0
14	895	2JH5	–9.0	808.7 ± 111.8	82.8 ± 8.9	–59.8 ± 1.9	170.6 ± 21.8	47.9 ± 12.4	130.5 ± 19.9
15	GR3	1AWH	–8.3	638.4 ± 88.3	69.5 ± 9.4	–51.6 ± 1.8	142.2 ± 21.8	29.1 ± 9.4	98.4 ± 17.0
16	00R	1D6W	–8.2	723.3 ± 158.0	84.0 ± 16.8	–62.5 ± 1.8	174.6 ± 36.5	37.2 ± 17.2	121.0 ± 31.5
17	B01	3SHC	–7.8	769.8 ± 130.1	83.0 ± 13.6	–59.3 ± 1.9	185.0 ± 24.8	40.8 ± 16.0	123.5 ± 28.0
18	P97	3SHA	–7.7	752.5 ± 126.0	80.5 ± 12.1	–53.1 ± 1.9	159.9 ± 26.3	41.6 ± 13.8	121.9 ± 24.6
19	19U	2ZFP	–7.1	879.3 ± 257.6	96.6 ± 42.6	–36.1 ± 1.9	192.9 ± 54.1	62.3 ± 40.0	158.5 ± 82.5
20	91U	3F68	–6.9	847.8 ± 164.1	92.6 ± 17.4	–47.0 ± 1.9	186.3 ± 34.4	52.1 ± 23.4	144.5 ± 39.8
21	M18	3EGK	–6.4	616.0 ± 86.3	61.3 ± 8.7	–45.4 ± 1.9	144.0 ± 13.4	26.5 ± 8.7	87.3 ± 17.1
22	99P	3P17	–6.1	692.7 ± 166.0	74.1 ± 18.1	–53.3 ± 1.6	173.0 ± 31.6	34.9 ± 18.7	108.5 ± 34.3
23	P05	3SV2	–5.8	754.4 ± 110.7	77.8 ± 12.3	–46.3 ± 1.9	174.1 ± 26.6	39.4 ± 12.6	117.0 ± 23.6

ΔGneqJar, ΔGunbind‡, and ΔGbind‡ were obtained using JE.

Figure 3

Table 2

Correlation between the Experimental Binding Free Energy ΔGexp with the Rupture Force Fmax, Pulling Work Wpull, Rupture Time tmax, Binding ΔGbind‡ and Unbinding ΔGunbind‡ Barriers, and Jarzynski’s Binding Free Energy ΔGneqJar, Obtained in the SMD Simulation for Six Complexesa

	thrombin	factor Xa	β-lactamase	HIV-1	MCL-1	CDK-2
F_max	0.81	0.78	0.82	0.64	0.80	0.66
W_pull	0.84	0.74	0.84	0.68	0.79	0.73
ln(t_max)	0.75	0.75	0.84	0.70	0.83	0.61
ΔG_unbind^‡	0.79	0.76	0.80	0.58	0.78	0.69
ΔG_bind^‡	0.81	0.77	0.83	0.63	0.77	0.68
ΔG_neq^Jar	0.86	0.83	0.89	0.80	0.83	0.81

Results for β-lactamase were taken from Truong et al and Li.[32]

Correlation between the experimental binding free energy ΔGexp and the rupture force Fmax, pulling work Wpull, rupture time tmax, binding ΔGbind‡ and unbinding ΔGunbind‡ barriers, and Jarzynski’s binding free energy ΔGneqJar, obtained in the SMD simulation for the thrombin complexes. The correlation coefficient R is also shown. ΔGneqJar, ΔGunbind‡, and ΔGbind‡ were obtained using JE. Results for β-lactamase were taken from Truong et al and Li.[32] Recently, based on the data set of 23 β-lactamase complexes, it has been found that[32] the rupture time tmax also correlates with binding affinity in such a way that the larger the tmax, the higher the binding affinity. An interesting question arises: does this conclusion remain true for other complexes? In order to answer this question, we estimated the rupture time for another five sets (Tables and S12–S15). For example, tmax varies between 144 and 267.8 ps for thrombin complexes (Table ), while for factor Xa, this range is narrower, from 168 to 249.7 ps (Table S12). The correlation level between simulation and experiment is R = 0.75 for both these data sets (Figures and S2). MCL-1 has a higher value of R = 0.81, while for CDK-2 and HIV-1, we obtained R = 0.62 and 0.69. Recall that the correlation between ln(tmax) and the experimental binding free energy is better for the β-lactamase complexes[32] with R = 0.84. Thus, for all the systems studied, the rupture time correlated with the binding affinity, and this result is reasonable because the stronger the binding, the longer it takes to extract the ligand from the binding site. More importantly, the rupture time can be used as a measure for binding affinity.

Correlation between Pulling Work and Binding Affinity

Pulling work was calculated using eq , and its time dependence is shown in Figure S1 for several systems. At the beginning, the work is almost zero because the ligand cannot move from the binding point. As the time increases, the work increases, reaching a stable value, which is called the nonequilibrium pulling work Wpull, necessary to drive the complex from a bound state to an unbound state. In the thrombin set, 1C4U has the greatest work of 148.2 kcal/mol, although its inhibition constant is ranked third (Table ). 1SL3, which holds the record for the inhibition constant, has slightly lower Wpull (146 kcal/mol), while Wpull of the second complex 1C1U is significantly lower (118 kcal/mol). In the case of factor Xa, the three strongest inhibitors, including 2P3T, 1MQ6, and 3CS7, have nonequilibrium work values of 115.4, 139.5, and 114.8 kcal/mol, respectively (Table S12). For the whole set, Wpull ranges from 73 to 139.5 kcal/mol. The nonequilibrium pulling work of HIV-1, MCL-1, and CDK-2 complexes is shown in Tables S13–S15. The correlation level between Wpull and ΔGexp is R = 0.84, 0.74, 0.68, 0.79, and 0.73 for thrombin, factor Xa, HIV-1, MCL-1, and CDK-2, respectively (Figures and S2–S5, Table ). For a set of 23 β-lactamase complexes, the corresponding correlation is 0.84.[32] The worst correlation was observed for HIV-1 (R = 0.68), but overall, for all data sets, the correlation between Wpull and ΔGexp is good. Thus, along with Fmax and tmax, Wpull can be used to rank binding affinity. In most of the cases, Wpull is a better indicator than Fmax because it has a higher correlation with experiment (Table ).

Nonequilibrium Binding and Unbinding Free Energy Barriers

The escape of a ligand from the protein binding site is the crossing over a free energy barrier. In the presence of the external force, using Hummer–Szabo eq , we can obtain the time-dependent free energy profiles (Figures and S1). The bound state at small times is separated from the unbound state at large times by the transition state (TS). The binding free energy barrier is defined as ΔGbind‡ = ΔGTS – ΔGunbound, while the unbinding barrier ΔGunbind‡ = ΔGTS – Gbound. The values of ΔGbind‡ and ΔGunbind‡ obtained in SMD simulation are shown in Tables and S12–S15 for the thrombin, factor Xa, HIV-1, MCL-1, and CDK-2, respectively. Over 23 thrombin complexes, 1SL3 has the highest ΔGunbind‡ (111.2 kcal/mol), while the lowest unbinding barrier (26.5 kcal/mol) belongs to 3EGK. For factor Xa complexes, it runs from 27.3 kcal/mol (2J94) to 101.4 kcal/mol (1MQ6). The correlation of ΔGunbind‡ with ΔGexp is pretty good with R = 0.79, 0.76, 0.58, 0.78, and 0.69 for thrombin (Figure ), factor Xa (Figure S2), HIV-1 (Figure S3), MCL-1 (Figure S4), and CDK-2 (Figure S5), respectively (see also Table ). These values are lower than R = 0.80 of β-lactamase.[32] Under nonequilibrium conditions, the binding process always passes through a barrier approximately two times higher than the unbinding barrier. This is consistent with the experimental fact that the process of ligand unbinding from a protein pocket is much faster than the binding process.[95] As expected, the nonequilibrium ΔGbind‡ also correlates well with ΔGexp with R = 0.81, 0.77, 0.63, 0.77, and 0.68 for thrombin (Figure ), factor Xa (Figure S2), HIV-1 (Figure S3), MCL-1 (Figure S4), and CDK-2 (Figure S5), respectively. Again, as in the unbinding case, the best correlation was obtained for β-lactamase[32] (R = 0.83). Although the correlation level of the nonequilibrium unbinding and binding barrier data with experiment is not as high as Jarzynski’s binding free energy (see below), we expect that a combination of these three quantities is useful for studying ligand binding.

Jarzynski’s Nonequilibrium Binding Free Energy

Using SMD simulation and eq , we obtained the time dependence of the binding free energy from each trajectory, as shown in Figure S1, for several systems. The binding free energy between the bound and unbound states is defined as ΔG at large time scales (Figure ). Averaging over all SMD trajectories, we obtained ΔGneqJar which characterizes the binding/unbinding process out of equilibrium because the pulling speed is not small and the number of trajectories is not large enough. ΔGneqJar of 23 β-lactamase complexes was obtained in the previous work,[32] showing that the correlation between ΔGneqJar and the experimental binding free energy is high (R = 0.89). This implies that the nonequilibrium binding free energy can be used to discern the relative binding affinities. Here, we want to further verify this conclusion for five more data sets. For thrombin, as shown in the experiment, 1SL3 has the highest binding affinity (Table ) but its ΔGneqJar (−74.7 kcal/mol) takes the fifth place. According to the ΔGexp rating, 1C4V is the second but has the lowest ΔGneqJar of −100 kcal/mol. 2ZFP has the highest Jarzynski’s nonequilibrium binding free energy of −36 kcal/mol. Twenty-three factor Xa complexes had the ΔGneqJar values that ranked between −96 (3CS7) and −62.1 kcal/mol (2J38) (Table S12). ΔGneqJar of HIV-, MCL-1, and CDK-2 complexes is given in Tables S13–S15. Because ΔGneqJar was obtained out of equilibrium, it is much lower that ΔGexp in all studied systems including the β-lactamase set.[32] As the pulling speed decreases, the rupture force and work decrease,[31,79] and therefore, the nonequilibrium binding free energy becomes smaller. Thus, in order to narrow the gap between simulation and experiment, we must reduce v, but this problem is left for further study. As can be seen from Figures and S2–S5, ΔGneqJar correlates with ΔGexp to a greater extent for the thrombin set (R = 0.86) compared with factor Xa (R = 0.83), HIV (R = 0.80), CDK-2 (R = 0.81), and MCL-1 (R = 0.83). However, these correlation levels are slightly lower than R = 0.89 of the β-lactamase set.[32] Interestingly, for all complexes, ΔGneqJar agrees with experiment better than other quantities (Table ).

Binding Free Energy Estimated by the LIE Method and Correlation with Experiment

A typical time dependence of the electrostatic and vdW interaction energy between the ligand and the protein is shown in Figure S6, where, for clarity, the interaction with the solvent was excluded. Obviously, the simulation time was long enough to turn off the ligand–protein interaction. To calculate ΔGLIE, we used eq with γ = 0, β given in Tables S6–S11 for the six sets and α ranged from 0 to 1. The average values of the polar and nonpolar interaction energies in the bound and unbound states for the six sets are shown in Tables and S16–S20 in the Supporting Information.

Table 3

Average Polar and Nonpolar Interaction Energy in the Bound and Free States for 23 β-Lactamase Complexesa

no.	AmpC PDB ID	ΔG_exp	⟨V^elec⟩_bound	⟨V^elec⟩_free	⟨V^vdW⟩_bound	⟨V^vdW⟩_free	ΔG_LIE α = 1
1	1XGJ	–8.2	–29.8 ± 4.9	–33.4 ± 2.5	–48 ± 5.8	–16.8 ± 3	–30 ± 11.2
2	2R9W	–7.0	–32.3 ± 5.9	–33.2 ± 2.7	–58 ± 7.3	–22.2 ± 3.1	–35.5 ± 13.2
3	1XGI	–6.7	–25.4 ± 5.3	–26.3 ± 2.7	–51.9 ± 5.4	–19.3 ± 2.6	–32.2 ± 11
4	2R9X	–6.7	–31.2 ± 4.8	–34.1 ± 2.8	–64.4 ± 4.9	–23 ± 3.2	–40.5 ± 10.6
5	4JXS	–6.5	–34 ± 5.9	–31.2 ± 3.1	–60.7 ± 5.4	–17.8 ± 3.3	–43.8 ± 11.7
6	4JXW	–6.5	–33.6 ± 6.9	–32.8 ± 2.9	–54.7 ± 6	–20.5 ± 3	–34.4 ± 12.2
7	1L2S	–6.3	–22.4 ± 2.7	–20.1 ± 2.1	–54.8 ± 3.4	–18 ± 2.4	–37.7 ± 7.5
8	4JXV	–6.2	–29.1 ± 5.4	–34.4 ± 3.1	–63.8 ± 6.9	–18.6 ± 3.4	–43.4 ± 13.1
9	2PU2	–6.1	–45.1 ± 7.3	–36.3 ± 3.2	–50.6 ± 5.7	–17.8 ± 3.4	–35.7 ± 12.6
10	4KZ4	–5.7	–25.4 ± 7	–21.6 ± 2.8	–47.1 ± 4.1	–15 ± 2.8	–33.5 ± 10.5
11	4KZA	–5.1	–32.5 ± 5.2	–24.4 ± 3	–51.1 ± 5.3	–13.2 ± 2.7	–40.9 ± 11.1
12	4KZ6	–4.3	–11.9 ± 2.8	–16 ± 2.2	–45.7 ± 4.1	–16.4 ± 2.4	–27.8 ± 8.3
13	3GRJ	–4.1	–16.1 ± 3.9	–21 ± 2.1	–33.2 ± 3.5	–10.5 ± 2.5	–21 ± 8.2
14	4KZ8	–3.8	–9.6 ± 1.9	–12.2 ± 1.6	–45.4 ± 3.6	–15.7 ± 1.9	–28.6 ± 6.9
15	4KZ3	–3.8	–25.9 ± 4.6	–25.3 ± 3	–39.2 ± 5.1	–10.3 ± 3	–29.1 ± 10.9
16	3GSG	–3.7	–14.5 ± 3	–20.4 ± 1.9	–43.5 ± 4.1	–15.4 ± 2.4	–25.9 ± 8.3
17	3GVB	–3.5	–23 ± 4.4	–18.3 ± 2.1	–41.9 ± 4.9	–14.2 ± 2.3	–29.3 ± 9.4
18	3GR2	–3.5	–16.6 ± 3	–21.7 ± 2.3	–40.9 ± 4.7	–14.9 ± 2.2	–24.1 ± 8.9
19	4KZ7	–3.4	–16.6 ± 4.2	–14.9 ± 2	–33.8 ± 3.5	–12.7 ± 2.3	–21.6 ± 8
20	2HDU	–3.2	–14.3 ± 3.5	–16.7 ± 2.1	–29.4 ± 4.4	–11.3 ± 2.1	–17.2 ± 8.6
21	3GV9	–2.9	–18.5 ± 4.7	–14.9 ± 2	–33.9 ± 3.9	–11.8 ± 2.3	–23.5 ± 8.8
22	4KZ5	–2.7	–15.9 ± 3.4	–18.3 ± 2.7	–48.1 ± 5.3	–18.2 ± 2.7	–29 ± 10.2
23	2HDR	–2.4	–18.7 ± 3.4	–21.5 ± 2.7	–25.5 ± 3.8	–6.2 ± 2.7	–18.3 ± 8.5

Absolute binding free energy ΔGLIE was obtained at an optimal value of α = 1.

Absolute binding free energy ΔGLIE was obtained at an optimal value of α = 1. The correlation between ΔGLIE and ΔGexp depends on α (Figure ). For MCL-1, β-lactamase, factor Xa, and CDK-2, the best correlations R = 0.85, 0.73, 0.42, and 0.01, respectively, were obtained at α = 1, while for thrombin and HIV-1, R = 0.80 at α = 0.16 and R = 0.23 at α = 0.43, respectively (Figure ). Thus, for all six data sets, the correlation, provided by the LIE method, is lower than JE (Table ), with the exception of MCL-1 for which the two methods gave almost the same R. In particular, R, obtained by LIE, is even lower than 0.5 for factor Xa, HIV, and CDK-2. The JE method provides a better fit with experiment than LIE because the former is based on the exact JE, while the latter accepts an approximation in which the binding free energy is a linear combination of the electrostatic and vdW interaction energies. In such an approximation, the entropy contribution was not estimated explicitly.

Figure 4

Dependence of R on parameter α for six data sets and the combined set.

Figure 5

Correlation between ΔGexp and ΔGLIE. The LIE binding free energy was computed at the α factor, shown in the figure. In all cases, we used γ = 0 and β, given in Tables S6–S11.

Table 4

Correlation Coefficient R between Experimental and Theoretical Binding Free Energies Obtained by JE and LIE Methods

data set	JE	LIE
β-lactamase	0.89	0.73
thrombin	0.86	0.80
factor Xa	0.83	0.42
HIV-1	0.80	0.23
MCL-1	0.83	0.85
CDK-2	0.81	0.01
all six sets	0.45	0.37
combined set of thrombin, factor Xa, β-lactamase, and MCL-1	0.88	0.43
combined set of HIV-1 and CDK-2	0.84	0.52
combined set of β-lactamase, thrombin, and factor Xa	0.93	0.85

Dependence of R on parameter α for six data sets and the combined set. Correlation between ΔGexp and ΔGLIE. The LIE binding free energy was computed at the α factor, shown in the figure. In all cases, we used γ = 0 and β, given in Tables S6–S11. Another disadvantage of LIE is that at α = 1, where the best correlation was achieved for β-lactamase, factor Xa, CDK-2, and MCL-1, the difference between ΔGLIE and ΔGexp is significant (Tables and S16, S19, and S20). This drawback becomes less pronounced for thrombin, where the best fit was obtained at α = 0.16 with ΔGLIE close to ΔGexp (Table S17) (a similar value of α was also chosen by other research groups[9,10]). To monitor the dependence of the difference between ΔGLIE and ΔGexp on α, we introduced the root mean square of the energy distance (RMSED) as followswhere N is the number of complexes in the data set. RMSED reaches a maximum at α = 1 and a minimum at αmin = 0.16, 0.12, 0.13, 0.35, 0.61, and 0.00 for β-lactamase, factor Xa, thrombin, HIV-1, MCL-1, and CDK-2, respectively (Figure S7 and Table S21). At αmin, we obtained R = 0.53, 0.17, 0.80, 0.23, 0.78, and 0.0 for β-lactamase, factor Xa, thrombin, HIV-1, MCL-1, and CDK-2, respectively. Therefore, the best agreement with the experiment was achieved for thrombin and MCL-1 not only in terms of high value of R but also from the point of view of the proximity of ΔGLIE to ΔGexp.

Correlation between Theory and Experiment for Combined Data Sets

As can be seen above, for the six data sets, JE gives better agreement with the experiment compared to the LIE method, in particular, for factor Xa. Combining six sets into one set of 138 complexes, we can show that the correlation level between ΔGneqJar and ΔGexp fell sharply to R = 0.45 (Figure ), which is well below the level of individual sets (Table ). For this large set, the best correlation between ΔGLIE and ΔGexp is achieved at α = 1 (Figure ) with R = 0.37 (Figure ), which is lower than that in JE case. However, in both cases, the correlation is poor, with R below 0.5.

Figure 6

Correlation between the ΔGneqJar (upper panel) and ΔGLIE (lower panel) with the experimental binding free energy for combined data sets. Straight lines refer to the linear fit for all six data sets (violet, correlation R), the combined set of thrombin, factor Xa, β-lactamase, and MCL-1 (orange, correlation R1), and the combined set of HIV-1 and CDK-2 (maroon, correlation R2). For JE, we obtained R = 0.45, R1 = 0.88, and R2 = 0.84, whereas for LIE, we obtained R = 0.37, R1 = 0.43, and R2 = 0.52. If we combined four data sets involving β-lactamase, thrombin, factor Xa, and MCL-1, then the correlation between ΔGneqJar and ΔGexp will become high (R = 0.88, Figure ), but ΔGLIE remains poorly correlated with experiment (R = 0.43). Similarly, for a kit that consists of the two remaining HIV and CDK-2 kits, the correlation with experiment is R = 0.84 and 0.52 for JE and LIE, respectively (Figure ). The best agreement between ΔGLIE and ΔGexp was obtained for a set that comprises β-lactamase, thrombin, and factor Xa complexes (R = 0.85, Figure S8). However, for this combined set, the JE method remains better than LIE, having R = 0.93 (Figure S8 and Table ). Therefore, for all four combined sets, JE gives much better agreement with experimental data than LIE (Table ). The main difference between the two methods is that ΔGneqJar is less sensitive to the size of the data set than ΔGLIE. The minimum root-mean-square deviation and correlation at αmin are shown in Table S21 for all four combinations. The largest deviation from the experiment was obtained for the set of 138 compounds (7.5 kcal/mol), while the best agreement was obtained for the set of thrombin, factor Xa, and β-lactamase (4.4 kcal/mol). The latter has the highest R = 0.85, which was achieved at αmin = 0.13 (Table S21). The remaining three combinations do not agree well with the experiment. Therefore, we advocate by comparing simulation with experiment for a given target but not for a set of several targets.

Conclusions

We studied nonequilibrium thermodynamics of small molecules in association with thrombin, factor Xa, HIV-1, MCL-1, and CDK-2 using JE and SMD. Similar to the β-lactamase case,[32] rating of ligand–protein binding affinities can be accessed by several quantities, among which Jarzynski’s nonequilibrium binding free energy is the most reliable. Because MD-based methods are the best tool for evaluating the equilibrium binding free energy, it is highly desirable to compare their ability in ranking binding affinity with the nonequilibrium JE approach. To start research in this direction, we have chosen the LIE method because it is computationally not expensive but still reliable in many cases.[11−13,15,18,40−42] Thus, the binding affinity of small molecules to six different proteins was studied using LIE. We found that in all cases, the correlation between the experimental and the simulation results is lower than that of JE, with the exception of MCL-1, where LIE (R = 0.85) is slightly better than JE (R = 0.83) (Table ). In particular, for HIV-1 and CDK-2, the LIE method gives R = 0.23 and 0.01, which are much lower than 0.80 and 0.81 obtained using JE (Table ). Because both ΔGLIE and ΔGneqJar were calculated using the data generated by the SMD simulation, the two methods consume approximately the same amount of computational time. Therefore, in terms of computation time, the two methods are equivalent, but JE has a clear advantage over LIE because it gives better agreement with experiment. From this point of view, JE in combination with SMD is a valuable tool for the computer-aided drug design. In addition, to rate the binding affinities, it is not necessary to carry out a time consuming equilibrium simulation because the nonequilibrium ΔGneqJar, which can be quickly calculated, is a good indicator for this purpose. It would be interesting to compare the effectiveness of JE in assessing binding affinity with other methods such as FEP, TI, MM-PBSA, MM-GBSA, and umbrella sampling. Our work in this direction is in progress.

83 in total

1. Calculations of antibody-antigen interactions: microscopic and semi-microscopic evaluation of the free energies of binding of phosphorylcholine analogs to McPC603.

Authors: F S Lee; Z T Chu; M B Bolger; A Warshel
Journal: Protein Eng Date: 1992-04

Review 2. Platelets and thrombin generation.

Authors: Dougald M Monroe; Maureane Hoffman; Harold R Roberts
Journal: Arterioscler Thromb Vasc Biol Date: 2002-09-01 Impact factor: 8.311

Review 3. The coagulation cascade: initiation, maintenance, and regulation.

Authors: E W Davie; K Fujikawa; W Kisiel
Journal: Biochemistry Date: 1991-10-29 Impact factor: 3.162

Review 4. Inhibitors of HIV-1 protease: a major success of structure-assisted drug design.

Authors: A Wlodawer; J Vondrasek
Journal: Annu Rev Biophys Biomol Struct Date: 1998

5. Ligand-receptor affinities computed by an adapted linear interaction model for continuum electrostatics and by protein conformational averaging.

Authors: Ariane Nunes-Alves; Guilherme Menegon Arantes
Journal: J Chem Inf Model Date: 2014-08-06 Impact factor: 4.956

6. A novel cdk2-selective inhibitor, SU9516, induces apoptosis in colon carcinoma cells.

Authors: M E Lane; B Yu; A Rice; K E Lipson; C Liang; L Sun; C Tang; G McMahon; R G Pestell; S Wadler
Journal: Cancer Res Date: 2001-08-15 Impact factor: 12.701

Review 7. Thrombin signalling and protease-activated receptors.

Authors: S R Coughlin
Journal: Nature Date: 2000-09-14 Impact factor: 49.962

8. Mcl-1 regulates survival and sensitivity to diverse apoptotic stimuli in human non-small cell lung cancer cells.

Authors: Lanxi Song; Domenico Coppola; Sandy Livingston; Doug Cress; Eric B Haura
Journal: Cancer Biol Ther Date: 2005-03-20 Impact factor: 4.742

Review 9. What is all that thrombin for?

Authors: K G Mann; K Brummel; S Butenas
Journal: J Thromb Haemost Date: 2003-07 Impact factor: 5.824

Review 10. The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities.

Authors: Samuel Genheden; Ulf Ryde
Journal: Expert Opin Drug Discov Date: 2015-04-02 Impact factor: 6.098