Literature DB >> 25937822

Hybrid Steered Molecular Dynamics Approach to Computing Absolute Binding Free Energy of Ligand-Protein Complexes: A Brute Force Approach That Is Fast and Accurate.

Abstract

Computing the free energy of binding a ligand to a protein is a difficult task of essential importance for which purpose various theoretical/computational approaches have been pursued. In this paper, we develop a hybrid steered molecular dynamics (hSMD) method capable of resolving one ligand–protein complex within a few wall-clock days with high enough accuracy to compare with the experimental data. This hSMD approach is based on the relationship between the binding affinity and the potential of mean force (PMF) in the established literature. It involves simultaneously steering n (n = 1, 2, 3, ...) centers of mass of n selected segments of the ligand using n springs of infinite stiffness. Steering the ligand from a single initial state chosen from the bound state ensemble to the corresponding dissociated state, disallowing any fluctuations of the pulling centers along the way, one can determine a 3n-dimensional PMF curve connecting the two states by sampling a small number of forward and reverse pulling paths. This PMF constitutes a large but not the sole contribution to the binding free energy. Two other contributors are (1) the partial partition function containing the equilibrium fluctuations of the ligand at the binding site and the deviation of the initial state from the PMF minimum and (2) the partial partition function containing rotation and fluctuations of the ligand around one of the pulling centers that is fixed at a position far from the protein. We implement this hSMD approach for two ligand–protein complexes whose structures were determined and whose binding affinities were measured experimentally: caprylic acid binding to bovine β-lactoglobulin and glutathione binding to Schistosoma japonicum glutathione S-transferase tyrosine 7 to phenylalanine mutant. Our computed binding affinities agree with the experimental data within a factor of 1.5. The total time of computation for these two all-atom model systems (consisting of 96K and 114K atoms, respectively) was less than one wall-clock week using 512 cores (32 Xeon E5-2680 processors).

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2015 PMID： 25937822 PMCID： PMC4411208 DOI： 10.1021/ct501162f

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Accurately computing the free energy of binding a ligand to a protein is a task of essential importance in biochemical and biophysical studies that still presents us considerable difficulty to overcome.[1−14] An effective approach in the current literature is to use the relationship[1,4,14] between the potential of mean force (PMF)[15−19] and the binding affinity. The equilibrium approaches, based on PMF or not, are not brute force in nature but require delicate choices of biasing/constraining potentials during the simulation processes. The nonequilibrium steered molecular dynamics (SMD)[20−38] approach, seemingly brute force, can be very efficient in sampling forced transition paths from the bound state to the dissociated state of the ligand but has not been used reliably for free-energy calculations with quantitative accuracy.[37] In this paper, we present a hybrid steered molecular dynamics (hSMD) approach that produces binding affinities in quantitative agreement with experimental measurements (within a factor of 1.5 in terms of the dissociation constant kD defined as the ligand concentration at which the holoprotein concentration equals the apoprotein concentration). The hSMD approach is based on the relationship between the PMF and the binding affinity in the established literature. The widely used SMD involves pulling one center of mass of one selection of the ligand atoms using a spring of finite, carefully chosen, stiffness. In contrast, the hSMD approach involves pulling n (n = 1, 2, 3, ...) centers of mass of n selected segments of the ligand (using n springs of infinite stiffness to disallow any fluctuation of the pulling centers along the way) to produce a 3n-dimensional (3n-D) PMF curve leading from the binding site on or inside the protein to the dissociated state in the bulk region far from the protein. This PMF difference between the bound state and the dissociated state gives a large (but not dominant) part of the absolute binding free energy. Another part of the hSMD approach is the equilibrium molecular dynamics (MD) sampling of the ligand’s fluctuations at the binding site that also contribute to the binding free energy. The third, final, part of the hSMD approach is SMD stretching and equilibrium MD sampling of the ligand in the dissociated state when one of the n pulling centers is fixed at a point in the bulk far from the protein. This contributes the final piece of the absolute binding free energy. We carry out applications of this hSMD approach to two ligand–protein complexes whose binding affinities were experimentally measured and whose crystal structures are available: OCA–GLB, caprylic acid (25 atoms, neutral) bound to bovine β-lactoglobulin; and GSH–SjGST(Y7F), glutathione (36 atoms, singly negatively charged) bound to Schistosoma japonicum glutathione S-transferase tyrosine 7 to phenylalanine mutant. The computing times required were approximately 62 wall-clock hours for OCA–GLB (all-atom model of 95 296 atoms) and 88 wall-clock hours for GSH–SjGST(Y7F) (all-atom model of 114 538 atoms), respectively, using 32 Xeon E5-2628 processors (512 cores) in parallel. In each of the two cases, the computed absolute binding free energy agrees well with the experimental data.

Methods

Absolute Binding Energy from the 3n-D PMF

Following the standard literature,[1,4] the binding affinity at one binding site iswhere c0 is the standard concentration. For clarity and for convenience of unit conversion, we use two different but equivalent forms: c0 = 1 M on the left-hand side and c0 = 6.02 × 10–4/Å3 on the right-hand side of eq 1. kB is the Boltzmann constant and T is the absolute temperature. The three-dimensional (3D) integrations are over the x-, y-, and z-coordinates of the ligand’s position r1 that can be chosen as the center of mass of one segment of or the whole ligand. The integral has the units of Å3 that renders the right-hand side dimensionless, as it should be. W[r1] is the 3D PMF. The subscripts “site” and “bulk” indicate that r1 is near the PMF minimum and r1 = r1∞ in the bulk region far from the protein, respectively. For a ligand whose size is not small and whose shape is not simple, the position of one segment center r1 will not be sufficient/efficient to represent its location and situation. Instead, the ligand can be better described with the positions (r1, r2, ..., r) of n centers of mass of its n chosen segments. Figure 1 shows an example of n = 3, where the positions (r1, r2, r3) of the three α-carbons of glutathione are chosen to quantify the location and situation of the ligand. These n positions fluctuate without being in any way biased/constrained during the equilibrium MD simulation of the bound state. They are steered during the SMD runs from the bound state to the dissociated state for constructing the 3n-D PMF W[r1, r2, ..., r] as a function of these positions. In the dissociated state, one of them will be fixed at r1 = r1∞ while all others (r2, ..., r) rotate and fluctuate according to the stochastic dynamics of the system without any other bias/constraint.

Figure 1

Glutathione (GSH) at the binding site. The coordinates are taken from the PDB (code 1U87). The three α-carbons of GSH are shown as yellow balls marked with their position vectors. The protein surface is shown as wire frames and the ligand is shown as licorice, all colored according to atom names. All graphics of this paper were rendered with VMD.[39] Note that in the relationship between the 3D and 3n-D PMFsthe 3(n – 1)-D integration over the (n – 1) positions (r2, ..., r) is effectively in a defined neighborhood of r1 because the ligand as one whole molecule dictates that the n centers cannot be stretched much farther from one another than its molecular size. When r1 is near the binding site, so will be (r2, ..., r). When the ligand is in the dissociated state, r1∞ needs to be so far from the protein that integration over (r2, ..., r) will be all in the region far from the protein. C is the normalization constant that will be canceled out in the following expressions. Making use of eq 2 twice in eq 1 (for the binding site and for the bulk), one has the following expression for the binding affinity. Now inserting the Boltzmann factor at a single state (r10, r20, ..., r) chosen from the bound state ensemble and the Boltzmann factor at the corresponding dissociated state (r1∞, r2∞, ..., r), the binding affinity can be expressed as three contributing factors: the partial partition function at the binding site Z, the PMF difference between two chosen states (r10, r20, ..., r) and (r1∞, r2∞, ..., r), and the partial partition function in the dissociated state Z. MathematicallyHere (r1∞, r2∞, ..., r) can be connected to (r10, r20, ..., r) via many curves in the 3n-D space, but the PMF is a function of state. Computation of the PMF difference between the two states can be achieved along a single curve passing through them both. The partial partition function Z of the bound state has the integration over all n centers and thus has the units of Å3. The partial partition function Z of the dissociated state has the integration over n – 1 centers and thus has the units of Å3. Again, the use of c0 = 6.02 × 10–4/Å3 on the right-hand side of eq 3 renders it a pure number as desired. The dissociation constant will conveniently be in the unit of M = moles per liter. The 3n-D PMF differenceis between one chosen bound state and its corresponding dissociated state. This PMF difference can be computed by means of the SMD simulations described in the latter part of this section. Note that the one chosen position of the ligand in the bound stateis the starting point for SMD runs. It is taken from the bound state ensemble of the system. It does not have to be the minimum of the PMF but any one state in its close neighborhood. Note that we take the collection of coordinate vectors, e.g., eq 8, as a single-row 1 × 3n matrix. Its transpose in the following eq 13 is a single-column 3n × 1 matrix. The one state chosen from the dissociated state ensembleis related to the SMD starting point by a large enough displacement in the 3n-D space.Here vd is the constant velocity of the SMD pulling and t is the time it takes to steer/pull the ligand from the binding site to the bulk. The reference point for the PMF, W[r1∞, r2∞, ..., r] = 0, is chosen as when the ligand is far from the protein. The absolute free energy of binding is With all this, one needs to compute three factors to determine the absolute free energy of binding. (1) The first factor is the PMF difference between one chosen bound state and its corresponding dissociated state that can be computed by running SMD simulations of pulling the ligand forward and backward along a 3n-D line connecting the two states. Note that the PMF is a function of state (a point in the 3n-D space) and the PMF difference is independent of the paths connecting the two end points. (2) The second factor is the partial partition function in the bound state that can be approximated as Gaussian in cases of strong binding or can be numerically evaluated by running equilibrium MD simulation. (3) The third factor is the partial partition function in the dissociated state that needs to be computed case by case for n = 2, 3, .... When the binding is tight, one can approximate the integral of eq 5 as Gaussian in the neighborhood of the PMF minimum. The coordinates of the minimum of a Gaussian distribution are equal to the average coordinates, of course, (⟨r1⟩, ⟨r2⟩, ..., ⟨r⟩). Carrying out the Gaussian integral, one hasHere the dimensionless quantity Δ/kBT gives a measure of how far (r10, r20, ..., r), the initial state chosen for SMD, is from the PMF minimum (⟨r1⟩, ⟨r2⟩, ..., ⟨r⟩). Det represents the determinant. Σ is the 3n × 3n matrix of the fluctuations/deviations of the pulling center coordinates δx1 = x1 – ⟨x1⟩, etc., in the bound state ensemble: Σ–1 is the inverse matrix of Σ which can be accurately evaluated by running equilibrium MD in the bound state of the ligand–protein complex. This approximation is generally valid if the ligand does not deviate much from the binding site. However, it is invalid if the binding is extremely weak or the binding site is not well localized. Then one would have to evaluate the 3n-D integral in eq 5 directly by numerical means. To decide whether the Gaussian approximation is suitable or not, one can evaluate how far from the Gaussian distribution is the distribution of the fluctuations at the binding site. Note that this evaluation does not require additional equilibrium MD runs in the bound state of the ligand–protein complex. The only computing effort needed is statistical analysis of the fluctuation data. For example, by computing the first to the fourth moments of the fluctuations and checking their relationships, one can readily determine the non-Gaussian-ness of the fluctuations. Unlike the partial partition function Z of the bound state, the computation of the partial partition function of the dissociated state, Z in eq 6, needs to be done individually for each case of n = 1, 2, .... In the following, we detail three cases: n = 1, 2, and 3.

One Center of Mass Is Steered

When one center of mass is steered, n = 1, Z1∞ = 1, all we have to do is to evaluate the fluctuations around the binding site (the 3 × 3 matrix Σ1) along with the PMF difference. In this case, we have the dissociate constant, within the Gaussian approximation of the fluctuations at the binding siteand the absolute free energy of binding the ligand to the protein It needs to be pointed out that the method of pulling one center of the ligand is only practical for small ligands of simple shapes. Furthermore, the Gaussian approximation of the ligand fluctuations at the binding site in eq 12 is inapplicable to cases of large fluctuations. For example, glycerol binds to GlpF inside the conducting channel[40] and its fluctuations inside the channel are large and non-Gaussian. In the computation of its binding affinity, some special attention needs to be paid to the integral of eq 5 instead of using the Gaussian approximation in eq 12.[41]

Two Centers of Mass Are Steered

When two centers of mass are steered, n = 2, one haswhich is an integration over the second steering center when the first steering center is fixed in a position r1∞ far from the protein. All one needs to do now is to evaluate the integral of eq 17 around the position (r1∞, r2∞). Over there, far from the protein, the ligand’s environment is spherically symmetrical around the position of the first pulling center r1∞ that is fixed while the second pulling center r2 is free to sample all space available. Therefore, the 3D integral becomes the following one-dimensional integral:where r = |r2 – r1∞| is the distance between the two pulling centers. W∞[r] here, as a function of r, is the PMF (reversible work) for stretching the ligand between the two pulling centers. It can be evaluated by conducting SMD runs of steering the second pulling center r2 to and from the first pulling center that is fixed at r1∞ along the axis passing through (r1∞, r2∞). In this, we have the absolute free energy of binding the ligand to the binding siteand the dissociation constantHere the partial partition function of the dissociated state Z2∞ is given in eq 18 and the partial partition function of the bound state is approximated below as Gaussian.where Σ2 is a 6 × 6 matrix of coordinate deviations defined in eq 14 and Δ2 is defined in eq 13 with n = 2. It is worth noting that, for protein–ligand complexes whose binding is strong, the computation of Z20 and Z2∞ can be carried out efficiently with sufficient accuracy. However, the computation of the PMF difference W[r10, r20] – W[r1∞, r2∞] may be difficult. Particularly, if the orientational entropy plays a large role, one needs to pull three or more centers of the ligand segments.

Three Centers of Mass Are Steered

When three centers of mass are steered, n = 3, we need to evaluate the integral of eq 6 around the position (r1∞, r2∞, r3∞) when the ligand is far from the protein. Over there, the ligand’s environment is spherically symmetrical around the position of the first pulling center r1∞ that is fixed while the second and the third pulling centers r2 and r3 are free to sample all space available. The PMF W[r1∞, r2, r3] is only dependent upon three of the six degrees of freedom contained in (r2, r3). Namelywhere r21 = |r2 – r1∞| and r31 = |r3 – r1∞| are the distances between the second/third pulling center and the first pulling center that is fixed at r1∞. θ is the angle between (r2 – r1∞) and (r3 – r1∞). Therefore, the six-dimensional integral of eq 6 becomes the following 3D integral: In terms of the 3D probability distribution ρ(r21, r31, θ), the integral becomeswhich can be evaluated by equilibrium sampling in the 3D space (r21, r31, θ) from MD runs with r1 being fixed at r1∞. θ∞ is the angle between (r2∞ – r1∞) and (r3∞ – r1∞), of course. If the equilibrium sampling in 3D cannot be achieved with sufficient accuracy, we can approximate the PMF as three separable terms:Correspondingly, we obtainHere γ is a factor negating the possible error introduced in eq 25. The PMF W∞[r21], as a function of the distance between the two pulling centers r21, can be evaluated by conducting SMD runs of steering the second pulling center r2 to and from the first pulling center that is fixed at r1∞ along the axis passing through (r1∞, r2∞). The PMF W∞[r31], as a function of the distance between the two pulling centers r31, can be evaluated by conducting SMD runs of steering the third pulling center r3 to and from the first pulling center that is fixed at r1∞ along the axis passing through (r1∞, r3∞). The θ-integral can be approximated by conducting equilibrium MD runs with the first pulling center fixed at r1∞ to sample angular distribution. Note that the direct sampling method in eq 24 is exact but it requires sufficient sampling in the 3D phase space (r21, r31, θ), which is not an easy task. The approximation eq 26 can be evaluated with much less sampling effort in the dissociated state, but it involves the approximation in eq 25 that is somewhat arbitrary. When the two methods produce similar results (as is the case in this work), one can be confident of the computation, of course. After all this, we have the free energy of binding the ligand to the binding site: The corresponding binding affinity is Note that, by pulling three centers of the ligand, it is easier to compute the PMF difference W[r10, r20, r30] – W[r1∞, r2∞, r3∞] as the overall orientation of the ligand is fixed along the pulling paths. Also, the computation of Z30 is not difficult for complexes of strong binding whose equilibrium fluctuations can be well approximated as Gaussian:where Σ3 is a 9 × 9 matrix of coordinate deviations defined in eq 14 and Δ3 is defined in eq 13 with n = 3. However, the computation of Z3∞ will be considerably more elaborate than Z2∞. Nevertheless, for long ligands of irregular shapes, pulling three or more centers should be the optimal method for accurately computing the binding free energy or binding affinity.

PMF from SMD Simulations

In an SMD[20] simulation of the current literature, one steers/pulls one center of mass of one selection of atoms, using a spring with a carefully chosen stiffness (spring constant). The use of a spring of finite stiffness introduces fluctuations and dissipation in the added degrees of freedom.[35] In this paper, we choose n segments (mutually exclusive n selections of atoms) of the ligand molecule for steering/pulling with n infinitely stiff springs (n = 1, 2, 3, ...). Namely, the n centers of mass of the chosen n segments will be controlled as functions of time twhile all the other degrees of freedom of the system are freely subject to stochastic dynamics. Here r = (x, y, z) is the center of mass coordinates of the ith segments. vd is the pulling velocity. The + and – signs are for the forward and reverse pulling paths, respectively. {r} denotes (r1, r2, ..., r), etc. We adopt the multisectional scheme of ref (41). The path from the bound state to the dissociated state is divided into a number of sections. Within a given section whose end states are marked as A and B, respectively, multiple forward and reverse pulling paths are sampled along which the work done to the system is recorded. The Gibbs free energy difference (namely, the PMF or the reversible work) is computed via the Brownian dynamics fluctuation–dissipation theorem (BD-FDT)[42] as follows:Here the brackets with subscript “F” and “R” represent the statistical average over the forward/reverse paths. W{ is the work done to the system along a forward path when the ligand is steered from A to r. W{ = W{ – W{ is the work for the part of a reverse path when the ligand is pulled from r to A. {r}, {r}, and {r} are the coordinates of the centers of mass of the steered segments of the ligand at the end state A, the general state r, and the end state B of the system, respectively. At each end of a section, A/B, the system is equilibrated for a time long enough to reach conditioned equilibrium while the steered centers are fixed at {r}/{r}. In this way, running SMD section by section, we map the PMF W[{r}] as a function of the steered centers along a chosen path from the ligand’s bound state to its dissociated state.

Simulation Parameters

In all the equilibrium MD and nonequilibrium SMD runs, we used the CHARMM36[43,44] force field for all intra- and intermolecular interactions. We implemented Langevin stochastic dynamics with NAMD[45] to simulate the systems at a constant temperature of 298 K and a constant pressure of 1 bar. Full electrostatics was implemented by means of particle mesh Ewald (PME) at 128 × 128 × 128. The time step was 1 fs for short-range interactions and 2 fs for long-range interactions. The PME was updated every 4 fs. The damping constant was 5/ps. Explicit solvent was represented with the TIP3P model. Selected α-carbons on α-helices and β-sheets far from the binding site are fixed to their crystal structure coordinates, fully respecting the experimentally determined ligand–protein structures. The pulling was along the z-axis at a speed of 2.5 Å/ns in all SMD runs. Namely, vd = (0, 0, 2.5 Å/ns).

Results

OCA Binding to LGB

The simulation system of the OCA–LGB complex[46] is illustrated in Figure 2, which was set up by taking the crystal structure of the protein–ligand complex from the PDB (code 3NQ9), putting it in the center of a box of water, neutralizing the system, and then salinating it with Na+ and Cl– ions to the concentration of 150 mM. Running equilibrium MD for 25 ns leads to the equilibrated system shown in Figure 2. In particular, the equilibrium structure of OCA bound to the protein is shown in Figure 2d, where the C8 (position vector r1) and C1 atoms (position vector r2) were highlighted with the green and red balloons, respectively. These two atoms were chosen as the two steering/pulling centers for the nonequilibrium SMD runs.

Figure 2

(a) The in silico system box of the OCA–LGB complex (95 296 atoms, 100 Å × 100 Å × 92 Å in dimension). Na+ and Cl– ions are represented by VDW (spheres) colored in yellow and cyan, respectively; water is represented by red and white dots; protein is represented by ribbons colored by residue types; and the ligand (not easily visible) is represented by licorices colored according to atom names. (b) Caprylic acid bound to bovine β-lactoglobulin. Caprylic acid, visible on top of the figure, is represented by licorice while protein is represented by ribbons and by CPK (ball-and-stick), both colored according to residue types. (c) Caprylic acid (in licorice colored in green) resides in the binding pocket of the protein (in surface representation colored according to atom names). (d) Caprylic acid in licorice representation colored according to atom names. The two groups (highlighted in red and green respectively) are selected for steering/pulling.

At the binding site, the two pulling centers (r1, r2) fluctuate inside the binding pocket with small amplitudes (shown in the Supporting Information, Figure S1). These fluctuations give us the following statistics at the binding site: the determinant of the deviation matrix Det(Σ2) = 2.42 × 10–1 Å12 and the deviation of the SMD initial state from the PMF minimum Δ2 = 1.68 kcal/mol. (a) The in silico system box of the OCA–LGB complex (95 296 atoms, 100 Å × 100 Å × 92 Å in dimension). Na+ and Cl– ions are represented by VDW (spheres) colored in yellow and cyan, respectively; water is represented by red and white dots; protein is represented by ribbons colored by residue types; and the ligand (not easily visible) is represented by licorices colored according to atom names. (b) Caprylic acid bound to bovine β-lactoglobulin. Caprylic acid, visible on top of the figure, is represented by licorice while protein is represented by ribbons and by CPK (ball-and-stick), both colored according to residue types. (c) Caprylic acid (in licorice colored in green) resides in the binding pocket of the protein (in surface representation colored according to atom names). (d) Caprylic acid in licorice representation colored according to atom names. The two groups (highlighted in red and green respectively) are selected for steering/pulling. Conducting multiple SMD runs starting from the initial state (r10, r20) to the final state (r1∞, r2∞) along the z-axis, we sampled four forward and four reverse pulling paths connecting the two states. The curves of work done to the system along the pulling paths are shown in the Supporting Information, Figure S2. From those work curves, we computed the PMF as a function of the z-displacement Δz shown in Figure 3a. This PMF curve gives us the PMF difference between the one chosen bound state and its corresponding dissociated state:

Figure 3

(a) PMF W[r1, r2] as a function of the ligand displacement from its binding site along the pulling path when two centers are steered away from the protein. (b) PMF W∞[r] in the dissociated state as a function of the distance r between the two steered centers (the C1 and C8 atoms). W∞[r0] = 0 where r0 = 8.40 Å is the distance between C1 and C8 atoms of OCA at the binding site.

Note that the PMF rises gradually all the way until Δz = 8 Å as the ligand is steered out of the binding pocket. After that, the PMF levels off. This behavior clearly reflects the hydrophobic nature of the deep binding pocket of LGB and that of OCA’s long hydrocarbon chain.[46] The van der Waals attraction between OCA and LGB is gradually reduced along the dissociation path, and meanwhile, the hydrophobic surfaces of LGB and OCA are increasingly exposed to water as they are steered apart from one another. These two together are responsible for binding OCA’s hydrocarbon chain inside LGB’s β-barrel, giving rise to a large part of the binding free energy. There are two other parts of the binding free energy: First, fluctuations of OCA inside the barrel as represented by Z20. Second, the rotation and fluctuations of OCA in the dissociated state represented by Z2∞. In the dissociated state, we fixed the C8 atom at r1∞ and steered the C1 atom (position vector r2) toward and away from C8, sampling four forward paths and four reverse paths (shown in the Supporting Information, Figure S2). From the work curves along those pulling paths, we obtained the PMF W∞[r] in the dissociated state as a function of the C1–C8 distance r shown in Figure 3b. Carrying out the integral of eq 18 using this PMF curve, we obtained the partial partition function in the dissociated state Z2∞ = 2.01 × 103 Å3. (a) PMF W[r1, r2] as a function of the ligand displacement from its binding site along the pulling path when two centers are steered away from the protein. (b) PMF W∞[r] in the dissociated state as a function of the distance r between the two steered centers (the C1 and C8 atoms). W∞[r0] = 0 where r0 = 8.40 Å is the distance between C1 and C8 atoms of OCA at the binding site. It is not surprising to note that the ligand conformation in the bound state represented by the distance between the two pulling centers (C1 and C8 atoms)is not optimal in the dissociated state. For a range of r values, the PMF has lower values: In the bulk region, away from the protein, the linear OCA hydrocarbon carbon chain will fluctuate more freely than in the binding pocket of LGB. These effects constitute a significant contribution to the binding free energy represented in the partial partition function Z2∞. Putting all the afore-presented results together into eq 19, we obtain the absolute binding energy of −5.7 ± 0.6 kcal/mol (corresponding to a dissociation constant of kD = 71 μM) that is in comparison with the in vitro result of −5.5 kcal/mol (kD = 92.6 μM).[47]

GSH Binding to SjGST(Y7F)

The simulation system box of the GSH–SjGST(Y7F) complex[48] was set up by taking the crystal structure of the protein–ligand complex from the PDB (code 1U87), putting it in the center of a box of water, neutralizing the system, and then salinating it with Na+ and Cl– ions to the concentration of 150 mM. Running equilibrium MD for 30 ns leads to the equilibrated system shown in Figure 4. In particular, the equilibrium structure of glutathione is shown in Figure 4d, where the three α-carbons CA1 (position vector r2), CA2 (position vector r1), and CA3 (position vector r3), were highlighted with the red, green, and blue balloons, respectively. These three atoms were chosen as the three steering/pulling centers for the nonequilibrium SMD runs.

Figure 4

At the binding site, the three pulling centers (r1, r2, r3) fluctuate inside the binding pocket with small amplitudes (shown in the Supporting Information, Figure S3). These fluctuations give us the following statistics at the binding site: The determinant of the deviation matrix Det(Σ3) = 2.53 × 10–11 Å18 and the deviation of the SMD initial state from the PMF minimum Δ3 = 5.78 kcal/mol. Conducting multiple SMD runs starting from the initial state (r10, r20, r30) to the final state (r1∞, r2∞, r3∞) along the z-axis, we sampled four forward and four reverse pulling paths connecting the two states. The curves of work done to the system along the pulling paths are shown in the Supporting Information, Figure S4. From those work curves, we computed the PMF as a function of the z-displacement (shown in Figure 5). This PMF curve gives us the PMF difference between the one chosen bound state and its corresponding dissociated state.

Figure 5

PMF W[r1, r2, r3] along the path of pulling GSH out of the binding pocket.

(a) The in silico system box (114 538 atoms, 100 Å × 100 Å × 113 Å in dimension). Na+ and Cl– ions are represented by VDW (spheres) colored in yellow and cyan respectively; water is represented by red (oxygen) and white (hydrogen) balls and sticks (CPK); protein is represented by ribbons colored according to residue types; and the ligand GSH (not easily visible) is represented by licorices colored according to atom names. (b) GSH (licorice colored according to atom names) in complex with the protein (ribbons colored according to residue types and CPK colored according to atom names). (c) GSH (licorice colored according to atom names) in the binding pocket. Here the protein surface (colored according to atom names) near or around the binding site is shown. (d) GSH in licorice with its three α-carbons bubbled in red (CA1), green (CA2), and blue (CA3) balloons. These three α-carbons are chosen as the three pulling centers. The orientation of GSH here is identical to (c). PMF W[r1, r2, r3] along the path of pulling GSH out of the binding pocket. Note that the shape of GSH is irregular. Pulling its center of mass (n = 1) would not be efficient for sampling the relevant phase space. Pulling three centers (n = 3) turned out to be effective as indicated in the PMF curve of Figure 5. The PMF rises most rapidly during the first 1.5 Å of displacement, showing effectiveness of pulling the three α-carbons of GSH simultaneously to separate its backbone from the protein. From 1.5 to 7 Å the oscillatory rise in PMF indicates separation of GSH side chains from the protein. After that, the PMF gradually rises some more and then levels off after 15 Å, indicating the ligand is in the bulk region (dissociated from the protein). Pulling three centers here is advantageous over pulling one center because the separation of the ligand from the protein, if pulled by one center, involves tight entanglement of the ligand’s fluctuations in conformation and in orientation with the fluctuations of the many residues at and near the binding site. By pulling three centers, the ligand conformation and orientation are not allowed to fluctuate. Their fluctuations are sampled in the dissociated state far from the protein. This disentanglement of two sets of fluctuations turned out to be very effective. In the dissociated state, we computed the partial partition function Z3∞ in two ways. First, we used eq 24 on the long time equilibrium fluctuations of CA1 (r2) and CA3 (r3) around CA2 (fixed at r1∞) shown in the Supporting Information, Figure S5. Second, we computed the PMFs in eq 26. Fixing CA2 at r1∞, we conducted SMD runs steering CA1 (r2) toward and away from CA2, sampling four forward and four reverse pulling paths (shown in the Supporting Information, Figure S6a). Likewise, steering CA3 (r3) toward and away from CA2, we sampled four forward and four reverse pulling paths (shown in the Supporting Information, Figure S6b). From those work curves, we extracted the PMF W∞[r21] as a function of the CA1–CA2 distance in Figure 6a and the PMF W∞[r31] as a function of the CA3–CA2 distance in Figure 6b. Meanwhile, we approximate W∞[θ] ≈ 0 and γ ≈ 1. With all these put together into eq 26, we obtained the partition function Z3∞ in the dissociated state. In two different ways described above, we obtained identical results for the partial partition function of the dissociated state, Z3∞ = 1.2 × 104 Å6.

Figure 6

(a) PMF W∞[r21] as a function of the CA1–CA2 distance in the dissociated state. (b) PMF W∞[r31] as a function of the CA3–CA2 distance in the dissociated state. CA2 is fixed at r1∞. CA1 (position vector r2) and CA3 (position vector r3) are steered to and from CA2 in (a) and (b), respectively. It is interesting to note that the PMF W∞[r31] (Figure 6b) has a deep, nearly harmonic well centered at a CA3–CA2 distance r31 ∼ |r3∞ – r1∞| = |r30 – r10| = 3.8 Å. This indicates that binding GSH to the protein does not significantly stretch CA3 from CA2. In contrast, the PMF W∞[r21] as a function of the CA1–CA2 distance (Figure 6a) behaves very differently. The PMF well is anharmonic and its minimum is lower than the PMF value at the CA1–CA2 distance |r2∞ – r1∞| = |r20 – r10| = 6.2 Å of the bound state. Binding GSH to the protein actually causes CA1 to stretch away from CA2 and reduces the fluctuations of CA1. This gives rise to a nonnegligible contribution to the binding free energy. Putting together the bound state fluctuations, the PMF difference, and the dissociated state fluctuations into eq 27, we obtain the absolute binding energy of −7.0 ± 0.9 kcal/mol (corresponding to a dissociation constant of kD = 8.2 μM) that is in comparison with the ITC result of −6.75 kcal/mol (kD = 13 μM).[49]

Discussion

In terms of a computational approach, we have developed a hybrid steered molecular dynamics approach for computing absolute binding energy from the PMF along a dissociation path. Applying this hSMD approach with high-performance parallel processing, one can achieve, within a few wall-clock days, the computation of the binding affinity of one ligand–protein complex with accuracy comparable with experimental measurements. The hSMD approach is “brute force” in the sense that one does not have to delicately devise biasing and constraining potentials during the course of simulations. Also, it does not involve sophisticated ways of removing the artifacts introduced by biasing/constraining the ligand in other PMF-based and non-PMF-based approaches. hSMD can be implemented straightforwardly by steering/pulling the n centers of mass of n chosen segments of a ligand using n infinitely stiff springs along one predetermined dissociate path, disallowing any deviations from the path. This use of a single path is correct because the PMF is a function of state. Therefore, the PMF difference between two states is independent of the paths connecting them. All other contributions, in addition to the PMF difference between the two end states of this one dissociation path, are rigorously accounted for in the partial partition functions of the bound and the dissociated states. The segments chosen to be steered, however, need to be the most stable parts of the ligand in the bound state because the coordinate integrations of these centers then can be approximated as Gaussian. The Gaussian approximation is expected to be valid for a wide range of ligand–protein complexes because there should always be at least one center of a ligand that does not deviate greatly from its bound-state coordinates but is tightly bound to the protein except for the cases of very weak binding. Another possible difficulty for the hSMD approach lies in the coordinate integration in the dissociated state when three segments’ centers of mass are steered. The approximation in eq 26 is based on the assumptions that two of the three pulling centers do not have significant contribution from the stereo collision between them and that the angle between the lines they form with the other center is approximately free to bend. When these two assumptions are not valid, the coordinate integration in eq 23 will necessitate further new schemes of approximation. In terms of biophysics, we have provided atomistic details in support of the binding mechanisms of two ligand–protein complexes elucidated in the experimental investigations.[46−49] For both OCA–GLB and GSH–SjGST(Y7F) complexes, our equilibrium MD simulations confirm the experimentally determined binding conformations. During the long time dynamics, the ligands were found to fluctuate with small deviations at the binding sites determined in the crystal structures of the ligand–protein complexes (see Figures 2b,c and 4b,c and the Supporting Information, Figures S1 and S3). Also, our computed dissociation constants agree with the experimentally measured values within a factor of 1.5. Therefore, it is expected that hSMD can be used to reliably predict binding affinities of ligand–protein complexes whose structures are available in the PDB without data for binding affinities yet and that the hSMD predictions would be validated by future experimental measurements. Finally, the agreement between the computed results in this work and the experimental data was based on the CHARMM force field, indicating its accuracy. The hSMD approach, however, is independent of which force field to use. It can be implemented with other force fields, of course, which may or may not produce similar results.

41 in total

1. Energetics of glycerol conduction through aquaglyceroporin GlpF.

Authors: Morten Ø Jensen; Sanghyun Park; Emad Tajkhorshid; Klaus Schulten
Journal: Proc Natl Acad Sci U S A Date: 2002-05-07 Impact factor: 11.205

2. Absolute free energy of binding of avidin/biotin, revisited.

Authors: Ignacio J General; Ralitsa Dragomirova; Hagai Meirovitch
Journal: J Phys Chem B Date: 2012-02-27 Impact factor: 2.991

Review 3. CHARMM: the biomolecular simulation program.

Authors: B R Brooks; C L Brooks; A D Mackerell; L Nilsson; R J Petrella; B Roux; Y Won; G Archontis; C Bartels; S Boresch; A Caflisch; L Caves; Q Cui; A R Dinner; M Feig; S Fischer; J Gao; M Hodoscek; W Im; K Kuczera; T Lazaridis; J Ma; V Ovchinnikov; E Paci; R W Pastor; C B Post; J Z Pu; M Schaefer; B Tidor; R M Venable; H L Woodcock; X Wu; W Yang; D M York; M Karplus
Journal: J Comput Chem Date: 2009-07-30 Impact factor: 3.376

4. Toward quantitative estimates of binding affinities for protein-ligand systems involving large inhibitor compounds: a steered molecular dynamics simulation route.

Authors: Paolo Nicolini; Diego Frezzato; Cristina Gellini; Marco Bizzarri; Riccardo Chelli
Journal: J Comput Chem Date: 2013-04-26 Impact factor: 3.376

5. Antigen-antibody interactions and structural flexibility of a femtomolar-affinity antibody.

Authors: Hiroaki Fukunishi; Jiro Shimada; Kenji Shiraishi
Journal: Biochemistry Date: 2012-03-13 Impact factor: 3.162

6. Exploring the free-energy landscapes of biological systems with steered molecular dynamics.

Authors: L Y Chen
Journal: Phys Chem Chem Phys Date: 2011-02-25 Impact factor: 3.676

7. Sugar transport across lactose permease probed by steered molecular dynamics.

Authors: Morten Ø Jensen; Ying Yin; Emad Tajkhorshid; Klaus Schulten
Journal: Biophys J Date: 2007-04-13 Impact factor: 4.033

8. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields.

Authors: K Vanommeslaeghe; E Hatcher; C Acharya; S Kundu; S Zhong; J Shim; E Darian; O Guvench; P Lopes; I Vorobyov; A D Mackerell
Journal: J Comput Chem Date: 2010-03 Impact factor: 3.376

Review 9. Theory of free energy and entropy in noncovalent binding.

Authors: Huan-Xiang Zhou; Michael K Gilson
Journal: Chem Rev Date: 2009-09 Impact factor: 60.622

10. Driven Metadynamics: Reconstructing Equilibrium Free Energies from Driven Adaptive-Bias Simulations.

Authors: Mahmoud Moradi; Emad Tajkhorshid
Journal: J Phys Chem Lett Date: 2013-05-17 Impact factor: 6.475

18 in total

1. Elongation affinity, activation barrier, and stability of Aβ42 oligomers/fibrils in physiological saline.

Authors: Roberto A Rodriguez; Liao Y Chen; Germán Plascencia-Villa; George Perry
Journal: Biochem Biophys Res Commun Date: 2017-04-17 Impact factor: 3.575

2. 1,3-propanediol binds deep inside the channel to inhibit water permeation through aquaporins.

Authors: Lili Yu; Roberto A Rodriguez; L Laurie Chen; Liao Y Chen; George Perry; Stanton F McHardy; Chih-Ko Yeh
Journal: Protein Sci Date: 2016-02 Impact factor: 6.725

3. Caver Web 1.0: identification of tunnels and channels in proteins and analysis of ligand transport.

Authors: Jan Stourac; Ondrej Vavra; Piia Kokkonen; Jiri Filipovic; Gaspar Pinto; Jan Brezovsky; Jiri Damborsky; David Bednar
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

4. Molecular dynamics study of human carbonic anhydrase II in complex with Zn(2+) and acetazolamide on the basis of all-atom force field simulations.

Authors: Thierry O Wambo; Liao Y Chen; Stanton F McHardy; Andrew T Tsin
Journal: Biophys Chem Date: 2016-05-18 Impact factor: 2.352

5. Affinity and path of binding xylopyranose unto E. coli xylose permease.

Authors: Thierry O Wambo; Liao Y Chen; Clyde Phelix; George Perry
Journal: Biochem Biophys Res Commun Date: 2017-10-12 Impact factor: 3.575

6. Computing the binding affinity of a ligand buried deep inside a protein with the hybrid steered molecular dynamics.

Authors: Oscar D Villarreal; Lili Yu; Roberto A Rodriguez; Liao Y Chen
Journal: Biochem Biophys Res Commun Date: 2016-12-26 Impact factor: 3.575

7. Computing Protein-Protein Association Affinity with Hybrid Steered Molecular Dynamics.

Authors: Roberto A Rodriguez; Lili Yu; Liao Y Chen
Journal: J Chem Theory Comput Date: 2015-09-08 Impact factor: 6.006

8. Molecular dynamics simulations on the effect of size and shape on the interactions between negative Au₁₈(SR)₁₄, Au₁₀₂(SR)₄₄ and Au₁₄₄(SR)₆₀ nanoparticles in physiological saline.

Authors: Oscar D Villareal; Roberto A Rodriguez; Lili Yu; Thierry O Wambo
Journal: Colloids Surf A Physicochem Eng Asp Date: 2016-08-20 Impact factor: 4.539

9. Thermodynamic Integration in 3n Dimensions without Biases or Alchemy for Protein Interactions.

Authors: Liao Y Chen
Journal: Front Phys Date: 2020-06-09

10. 1,3-Propanediol binds inside the water-conducting pore of aquaporin 4: Does this efficacious inhibitor have sufficient potency?

Authors: Lili Yu; Oscar D Villarreal; L Laurie Chen; Liao Y Chen
Journal: J Syst Integr Neurosci Date: 2016-01-23