We present a strategy for carrying out high-precision calculations of binding free energy and binding enthalpy values from molecular dynamics simulations with explicit solvent. The approach is used to calculate the thermodynamic profiles for binding of nine small molecule guests to either the cucurbit[7]uril (CB7) or β-cyclodextrin (βCD) host. For these systems, calculations using commodity hardware can yield binding free energy and binding enthalpy values with a precision of ∼0.5 kcal/mol (95% CI) in a matter of days. Crucially, the self-consistency of the approach is established by calculating the binding enthalpy directly, via end point potential energy calculations, and indirectly, via the temperature dependence of the binding free energy, i.e., by the van't Hoff equation. Excellent agreement between the direct and van't Hoff methods is demonstrated for both host-guest systems and an ion-pair model system for which particularly well-converged results are attainable. Additionally, we find that hydrogen mass repartitioning allows marked acceleration of the calculations with no discernible cost in precision or accuracy. Finally, we provide guidance for accurately assessing numerical uncertainty of the results in settings where complex correlations in the time series can pose challenges to statistical analysis. The routine nature and high precision of these binding calculations opens the possibility of including measured binding thermodynamics as target data in force field optimization so that simulations may be used to reliably interpret experimental data and guide molecular design.
We present a strategy for carrying out high-precision calculations of binding free energy and binding enthalpy values from molecular dynamics simulations with explicit solvent. The approach is used to calculate the thermodynamic profiles for binding of nine small molecule guests to either the cucurbit[7]uril (CB7) or β-cyclodextrin (βCD) host. For these systems, calculations using commodity hardware can yield binding free energy and binding enthalpy values with a precision of ∼0.5 kcal/mol (95% CI) in a matter of days. Crucially, the self-consistency of the approach is established by calculating the binding enthalpy directly, via end point potential energy calculations, and indirectly, via the temperature dependence of the binding free energy, i.e., by the van't Hoff equation. Excellent agreement between the direct and van't Hoff methods is demonstrated for both host-guest systems and an ion-pair model system for which particularly well-converged results are attainable. Additionally, we find that hydrogen mass repartitioning allows marked acceleration of the calculations with no discernible cost in precision or accuracy. Finally, we provide guidance for accurately assessing numerical uncertainty of the results in settings where complex correlations in the time series can pose challenges to statistical analysis. The routine nature and high precision of these binding calculations opens the possibility of including measured binding thermodynamics as target data in force field optimization so that simulations may be used to reliably interpret experimental data and guide molecular design.
The rational design of
high-affinity ligands, in applications like
drug discovery and supramolecular chemistry, depends on an understanding
of how interactions involving the ligand, receptor, and solvent contribute
to the overall observed affinity. Calorimetric studies, which break
down the binding affinity into enthalpic and entropic components,
have been helpful in this respect and have led to discussions of entropy–enthalpy
compensation[1−8] and the role of entropy and enthalpy in ligand design.[9−15] Molecular simulations offer additional insight, as they link atomistic
detail and binding thermodynamics, but, although decompositions of
computed binding free energies into enthalpic and entropic terms are
sometimes performed,[16−19] this practice is not common, perhaps because these terms are regarded
as being more difficult to compute. Importantly, simulations often
allow the thermodynamic components themselves to be further subdivided
to provide rich detail regarding the driving forces of the binding
interaction, as we recently demonstrated for the binding enthalpies
of a series of host–guest systems.[20]Here, we extend our previous work to describe what may be
termed
computational calorimetry, in which both a binding free energy and
a binding enthalpy are obtained self-consistently from the same set
of simulations. We focus on host–guest systems (Figure ) for which ample experimental
data are available[21−30] and extensive sampling can be achieved via GPU-enabled simulations.[31] For these systems, just a few days of computing
on commodity hardware yields numerical precision as good as or better
than the corresponding reported experimental uncertainties. This precision
is enabled, in part, by enhancements to the attach–pull–release
(APR) free energy framework, which include the addition of orientational
and conformational restraints that facilitate efficient simulations.
As the centerpiece of this study, we validate the self-consistency
of this approach by demonstrating excellent agreement between binding
enthalpies computed directly, via end-state mean potential energies,[16−20,32−34] and indirectly,
via a van’t Hoff fit of the binding free energy.[16−19,35] Furthermore, we compare the numerical
efficiency of these direct versus van’t Hoff approaches, determine
whether hydrogen mass repartitioning (HMR) simulations[36−38] can be used to speed the calculations without impacting their fidelity,
and compare the results from two widely used approaches to estimate
the statistical uncertainty of mean quantities from simulations with
correlated data sets: statistical inefficiency analysis[39] and blocking analysis.[40]
Figure 1
Structures
of the host (left) and guest (right) molecules studied
in this work. The protonation state used in this study is shown for
each guest and reflects the dominant species under experimental conditions.
Binding thermodynamic values were calculated for cucurbit[7]uril (CB7)
with all guests except Hex and for β-cyclodextrin (βCD)
with Hex.
Structures
of the host (left) and guest (right) molecules studied
in this work. The protonation state used in this study is shown for
each guest and reflects the dominant species under experimental conditions.
Binding thermodynamic values were calculated for cucurbit[7]uril (CB7)
with all guests except Hex and for β-cyclodextrin (βCD)
with Hex.Comparison of our results with
experiment reveals significant deviation.
The extensive sampling and uncertainty analysis we perform, coupled
with the relative simplicity of host guest systems, rules out sampling
error as a major contributor to the deviation. Rather, to build rational
explanations of binding, further testing and improvement of simulation
force fields will be necessary. The computational calorimetry framework
we describe will be helpful in performing such validation and development
studies.
Methods
Binding Systems
The model host–guest
systems we chose for this work, cucurbit[7]uril (CB7) and β-cyclodextrin
(βCD), exemplify key features of protein–ligand binding,
but they are far more computationally tractable due to their small
size. The two hosts differ in their conformational properties and
binding characteristics. CB7 systems feature high binding affinities[7]uril. Proc. Natl. Acad. Sci. U.
S. A.. 2002 ">24,7]uril with Bicyclo[2.2.2]octane and Adamantane
Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J. Am. Chem. Soc.. 2011 ">27,41] and thus require a rigorous approach
to deal with the steep energy landscape encountered when removing
the guest. Additionally, binding enthalpies have been previously calculated[20] for these systems using the same force field
parameters but with a different simulation box setup, thus providing
a convenient comparison to study the influence of the choice in setup,
as detailed below. Cyclodextrin host–guest systems are attractive
due to the wealth of experimental data available.[21−23] Although their
binding affinities tend to be more modest than those of CB7 systems,
their flexibility and large number of hydrogen-bond donors and acceptors
offer a potentially more realistic representation of protein–ligand
binding. A complicating issue with cyclodextrins is that many guest
molecules can bind in two orientations that do not exchange on the
microsecond time scale of the present simulations. For example, hexanoate
can bind to βCD with the carboxylate group near the primary
hydroxyl face (primary binding mode) or at the secondary hydroxyl
face (secondary binding mode). Thus, binding calculations must be
performed for both orientations and combined to yield a single value
(see Section ). Finally, we study an extremely simple model of binding, an ion-pair,
to validate our computational procedures at high precision.The following subsections describe the parametrization and system
building protocol. Each set of simulations is named according to the
binding pair; the subscript Temp indicates a temperature series used
for van’t Hoff binding enthalpy calculations, and the subscript
HMR indicates calculations using hydrogen mass repartitioning. Refer
to Figure and Table for illustrative
and technical summaries, respectively, of the simulation sets discussed
below.
Table 1
Simulation Sets Studied in This Work
simulation set
setup detailsa
values calculated
total time (ns)b
CB7-All8
I
ΔG: 8 guests at 300
K
35265
ΔH: 8 guests at 300 K
16000
CB7-B2Temp
I
ΔG: 282, 288,
294, 300, 306, 312, 318
K
111090
ΔH: 300 K
3000
CB7-B2HMR
I
ΔG: 300 K
13094
ΔH: 300 K
4000
βCD-HexTemp
I
ΔG: 280, 290, 300, 310,
320 Kc
174536
ΔH: 300 Kc
33849
βCD-HexHMR
I
ΔG: 298 Kc
16 955
ΔH: 298 Kc
16955
βCD-HexAlt
I
ΔG: 300 Kd
13 958
ΔH: 300 Kd
2000
K-ClSimple
II
ΔG: 300 K
4000
ΔH: 300 K
2000
K-ClTemp
III
ΔG: 280,
290, 300, 310, 320 K
69563
ΔH: 300 K
2312
Simulation setup types are as following:
I = Rectangular box with a 3 anchor particle restraint setup; II =
Truncated octahedron box with 1 distance restraint; III = Rectangular
box with a two anchor particle restraint setup.
Aggregate simulation time over the
entire simulation set is indicated for direct calculations of the
binding free energy and binding enthalpy. Note that the binding free
energy total time includes the binding enthalpy simulation time since
the latter can be included with the former.
Free energies and enthalpies were
determined for the guest in two orientations inside the βCD
host, requiring twice as many simulations. Additionally, the total
time listed for the ΔH includes the two free
energy calculations required to appropriately weight the contributions
from the two binding orientations.
Calculations were performed only
on the primary orientation, as these simulations served as a consistency
check.
Simulation setup types are as following:
I = Rectangular box with a 3 anchor particle restraint setup; II =
Truncated octahedron box with 1 distance restraint; III = Rectangular
box with a two anchor particle restraint setup.Aggregate simulation time over the
entire simulation set is indicated for direct calculations of the
binding free energy and binding enthalpy. Note that the binding free
energy total time includes the binding enthalpy simulation time since
the latter can be included with the former.Free energies and enthalpies were
determined for the guest in two orientations inside the βCD
host, requiring twice as many simulations. Additionally, the total
time listed for the ΔH includes the two free
energy calculations required to appropriately weight the contributions
from the two binding orientations.Calculations were performed only
on the primary orientation, as these simulations served as a consistency
check.
Cucurbituril
Host and Eight Guests
All CB7 simulation sets used the same
parameters as those described
by Fenley et al.[20] in order to maintain
consistency with that publication. Briefly, parameters for the CB7
host and the guests were assigned by the program Antechamber:[42] partial charges used the AM1-BCC method,[43,44] whereas bonded and nonbonded parameters came from GAFF.[45] All simulations (both bound and unbound) were
solvated in an orthorhombic box of 2210 TIP3P waters with dimensions
of approximately 36 × 36 × 52 Å3, with the
exception of the larger B11 guest, for which 2500 waters were used
in a box of dimensions 36 × 36 × 60 Å3.
The simulation with the MVN guest also included one molecule of protonated
Tris buffer in order to match experiment. Included in each simulation
box were three noninteracting anchor particles, which, in conjunction
with several restraints, aligned the host–guest system with
the long axis of the box. Hydrogen mass repartitioning was tested
for the association of CB7 with B2. For these CB7-B2HMR simulations, the parameter/topology file was modified with the Parmed.py
program (distributed with AMBER) to reallocate the solute atom masses
in accordance with previously described hydrogen mass repartitioning
schemes.[36−38] Solvent molecules were not modified.
β-Cyclodextrin Host and Hexanoate
Guest
Cyclodextrin parameters were taken from Cézard
et al.,[46] and hexanoate parameters used
RESP[47] partial charges from the R.E.D.
Server program[48,49] and other bonded/nonbonded parameters
from GAFF.[45] In an attempt to closely match
typical experiments, which include 10–20 mM phosphate buffer,
we included two molecules each of H2PO4– and HPO32– and neutralized
with an appropriate number of sodium ions. To our knowledge, there
are no established parameters for inorganic phosphate ions in the
AMBER community, so we began with parameters for existing bio-organic
phosphates. Partial charges for each species were determined with
a RESP fit while Lennard-Jones terms and bonded parameters were taken
from GAFF. Because we observed nonphysical aggregation of phosphate
and sodium ions in the initial simulations, we increased the Lennard-Jones
radius parameters for the phosphateoxygen atoms from their default
values by 0.29 and 0.33 Å for H2PO4– and HPO32–, respectively,
in accordance with conclusions from Steinbrecher et al.[50] This change eliminated the aggregation, but
we do not regard these parameters as optimal, and we note the caution
from Steinbrecher et al. in regards to the uncertainty of their suitability.
The entire system was solvated with 2210 TIP3P waters in an orthorhombic
box whose dimensions were approximately 36 × 36 × 52 Å3. As for the CB7 simulations, anchor particles were used to
orient the system along the long axis of the box. Calculations were
done with (βCD-HexHMR) and without (βCD-HexTemp, βCD-HexAlt) hydrogen mass repartitioning
of the solute.
Ion-Pair Test Case
As a toy system
that would allow unambiguous numerical convergence, and thus rigorously
test our computational setups, we studied the binding of two monatomic
ions in aqueous solution. We used a single K+ and Cl– ion pair, with cation parameters from Åqvist[51] and anion parameters from Dang,[52] but we modified the ion parameters to create a diatomic
system with favorable enough affinity to bind together during simulation.
In particular, although this combination is known to produce unphysical
ion pairing,[53] the binding was not sufficiently
tight to mimic a host–guest-like system, so we increased the
binding affinity by artificially changing the charges on each ion
from ±1.0 to ±1.3. This led to a fairly stable ion pair,
which persisted during simulation. The K-ClSimple simulation
set, which did not include orienting anchor particles (see Section ), was solvated
with 2000 TIP3P waters in an isotropic truncated octahedron. In contrast,
the K-ClTemp simulations, which did use orienting anchor
particles, were solvated with 1000 TIP3P waters in an orthorhombic
box with dimensions of approximately 25 × 25 × 50 Å3, where the long dimension of the box corresponds to the guest
pulling axis.
Binding Free Energy Calculations
Our general approach to free energy calculations extends the theoretical
framework described by Velez-Vega and Gilson,[54] in which the binding free energy is computed in terms of the sum
of the work, W, required to attach restraints to
the complex, pull the ligand from the binding site and then release
the attached restraints (see schematic summary in the left-hand panel
of Figure ):Here, rather
than performing a single dynamic
pulling calculation, we break the calculation into a series of independent
simulation windows. The collection of independent simulations can
still be viewed as spanning an attach–pull–release (APR)
process, although, due to their independent nature, the process could
be equally viewed as unrelease–push–unattach. Breaking
the free energy path into independent simulations has two advantages:
(1) because each simulation is independently built and equilibrated,
little or no memory effect is encountered from one point on the path
to the next; (2) the entire calculation can be distributed across
a large, heterogeneous computational cluster, in order to improve
turnaround. In the following subsections, we provide more detail on
the binding free energy calculations performed in this work, starting
with a thermodynamic integration (TI) approach for calculating the
binding free energy from the APR simulation data.
Figure 2
Depiction of the attach–pull–release
(APR) binding
free energy calculation (left) and the direct binding enthalpy calculation
(right). The data shown was obtained from the primary orientation
of the βCD-HexTemp simulation set at 300 K. Error
bars are 95% CI.
Depiction of the attach–pull–release
(APR) binding
free energy calculation (left) and the direct binding enthalpy calculation
(right). The data shown was obtained from the primary orientation
of the βCD-HexTemp simulation set at 300 K. Error
bars are 95% CI.
Thermodynamic
Integration Approach
Following the approach detailed by Velez-Vega
et al.,[54] we first define the chemical
potential of the
restrained host–guest system in the following mannerwhere β ≡ 1/RT, R is the gas constant, T is the
absolute temperature, C° is the standard concentration
(1 M or 1 particle/1660 Å3), and E(X) is the potential energy as a function of atomic
coordinates, X, defined in the reference frame of
the host. The equation includes two nonphysical summation terms that
define harmonic restraint potentials on the rotational and translational
coordinates, or pose, of the guest relative to the host, xp, and on a set of conformational coordinates internal to the
host and/or the guest, xc. The Np/Nc number of harmonic restraints
are described with force constants, kp/kc (which implicitly include a factor
of 1/2, according to the convention in AMBER), and target values, x0p/x0c, along with scaling
coefficients λp/λc, which are varied
from 0 to 1 in order to turn the restraints on and off in the course
of the free energy calculations. In addition, E(X) implicitly includes a set of energy terms that restrain
the overall translational and rotational coordinates of the host within
the lab frame, without influencing its conformational distribution.
These restraints remain in place and unchanged during the entire APR
process. Most simply, a single atom of the host may be restrained
to a lab-frame position. More generally, it may be convenient to keep
the host positioned in and aligned with the long axis of a rectangular
simulation box via restraints tied to a set of artificial particles
anchored in the lab frame (black restraints in the Figure S1A example).In the following sections, we will
refer to five host–guest states, which depend on the status
of the variable restraints: bound, attached, pulled, pulled-free,
and unbound. These states are illustrated in Figures (left panel) and S1. The bound state has the guest bound to the host but with no added
restraints. The attached state is the same, except that restraints
have been turned on. In the pulled state, the restraints are still
on, but the equilibrium value of the restraint that sets the host–guest
distance has been adjusted to keep the guest far from the host. In
the pulled-free state, the conformational restraints have been turned
off, but the pose restraints are still in force. In the unbound state,
the pose restraints are removed and the guest is considered to be
at the standard state concentration. The following sections explain
the process for moving the host–guest system through each of
these states.
Attachment Phase
This phase begins
with the system in the bound state. The bound state does not include rotational or translational restraints on the guest or conformational
restraints on either the host or the guest. Therefore, using the definition
in eq , the bound state
corresponds to λp = 0 and λc = 0.
With λp = 1 and λc = 1, the chemical
potential is defined as the attached state, as it now includes restraints
necessary to extract the guest (depicted in gray in the Figure S1B example). The force of attaching these
restraints is λ-dependent and can be found by taking the partial
derivative of μhg° with respect to λp and λc (intermediate steps given in Supporting Information).Note that the angle
brackets
indicate the mean value from the simulation at a given value of λp/λc. Also, the scaling of λp/λc should be performed simultaneously for the above
equation to be valid. To compute the work, we estimate the integral
over the force curve (Figure , left)from a series of simulations
with discrete λ steps from 0 to 1, as detailed in the Appendix.
Pulling
Phase
The geometry of
the host molecule and our restraint setup (Figure S1B) allows us to vary the target value of a single distance
restraint parameter, designated x0,1p in our general equations below,
in order to pull the guest away from the host while the other restraints
remain constant. At this point, all of the restraints are fully turned
on (λp = λc = 1), and we compute
the work of increasing x0,1p from its initial to its final value,
which is the pulled state (Figures and S1C). First, we write the mean force as the partial derivative of the
chemical potential with respect to the varying parameter, x0,1p (see intermediate steps in Supporting Information):Note that,
during the pulling
phase, usually x0,1p > ⟨x1p⟩, so the
present expression
will generally yield a positive work for pulling the guest from the
host, as expected for favorable binding. To find the work, we integrate
the force over the change in equilibrium restraint length from the
initial attached distance to a final pulled distance at which the
host and guest can be considered to be noninteracting.As in the attachment phase,
in practice the integration is estimated from simulations at discrete
steps in x0,1p.
Figure 3
Restraint scheme demonstrating the use of anchor
particles for
attach–pull–release (APR) free energy calculations.
P1–3 are anchor particles, H1–3 are host atoms, and
G1–2 are guest atoms. The ED1, EA1–2, and ET1–3 labels indicate distance, angle, and
torsion restraints, respectively, which modify only the host translational
and rotational degrees of freedom and are included implicitly in the
potential energy function, E(X),
as described in the main text. These restraints are held constant
throughout all simulation windows and do not perturb the host’s
conformational degrees of freedom. Indicated in gray with labels x1p, x2p, x3p, and xc are restraints
that are attached over a series of simulation windows and are subsequently
used to pull the guest out of the host cavity. In this example, x1p is a distance restraint, x2p and x3p are angle restraints,
and xc represents host conformational restraints
(of any harmonic form). In parentheses, the corresponding spherical
coordinate or Euler angle is indicated for each translational/rotational
pose restraint, as defined in the main text. See Figure S1 for a full illustration of the APR process.
Restraint scheme demonstrating the use of anchor
particles for
attach–pull–release (APR) free energy calculations.
P1–3 are anchor particles, H1–3 are host atoms, and
G1–2 are guest atoms. The ED1, EA1–2, and ET1–3 labels indicate distance, angle, and
torsion restraints, respectively, which modify only the host translational
and rotational degrees of freedom and are included implicitly in the
potential energy function, E(X),
as described in the main text. These restraints are held constant
throughout all simulation windows and do not perturb the host’s
conformational degrees of freedom. Indicated in gray with labels x1p, x2p, x3p, and xc are restraints
that are attached over a series of simulation windows and are subsequently
used to pull the guest out of the host cavity. In this example, x1p is a distance restraint, x2p and x3p are angle restraints,
and xc represents host conformational restraints
(of any harmonic form). In parentheses, the corresponding spherical
coordinate or Euler angle is indicated for each translational/rotational
pose restraint, as defined in the main text. See Figure S1 for a full illustration of the APR process.
Release
Phase
In the release
phase, all of the restraints added during the attachment phase must
be removed either by numerical integration or analytically. Those
restraints that perturbed the conformational distribution of the system,
either host or guest, are removed in precisely the same manner as
the attachment phase, except that now the guest is held far from the
host. In this case, only λc is varied, while the
scaling factor for the translational and rotational pose of the guest
relative to the host, λp, is held constant. The λ-dependent
force curve is integrated and multiplied by −1 to find the
work of releasing the Nc conformational
restraints.After removal of the conformational
restraints, the system is in the pulled-free state (Figure S1D). Note that, in cases where conformation-perturbing
restraints are not used, there is no distinction between the pulled
and pulled-free states.Finally, since the guest’s rotational
and translational
degrees of freedom are still restrained relative to the frame of reference
of the host, we calculate the work to release these restraints and
place the guest at standard concentration (the unbound state, depicted
in Figure S1E). This work is the difference
between the chemical potential of the pulled-free state and the sum
of the chemical potentials of the separate host and guest at standard
concentration:Here, Xh and Xg represent the coordinates of the host and
guest respectively, so that X = (Xh,Xg). The cancellation of
the energy potential term is dependent on the assumption that the
host and guest are far enough apart to be noninteracting so that E(X) = E(Xh) + E(Xg). For evaluation of the Wrelease–std term, it is helpful to reconsider the guest’s translational
degrees of freedom in the spherical coordinate system (r, θ, φ) and its orientation in terms of Euler angles
(a, b, c). Here,
we rewrite eq with
appropriate Jacobian terms and, in accordance with our example figures,
identify the two translational restraints as r and
θ and one rotational restraint as b:This equation can be evaluated semianalytically,
without the use of molecular simulations. Note that it may readily
be generalized to a calculation in which restraints are also applied
to φ, a, and/or c.
Multistate Bennett Acceptance Ratio Approach
One may use the same set of dynamics trajectories to compute Wattach, Wpull, and Wrelease–conf values by the MBAR estimator
approach.[55] (Wrelease–std is still computed semianalytically, as noted above.) This is done
by postprocessing the simulation trajectories to obtain the restraint
potential energy of each simulation frame in the restraint potential
of each simulation window. The freely available Pymbar implementation
of the MBAR estimator[55] can then be used
to determine the relative free energy along the entire free energy
path.
Multiple Non-exchanging Bound States
For some host–guest complexes, the guest can adopt multiple,
distinct orientations that do not readily convert without unbinding.
For example, an asymmetric guest may bind to β-cyclodextrin
head-in or head-out. In such cases, the overall binding free energy
can be computed from Nb separate binding
free energies of the Nb noninterconverting
configurations:Note that eq places greater weight on results of lower free energy,
so errors in the component free energies propagate nonlinearly. Therefore,
we estimate both the mean and the uncertainty of ΔGbind,all° by numerically sampling from a Gaussian distribution based on the
values of ΔGbind,° and their respective
uncertainties (see Appendix).
Binding Enthalpy Calculations
The
enthalpy calculations in this article are performed in one of two
ways, termed the direct and van’t Hoff methods,[16−19] as now detailed.
Direct Method
The binding enthalpy
is the difference between the partial molar enthalpies of the complex
and the free molecules. In the direct approach, this quantity is obtained
from differences in mean potential energy between simulations of the
bound and free states. (Note that the pressure–volume contribution
to the binding enthalpy is negligible for binding in solution.) Previously,
Fenley et al.[20] implemented this method
by a multi-box approach, which uses four separate simulations of the
free host, the free guest, the host–guest complex, and any
additional pure solvent needed to exactly balance the stoichiometry
of the bound and unbound simulations. Here, we instead use a single-box
approach, which involves simply subtracting the mean energy of the
host–guest system in the pulled-free state from that of the
bound complex (Figure , right)where U is defined as the
potential energy without any restraining potentials included. It should
be emphasized that, in these bound and pulled-free states, the restraining
potentials have no influence on the internal coordinates of the molecules.
In the pulled-free state, they only hold the host and guest far apart,
along an axis aligned with the long axis of the rectangular simulation
box. The two states considered here correspond to simulations in the
first (Figure S1A) and last (Figure S1D) simulation windows of the APR binding
free energy approach. As shown in Results,
this single-box method gives excellent agreement with the prior multi-box
approach.When Nb noninterchangeable
binding orientations exist, as for the present βCD systems,
where Nb = 2, the binding enthalpy of
the various orientations must be weighted by their respective free
energies to yield the overall binding enthalpy:Again, this combination step is carried out
using resampling from Gaussian distributions based on the mean and
SEM of each input value.
van’t Hoff Method
This approach
involves using the van’t Hoff equation or related forms to
extract the binding enthalpy and entropy from calculations of the
binding free energy (described above) at multiple temperatures. If
one makes the approximation that enthalpy and entropy are invariant
with temperature, then both values can be obtained by fitting to the
familiar integrated form of the equationvia
linear regression analysis on a plot of
ln(Keq) versus 1/T. Here,
however, we allowed for temperature dependence by explicitly including
a fitted change in the heat capacity, ΔC, using the following equation, where the
thermodynamic parameters were fitted with nonlinear curve fitting
tools:[35]Here
the ref subscripts indicate the reference
temperature for which the parameters are fitted. We used simulations
to compute ΔG(T) at either five or seven different temperatures, T, at uniform temperature intervals
and with a central temperature (i = 3 or i = 4, respectively) of 300 K; chose Tref = 300 K; and used the values of ΔG(T) at all
five or seven temperatures to fit the values of ΔH(Tref), ΔS(Tref), and ΔC(Tref) via nonlinear
fitting. It is worth remarking that this approach yielded a value
of ΔH(Tref) that
was statistically indistinguishable from that obtained by using eq as mentioned above.
On the other hand, a pure finite difference method, in which the entropy
or enthalpy is computed from just two free energy values at nearby
temperatures, results in very large uncertainties, as may be appreciated
by inspection of the data in the top panels of Figure . Thus, numerical uncertainty is substantially
decreased by using free energies computed at multiple temperatures,
as done here.
Figure 7
Demonstration of the temperature dependence of the binding free
energy (top) and the convergence of the van’t Hoff and direct
binding enthalpy calculations at 300 K (bottom). Data is shown for
the CB7-B2Temp (left) and βCD-HexTemp (right)
simulation sets. Error bars are 95% CI.
Quantifying Statistical
Uncertainty
We use numerical molecular simulations to estimate
the mean quantities
in the equations above, so the reported binding enthalpies and free
energies are estimates associated with statistical uncertainties,
in the sense that replicating the simulations with slightly different
starting conditions or random number seeds would yield somewhat different
means, given finite sampling. It is possible to estimate the uncertainty
of the mean without actually running time-consuming replicates so
long as time correlation within the simulation data is accounted for
properly.[56] We investigated two approaches.
First, the blocking method[40] iteratively
averages a time course data series (e.g., the potential energy, in
an enthalpy calculation) into successively larger blocks and calculates
the SEM of the block averages at each block size. The resulting plot
of block size versus SEM will, ideally, display a plateau feature
in the curve. The SEM value of the plateau region is an error estimate
that accounts for correlation in the data. However, it is not straightforward
to automate detection of a plateau, and, in some cases, there is no
clear plateau, as detailed in Section . Therefore, in order to automate the usage
of this method, we conservatively chose to use the maximum SEM value
obtained by the blocking analysis, even if it comes from the noisy
region of very large block sizes at the far right of the graph rather
than from a plateau region. Second, the statistical inefficiency[39] (StIn) method involves estimating the autocorrelation
function of the data series. The effective number of uncorrelated
data points is then found by dividing the total number of correlated
data points by the StIn value. The usual formula for the SEM of an
uncorrelated data series can then be used. Following what has been
presented in the literature, we couple the TI free energy calculations
with the blocking method and the MBAR calculations with the StIn method
(although any combination could be used). A comparison of these approaches
is presented in the Results. Further description
of the error propagation in each value calculated is given in the Appendix.
Restraint Setup
In order to perform
the binding free energy calculations described in this work, restraints
must be used to guide the host and guest apart along a free energy
path. In one case (K-ClSimple), we used a single distance
restraint between the ions, primarily as a reference benchmark for
more complicated but efficient restraint schemes that control the
position and orientation of the pulling axis. We discuss the restraint
setup details for each simulation set (Table ) in the following text.
Orientation
with Anchor Particles
Many of the simulations we present
include 2 or 3 noninteracting
anchor particles, which are positionally restrained to fixed Cartesian
coordinates. The anchor particles have zero charge, zero Lennard-Jones
radius and well-depth, and a mass of 220 Da. To these anchor particles
we attach harmonic distance, angle, and torsion restraints that position
and orient the host and guest. Figures , S1, and S2 diagram a typical anchor particle restraint setup for a
host–guest system and an ion-pair toy model. For the ion-pair
simulations, we use two restraints per ion: a distance restraint and
an angle restraint set to 180°. This combination effectively
locks all three translational degrees of freedom. Six restraints are
required to control the three translational and three rotational degrees
of freedom of the polyatomic hosts. These include one distance restraint,
two angle restraints, and three torsional restraints; similar restraint
sequences have been described previously.[57] It is critical to note that, in order for the binding free energy
results to be valid, these six restraints must be designed so that
they do not perturb the conformational distribution of the host molecule;
see, e.g., Figure S1A. These host restraints
are present during the entire APR process, from the beginning to the
end; thus, they are not imposed during the attach phase or detached
during the release phase. In all cases with anchor particles, except
the βCD-HexHMR simulation set, the host translational
and rotational restraints had the following force constants: distance
restraints used a 10.0 kcal/mol·Å2 force constant,
whereas angle and torsional restraints used 100 kcal/mol·rad2. A weaker distance restraint force constant of 5.0 kcal/mol·Å2 was tested for the βCD-HexHMR simulation
set, whereas the other force constants were identical, but no difference
was observed. The target equilibrium values of these host restraints
were set equal to the initial values of the input structure, which
was manually oriented in UCSF Chimera.[58]
Host Conformational Restraints
In all simulations involving CB7 and βCD, restraints were added
during the attachment phase to adjust and restrain the host conformation
in a manner that would facilitate sampling during the pulling phase.
This approach is conceptually similar to the confine-and-release method
that has been described for alchemical free energy calculations of
protein–ligand complexes.[59] The
CB7 host, which is a very stiff molecule with narrow entry portals,
poses a high energetic barrier to removing guests from its cavity.
We used 14 distance restraints, seven spanning each of the cavity
portals, to enlarge the cavity opening and reduce the energy barrier
for guest removal (Figure S3). The work
of adding these restraints is relatively easy to converge, and opening
the portal in this manner avoids sampling problems that would otherwise
occur during the pulling phase, as the guest pops like a cork from
the restrictive portal.[54,60] In contrast to the
CB7 molecule, βCD is very flexible and can get dragged into
distorted conformational substates for many nanoseconds as it clings
to the guest during the pulling-phase simulations; the associated
fluctuations and conformational trapping can make convergence difficult.
To alleviate this problem, 14 torsional restraints (two per glucopyranoside
monomer) were added to the β-cyclodextrin backbone during the
attachment phase of the free energy calculation. The effect of these
restraints is to maintain the symmetrical, canonical shape of the
βCD molecule. The work of adding these restraints is easy to
converge, as the host’s conformation does not fluctuate much
in the presence of the bound guest. Overall, the present restraint
schemes significantly improve the reproducibility of the calculated
binding free energies for the systems we have tested.Additionally,
the work of releasing such conformational restraints must be included
in the release phase, with the guest now far from the binding site.
We accomplished this by performing the exact same simulations as the
attachment phase except the guest is either held at a constant remote
distance (in the case of the CB7-B2Temp, CB7-B2HMR, and βCD-HexAlt simulation sets) or removed entirely
from the simulation (all other CB7 and βCD simulation sets).
In the latter cases, we added a single simulation window of the guest
in the pulled-free state with host conformational restraints released
(Figure S1D) in order to compute the direct
enthalpy.
Guest Restraints
Fewer restraints
were required to orient the guest molecules on the pulling axis. Control
of the necessary degrees of freedom was achieved with just two or
three restraints for the ion-pair and host–guest systems, respectively,
due to convenient properties of 180° angle restraints. We note
here that it was necessary to include constant flatwell restraints
during the attachment phase for the weaker affinity systems in order
to prevent occasional dissociation of the bound complex. An arbitrary
choice must be made about where to place the boundary of the bound
state in these cases.The details of the guest restraints and
chosen number of windows for each calculation are given in the Supporting Information. Some of the details vary
from one simulation set to another as we optimized our protocol during
the course of this investigation. However, none of the changes were
made due to concerns about the quality of a previously collected simulation
set; rather, we optimized our overly conservative initial approach
for the sake of greater computational efficiency.
Molecular Dynamics Simulation Details
All simulations
were performed using the pmemd, pmemd.MPI, and pmemd.cuda
programs from AMBER 14.[61] Production simulations
were run in the NPT ensemble, with temperature control using a Langevin
thermostat[62] with collision frequency 1.0
ps–1 and pressure control provided by the Monte
Carlo barostat[63] for all simulation sets
except the CB7-B2Temp, which used the Berendsen barostat[64] to check consistency with previous work[20] (note that we have not observed differences
between the two barostats). Direct space nonbonded interactions were
truncated with a 9.0 Å cutoff, whereas long-range electrostatics
were handled with the PME method[65,66] using default
AMBER settings. SHAKE constraints[67,68] were applied
to bonds involving hydrogen, and the simulation time step was set
to 2 fs, with the exception of HMR simulations, which used a 4 fs
time step. Anchor particles, when present, were maintained at the
desired coordinates using 100 kcal/mol·Å2 positional
restraints, except for the CB7-All8 and βCD-HexHMR sets, which used 50 kcal/mol·Å2. The equilibration
process consisted of the following steps: (1) minimization of the
initial system with positional restraints maintained on the anchor
particles, host, and guest, (2) 50 ps of NPT simulation at 50 K with
the same positional restraints as the minimization, (3) 1.0 ns of
NPT simulation with smooth heating from 50 K to the desired final
temperature without positional restraints on the host or guest, and
(4) 2.0 ns of NPT equilibration at the final temperature without positional
restraints. Two strategies were employed to determine the length of
the production simulations. For the CB7-B2Temp set, we
chose a fixed simulation length for each window involved in the APR
path. For all other simulation sets, we chose a variable simulation
length for each window that depended on either reaching a specified
error value in the forces or reaching a maximum simulation time, whichever
came first. The simulation windows for the bound and pulled-free states
were then extended to at least 1 μs when the direct enthalpy
calculation was required. For the HMR simulations, which have previously
been reported,[36−38] AMBER’s parmed.py program was used to edit
the standard parameter/topology file in order to increase hydrogen
masses to 3.024 Da and reduce the mass of neighboring heavy atoms
accordingly. The water molecules were not altered.
Results
Computational Calorimetry
Computational
calorimetry simulations were performed on nine host–guest systems
for which experimental data are available (Table ), using the attach–pull–release
(APR) method for the free energy and the direct, single-box method
for the binding enthalpy. The calculated binding thermodynamics display
a strong tendency to overestimate the measured favorability of the
binding free energy, the binding enthalpy, and the binding entropy
(Table and Figure ). The RMSD values
of these thermodynamic quantities, relative to experiment, are all
greater than 4 kcal/mol, with the largest RMSD, ∼8 kcal/mol,
observed for the binding free energies due to overestimation of both
the enthalpic and entropic contributions. On the other hand, the computed
binding free energy and enthalpy correlate well with experiment, with R2 values of 0.92 ± 0.09 and 0.87 ±
0.17, which suggests that there may be a systematic cause for the
large absolute deviations from experiment.
Table 2
Comparison of Experimental and Calculated
Binding Dataa
ΔGbind
ΔHbind
–TΔSbind
simulation set
host, guest
experiment
calculated
experiment
calculated
experiment
calculated
CB7-All8
CB7,
A1
–14.1 ± 0.2
–23.74 ± 0.29
–19.0 ± 0.5
–24.89 ± 0.43
4.9 ± 0.5
1.15 ± 0.52
CB7-All8
CB7, A2
–19.4 ± 0.1
–27.41 ± 0.27
–19.3 ± 0.5
–22.64 ± 0.60
–0.1 ± 0.6
–4.78 ± 0.66
CB7-All8
CB7, B2
–13.4 ± 0.1
–21.25 ± 0.22
–15.8 ± 0.2
–21.59 ± 0.41
2.4 ± 0.2
0.34 ± 0.47
CB7-All8
CB7, B5
–19.5 ± 0.2
–27.12 ± 0.36
–15.6 ± 0.5
–18.27 ± 0.48
–3.9 ± 0.6
–8.86 ± 0.60
CB7-All8
CB7, B11
–20.6 ± 0.5
–30.41 ± 0.81
–16.3 ± 0.5
–17.71 ± 0.48
–4.3 ± 0.6
–12.70 ± 0.94
CB7-All8
CB7, G8
–6.12 ± 0.12
–13.89 ± 0.29
–8.5 ± 0.6
–6.26 ± 0.48
2.4 ± 0.6
–7.63 ± 0.56
CB7-All8
CB7, G9
–7.43 ± 0.14
–18.49 ± 0.59
–3.8 ± 0.2
–11.68 ± 0.54
–3.6 ± 0.3
–6.81 ± 0.80
CB7-All8
CB7, MVN
–7.08 ± 0.14
–11.07 ± 0.68
–3.2 ± 0.2
–2.34 ± 0.53
–3.9 ± 0.1
–8.73 ± 0.86
βCD-HexTemp
βCD,
Hex
–2.28 ± 0.07
–3.86 ± 0.17
1.31 ± 0.08
–1.13 ± 0.66
–3.59 ± 0.03
–2.73 ± 0.68
RMSD
7.95
± 1.50
4.19 ± 1.55
5.40 ± 1.92
MSE
–7.48
± 1.84
–2.92 ±
2.02
–4.56
±
2.00
MUE
7.48
± 1.84
3.61 ± 1.47
4.75 ± 1.80
R2
0.92
± 0.09
0.87 ± 0.17
0.51 ± 0.49
Experiments were
all ITC measurements.
A1, A2, B2, B5, and B11[27] were at 298.15
K, G8 and G9[41] were at 300 K, and MVN[24] was at 298 K. Data for MVN is an averaged obtained
from the published value and two additional measurements via personal
communication. All simulations were performed at 300 K. The calculated
binding enthalpy was via the direct method. Calculated −TΔS was determined by subtracting
the binding enthalpy from the binding free energy, and the uncertainty
was determined by addition of the respective errors in quadrature.
Uncertainty values for the thermodynamic data given as 95% CI calculated
with the TI/blocking approach. The uncertainty in the error statistics
are also 95% CI, determined by resampling the data with replacement.
Figure 4
Comparison of the experimental
and calculated binding values for
the systems in Table . Error bars are 95% CI.
Comparison of the experimental
and calculated binding values for
the systems in Table . Error bars are 95% CI.Experiments were
all ITC measurements.
A1, A2, B2, B5, and B11[27] were at 298.15
K, G8 and G9[41] were at 300 K, and MVN[24] was at 298 K. Data for MVN is an averaged obtained
from the published value and two additional measurements via personal
communication. All simulations were performed at 300 K. The calculated
binding enthalpy was via the direct method. Calculated −TΔS was determined by subtracting
the binding enthalpy from the binding free energy, and the uncertainty
was determined by addition of the respective errors in quadrature.
Uncertainty values for the thermodynamic data given as 95% CI calculated
with the TI/blocking approach. The uncertainty in the error statistics
are also 95% CI, determined by resampling the data with replacement.The statistical uncertainties
of both the computed and the measured
thermodynamic quantities are on the order of 0.5 kcal/mol (Table ). As a consequence,
the deviations of the calculations from experiment cannot be attributed
to inadequate sampling during the molecular dynamics simulations.
This view is supported by the highly stable convergence graphs shown
in Figure ; indeed,
the binding free energies and enthalpies obtained with 100% of the
simulation data in all cases remain within the confidence intervals
obtained with only 10% of the data. Finally, inspection of the simulation
trajectories suggests that good conformational sampling has been achieved.
This is demonstrated in Figure , where we overlay the time-averaged structure for the bound
state of every host–guest system reported in Table . Due to averaging of the guest’s
orientation within the host, the extensive sampling results in a stick-like
appearance of the guests inside the host; the averaging also gives
the host a regular, symmetric appearance. In the case of CB7 with
the aromatic guests, this differs substantially from the elliptical
distortion characteristic of a typical instantaneous bound conformation.
Likewise, the time-averaged βCD structure displays a uniform
conformation for each glucopyranoside monomer, which contrasts with
a typical instantaneous binding pose in which each monomer flexibly
adopts its own conformation, producing an asymmetrical appearance.
The thorough convergence and small statistical uncertainties of these
results highlight the feasibility of performing numerically precise
computational calorimetry studies on host–guest systems.
Figure 5
Convergence
plot of the binding free energy and binding enthalpy
for the systems in Table . The guests are indicated in the legend at right. CB7 was
the host molecule for all guests except Hex, which bound to βCD.
Convergence is indicated by percentage of total data due to the varying
trajectory length of the simulation windows considered in the ΔGbind or ΔHbind calculations. Error bars are 95% CI.
Figure 6
Time-average structures for the bound state of the host–guest
systems presented in Table . (Left) Superposition of each CB7-All8 host–guest
system. (Right) Superposition of the βCD-HexTemp host–guest
system at 300 K in both the primary and secondary orientations.
Convergence
plot of the binding free energy and binding enthalpy
for the systems in Table . The guests are indicated in the legend at right. CB7 was
the host molecule for all guests except Hex, which bound to βCD.
Convergence is indicated by percentage of total data due to the varying
trajectory length of the simulation windows considered in the ΔGbind or ΔHbind calculations. Error bars are 95% CI.Time-average structures for the bound state of the host–guest
systems presented in Table . (Left) Superposition of each CB7-All8 host–guest
system. (Right) Superposition of the βCD-HexTemp host–guest
system at 300 K in both the primary and secondary orientations.One potential concern with the
present single-box direct approach
to computing binding enthalpies is that the guest in its pulled-free
state might not really be far enough from the host to be considered
noninteracting. We addressed this issue by comparing the present results
with those from our prior multi-box direct enthalpy calculations,[20] in which the noninteracting state is represented
by two separate simulations, one for the host and one for the guest,
so that host–guest interaction is eliminated in the unbound
state. Figure S4 confirms that the present
single-box results are indistinguishable from those of the multi-box
approach and thus supports our choice of separation distance as being
adequate. Additionally, to confirm that the calculated thermodynamic
values did not depend on our specific choice of atoms for the restraint
scheme, we performed an additional set of calculations, labeled βCD-HexAlt, in which an alternate set of atoms was chosen as the restraint
points. The results showed excellent agreement with the βCD-HexTemp values (see Table S2).
Direct and van’t Hoff Binding Enthalpy
The binding
enthalpy can be computed directly, i.e., from potential
energies, or via the van’t Hoff equation, which exploits the
temperature dependence of the binding free energy.[35] Since we can now calculate both binding free energies and
direct binding enthalpies at high precision, we have the opportunity
to test for self-consistency by comparing binding enthalpies obtained
directly with those obtained from binding free energy calculations
at several temperatures. We explored this in the CB7-B2Temp and βCD-HexTemp simulation sets, wherein binding
free energies were calculated at seven and five temperatures, respectively,
centered at 300 K (Figure , top). The consistency between the direct
and van’t Hoff binding enthalpy results, shown in Table , supports the validity
of both the binding free energies and the direct enthalpies. It is
also of interest that convergence plots of the binding enthalpies
from both approaches (Figure , bottom) show that the van’t Hoff approach is much
more sensitive to extending the simulation time and hence is significantly
more difficult to converge than the direct approach. Also, for both
systems tested, the final statistical uncertainty is much larger for
the van’t Hoff approach (Table ) and comes at a much higher computational cost in
terms of simulation time (Tables S1 and S2).
Table 3
Thermodynamic Binding Values for Validation
and Internal Consistencya
ΔGbind at
300 K
direct
ΔHbind at 300 K
van’t Hoff ΔHbind at 300 K
simulation set
host, guest
TI/Block
MBAR/StIn
Block
StIn
TI/Block
MBAR/StIn
CB7-All8
CB7, B2
–21.25 ± 0.22
–21.22 ± 0.26
–21.59 ± 0.41
–21.59 ± 0.36
CB7-B2Temp
CB7, B2
–21.32 ± 0.13
–21.36 ± 0.07
–21.65 ± 0.47
–21.65 ± 0.28
–20.78 ± 1.09
–20.66 ± 0.67
CB7-B2HMR
CB7, B2
–21.27 ± 0.41
–21.33 ± 0.12
–21.78 ± 0.44
–21.78 ± 0.31
βCD-HexTemp
βCD,
Hex
–3.86 ± 0.17
–3.86 ± 0.09
–1.13 ± 0.66
–1.11 ± 0.41
–0.81 ± 1.72
–0.38 ± 0.89
βCD-HexHMR
βCD, Hex
–3.98 ± 0.21
–3.92 ± 0.09
–0.27 ± 0.64
–0.26 ± 0.39
K-ClTemp
K, Cl
–1.94 ± 0.06
–1.94 ± 0.04
3.68 ± 0.28
3.68 ± 0.23
3.35 ± 0.55
3.58 ± 0.42
K-ClSimple
K, Cl
–1.92 ± 0.14
–1.86 ± 0.10
3.45 ± 0.39
3.45 ± 0.35
Uncertainty values are 95% CI.
Demonstration of the temperature dependence of the binding free
energy (top) and the convergence of the van’t Hoff and direct
binding enthalpy calculations at 300 K (bottom). Data is shown for
the CB7-B2Temp (left) and βCD-HexTemp (right)
simulation sets. Error bars are 95% CI.Uncertainty values are 95% CI.Careful inspection of Table shows that the uncertainty values produced
by the blocking
method are consistently larger than those from the StIn method. Although
our automated implementation of the blocking method is expected to
produce slightly larger values than an investigator-performed analysis
in which the SEM would be assigned based on a plateau in the SEM as
a function of block size, the large difference between the two uncertainty
estimates was surprising. This difference could lead to alternate
interpretations of the data. For example, when comparing the direct
and van’t Hoff binding enthalpies of CB7-B2Temp,
a two-sample t-test for equal means (α = 0.05)
would not indicate a significant difference based on the blocking
uncertainty values, but it would indicate a significant difference
when using the StIn uncertainties. This concern led us to develop
a toy model that would allow an even more precise validation of our
computational calorimetry methods (see Section ) and a deeper investigation into the uncertainty
estimation procedures (see Section ).
Modified Ion-Pair Test
Case
Although
the agreement between the direct and van’t Hoff binding enthalpies
for the two host–guest systems was strongly encouraging, we
wanted to test the comparison to even finer precision in order to
detect any potential subtle pathologies. Using a modified K-Cl ion
pair, which was tuned to exhibit favorable binding during simulation,
we performed several further simulation studies to validate the present
approach, as follows.First, we established that the use of
noninteracting anchor particles to align the pulling coordinate did
not influence the results. The K-ClSimple simulation set,
which used a single distance restraint in a large water box, produced
binding values statistically indistinguishable from the anchor particle
aligned K-ClTemp simulation set (see Table for results, and see Figure S2 for the anchor particle approach with ions). Additionally,
comparison between the K-ClSimple and K-ClTemp results suggests that use of a smaller rectangular simulation box
aligned to the pulling axis does not introduce error into the binding
results (Table ).
The binding free energies and direct enthalpies showed excellent agreement
to high precision, with confidence intervals of less than 0.15 and
0.40 kcal/mol, respectively. Our modified K-Cl system could be considered
particularly challenging due to the rather large charges that we placed
on the ions (±1.3). Thus, it is reassuring that the optimized
rectangular box (with alignment via anchor particles) did not incorrectly
treat the electrostatics of the system.Finally, we also observed
excellent agreement between the direct
and van’t Hoff binding enthalpy for this simple test case (Table ). The small uncertainties
in the van’t Hoff binding enthalpies, relative to those for
the full host–guest systems, can be attributed to the better
behaved temperature dependence of the binding free energy (Figure S5, top), and the convergence behavior
of both binding enthalpy methods is correspondingly smooth (Figure S5, bottom).Interestingly, we noticed
that, although the MBAR/StIn uncertainties
are consistently smaller than the TI/blocking uncertainties for the
K-Cl data, the ratio between the two values is smaller than the ratio
observed for the host–guest values (Table ). Although a slight overestimation of the
error is expected by our implementation of the blocking method, the
variability of the ratio between the two uncertainty estimates suggests
the difference might be influenced by the nature of the system being
simulated. Here, the simplicity of the K-Cl systems possibly minimizes
the apparent difference in the uncertainty.Taken together,
the results of the K-Cl simulations strongly suggest
that the theory and implementation of the APR computational calorimetry
framework is valid and numerically sound. As a consequence, it seems
likely that the modest deviations between the direct and van’t
Hoff binding enthalpies for the host–guest systems arise from
statistical, rather than methodological, error. Curiously, however,
there is still a systematic tendency for the blocking estimate of
error to exceed the StIn estimate. We investigate this further in
a subsequent section (see Section ).
Accelerating Computational
Calorimetry with
Hydrogen Mass Repartitioning
The HMR technique is appealing
because it allows one to double the simulation time for a given elapsed
wall-clock time, by going from a 2 fs to a 4 fs MD time step without
inducing numerical instabilities.[36] It
should be noted that, because we are working within the classical
approximation of statistical thermodynamics, the quantities that we
aim to compute are independent of the atomic masses. Therefore, any
error introduced by HMR would derive from changes in the numerical
properties of the simulations. In particular, increasing the time
step can increase error by increasing the mean spatial excursion the
atoms make in one time step; conversely, increasing the masses of
the fastest atoms, i.e., the hydrogens, reduces their mean velocities
and hence their mean excursions, for a given temperature, and thus
compensates, at least partially, for the increased time step. The
net effect could be to influence the values that the computed quantities
approach, in the limit of a long simulation. In addition, changing
the masses and time step influences the dynamics of the simulations[38] and hence can influence the rate of convergence.
The net effect of HMR on accuracy and precision is hard to predict.Therefore, we carried out an empirical evaluation of the feasibility
of speeding thermodynamic calculations, while maintaining reliable
results, by performing several binding free energy and binding enthalpy
calculations with HMR and a 4 fs time step. The results for the CB7-B2HMR and βCD-HexHMR simulations sets, given
in Table , indicate
generally excellent agreement between the standard approach and HMR.
(We did not carry out van’t Hoff binding enthalpy measurements,
given the excellent agreement between HMR and the traditional approach
for the free energy calculations at 300 K.) The one potential outlier
is the comparison of the direct binding enthalpy for βCD-Hex.
Although a t-test (α = 0.05) would not indicate
a significant difference between the binding enthalpy means for βCD-HexTemp and βCD-HexHMR when using blocking uncertainties,
the StIn uncertainties are smaller and thus indicate that the difference
in the means is potentially significant. In the following section,
we investigate this difference further, show that the difference is
due to random statistical chance, and offer evidence that the StIn
uncertainty estimate can sometimes be deceptively small.As
noted by others,[36] the speedup from
HMR does not always double the effective sampling rate due to alterations
in the viscosity of the system. Possible evidence of this effect can
be found in our data. For example, with a traditional 2 fs time step,
the blocking uncertainty values in the βCD-HexTemp direct binding enthalpy for the primary and secondary guest orientations
are 0.56 and 0.52 kcal/mol, respectively (Table S2). With HMR and the same simulation time (ns), and double
the number of data points due to reducing the interval between data
writing by one-half, the blocking uncertainty increases to 0.60 and
0.64 kcal/mol. (Note, however, that the real time expenditure of GPU
hours for HMR was roughly 60% of the traditional approach.) This suggests
that HMR may influence the effective sampling rate to some degree,
but the result could be due to random chance and thus requires further
study to make a definitive conclusion. The same trend is not always
evident for similar comparisons elsewhere in our data, possibly because
the effect is clouded by our evolving methodology for performing the
calculations.Overall, HMR does seem to offer an effective increase
in sampling
rate without impacting the fidelity of the calculations, but the effective
speed up is not always straightforward to determine.
Comparison of Uncertainty Measures
As discussed in
prior subsections, the two approaches we employ to
estimate uncertainty, blocking analysis and StIn, can give very similar
estimates for the SEM of a time-averaged quantity obtained from MD
simulations, especially for simple systems, such as our K-Cl model
(Figure , top). However,
we noticed that the two approaches can disagree substantially for
more complicated systems, such as a host–guest complex (Figure , bottom). In the
latter case, it appears that the StIn approach may register relatively
fast fluctuations while missing the consequences of slower processes,
leading to underestimates of the SEM.
Figure 8
Typical blocking curves for the potential
energy of the bound state
for a K-ClTemp simulation (top) and a βCD-HexTemp simulation (bottom). Error bars are an estimate of the
uncertainty in the SEM; see Flyvbjerg et al.[40] for further details. The SEM predicted by the statistical inefficiency
method (StIn) is indicated with a red line.
Typical blocking curves for the potential
energy of the bound state
for a K-ClTemp simulation (top) and a βCD-HexTemp simulation (bottom). Error bars are an estimate of the
uncertainty in the SEM; see Flyvbjerg et al.[40] for further details. The SEM predicted by the statistical inefficiency
method (StIn) is indicated with a red line.In the present context, it was of particular concern that
comparison
of the direct binding enthalpy values obtained from the βCD-HexTemp and βCD-HexHMR simulation sets (Table ) shows a discrepancy
in the uncertainty values, which could impact the conclusions drawn
about the difference in the means, as noted above. We traced the primary
difference between the two binding enthalpy values to a single source:
the average potential energy value of the bound state in the secondary
orientation. There were no structural indications to suggest that
the sampling had been biased, so we reran one replicate of this particular
simulation for each simulation set. The mean of the βCD-HexHMR replicate was less than 0.1 kcal/mol from the original
value, whereas the βCD-HexTemp replicate differed
from the original by 0.6 kcal/mol, and four additional βCD-HexTemp replicates differed from the original by 0.3–0.7
kcal/mol. Thus, it appeared that the original βCD-HexTemp mean value was simply a low probability, but reasonable, value from
the tail of the expected normal distribution of mean values. Using
the βCD-HexTemp replicates, we compared the observed
SEM value with the average blocking SEM value and average StIn SEM
value. The observed SEM for the six calculations (i.e., the standard
deviation of the means for the one original value and five additional
replicates) was 0.24 kcal/mol, the average blocking SEM was 0.20 kcal/mol,
and the average StIn SEM was 0.14 kcal/mol. The results show closer
agreement between the blocking SEM and the observed SEM and suggest
that there is no statistical difference between the βCD-HexTemp and βCD-HexHMR direct binding enthalpy
values (according to the t-test), whereas the StIn
uncertainty values would indicate a statistical difference.In order to test more systematically whether the StIn method underestimates
the uncertainty in some cases, we developed a Python script (Figure S6) that generates artificial simulation
data containing a short-correlation/large-variance process, mixed
with several long-correlation/small-variance processes. The resulting
time-series data (Figure S7, top) yield
blocking analysis graphs (Figure S7, bottom)
similar to those from real host–guest data (Figure , bottom). By using this method
to generate 500 distinct, artificial simulation data sets, we were
able to calculate the actual SEM as a function of
simulation time and use this to test the ability of the blocking and
StIn methods to estimate this observed SEM. In order to generate a
summary comparison, we averaged the results from blocking analysis
and StIn over the same 500 artificial time series (Figure ). Early in these runs (<10%
of the simulation length), both SEM estimators underestimate the observed
SEM. However, for the remainder of the data collection, the blocking
method closely tracks (and slightly overestimates by 10–15%)
the observed SEM, whereas the StIn method underestimates the SEM by
roughly 50%.
Figure 9
Comparison of the observed SEM (black) with the average
StIn SEM
(red) and average blocking SEM (blue) for a data set consisting of
500 artificial simulations, each containing 500 000 correlated
data points. The artificial simulations contain similar correlation
patterns to those observed for real host–guest data (compare Figure , bottom, with Figure S7).
Comparison of the observed SEM (black) with the average
StIn SEM
(red) and average blocking SEM (blue) for a data set consisting of
500 artificial simulations, each containing 500 000 correlated
data points. The artificial simulations contain similar correlation
patterns to those observed for real host–guest data (compare Figure , bottom, with Figure S7).A possible explanation for the underestimate by the StIn
method
can be traced to its approach for computing the integrated correlation
time. As detailed previously,[39] the tail
of the correlation function is generally dominated by noise, which,
if included in the integration, tends to make the computed correlation
time inaccurate. Following prior practice,[39] we addressed this issue by truncating the correlation function when
it first crosses zero. To check whether this approach misses useful
long-range correlation signals, we performed several tests on the
artificial data sets in which the correlation function was not truncated.
In each case, however, the estimated uncertainty was even smaller
than that obtained with the truncation approach, which itself underestimated
the observed uncertainty. Therefore, regardless of whether or not
truncation is used, the StIn method appears to be unreliable for the
data sets that we studied in this work. We suspect that better reliability
of the blocking method can be attributed to two features: (1) the
blocking method seems to be sensitive to long-correlation/small-variance
signals even when they are obscured by short-correlation/large-variance
signals, whereas the StIn method seems to recognize only the latter
in the same situation; (2) our automated implementation does not search
for a plateau, as in the canonical approach, but reports the largest
estimated SEM from the entire blocking curve. The second feature is
likely responsible for occasional overestimation of the SEM, but it
seems to be the simplest and most conservative method for automation.
Discussion
Examples
of binding affinity calculations are numerous in the literature since
they are typically the most sought after value in lead optimization
studies. Our computational calorimetry approach provides additional
insight simply by extending the end point simulation windows of the
APR binding free energy calculation and extracting the binding enthalpy
from the difference in mean potential energies. As detailed previously,[20] binding enthalpies obtained by the direct method
may also be easily decomposed to provide information on the force
field terms and/or structural elements that drive binding. As a larger
set of host–guest analyses is collected, a more detailed understanding
of the role each chemical moiety plays can be established.Comparison
of the calculated and experimental binding free energies, enthalpies,
and entropies shows good correlations but large systematic deviations.
Given the correlations, one might wonder whether the systematic deviations
should be regarded as problematic since, at least in the field of
computational ligand design, predicting a correct ranking of ligand
affinities is nearly as useful as predicting absolute affinities.
However, it is difficult to escape the conclusion that errors of up
to ∼10 kcal/mol point to significant problems with the force
field that require corrective action, especially given the high precision
of the results. Moreover, the types of errors observed here could
generate serious practical problems in the more complex setting of
protein–ligand binding. In particular, there is no reason to
expect that chemically varied ligand moieties interacting with varied
subpockets of a protein binding site will be uniformly scaled by whatever
factor is generating the systematic bias seen here. As a consequence,
there is no assurance of obtaining a good correlation with experiment
in such complex systems. Moreover, despite their simplicity, the host–guest
systems studied in this work share key attributes of protein–ligand
systems, including deep binding cavities with the potential to generate
structured water, steric barriers to entry and exit, conformational
fluctuations with long time scale correlations, and multiple binding
poses. We would argue, therefore, that careful force field optimization
to improve the agreement of absolute binding thermodynamics for host–guest
systems, in combination with more traditional target data, is a useful
path to improve the reliability of protein–ligand modeling
for drug discovery.
Self-consistent Calculations
of Binding Thermodynamics
The demonstration of good agreement
between the direct and van’t
Hoff binding enthalpies for host–guest systems, and of even
greater precision for the ion model systems, offers more than a consistency
check. Given that robust agreement between the direct and van’t
Hoff methods has seemed to be elusive for not only computational investigations[18] but also experimental studies,[69] there is value in a clear demonstration that rigorous consistency
can, in fact, be achieved. Second, consistency between the direct
and van’t Hoff binding enthalpy calculations arguably provides
strong support for the validity of the binding free energy method
employed. This suggests that, as an alternative to assessing error
by performing multiple free energy simulations which close a thermodynamic
cycle, as is frequently done for alchemical transformations,[70] one could instead compare direct and van’t
Hoff binding enthalpies along a single path. Whether this approach
is advantageous will depend on the specifics of the system and the
free energy calculation.The present study also sheds light
on the relative merits of the direct versus van’t Hoff binding
enthalpy methods. Our data clearly indicate that, for small host–guest
systems, the direct binding enthalpy requires less net simulation
time to reach a given level of statistical uncertainty. In addition,
the direct approach is far simpler to implement than the van’t
Hoff approach, at least when the system does not have multiple noninterconverting
configurations. The van’t Hoff approach may still be numerically
favorable for larger systems, where the potential energy fluctuations
are so large that reducing the uncertainty of the mean to a useful
level requires extremely long simulations. On the other hand, the
challenge of achieving binding free energies that are sufficiently
tightly converged to yield meaningful temperature derivatives is not
trivial; and the mere technical simplicity of the direct enthalpy
approach, which does not require a pathway or lambda windows, is itself
a powerful argument in its favor.
Automation,
Sampling, and Timing
With over 438 μs of total sampling
across 4434 independent
simulations, automation in nearly every aspect of this project was
necessary. This included generating the initial coordinates for each
simulation along the APR path, solvating the systems with precisely
the desired number of water molecules (a feature that is not currently
available in AMBER’s LEaP program), determining the correct
target values for restraints, and writing all of the input files necessary
to perform the simulations. Additionally, our heterogeneous set of
computational resources (desktop machines, local clusters, and national
clusters) motivated us to develop execution/queuing scripts that simultaneously
optimized usage of each resource while maintaining compatibility with
the generalized input scripts. Of particular importance was implementation
of an automated scheme to determine whether an individual free energy
simulation window had reached a predetermined uncertainty threshold
in the force uncertainty. Our initial simulation set (CB7-B2Temp) did not use this feature and, because all windows used the same
amount of sampling time, likely oversampled many of the easy-to-converge
windows near the pulled-free state. Implementation of our adaptive
scheme, which involved iteratively running a short 5 ns simulation
followed by checking the uncertainty value of the mean force, improved
the sampling efficiency of our other simulation sets. We also note
here that the free energy simulation times listed in Tables S1–S3 all include the entire direct binding
enthalpy simulations in which the two end point simulations were extended
to at least 1 μs each. If one were interested in only computing
the binding free energy, then this extra sampling at the end points
would not be required, so the aggregate simulation time for a free
energy calculation would be lower. Nonetheless, in the cases presented
here, removing the additional sampling from the binding free energy
simulations would not tilt the balance in favor of the van’t
Hoff binding enthalpy approach. For example, Table shows that the extended end point sampling
in the CB7-B2Temp simulation set contributed only, at most,
2.7% of the total sampling time for the van’t Hoff calculations.This study was feasible in large part to the availability of GPUs,
both the commodity and research-grade varieties. Typical simulation
speeds for the systems studied here ranged between 150 and 200 ns/day
without HMR and up to 350 ns/day with HMR, depending on which generation
of hardware was used. With a small cluster of GPUs, both binding enthalpy
and free energy values can be computed within a matter of days to
highly precise values, with 95% confidence intervals in the range
of 0.5 kcal/mol.As noted in the Methods section, each host
studied here posed a distinct challenge to the calculation of well-converged
binding free energies. For rigid CB7, the constrictive portal can
generate a cork popping problem, whereas flexible βCD can catch
on the exiting guest and be pulled into long-lasting distorted conformations.
We solved these problems by imposing restraints during the attach
phase and lifting them during the release phase, but a variety of
enhanced sampling technologies[71−84] might also have been brought to bear. Indeed, given broad current
interest in binding free energy calculations, the computational tractability
of these host–guest systems, and the fact that each host examined
here exemplifies a different type of sampling challenges, these could
be valuable test cases for testing and improving enhanced sampling
algorithms.
Directions
Broader
use of the methods
laid out here should be facilitated by the fact that our implementation
of the APR binding free energy method, and the associated direct calculation
of binding enthalpy, is naturally suited to run with nearly all simulation
packages that support restraints. No specialized code is required.
Automation of the simulation setup still remains a challenge, as it
is difficult to foresee a generalized approach for determining the
ideal pulling pathway and implementing it with an appropriate restraint
setup. However, in most cases, automation for a given host or protein
will require setup only for an initial guest or ligand, which can
then be used as a template for subsequent compounds.As noted
above, we expect a primary application of computational calorimetry
to be force field validation and development. In anticipation of this,
we are currently generating a larger data set of calculated host–guest
binding values, with multiple force field and water model variants.
We anticipate that this approach will be informative since most force
fields are not optimized against experimental binding data, even though
the calculation of binding affinities, in the context of computer-aided
drug design, is one of their most important and prevalent uses. Moreover,
the fact that a relatively advanced force field can yield serious
errors in binding thermodynamics, as clearly demonstrated here, means
that the experimental data traditionally used to adjust force field
parameters, such as the hydration free energies of small molecules
and the properties of neat liquids, do not suffice when one’s
goal is to compute the thermodynamics of noncovalent association.
We anticipate that the use of binding data as a force field optimization
target, in combination with more traditional targets such as ab initio quantum data, neat liquid properties, and hydration
free energies, will be a powerful new tool to improved simulation
accuracy.
Authors: Andrew D Scott; Chris Phillips; Alexander Alex; Maria Flocco; Andrew Bent; Amy Randall; Ronan O'Brien; Luminita Damian; Lyn H Jones Journal: ChemMedChem Date: 2009-12 Impact factor: 3.466
Authors: John E DeLorbe; John H Clements; Martin G Teresk; Aaron P Benfield; Hilary R Plake; Laura E Millspaugh; Stephen F Martin Journal: J Am Chem Soc Date: 2009-11-25 Impact factor: 15.419
Authors: Andrea Rizzi; Travis Jensen; David R Slochower; Matteo Aldeghi; Vytautas Gapsys; Dimitris Ntekoumes; Stefano Bosisio; Michail Papadourakis; Niel M Henriksen; Bert L de Groot; Zoe Cournia; Alex Dickson; Julien Michel; Michael K Gilson; Michael R Shirts; David L Mobley; John D Chodera Journal: J Comput Aided Mol Des Date: 2020-01-27 Impact factor: 3.686
Authors: Kin Sing Stephen Lee; Niel M Henriksen; Connie J Ng; Jun Yang; Weitao Jia; Christophe Morisseau; Armann Andaya; Michael K Gilson; Bruce D Hammock Journal: Arch Biochem Biophys Date: 2016-10-29 Impact factor: 4.013
Authors: David L Mobley; Caitlin C Bannan; Andrea Rizzi; Christopher I Bayly; John D Chodera; Victoria T Lim; Nathan M Lim; Kyle A Beauchamp; David R Slochower; Michael R Shirts; Michael K Gilson; Peter K Eastman Journal: J Chem Theory Comput Date: 2018-10-30 Impact factor: 6.006
Authors: Ido Y Ben-Shalom; Zhixiong Lin; Brian K Radak; Charles Lin; Woody Sherman; Michael K Gilson Journal: J Chem Theory Comput Date: 2020-11-18 Impact factor: 6.006