The ability of grand canonical Monte Carlo (GCMC) to create and annihilate molecules in a given region greatly aids the identification of water sites and water binding free energies in protein cavities. However, acceptance rates without the application of biased moves can be low, resulting in large variations in the observed water occupancies. Here, we show that replica-exchange of the chemical potential significantly reduces the variance of the GCMC data. This improvement comes at a negligible increase in computational expense when simulations comprise of runs at different chemical potentials. Replica-exchange GCMC is also found to substantially increase the precision of water binding free energies as calculated with grand canonical integration, which has allowed us to address a missing standard state correction.
The ability of grand canonical Monte Carlo (GCMC) to create and annihilate molecules in a given region greatly aids the identification of water sites and water binding free energies in protein cavities. However, acceptance rates without the application of biased moves can be low, resulting in large variations in the observed water occupancies. Here, we show that replica-exchange of the chemical potential significantly reduces the variance of the GCMC data. This improvement comes at a negligible increase in computational expense when simulations comprise of runs at different chemical potentials. Replica-exchange GCMC is also found to substantially increase the precision of water binding free energies as calculated with grand canonical integration, which has allowed us to address a missing standard state correction.
Aided by community
efforts such as the Drug Design Data Resource
(D3R) and Statistical Assessment of the Modeling of Proteins and Ligands
(SAMPL) challenges,[1−3] rigorous binding free energy calculations are becoming
increasingly feasible and reliable for use in drug discovery projects.[4] Recent studies using these methods have reported
promising results in predicting and explaining the selectivity of
small molecule inhibitors to protein targets.[5,6] In
addition to predicting the binding affinity of small molecules to
proteins, free energy calculations can also help validate hypotheses
in structure-based drug design. In particular, medicinal chemists
often seek to exploit the existence of water molecules in binding
sites by designing compounds that will displace water upon binding.[7,8] While there are a number of cases where the targeted displacement
of water molecules has resulted in more potent and/or specific compounds,[9−11] there are also examples where new compounds have failed to displace
a water molecule,[12,13] or in doing so have reduced the
affinity of the compound.[14,15] Binding free energy
calculations using all-atom simulations have the potential to help
medicinal chemists in this endeavor, as the calculated binding free
energies for water at particular sites have been shown to be indicative
of how “displaceable” a water molecule is.[16]One of the most popular ways to calculate
the absolute binding
free energy of water is via double decoupling (DD).[17−20] This method uses two sets of
simulations that gradually reduce the interaction energy between a
chosen water molecule and the rest of the system over a series of
nonphysical, alchemical, intermediate states. One
simulation alchemically decouples a water molecule from a particular
location in a protein and the other decouples a water molecule from
bulk solvent. When performing DD calculations on water molecules,
the bulk solvent calculation need only be performed once for a given
water model and set of simulation parameters (such as the length of
the nonbonded cutoff). The decoupling calculations in the protein
require the careful application of constraints and/or restraints to
keep the water molecule in question bound to a particular location
and to prevent other water molecules drifting into the site after
decoupling. Applying DD calculations to a bonded network of buried
water molecules would require not only cumbersome constraints but
also a separate decoupling simulation for each water in the network.
The water molecules should be decoupled in the order of weakest to
tightest bound, but establishing the correct order will require multiple
simulations, and the choice of pathway may increase the variance of
the results.Using binding free energy calculations of water
to drive rational
drug design is predicated on knowing the probable locations of water
within binding sites. While X-ray crystallography is the most well
used method of structure determination, locating water sites within
X-ray structures is fraught with difficulty. For instance, water molecules
may be artifactual,[21] may not appear at
a given level of resolution,[22−24] or may require special techniques
to find them.[25] There can even be poor
agreement between high-resolution structures, with studies finding
that an observed water has approximately only a 50% chance of being
within 1 Å of an independently resolved structure of the same
protein.[26,27] Molecular simulations with explicit water
offer the possibility of predicting the locations of water in protein
cavities that are consistent with the force field used in the free
energy calculation. However, despite the ever-increasing time scales
accessible by molecular dynamics (MD) simulations—brought about
by specialized processors and GPU accelerated code[28,29]—water molecules that are buried within proteins are systematically
under sampled in MD. Nuclear magnetic resonance studies have indicated
that the residence times of water in cavities that are inaccessible
to bulk water range from tens to hundreds of microseconds,[30,31] which is beyond the time scales typically explored by MD studies.We recently reported theoretical and methodological improvements
to the sampling technique known as grand canonical Monte Carlo (GCMC)
applied to water.[32] In essence, GCMC can
be considered as an enhanced sampling method, as molecules (in this
case water molecules) can be created and annihilated within a given
cavity, circumventing the kinetic barriers that would be encountered
in MD and, thus, aiding the determination of water sites. The recent
improvements to GCMC allowed for the determination of the absolute
binding free energy of highly coupled networks of water as well as
the relative stabilities of individual water molecules, all in a single
set of simulations. Key to the latest developments in GCMC was the
exploitation of simulations performed at a range of different chemical
potentials as part of a technique we refer to as grand canonical integration
(GCI). When applied to calculating the binding affinity of networks
of water molecules, GCI was both easier to execute and computationally
less intensive than sequential alchemical DD calculations. However,
the statistical uncertainty with GCI was found to be significantly
larger than for DD.Although GCMC sampling of water is orders
of magnitude more time
efficient than MD in sampling buried sites, GCMC suffers from its
own sampling difficulties that affect the precision of the free energy
calculations. In GCMC, Metropolis–Hastings acceptance rates
for insertions and deletions are low, often <1%.[33] This is due to the possibility of steric clashes when inserting
molecules into dense systems and large energy deficits when removing
waters from systems. An early solution to this issue is the cavity-bias
algorithm by Mezei,[33] where before every
insertion, a random set of points is uniformly sampled within the
GCMC simulation region, and an insertion is attempted on a randomly
selected point that does not clash with the system. This idea was
later developed by Woo et al.,[34] who used
a dynamically updated grid that kept track of the free space. Orientational
biases have also been applied to improve insertion rates.[34,35] A recurring issue with such Monte Carlo biasing schemes is whether
the increased time of computing and applying a bias does not nullify
the improvement in sampling efficiency.Based on the benefits
in sampling seen with temperature,[36] Hamiltonian,[37−39] and constant pH replica-exchange
methods,[40] here we apply replica-exchange
(RE) to the chemical potential in GCMC simulations. While biasing
schemes can suffer from increased computing time as a payoff for their
enhanced sampling, the addition of RE to the GCI protocol achieves
enhanced sampling with a negligible effect on the run-time. The variance
of these calculated free energies significantly decreased with RE,
which allowed for comparisons between GCI and double decoupling calculations
at a higher level of precision than was previously possible. This
facilitated a detailed examination of the GCI binding free energy
equation and prompted the development of an improved equation that,
unlike before, computes standard state binding free energies. Therefore,
the GCMC-based method reported here is more accurate and precise than
the previous,[32] with a statistical uncertainty
that is comparable to double decoupling. Thus, absolute binding free
energy calculations with GCI may be an “all in one”
solution for water-focused medicinal computational chemistry, as a
single set of simulations can determine a large number of properties
of the water molecules, pertinently indicating which of the waters,
or sets of waters, are most displaceable. GCMC is able to determine
locations of water molecules, binding free energies of networks, and
cooperativity in a single simulation.
Theory
Grand Canonical Monte Carlo
GCMC is the technique that
allows for the creation and annihilation of molecules within a simulation.
By construction, simulation statistics in GCMC are consistent with
the grand canonical ensemble, where it is imagined that the system
of interest can exchange molecules with a reservoir. The reservoir exists in a predetermined thermodynamic state that
is specified by its temperature and chemical potential, which accounts
for the average density and interaction energy of the molecular species
in the reservoir. This work is concerned with performing GCMC on water
molecules within protein cavities, such that the natural choice of
the reservoir is bulk water at room temperature.In Adams’s
formulation of GCMC,[41] the probability
to insert one molecule into the simulated system from the reservoir
is given byand the probability to remove one
molecule
from the system and add it to the reservoir is given bywhere N is the instantaneous
number of molecules of the chemical species in the system, ΔE is the change in potential energy of inserting or removing
the water molecule, β is the inverse temperature, and B is the applied Adams parameter,[41] a term that accounts for the chemical potential of the reservoir
and the volume of the simulated system. Originally defined by Adams
in terms of the excess chemical potential and average occupancy of
the system,[41]B is related
to the applied chemical potential, μ, and volume of the GCMC
region, V, viawhere Λ3 is the thermal wavelength
of the GCMC molecule.
Replica-Exchange GCMC
Guarnieri
and Mezei pioneered
the technique in which many independent GCMC simulations of water
are performed at a range of different B values—equivalently,
different chemical potentials.[42] By viewing
how and where the occupancy of water changed as a function of B, they obtained a semiquantitative map of the regions of
high water affinity. By analogy with ligand-protein binding assays,
we use the term “titration” to refer to a set of GCMC
simulations of the same system at different B values.
This titration technique was extended in our previous study where
the average number of water molecules as a function of B could be integrated to predict the binding free energies of water
networks.[32] This method is referred to
as GCI and is discussed in more detail below. Given the utility of
performing GCMC simulations at different chemical potentials, and
the sampling improvements gained by Hamiltonian RE in alchemical free
energy calculations,[37,38] RE of the B values
in GCMC simulations has the potential to improve the accuracy of the
free energies calculated with GCI.When running concurrent GCMC
simulations at a range of chemical potentials, the probability of
accepting a Metropolis–Hastings move that swaps the B values of the ith and jth simulations is given bywhere N and N are the
number of water molecules in the ith and jth replicas. This equation is similar to the Metropolis–Hastings
criteria in constant-pH RE[40] as well as
parallel tempering and Hamiltonian RE[36] and is derived in the Supporting Information (SI) by constructing an expanded ensemble that allows for chemical
potentials to vary. One would expect RE of the chemical potentials
to enhance the sampling in GCMC titrations, because simulations at
high chemical potentials, where insertions are more likely, can mix
with simulations at low chemical potentials, where deletions are more
likely.
Binding Free Energies with Grand Canonical Integration
GCI is a technique that integrates over the average number of water
molecules as a function of the B value to predict
the relative binding free energies of water molecules.[32] If at the lowest B value there
are no waters bound, then GCI can predict absolute binding free energies.
When applied to calculating the binding free energy of water networks,
the GCI method is more computationally efficient and easier to implement
than traditional double-decoupling techniques,[32] which require multiple simulations for each water molecule
in the network as well as the careful application of restraints or
constraints.Given that GCI uses the information from simulations
run at different B values, one can expect the statistical
uncertainty of the calculated free energies to improve with RE. Indeed,
as shown in the Results section and SI, the gain in precision afforded by RE revealed
that binding free energies calculated with the method described previously[32] were dependent on the volume of the GCMC region
and had statistically resolvable systematic differences with free
energies calculated by double decoupling. This prompted a re-examination
of the equation used to calculate binding free energies with GCI.
While the original GCI equation—for the free energy to transfer
water from ideal gas to the system of interest at a fixed volume—is
correct, the standard state corrections were missing for the binding
free energy. Derived in the SI, the equation
to calculate relative standard state binding free energies between
an initial average number of waters N and final average number N is given
bywhere B is the Adams
parameter for which there are an average of N waters, Vsys is the volume of the GCMC region, Vo is the standard state volume of bulk water, and μsol′ is the
excess chemical potential of the water in bulk water. For practical
purposes, μsol′ is the hydration free energy of the simulated water
model.In contrast to the previous binding energy equation,
this has a
term for the volume of the GCMC region, and no longer depends on the
factorial of N and N. As shown in the SI, these changes come from removing the implicit
assumption in the previous derivation that both the free energies
of the simulated system and reservoir had the same contribution from
the ideal gas.In addition to the theoretical derivation, the
effect of the volume
term of the GCI equation on the calculated binding free energies has
been empirically tested using a very well converged, idealized protein-ligand
system based on scytalone dehydratase. The binding free energies of
water molecules should be independent of the size of the GCMC subvolume,
and these results demonstrate this. These results are available in
the SI and confirm that eq is the correct form of the GCI
equation.The new GCI equation is also consistent with the following
equilibrium
condition for water binding:where μsys is the chemical
potential for water in the simulated system, and μsol is the chemical potential for water in bulk solvent. This relation
is also derived in the SI. Owing to the
previous implicit assumption about the ideal gas contribution to the
free energies, this was previously erroneously stated as the equality
of excess chemical potentials.[32]Eq makes
it straightforward to set the B value for a GCMC
simulation where water molecules in the system of interest are in
equilibrium with bulk water:Given
that the hydration free energy and density of bulk water
only need to be calculated once per water model, one can set the Adams
parameter, or chemical potential, prior to running a GCMC simulation,
where equilibrium with bulk water is desired.
Methods
Simulations were performed to (1) assess whether RE of B values
in GCMC titration simulations (RE-GCMC) improves the consistency and
accuracy of repeated GCMC simulations and free energies calculated
with GCI and (2) determine the degree of precision with which the
new GCMC methodology agrees with double decoupling in biomolecular
systems. The protein system bovine pancreatic trypsin inhibitor (BPTI)
was used to address these aims, chosen due to its prior use as a model
system.[43]BPTI was used to test the
ability of RE-GCMC to improve the sampling
of GCMC titration simulations and to compare free energies calculated
with GCI and RE to estimates from double decoupling. All simulations
focused on a small cavity in BPTI that binds to three water molecules.
In our previous investigations,[32] this
system was found to be the worst sampled of all 10 systems studied
and thus will most clearly highlight any improvements over the previous
methodology. The simulated structure was taken from protein data bank
(PDB) entry 5PTI and was prepared as previously described.[32] The protein backbone and side chain atoms were sampled over the
angles and dihedrals. Backbone moves within ProtoMS are performed
by rigid-body translations and rotations, centered on the intersection
of the Cα–N, and C=O bond vectors. The structure
was solvated inside a half-harmonically restrained, spherical droplet
of water, with a radius chosen such that the edge of the droplet was
approximately 15 Å away from the surface of the protein.All simulations were performed with the freely available Monte
Carlo simulation package ProtoMS, version 3.3.[44] The TIP4P water model[45] and
the AMBER14SB force-field[46] were used.
Nonbonded interactions were truncated, using a residue-based switching
cutoff, at 10 Å.
GCMC
In ProtoMS, GCMC insertion
and deletion moves
are only made inside a user-defined, cuboidal subvolume of the system.
Once inserted, water molecules can translate and rotate. Insertion,
deletion, and translation/rotation moves for GCMC are attempted with
equal probability. To keep track of the number of waters inside the
GCMC subvolume, a hard-wall constraint is applied to inserted water
molecules to prevent them from drifting away, and bulk solvent water
molecules are not allowed to enter from outside the subvolume.Water molecules were inserted and deleted within the box over the
three-water cavity shown in Figure . The box had dimensions 5.0 × 4.0 × 8.0
Å3 and an origin of (29.0, 5.0, −2.0) Å
in the reference frame of the PDB structure. Titration simulations
were run with integer B values between −31.0
and 0.0, inclusive. For equilibration, 1 million (M) GCMC-only moves
were performed with the rest of the system fixed, followed by 1 M
moves of GCMC and protein and solvent configuration sampling, where
each repeat was equilibrated independently. The production simulations
comprised 10 repeats of 100 M moves. When fully sampling the whole
system, protein, bulk solvent, and grand canonical insertion, deletion,
and translation/rotation sampling were trialled with the ratios 461:39:167:167:167,
respectively.
Figure 1
Cavity in the BPTI protein, with waters A, B, and C bound.
The
surface of the protein is shown with a transparent gray surface. The
GCMC box covering the water sites is delineated with black lines.
Cavity in the BPTI protein, with waters A, B, and C bound.
The
surface of the protein is shown with a transparent gray surface. The
GCMC box covering the water sites is delineated with black lines.To assess the impact of RE on
the consistency of the GCMC results,
simulations were performed without RE and with RE with B value exchanges attempted with randomly selected nearest neighbors
every 100,000, 200,000, 500,000, and 1 M MC moves.
Double Decoupling
For each water location found with
GCMC, decoupling simulations were performed to determine the binding
free energy of each water. Decoupling was performed over 16 alchemical
λ states, where the Lennard-Jones and Coulombic terms were scaled
simultaneously. Moves were split between protein, bulk water, and
decoupled water at a ratio of 402:98:1 respectively. The water molecules
were decoupled sequentially, from weakest to strongest bound. Where
the free energies of multiple waters are similar, calculations were
repeated with a different order of decouplings. 500,000 equilibration
and 40 M production moves were performed for each water at each λ
value. Each simulation was repeated four times. Soft-cores (soft66
in ProtoMS package)[47−49] were used for decoupling calculation with δ
= 0.2 and δ = 2.0. The free energy
to decouple the water from the system was determined using MBAR.[50]A harmonic restraint with a force constant
of 2 kcal mol–1 Å–2 was used
on the oxygen of the water being decoupled at all λ values.
A gas phase correction ofwherewas applied
to account for the removal of
the restraint from the decoupled system.[51] This is analogous to the volume term introduced in the GCI equation, eq . Prompted by the higher
precision obtained in RE-GCMC and unlike our previous study,[32] the free energy penalty of applying the harmonic
restraint in the bound simulation was calculated using Bennett’s
acceptance ratio method from 40,000 Monte Carlo simulation steps with
six equally spaced λ values of the restraint from 0 kcal mol–1 Å–2 to 2 kcal mol–1 Å–2. No symmetry correction was applied to
water molecules. More details on the location of the restraints and
the resulting free energy corrections are available in the SI.
Results
Replica-Exchange with BPTI
GCMC calculations were performed
on BPTI with and without RE in B. The titration results
from 10 repeats with no RE and the most frequent RE (attempted every
100,000 MC steps) are shown in Figure . The acceptance rate for the attempted exchanges between B replicas was >89% for all RE frequencies. More details
of this are available in the SI.
Figure 2
Ten repeats
of GCMC titration data for the BPTI system, without
RE (left) in B and RE every 100,000 steps (right).
Each point corresponds to the average number of water molecules at
a given B value. The first 200,000 MC steps have
been excluded as equilibration.
Ten repeats
of GCMC titration data for the BPTI system, without
RE (left) in B and RE every 100,000 steps (right).
Each point corresponds to the average number of water molecules at
a given B value. The first 200,000 MC steps have
been excluded as equilibration.Figure shows
that
RE has improved the repeatability of the titration data. The Kolmogorov–Smirnov[52] test was used to test the statistical significance
of this apparent reduction in uncertainty. All values of N from simulations with and without RE were median centered to focus
only on the spread of the Ns and not the relationship
with B. The distributions of median-centered N with and without RE were found to be significantly different
with a p-value of 0.0135. The relationship between B and N should be monotonically increasing
owing to the relationship between B value and average
water occupancy (see eqs and 2). The monotonicity of each individual
titration repeat, therefore, provides an additional quantifiable test
of sampling performance. The Kendall rank correlation coefficient,
denoted τ, is an appropriate measure of the montonicity as it
does not require a linear relationship between B and N. A perfect monotonically increasing relationship between B and N is indicated by τ = 1. The
mean τ with one standard error for the non-RE titration data
is 0.86 ± 0.01, compared to 0.98 ± 0.00 for RE-GCMC, averaged
over all RE frequencies. Hence, the titration data from RE-GCMC are
more physically sound than without RE.Using eq , the titration
data for all the BPTI simulations were used to calculate binding free
energies for each RE frequency. As the purpose is to evaluate the
extent to which the uncertainty in the GCI free energy decreases with
RE, Figure shows
the boxplots of the median-centered binding free energy for one, two,
and three waters, whose locations can be seen in Figure . The boxplots were generated
by bootstrap sampling the titration data and calculating the GCI binding
free energy of each sample. A bootstrap sample consisted of one randomly
sampled N value from the set of 10 repeats for each
of the 32 B values, and the titration curve was estimated
as previously described.[32] Both the interquartile
ranges and the range of the data (illustrated by the “whiskers”)
are larger for the protocols with no RE. The statistical significance
of this reduction in spread with RE was assessed using the bootstrap
samples of the free energy to bind three waters. The variance of the
binding free energy was calculated in batches of 10 bootstrap samples
for all RE data and separately for free energies calculated without
RE. Using these samples, we estimate that the probability that the
variance with RE is equal to or greater than the variance without
RE to be 2.6%. Thus, the reduction in the variance of calculated free
energies is statistically significant. This result is consistent both
with the increased monotonicity and the reduced titration variance
found for RE-GCMC. Therefore, RE-GCMC improves the repeatability of
individual titration simulations. There is no clear RE frequency that
is significantly better than the rest (see Figure S5).
Figure 3
Boxplot of the median-centered free energies for each protocol,
where errors have been calculated over 1000 bootstrapping steps of
10 repeats. In each case it is the free energy difference between
an empty GCMC region, to a one, two and three water network, respectively.
RE with GCI produces free energies that have a consistently tighter
distribution than GCI free energies calculated without RE.
Boxplot of the median-centered free energies for each protocol,
where errors have been calculated over 1000 bootstrapping steps of
10 repeats. In each case it is the free energy difference between
an empty GCMC region, to a one, two and three water network, respectively.
RE with GCI produces free energies that have a consistently tighter
distribution than GCI free energies calculated without RE.Figure compares
the binding free energies calculated for the three-waterBPTI network
using GCI and double decoupling. The standard deviation over the repeats
is used to illustrate the relative uncertainty of each method. The
distribution of the error of original GCI binding free energy equation[32] without RE overlaps with the distribution of
error with the new standard state formulation (eq ) and double decoupling. With the benefit
of the higher precision granted by RE, the standard state binding
free energy formulation of GCI is more clearly in agreement with the
standard state binding free energies as calculated via double decoupling.
Indeed, the standard deviation of GCI is comparable to, if not smaller
than, double decoupling. In addition to this, the GCI method is easier
to use than double decoupling, as all of the water binding free energies
are determined in a single, automated set of simulations. GCMC does
not require the a priori knowledge of both the number of waters and
the location of the waters that are needed for double decoupling simulations.
Double decoupling methods also require restraints or constraints within
the simulations. Both restraints and constraints are needed to reduce
the sampling of the water molecule as it is decoupled from the system,
but constraints are also able to prevent other nearby water molecules
occupying the site. GCMC is a significantly easier method to implement
and, with the addition of RE, is able to provide results with comparable
variance.
Figure 4
Binding free energies of the three water network in BPTI calculated
using different methods. To highlight the intrinsic uncertainty of
each method, the colored bars indicate the standard deviation, as
opposed to the standard error, over all repeats. Blue results are
from sequential decoupling of the water network, with decoupling
1 waters are decoupled in the order C–B–A and decoupling 2 in the order C–A–B. Red results
show the binding free energies calculated with the orginal formulation
of GCI binding free energy equation for both RE (frequency: 100,000)
and non-RE simulations, and the orange results are from the same simulations
determined using eq .
Binding free energies of the three water network in BPTI calculated
using different methods. To highlight the intrinsic uncertainty of
each method, the colored bars indicate the standard deviation, as
opposed to the standard error, over all repeats. Blue results are
from sequential decoupling of the water network, with decoupling
1 waters are decoupled in the order C–B–A and decoupling 2 in the order C–A–B. Red results
show the binding free energies calculated with the orginal formulation
of GCI binding free energy equation for both RE (frequency: 100,000)
and non-RE simulations, and the orange results are from the same simulations
determined using eq .The detailed breakdown of the
BPTI thermodynamic cycle is shown
in Figure . The binding
free energies of the water molecules in the BPTI system were determined
by decoupling calculations. As Figure shows, two water molecules appear to couple into the
BPTI system simultaneously (waters A and B), followed by a third molecule
at a higher chemical potential (water C). As waters A and B couple
in simultaneously, the decoupling of these water molecules has been
attempted in both orders (i.e., A then B, and B then A) with double
decoupling. The results are shown in Figure , and the closures of the thermodynamic cycles
(shown in red) are consistent to within error. The free energy of
the three water network is −21.81 (0.50), −21.67 (0.68),
and −21.84 (0.74) kcal mol–1 as calculated
by GCI with RE-GCMC and the two double decoupling pathways, respectively.
As GCI couples the first two waters in simultaneously, it is not possible
to decompose the free energy of the dimer to free energies for separate
waters. The GCI simulations were analyzed to look at all of the simulation
snapshots across all B values for structures where
only one water is present in the site of interest, and in the majority
of cases, water B is present more often than water A, suggesting that
it is the first to couple into the system, albeit with similar free
energies of binding. Performing the double decoupling simulation,
as water B has a larger binding free energy when coupling the dry
cavity than water A (−8.20 kcal mol–1 and
−5.85 kcal mol–1), respectively, double decoupling
suggests that water B will enter the protein system first. The largest
error between GCI and double decoupling occurs in the leg with the
order of water B then water A. The largest error of the cycle is the
decoupling leg of water B to a dry cavity, where the standard error
of calculations is 0.63.
Figure 5
Thermodynamic cycle of BPTI. Orange arrows,
GCMC results; blue
arrows, decoupling results. GCMC values are calculated using a RE
frequency of 100,000. Standard errors are shown in parentheses. Gray
waters indicate which waters have been turned “off”
in the system. Cycle closure energies are shown in bold red. All values
are recorded in kcal mol–1.
Thermodynamic cycle of BPTI. Orange arrows,
GCMC results; blue
arrows, decoupling results. GCMC values are calculated using a RE
frequency of 100,000. Standard errors are shown in parentheses. Gray
waters indicate which waters have been turned “off”
in the system. Cycle closure energies are shown in bold red. All values
are recorded in kcal mol–1.
Discussion and Conclusions
This work has used replica-exchange
(RE) of the chemical potential
(equivalently, Adams parameter) in grand canonical Monte Carlo (GCMC)
to calculate the absolute binding free energies of small networks
of water molecules to protein cavities. The use of RE was found to
significantly reduce the variance of the water occupancies at each
applied chemical potential, which resulted in a significant reduction
in variances of the absolute binding free energies as calculated with
grand canonical integration (GCI). The decreased variance of the GCMC
simulations facilitated the precise comparison of water binding free
energies calculated using double decoupling and the standard state
GCI binding free energy equation developed herein. The free energies
calculated from both methods were found to be in agreement.In this study, the absolute binding free energies calculated from
GCMC (using GCI) have lower statistical uncertainties than the free
energies calculated via alchemical decoupling (estimated using MBAR).
This difference becomes more pronounced the more water molecules are
decoupled in spite of the fact that, in our hands, free energy calculations
with GCMC require fewer CPU hours than alchemical decoupling in BPTI
and simpler applications of constraints/restraints. For binding free
energy calculations with greater numbers of water molecules, we expect
the gap between accuracy and performance to widen between GCMC and
alchemical methods.The extent to which the precision and speed
of the GCMC binding
free energy methods carries over from water molecules to small, drug-like
molecules remains to be seen. The use of GCMC with unbiased insertion
moves, as used here for water, becomes increasingly inefficient for
larger molecules due to higher probabilities of steric clashes. Nevertheless,
GCMC has already been successfully applied to small fragments, both
with unbiased insertion moves[53] and using
cavity bias.[54] The techniques developed
here, therefore, may hold promise for the rigorous calculation of
absolute binding free energies of small fragments via GCMC. Larger
molecules will no doubt require the application of configurational
biasing techniques,[33,35,55] although these do come with the additional computational effort
required to compute the bias. The use of RE for absolute free energy
calculations with GCI comes with a negligible increase in the computational
burden, as the exchange of information across CPU nodes is orders
of magnitude faster than the run-time of the molecular simulation.A curious feature of GCMC simulations is the requirement that the
volume must remain constant. This is so that the simulations correctly
sample the grand canonical (μVT) ensemble.
As protein-ligand binding affinities are typically measured under constant pressure conditions,
a natural question is to what extent GCMC-like techniques can be used
in constant-pressure simulations, such as those in the isothermal-isobaric
(NpT) ensemble. Despite the fact that the NpT-type ensembles require the number of particles to remain
constant, there are a number of Markov chain Monte Carlo techniques
that have insertion and deletion moves, in addition to volume moves,
as part of their sampling repertoire. For instance, the Gibbs-ensemble
Monte Carlo method[56,57] attempts to replicate the physically
impossible μpT ensemble by allowing two simulated
compartments to exchange volume and molecules or particles. Although
the volume and number of particles in either compartment can fluctuate,
the combined volume and the total number of particles must remain
constant. Other techniques employ the semigrand canonical ensemble
(see, for example, refs (58 or 59)), in which the composition of the system can change (i.e., allowing
different fragments to interconvert) under the constraint that the
total number of particles remains constant. The semigrand
canonical ensemble also requires the difference in
the chemical potentials of the interconverting molecules to remain
constant so that, when there are only two interconverting molecular
species, the semigrand canonical ensemble can be described as a NpTΔμ ensemble. In controlling the ratio of
two species, the Δμ parameter operates—in practice—very
much like the μ parameter in GCMC, so that semigrand canonical
simulations are amenable to enhanced sampling via the RE method described
herein.Absolute binding free energy calculations with GCMC
can calculate
both the location and binding free energy of networks of waters as
well as the relative stability of the individual waters. The methods
developed here reduce both the bias and variance of the predictions
made with GCI compared to its original implementation. Coupled with
its ease of use compared to double decoupling, we hope GCI will be
a powerful tool in structure-based design, particularly in cases where
one seeks to rigorously quantify the effect of structural waters on
ligand binding affinities.
Authors: Symon Gathiaka; Shuai Liu; Michael Chiu; Huanwang Yang; Jeanne A Stuckey; You Na Kang; Jim Delproposto; Ginger Kubish; James B Dunbar; Heather A Carlson; Stephen K Burley; W Patrick Walters; Rommie E Amaro; Victoria A Feher; Michael K Gilson Journal: J Comput Aided Mol Des Date: 2016-09-30 Impact factor: 3.686
Authors: Jian Yin; Niel M Henriksen; David R Slochower; Michael R Shirts; Michael W Chiu; David L Mobley; Michael K Gilson Journal: J Comput Aided Mol Des Date: 2016-09-22 Impact factor: 3.686
Authors: Ido Y Ben-Shalom; Charles Lin; Tom Kurtzman; Ross C Walker; Michael K Gilson Journal: J Chem Theory Comput Date: 2019-03-13 Impact factor: 6.006
Authors: Helena Käck; Kevin Doyle; Samantha J Hughes; Michael S Bodnarchuk; Hans Lönn; Amanda Van De Poël; Nicholas Palmer Journal: ACS Med Chem Lett Date: 2019-07-15 Impact factor: 4.345
Authors: Ido Y Ben-Shalom; Charles Lin; Brian K Radak; Woody Sherman; Michael K Gilson Journal: J Chem Theory Comput Date: 2021-11-11 Impact factor: 6.006
Authors: Gregory A Ross; Ariën S Rustenburg; Patrick B Grinaway; Josh Fass; John D Chodera Journal: J Phys Chem B Date: 2018-05-31 Impact factor: 2.991
Authors: Teresa Danielle Bergazin; Ido Y Ben-Shalom; Nathan M Lim; Sam C Gill; Michael K Gilson; David L Mobley Journal: J Comput Aided Mol Des Date: 2020-09-24 Impact factor: 3.686
Authors: Jessica L Thomaston; Marley L Samways; Athina Konstantinidi; Chunlong Ma; Yanmei Hu; Hannah E Bruce Macdonald; Jun Wang; Jonathan W Essex; William F DeGrado; Antonios Kolocouris Journal: Biochemistry Date: 2021-08-03 Impact factor: 3.321