Crystal N Nguyen1, Tom Kurtzman2,3, Michael K Gilson1. 1. Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego , 9500 Gilman Drive, La Jolla, California 92093-0736, United States. 2. Department of Chemistry, Lehman College, The City University of New York , 250 Bedford Park Blvd. West, Bronx, New York, New York 10468, United States. 3. Ph.D. Program in Chemistry, The Graduate Center of The City University of New York , New York 10016, United States.
Abstract
A number of computational tools available today compute the thermodynamic properties of water at surfaces and in binding pockets by using inhomogeneous solvation theory (IST) to analyze explicit-solvent simulations. Such methods enable qualitative spatial mappings of both energy and entropy around a solute of interest and can also be applied quantitatively. However, the entropy estimates of existing methods have, to date, been almost entirely limited to the first-order terms in the IST's entropy expansion. These first-order terms account for localization and orientation of water molecules in the field of the solute but not for the modification of water-water correlations by the solute. Here, we present an extension of the Grid Inhomogeneous Solvation Theory (GIST) approach which accounts for water-water translational correlations. The method involves rewriting the two-point density of water in terms of a conditional density and utilizes the efficient nearest-neighbor entropy estimation approach. Spatial maps of this second order term, for water in and around the synthetic host cucurbit[7]uril and in the binding pocket of the enzyme Factor Xa, reveal mainly negative contributions, indicating solute-induced water-water correlations relative to bulk water; particularly strong signals are obtained for sites at the entrances of cavities or pockets. This second-order term thus enters with the same, negative, sign as the first order translational and orientational terms. Numerical and convergence properties of the methodology are examined.
A number of computational tools available today compute the thermodynamic properties of water at surfaces and in binding pockets by using inhomogeneous solvation theory (IST) to analyze explicit-solvent simulations. Such methods enable qualitative spatial mappings of both energy and entropy around a solute of interest and can also be applied quantitatively. However, the entropy estimates of existing methods have, to date, been almost entirely limited to the first-order terms in the IST's entropy expansion. These first-order terms account for localization and orientation of water molecules in the field of the solute but not for the modification of water-water correlations by the solute. Here, we present an extension of the Grid Inhomogeneous Solvation Theory (GIST) approach which accounts for water-water translational correlations. The method involves rewriting the two-point density of water in terms of a conditional density and utilizes the efficient nearest-neighbor entropy estimation approach. Spatial maps of this second order term, for water in and around the synthetic host cucurbit[7]uril and in the binding pocket of the enzyme Factor Xa, reveal mainly negative contributions, indicating solute-induced water-water correlations relative to bulk water; particularly strong signals are obtained for sites at the entrances of cavities or pockets. This second-order term thus enters with the same, negative, sign as the first order translational and orientational terms. Numerical and convergence properties of the methodology are examined.
Molecular recognition in water cannot be understood purely in terms
of the direct interactions of the two molecules that bind each other.
Water itself plays an intimate role, such as by effectively weakening
charge–charge interactions, by competing with the solutes for
hydrogen-bonding opportunities, and by providing hydrophobic interactions
that drive the association of nonpolar parts of the solutes. Much
is known about the physical principles by which water influences binding,[1−12] and the effects of water can, to a large extent, be captured in
both implicit[13−21] and explicit[22−29] representations of water. However, there are also still puzzles
and gaps in our understanding of water’s role in molecular
recognition. For example, although the hydrophobic effect may be reasonably
well understood for simple nonpolar surfaces, many solutes, particularly
biomolecules like proteins and DNA, present surfaces with complicated
shapes and complex patterns of polarity, where water might have properties
quite different from water at simpler, better understood surfaces.
Thus, small surface clefts can lead to the formation of water clusters
with a reduced set of hydrogen-bond arrangements, making such waters
particularly easy to displace into bulk, where they regain many more
hydrogen bonding opportunities.[30−34] Classical notions of the hydrophobic effect[35−37] do not necessarily
envision such distinctive cases.In recent years, new computational
tools have come online to compute and visualize the structure and
thermodynamics of water at complex surfaces, by applying inhomogeneous
solvation theory (IST)[38−44] to simulations of solutes immersed in explicit water. Some of these
tools, such as WaterMap,[30,45] STOW,[46] and others[47−49] gain simplicity by modeling the density distribution
of water in terms of discrete hydration sites. Alternatively, Grid
Inhomogeneous Solvation Theory (GIST)[34,50−52] discretizes water density and thermodynamic contributions onto a
three-dimensional grid, in order to gain a more comprehensive and
quantitative description of solute hydration. Such tools offer insights
into the properties and role of surface water and can also be useful
in computer-aided drug design, by pointing out regions in a protein
binding pocket where a ligand may gain affinity by displacing thermodynamically
unfavorable water.[30,34,45,53]Implementations of IST have so far
concentrated almost entirely on calculating first-order contributions
to the entropy of water. The first-order translational entropy reflects,
in effect, the bumpiness of the density distribution of water induced
by the solute, while the first-order orientational entropy reflects
the degree to which the solute reduces the orientational freedom of
nearby water. However, it is likely that higher-order terms also contribute
significantly to the hydration entropy. Higher-order terms relate
to water–water correlations, which certainly exist in bulk
water[54,55] and are expected to be modified by the presence
of a solute. Note that greater correlation implies lower entropy and
that the higher-order entropy terms in the IST series expansion are
akin to the mutual information terms in the mutual information expansion[56,57] which has been used to estimate changes in the configurational entropy
of small molecules and proteins,[58−61] and have been tied to the multibody
expansion of IST.[62] Indeed, Velez-Vega
et al. have recently obtained encouraging results for pure water with
a novel method that treats water on an atomistic, rather than molecular,
basis and is based on the mutual information expansion.[63]Several prior studies have addressed the contributions of
water–water correlations to hydration entropy. An early work
used IST to study the contributions of water–water correlations
to the hydration entropy of methane, modeled as a single Lennard-Jones
atom, for which numerical convergence is facilitated by exploitation
of spherical symmetry.[64] Interestingly,
the signs of the water–water translational and orientational
terms were found to be negative (increased correlation relative to
bulk water) at room temperature but found to become positive at high
temperature, whereas the first order terms remained negative at all
temperatures considered. This study also tested an approximation in
which the water–water correlation function near the solute
was approximated by the bulk water–water correlation function;
however, the resulting entropies were found to deviate substantially
from those computed without the approximation. Nonetheless, small
molecule hydration free energies computed with this approximation
were shown to agree reasonably well with corresponding values obtained
by a reference free energy perturbation method,[65] and the same approximation was used to estimate water–water
correlations within clusters of water molecules buried at a protein–ligand
binding interface.[66] Two other studies
have taken an alternative approach to obtaining these numerically
challenging terms for water near small molecules[67] and protein binding sites:[52] instead of computing the water–water terms from actual correlation
data, they were instead heuristically set as proportional to the first
order entropy, where the constant of proportionality is on the order
of −0.3 to −0.5. Finally, a recent study used the water–water
correlations obtained from simulations of the system of interest,
rather than of bulk water, to compute the contributions of these correlations
to the hydration entropy contributions of pairs of waters buried in
protein cavities.[62] The water–water
correlation contribution was found to be negligible, however, apparently
because the waters were tightly bound to the protein and hence had
minimal fluctuations available to correlate with each other. Thus,
considerable work has already been done to study or estimate the contributions
of water–water correlations to the entropy of hydration. However,
correlations computed from simulations have not yet been used to quantify
or visualize second order terms for water at well-hydrated, complex
surfaces, such as the binding sites of host–guest systems and
proteins.The present paper addresses this challenge with a
three-dimensional grid formulation of the second-order translational
entropy of water, which is related to the two-point spatial correlation
function of water density. Rewriting the second order translational
entropy in terms of a conditional density, instead of a two-point
density, leads to an intuitively interpretable form of this term and
also facilitates spatial mapping and visualization. The present method
also makes use of the efficient nearest-neighbor entropy-estimation
approach.[50,67,68] We present
an evaluation of the numerical properties of the method and analyses
of the second-order translational entropy of water in and around the
synthetic host molecule cucurbit[7]uril and the blood clotting cascade
enzyme Factor Xa (FXa). These second order results are placed in the
context of the first-order translational and orientational entropy
around these solutes, and visualization is used to understand the
spatial distribution of the second order term, including molecular
features that generate particularly high water–water correlations
and hence particularly low second-order translational entropies.
Methods
Theory
Here, we
express the second order (or pairwise) translational entropy, Sww, in terms of conditional distribution functions,
then show how the resulting integrals can be estimated from simulation
data by using a three-dimensional grid to discretize space and then
estimating water densities with either a histogram or a nearest neighbor
method. We also review how the first order (or single-body) translational
entropy, Ssw, can be estimated with the
nearest neighbor method,[67] as this offers
numerical advantages over the histogram approach used previously.[50]
Pairwise Translational
Entropy in the Inhomogeneous Solvation Theory: Background
The solvation entropy of a flexible solute may be written aswhere ρsol(q) is the equilibrium
probability distribution function of the solute over its internal
coordinates q when it is in solution and ρvac(q) is the corresponding function for the
solute in vacuo, ΔSsol(q) is the solvation entropy the solute would have
if it were constrained in conformation q,[69] and kB is Boltzmann’s
constant. The first term in eq is the solvation entropy averaged over the Boltzmann ensemble
of solute configurations, and the second term is the change in solute
configurational entropy on being transferred from a vacuum to solvent.
Here, as in prior applications of IST, we consider the solvation of
a solute in a given conformation, q, or in a narrow
range of conformations, although one could, at some computational
cost, explore solvation over a range of conformations by applying
IST to each one separately and appropriately weighting each conformation.
For simplicity, we will refer to ΔSsol(q) as the solvation entropy and write ΔSsol instead of ΔSsol(q).For solvation in water, inhomogeneous
solvation theory (IST) provides an expansion of ΔSsol in terms of one-water, two-water, and higher order
terms.[38,39,41−43] The one-water term, Ssw, accounts for
the patterning of water density around the solute, while the two-water
term, ΔSww, accounts for the difference
between the two-point, water–water correlations around the
solute and those in bulk; subsequent terms capture still higher-order correlations.
Thus, neglecting terms above pairwise leads to the following approximation:The one-water term has been considered in
some detail in prior studies (see Introduction) and is considered here only in relation to its calculation via
a nearest-neighbor (NN) method (section 2.5). The pairwise term, which is of primary interest here, may be broken
down further into orientational and translational parts:[46,66]The present study focuses exclusively on the translational
part, ΔSwwtrans, which accounts for the solute-induced
change in the two-point correlation of water density, without reference
to the spatial orientation of the waters. Using a form provided by
Morita and Hiroike[42] and clearly rendered
in eq 12 of a subsequent paper by Lazaridis,[43] one may write this quantity asThe first equation of eq writes ΔSwwtrans as the difference
between the pairwise translational entropy of the solute-perturbed
water from the pairwise translational entropy of unperturbed (bulk)
water. The second and third equations provide expressions for the solute-perturbed
system and bulk water. Here, ρ is the number density of bulk
water, g(r) is the unitless ratio of
the number density of water at r in the presence of the
solute to the bulk density, , and g(r,r′) is the two-point correlation function in the presence
of the solute, , where ρ(r,r′)
is the two-point number density of water at locations r and r′. The solute is considered to be immobilized
in the lab frame of reference, so that coordinates r and r′ represent both lab-frame coordinates
and solute-frame coordinates. The integrals range over the whole system
or, at any rate, to points r and r′
far enough from the solute to effectively capture the entire effect
of the solute on the solvent distribution.The expression in
line three for the pairwise translational entropy of pure water uses
the corresponding distribution functions, go(r), go(r′),
and go(r,r′)
for unperturbed bulk water. Note, however, that in the absence of
the perturbing solute (or, in the terms of Morita and Hiroike, in
the absence of an external field), go(r) = go(r′)
= 1. We include these terms here to make clear that derivations and
algorithms which apply in the presence of a solute fixed in the lab
frame are equally applicable to pure water, where no solute is present.
Conditional Formulation of the Pairwise Translational
Entropy
We now develop an expression for Swwtrans in
terms of conditional densities. As noted in the prior subsection,
the same derivation carries through in the case of pure water, so
the pure water case is not included explicitly. The resulting formulation
enables an informative spatial decomposition and provides a convenient
basis for evaluating this correlation term with nearest-neighbor[51,68] or histogram methods. We first use the definitions in the prior
paragraph to write the starting expression in terms of number densities:Because,
in the canonical ensemble, ∫ρ(r) dr = N and ∫∫ρ(r,r′) dr dr′
= N(N – 1), where N is the number of water molecules, the second line in eq , which we call the nonlogarithmic
term, reduces to . For the sake of brevity, this simple term will be omitted from
subsequent expressions, except as otherwise noted. We now use the
fact that the two-point density may be written in terms of either
of the conditional two-point densities, ρ(r,r′) = ρ(r|r′)
ρ(r′) = ρ(r′|r) ρ(r), to rewrite Swwtrans, without
the nonlogarithmic term, asIntuitively,
the first term in the last line of eq reports how the presence of a water at r′ influences the entropy of the water in the system as a whole,
while the second term essentially subtracts out the entropy of the
water for the distribution not conditioned on the presence of water
at r′. In the absence of any correlation between
the density of water at r and at r′,
then ρ(r|r′) = ρ(r) and ρ(r′|r) = ρ(r′), so the integrands of both terms in the last line
of eq become equal,
and the volume elements at these locations will make zero contribution
to the pairwise entropy.The nearest neighbor entropy estimation
method provides averages of the form ⟨ln ρ⟩,[68] so it is of interest to express eq in a form which uses such averages.
We use the facts that ∫ρ(r) dr = N and ∫ρ(r|r′) dr = N – 1, to recognize
that and are
normalized probability density functions and hence can be cast as
probability weighting factors of the corresponding logarithm terms
in eq . Multiplying
and dividing the first and second terms in the last two lines of eq by the respective normalization
constants (N – 1) and N and
integrating the second term over r′ to obtain
a factor of N – 1 yieldswhere angle brackets indicate averages over the Boltzmann distribution.
Localized Perturbation Approximation
The integrals in the equations above range over the entire system,
but in practice we will often be most interested in the region of
solvent near the solute. For one thing, such a region may have particular
functional importance, such as binding of a ligand by the solute,
if it is a receptor. Furthermore, the influence of the solute on the
solvent distribution functions dies off with distance from the solute,[41,43] so distant regions contribute little, and ignoring distant regions
can make calculations more tractable without introducing much error.
Accordingly, it is useful to restrict the integrals over r′ to a local region of interest, K, such
as a receptor binding site. Furthermore, the contribution of the water
at r′ to Swwtrans is directly connected with
the conditional density function ρ(r|r′), which reports on the perturbation of water density at r by the presence of a water at r′, and
this perturbation also dies off with the distance of r from r′. As a consequence, it is often reasonable
to restrict the integrals over r to finite regions centered
on r′, which will be termed L. With these localizing approximations, eq may be written in either
of the following equivalent forms:In physical terms, eq provides the contribution to the
total pairwise entropy from the degree to which the water density
in regions L is
modified by conditioning on the presence of a water at each location r′ in K. Note that eq is recovered when the K and L regions span the entire system.
Using a 3D Grid to Estimate Distribution Functions from Simulations
with Explicit Water
Following the existing first-order GIST
approach,[51] we use a three-dimensional
grid to compute the spatial distribution of the second-order translational
entropy from explicit-water simulations for a region of interest,
such as a protein binding site. Thus, we discretize eq by breaking space into cubic voxels
of side-length Rvox and volume Vvox = Rvox3, which are assumed to be small
enough that a voxel can contain at most one water molecule. (Here,
the translational coordinates of a water molecule are identified with
those of its oxygen atom.) As a matter of notation, location r′ maps to voxel index k, and the
region K corresponds to a grid in the region of interest,
such as a binding site (Figure ), while r maps to voxel index l and the regions L, which are centered around r′, map to regions L, which are centered around
voxels k ∈ K; see Figure . The following subsections
describe a histogram method and a nearest-neighbor method of using
this discretization to estimate Sww,trans through
analysis of n frames,
or snapshots, from a molecular dynamics (MD) simulation of explicit
water molecules around the solute; an additional subsection explains
how the nonlogarithmic term in eq is computed.
Figure 1
Diagram of a receptor (green) in a water-filled
simulation box (blue), with grid corresponding to region K in pale pink and two examples of regions L in yellow (orange where overlapping with K). In each case, the L grid is centered in one K-grid voxel, k, which is highlighted in pink.
Diagram of a receptor (green) in a water-filled
simulation box (blue), with grid corresponding to region K in pale pink and two examples of regions L in yellow (orange where overlapping with K). In each case, the L grid is centered in one K-grid voxel, k, which is highlighted in pink.
Histogram Method
The histogram method uses MD data to estimate
the various densities in the first line of eq by counting the instances of water molecules
in voxels:Here, n and n are the numbers of MD frames for which a water is
found in voxels l and k, respectively,
and n = n are the numbers of MD frames for which a water is found in both
voxels k and l. Note that this approximation
treats the water densities as constant over the volume of each voxel,
and, as noted above, we have assumed that the voxels are small enough
that at most one water molecule can occupy a voxel. In going from
a double integral to a double sum over voxels, two factors of the
volume element Vvox appear. Thus, the
histogram method estimates Sww,trans asHere, the contribution of regions K and L to the pairwise translational part of the solvation entropy
is estimated in terms of the quantities summed over voxels k ∈ K. The contribution of voxel k, Vvox(scond,hist – shist), which is the product of the
voxel volume, Vvox, and the difference
between the conditional and unconditional entropy densities, is computed
as the probability of finding a water in voxel k, , multiplied by the per water normalized quantity scond,norm,hist – snorm,hist. Writing the pairwise translational entropy in terms of a sum over
voxels, in this manner, allows a spatial decomposition of this term
which is useful for generating graphical representations. In practice,
the expressions in eq are evaluated by analyzing the n available MD frames to determine n, the number of frames with a water in voxel k; n, the
number of frames with a water in voxel l; and n, the number
of frames with a water in both voxels l and k, where voxels k and l are required to reside in regions K and L, respectively.
Nearest-Neighbor Method
Starting
with the final expression in eq , we multiply and divide the first term by , which
is the mean number of waters on the L grid given a water in the k voxel,
and we multiply and divide the second term by N = ∫ ρ(r) dr, which is the mean number of waters on
the L grid without
any condition, in order to convert the corresponding number density
functions into probability density functions, much as previously done
in deriving eq . The
present expression for N is an approximation,
due to the discretization of r′ into voxels, so
the following expressions are, accordingly, also written as approximations.
We obtainHere,
⟨⟩ indicates an average over the volume of the L grid. These integrals are now discretized
via the approximations in eq :Here, ρ(r|k) represents
the number density of water at r conditioned on the presence
of a water in voxel k, and L is the L region associated
with, and typically centered on, voxel k. This may
be put into a form analogous to the histogram formulation in eq , as follows:Both average log terms inside eq can be found using the NN method, with the gamma correction
for the bias, as follows:Here, the index i runs over water instances and r is the distance between
water instance i and its nearest-neighbor water instance.The water instances are defined as follows. For the second expression
in eq , the water
coordinates found in all n MD frames are merged into one “superframe,”
containing Nn sets of
water coordinates, so that each of the N water molecules
in the system appears n times, each time with a different set of coordinates; each resulting
set of water coordinates in the superframe is considered one instance
of a water. The symbol ∑ indicates a sum over
the Nnwater
instances falling within the L grid, but the nearest neighbors of these water instances may
be water instances which fall just outside the L grid. For the conditional term in the first
line of eq , the water
coordinates in all n frames for which a water molecule is found in k are merged into one superframe, which is now conditioned on the
presence of a water at k, and each set of water coordinates
in this conditional superframe is considered one instance of a water.
The symbol ∑( indicates a sum over
the water instances in this superframe which fall within the L grid; again, the nearest
neighbors of these conditional water instances may be conditional
water instances which fall just outside the L grid.
Nonlogarithmic Term
The nonlogarithmic term in eq may readily be computed for regions K and L, within the grid approach. Thus,whereThis term reflects the difference
between the two-point density of water around the solute and the product
of the corresponding one-point densities. As detailed in the Results section, once referenced to bulk, its contribution
becomes negligible, at least for the cases analyzed here.
Referencing the Results to Bulk Water
As
explained in section 2.1.1, the derivation
and algorithms described above for water in the presence of a solute
fixed in the lab frame are equally applicable to pure water. In order
to reference the entropy of water in the presence of solute to pure
water values matched numerically to those computed in the presence
of the solute, we analyze a simulation of neat water with a small K grid near the center of the simulation box and use the
same grid spacing and L-grid parameters as used to
analyze water in the presence of the solute to compute the value of
TSww,bulktrans for the water on the K grid. This quantity is then
normalized by the mean number of water molecules on the K grid, for the pure-water calculation, to obtain Sww,bulktrans,norm, the mean pairwise translational entropy per water in bulk. Note
that the dimensions of the pure water K grid need
not be matched to the one used in the presence of solute: every voxel k is equivalent to every other, for the pure water case,
so there is no need to cover any particular region. Now, if the K region in the presence of solute contains an average of N waters, again in the presence
of solute, then one obtains the net pairwise entropy difference of
the waters in the presence of solute versus in bulk from the following
expression:In the formulations detailed above, each voxel k on the K grid is associated with an entropy density.
For example, for the nearest neighbor approach, this quantity is obtained
from eq as scond,NN – sNN. This density may be referenced to bulk by subtracting the corresponding
entropy density for bulk water; when the nonlogarithmic term is also
included, the resulting entropy difference density is . The corresponding
per-water normalized entropy difference is . These
voxel quantities may be visualized in terms of three-dimensional contours,
as illustrated in the Results section.
Validation of Grid Methods by Comparison with the Radial Distribution
Method for Pure Water
The applicability of the present methods
to pure water provides an opportunity to validate them by comparing
their results with those obtained by an independent computational
approach. We use the fact that the second order translational entropy
of a homogeneous liquid may be obtained from the one-dimensional radial
distribution function (RDF) centered on a given water in the pure
liquid:Formally, the upper limit of the integral, R, corresponds to the size of the entire system, but far
smaller values of R yield good approximations, because
water–water correlations decay quickly with values of r on the scale of the mean water–water distances,
as detailed in the Results. Here, g(r), the radial correlation function between
water molecules, is estimated from n MD frames aswhere n is the number of
water instances in the spherical shell of thickness Δr between r – Δr/2 and r + Δr/2. In practice,
we maximize statistical convergence by averaging the values of g(r) over all N waters
in a simulation of pure water. The result from eq was compared with that obtained by the grid
methods as described in the prior subsection.
NN Estimation
of the First Order Translational Entropy
The first order
translational entropy of the water around a solute is given byThe contribution of region K to Sswtrans may, for the sake of visualization and analysis, be decomposed into
contributions from voxels k ∈ K, as follows:where is the mean number of waters
in voxel k, so that , and ⟨ln ρ(r)⟩, the mean log density
of water in voxel k, may be computed by the nearest
neighbor method as
Implementation of Entropy Calculations
The K and L regions used for both the histogram
and NN entropy calculations were sized to fit within the simulation
boxes (see details below), and the grid spacings were 0.5 Å,
except as otherwise noted. The efficient ANN algorithm[70] was used to find nearest neighbors of water
instances. Because the difference between the unconditional density
and the density conditioned on a water in voxel k diminishes rapidly with distance from k, each voxel k was assigned its own cubic L grid, centered
around k, in order to efficiently capture the local
contributions of these differences to the overall second-order entropy.
The side-length of the L grid was set to twice the
desired cutoff distance within which correlations would be captured.
For example, a cutoff distance of 5 Å leads to a 10 Å L grid. The first-order orientational entropy was computed
with the previously described GIST NN method,[51] and the first order translational entropy was computed by the NN
method described in the subsection above.
Simulation
Details
The pure water calculations used in section 3.1 were carried out in cubic simulation boxes with
side lengths of about 32 Å and periodic boundary conditions.
The temperature was maintained at 300 K with the Langevin thermostat
and a collision frequency of 2.0 ps–1; pressure
was maintained at 1 atm by isotropic position scaling with a pressure
relaxation time of 0.5 ps. A 9 Å cutoff distance was employed
for all nonbonded interactions in the pairlist, and the particle-mesh
Ewald method was used to account for long ranged electrostatic interactions.
The simulations were run for 1.5 μs on a single GPU, with the
pmemd.cuda component of AMBER 12[71] or AMBER
14, with a 2 fs time step. The SHAKE algorithm[72] was used to constrain the lengths of all bonds involving
hydrogen atoms. Frames were saved for analysis every 0.5 ps.Parameters were assigned to cucurbit[7]uril (CB7) with the GAFF[73] force field and RESP[74] with the HF 6-31G* basis set. The starting structure was solvated
with 1096 water molecules, resulting in a 10 Å buffer of water
surrounding the solute, using the program Tleap in AmberTools.[75] The system was then briefly energy-minimized
with Cartesian positional restraints on CB7, with force constants
of 10 kcal/mol/ Å2. The minimization comprised 1500
steps of the steepest descents algorithm, followed by a maximum of
2000 steps of the conjugate gradient method. The crystal structure
of FXa complexed with the inhibitor ZK-807834 (alternatively called
CI-1031), from Protein Data Bank entry 1FJS,[76] was prepared
and simulated as described previously;[34] the ligand was deleted to leave an empty binding site, and two ions observed in the crystal structure (Ca2+ and
Cl–) were included as part of the solute and hence
were restrained along with the protein atoms, as detailed below. TIP3P[23] water molecules were added to solvate the protein
using the program Tleap,[77] and the AMBER99sb
force field[78] was used to assign the atomic
parameters. The final system comprised 29 338 atoms, including
the 8557 water molecules. The MD simulation started with an energy
minimization of 1500 steps by the steepest descents algorithm, followed
by conjugate gradient energy minimization for a maximum of 2000 steps.
During the initial minimization, all protein atoms were harmonically
restrained to their initial positions with a force constant of 100
kcal/mol/Å2. The second minimization further relaxed
the system by keeping only non-hydrogen protein atoms restrained with
the same force constant. The energy-minimized systems were heated
in increments of 50 K lasting 20 ps each, at constant volume and temperature.
After the system reached 300 K, it was then equilibrated for 10 ns
at a constant temperature and constant pressure of 1 atm. The equilibrated
system was further run for an additional 5 ns at constant volume,
and data were then collected at NVT. During the heating process and
thereafter, all solute atoms were harmonically restrained to their
energy-minimized positions with a force constant of 100 kcal/mol/Å2 for FXa and 10 kcal/mol/Å2 for CB7. Other
simulation parameters were the same as for the pure water simulations
described above. A 400 ns production simulation was run for FXa, with
coordinates saved every 1 ps, for a total of 400 000 snapshots.
For CB7, the production run was 600-ns-long, with coordinates saved
every 0.5 ps, for a total of 1 200 000 snapshots.
Results
Second Order Translational
Entropy of Pure Water
Radial Distribution Function
Method
Here, we study the second order translational entropy
of pure water, for which the RDF method is available for reference.
(It is worth noting that the RDF method is not applicable to the solute-water
systems, for which the grid methods were developed.) First, as a verification
of the present methodology, we used the RDF method, with spherical
shells of thickness Δr = 0.01 Å, and 200 000
MD frames, to estimate Swwtrans,RDF for the TIP3P, TIP4P,[23] TIP4PEW,[27]and SPC/E[25] water models; the results all agree with prior
calculations[79] to within 0.5%, with the
exception of TIP3P, for which the difference was 1.9%. (A replicate
TIP3P simulation yielded the same result.) No significant changes
were observed on extending any of the simulations to 400 000
frames. We then examined the RDF results as a function of the two
numerical parameters R, the upper limit of the integral
in eq , and Δr, the width of the spherical shells in eq . As detailed in Figure (left panel), for the TIP3P
water model, the entropy from the RDF method converges quickly as
the upper limit of the integral increases, changing by less than 0.3%
when R increases from 4 to 10 Å. This observation
supports the use of local L(k) grids
(section 2.2) of modest size. The RDF result
also depends on the width of the spherical shell, closely approaching
its presumed asymptote for values less than or equal to 0.1 Å
(Figure , right panel).
Intuitively, the entropy falls as the shells become thinner and thereby
reveal finer structure in the water distribution. This result suggests
that voxels larger than about 0.1 Å in linear dimension are likely
to overestimate the entropy somewhat, due to smoothing of the density
distributions.
Figure 2
Second-order water–water entropy, provided as a
free energy contribution TS (kcal/mol), computed for pure TIP3P water
with the RDF method. Left: entropy as a function of the distance cutoff, R in eq , for a spherical shell thickness, Δr in eq , of 0.001 Å. Right:
entropy as a function of spherical shell thickness, for a distance
cutoff of 10 Å; the two right-most points are for shell thicknesses
of 0.25 and 0.5 Å. These calculations used 50 000 MD frames;
extending to 200 000 frames changes the results by only a few
thousandths of a percent.
Second-order water–water entropy, provided as a
free energy contribution TS (kcal/mol), computed for pure TIP3P water
with the RDF method. Left: entropy as a function of the distance cutoff, R in eq , for a spherical shell thickness, Δr in eq , of 0.001 Å. Right:
entropy as a function of spherical shell thickness, for a distance
cutoff of 10 Å; the two right-most points are for shell thicknesses
of 0.25 and 0.5 Å. These calculations used 50 000 MD frames;
extending to 200 000 frames changes the results by only a few
thousandths of a percent.
Grid Methods
The grid methods,
both histogram and NN, were used to compute the same quantity, by
defining region K as a cube of side length 1.5 Å,
centered in a pure water simulation box and divided into cubic voxels k, each of side length Rvox.
Each voxel k was associated with its own cubic region L (see Figure ) centered on k and extending
a fixed distance R from k along each grid axis, so that the side of each region L was of length 2R. For the histogram method, the cubic L regions were divided into voxels l of
side-length Rvox, and the K and L grids were registered so that each k voxel was also an l voxel. However, not
every l voxel was a k voxel, because L extended outside K. The agreement of
the grid method with the RDF method is examined below, as a function
of the voxel size, Rvox, and the size
of the L grids, R.Dependence of second order pure water entropy on grid spacing. Left:
NN results for grid spacings (Rvox) of
0.25 (green), 0.5 (red), and 0.75 Å (blue), with the reference
RDF result (dashed line; computed with R = 15 Å,
Δr = 0.01 Å, and 200 000 MD frames).
Error bars for the 0.5 Å results indicate the standard deviations
across all 27 k voxels in this K region. Right: Analogous results from the histogram method; results
for a grid spacing of 0.25 Å were computed but are below the
scale of the graph. All calculations used L regions
with R = 8 Å, and
all entropies are reported as free energy contributions, TS, in kcal/mol.The NN method provides reasonably
well-converged results within 1–2 million MD frames, for voxel
sizes, Rvox, of 0.75 and 0.5 Å. Thus,
as shown in Figure (left), the mean of TSww across all 27 k voxels (solid lines) changes little with increasing simulation time
for both of these grid spacings. In addition, for 0.5 Å voxels
(red), the standard deviation of TSww across the 27 k voxels falls to 0.066 kcal/mol at 1 million frames and
0.05 kcal/mol at 2.3 million frames (red error bars). For 0.75 Å
voxels (blue), the standard deviation is 0.042 kcal/mol at 1 million
frames (data not shown). However, calculations with these grid spacings
converge to values that overestimate the reference RDF result somewhat,
particularly for a grid spacing of 0.75 Å (Figure , blue). Because the NN method does not make
use of the discretization of the L grid into voxels l, these systematic errors are attributed to the fact that
larger k voxels lead to greater smoothing of the conditional
probability density functions over the corresponding regions L. Going to a finer grid spacing, Rvox, of 0.25 Å yields ultimately a closer
approach to the reference RDF result (Figure , green), but convergence is notably slower.
The histogram method converges slowly with the number of MD frames
(Figure , right).
This is evident even for the relatively coarse 0.75 Å grid spacing,
and our 0.25 Å results are below the scale of the graph. In addition,
it is evident that the 0.75 Å result is converging to a result
that is further from the reference RDF result than the NN 0.75 Å
result. This difference presumably results from further smearing of
the conditional density functions, due to the added discretization of the L regions, which is not required
for the NN method.
Figure 3
Dependence of second order pure water entropy on grid spacing. Left:
NN results for grid spacings (Rvox) of
0.25 (green), 0.5 (red), and 0.75 Å (blue), with the reference
RDF result (dashed line; computed with R = 15 Å,
Δr = 0.01 Å, and 200 000 MD frames).
Error bars for the 0.5 Å results indicate the standard deviations
across all 27 k voxels in this K region. Right: Analogous results from the histogram method; results
for a grid spacing of 0.25 Å were computed but are below the
scale of the graph. All calculations used L regions
with R = 8 Å, and
all entropies are reported as free energy contributions, TS, in kcal/mol.
Dependence of second order pure water entropy on size
of L regions. Left: Comparison of NN results, for L regions with R of 4–10 Å, with the reference RDF result (dashed
line; R = 15 Å, Δr =
0.01 Å, and 200 000 MD frames). Right: Analogous results
from the histogram method. All results are for a grid spacing of 0.5
Å, and all entropies are reported as free energy contributions,
TS, in kcal/mol.As detailed in the Appendix, further comparison of the NN and histogram
methods, with grid spacings down to 0.15 Å, highlights the advantage
of NN over the histogram method, in terms of the rate of convergence
and the amount of sampling required to reach the reference RDF result.
The graphs in the Appendix suggest that the RDF result would be closely approximated
by the NN method at a grid spacing of 0.15 Å, given about 4 ×
106 snapshots, whereas the histogram method would require
far more sampling.The present calculations depend not only
on the grid spacing, but also on the size of the L region, in which the perturbation of water density by a
water in each voxel k is evaluated. It is thus appropriate
to evaluate the sensitivity of the second order entropy calculations
to the size of the L regions. Based on
the RDF study (Figure , left), one may anticipate that R ≥ 4 Å should suffice to capture most of the second-order
entropy for the TIP3P water model examined here. This expectation
is borne out by the insensitivity of the NN results to R, over a range of 4–10 Å
(Figure , left); in
addition, good numerical convergence is observed over this range.
It is worth noting that, because the L regions are
cubic, a value of R equal
to 5 Å, for example, includes locations as far as Å from the center of voxel k. On the other
hand, the poor convergence of the histogram method (Figure , right) makes it difficult
to assess the sensitivity of its results to the value of R.
Figure 4
Dependence of second order pure water entropy on size
of L regions. Left: Comparison of NN results, for L regions with R of 4–10 Å, with the reference RDF result (dashed
line; R = 15 Å, Δr =
0.01 Å, and 200 000 MD frames). Right: Analogous results
from the histogram method. All results are for a grid spacing of 0.5
Å, and all entropies are reported as free energy contributions,
TS, in kcal/mol.
On the basis of this analysis
of calculations for pure water, we selected a grid spacing of 0.5
Å as a good working compromise between speed and accuracy for
the cucurbituril and protein calculations described below, and we
set R, the size of the L regions, to 5 Å. The
bulk value used to compute the change in the second order entropy
is taken from a pure water calculation under the same conditions and
with 2 000 000 MD frames. In addition, given the poor
convergence of the histogram method, the following calculations use
only the NN method for the second order calculations.
Cucurbit[7]uril (CB7)
Visual Mapping of Second-Order
Translational Entropy
Figure visualizes the spatial distribution of the second-order
translation entropy, relative to bulk, in and around the synthetic
host molecule CB7, which we previously studied to first order in entropy.[51] As shown in the left panel, there is a large
region of water with reduced second order entropy density, relative
to bulk, centered around the symmetry axis of CB7 (yellow), with particular
enhancement in two toroids (pink) roughly level with the upper and
lower nitrogen–carbon rings of the host. These are regions
where more highly correlated water (hence reduced second-order entropy
relative to bulk) is present at relatively high number density, leading
to notably negative pairwise entropy density relative to bulk. On
the other hand, small loci of water with positive pairwise entropy
density relative to bulk (i.e., less correlated than bulk) are present
at seven symmetry-related sites around the relatively apolar exterior
of the host (green), and also at symmetry-related sites between the
carbonyl oxygens at the upper and lower portals of the host (green).
Figure 5
Spatial
maps of second order translational entropy density of water, relative
to bulk, in and around host CB7, based on 1 200 000
MD frames. Left: Entropy contoured at −0.02 kcal/mol/Å3 (pink), −0.01 kcal/mol/Å3(transparent
orange), and +0.005 kcal/mol/Å3 (green). Middle: same,
but omitting the contour at −0.01 kcal/mol/Å3 and additionally showing water number density contoured at 0.1 waters/Å3 (gray) and 0.175 waters/Å3 (magenta). Right:
rotated view, showing only entropy density contours at −0.02
and 0.005 kcal/mol/Å3 and number density contour at
0.01 waters/Å3, with the same colors. Entropies are
multiplied by temperature to convert to kcal/mol. These and all other
molecular graphics in this paper were generated with the program VMD.[80]
It is of interest to compare this distribution of pairwise entropy
density with the simple number density of water, which is contoured
in gray and magenta in the middle and right-hand panels of Figure . Whereas the number
density is highest in the central torus (magenta, middle panel), the
absence of a central torus in the entropy contours of the left-hand
panel indicates that the water molecules in the central torus are
not especially correlated with other water molecules. In contrast,
water molecules present at intermediate density in the upper and lower
tori (middle panel) are highly correlated and therefore appear prominently
in the entropy map (pink in the left-hand panel). Similarly, the regions
of elevated pairwise entropy (green) appear as a subset of the high
number-density regions contoured in gray (middle and right-hand panels).Spatial
maps of second order translational entropy density of water, relative
to bulk, in and around host CB7, based on 1 200 000
MD frames. Left: Entropy contoured at −0.02 kcal/mol/Å3 (pink), −0.01 kcal/mol/Å3(transparent
orange), and +0.005 kcal/mol/Å3 (green). Middle: same,
but omitting the contour at −0.01 kcal/mol/Å3 and additionally showing water number density contoured at 0.1 waters/Å3 (gray) and 0.175 waters/Å3 (magenta). Right:
rotated view, showing only entropy density contours at −0.02
and 0.005 kcal/mol/Å3 and number density contour at
0.01 waters/Å3, with the same colors. Entropies are
multiplied by temperature to convert to kcal/mol. These and all other
molecular graphics in this paper were generated with the program VMD.[80]Further insight may be obtained by examining the pairwise
entropy on a per-water-molecule basis, rather than on the basis of
entropy density, as shown in Figure . The contours are more irregular than those in Figure , because, unlike
the density-weighted entropies there, here the contours involve voxels
where the number density is low, so the numerical convergence is worse.
Nonetheless, it is interesting to observe that the most correlated
water molecules are those at the centers of the two portals, as highlighted
in the pink and clear orange contours. In contrast, the water in the
center of the host is not especially correlated, as indicated by the
lack of contouring there. This result is consistent with the absence
of a torus of entropy density in the left-hand panel of Figure .
Figure 6
Spatial map of second
order translational water entropy per water molecule, relative to
bulk, in and around host CB7. Pink: contour at −0.5 kcal/mol/water.
Transparent orange: −0.3 kcal/mol/water. No significant positive
regions were observed after setting aside voxels with inadequate sampling,
e.g., those with number densities less than 0.001 water molecules/Å3.
Spatial map of second
order translational water entropy per water molecule, relative to
bulk, in and around host CB7. Pink: contour at −0.5 kcal/mol/water.
Transparent orange: −0.3 kcal/mol/water. No significant positive
regions were observed after setting aside voxels with inadequate sampling,
e.g., those with number densities less than 0.001 water molecules/Å3.The strongly negative pairwise
entropy of water molecules at the centers of the portals (pink in Figure ) suggests that the
density distribution of water is strongly influenced by the presence
of a water at these locations, as discussed in section 2.1.2. Put differently, the density conditioned on
the presence of water in the middle of the portal should be quite
different from the unconditioned density. This is indeed the case,
as shown in Figure . Here, the left-hand panel shows the baseline density distribution
of water, with a strong central torus and a few other small loci of
high density, while the right-hand panel shows water density conditioned
on the presence of a water at the center of the portal (star in figure),
contoured at the same level as in the left-hand panel. Despite some
noise in the conditional contours, which results from a reduction
in the number of MD frames available for averaging, due to the condition,
one can clearly see a dramatic increase in water structure, with the
appearance of an additional torus below the star and a more jagged ring
of density above it. This increase in structure gives a conditional
entropy lower than the baseline unconditional entropy and hence a
negative contribution to the overall pairwise translational entropy
from the voxel corresponding to the star.
Figure 7
Number density of water,
contoured at 0.15 water molecules/Å3, in and around
CB7. Left: standard number density derived directly from simulation.
Right: number density conditioned on the presence of a water molecule
at the location marked by a star.
Number density of water,
contoured at 0.15 water molecules/Å3, in and around
CB7. Left: standard number density derived directly from simulation.
Right: number density conditioned on the presence of a water molecule
at the location marked by a star.
Quantifying the Second Order Translational
Entropy of Water in and around CB7
Binding of guest molecules
to the CB7 host leads to displacement of water molecules to the bulk
from regions in and near the binding cavity. The resulting change
in water thermodynamics makes a contribution to the overall thermodynamics
of binding, so it is of interest to examine the magnitude and sign
of these contributions. In a previous study, we used the histogram
method to study the first-order translational entropy, and the NN
method to study the orientational entropy, of water in the CB7 binding
cavity.[51] Here, we report the second-order
translational entropy for regions of interest in and around CB7 and
consider it in the context of the first-order terms. We also compare
the present NN implementation of the first order translational entropy
(section 2.5) with the prior histogram method.Three regions are considered (Figure ): the cavity (blue), which comprises the
complete interior of the host (including the torus, below), with upper
and lower cutoffs positioned at a trough in water density near the
level of the host’s carbonyl oxygens; the torus (red), which
is a subset of the cavity region holding only the central torus of
high-density water;[51] and the portals (cyan),
which extend 2 Å above and below the upper and lower borders,
respectively, of the cavity region and hold regions of increased water
density above the upper and lower rings of carbonyl oxygens. It should
be noted that the geometric definitions of the cavity and torus regions
used here have been refined, relative to those used previously,[51] so the first-order thermodynamic results for
them differ somewhat. In particular, the cavity is somewhat taller,
to reach the troughs in water density at the portals, and the torus
is slightly shorter, to capture only its peak density. The results
(Table ) are reported
as regional integrals, as indicated by the superscript “R”
in each case, and also on a per-water-molecule basis, which provides
information on the average properties of water in each region.
Figure 8
Regions of
interest in and around the CB7 host, shown in side (left) and top
(right) views, with the host (stick diagram, omitting hydrogen atoms),
and water density contoured at 0.10 water molecules/Å3 (gray). Central blue box: cavity region. Union of upper and lower
cyan boxes: portal region. Red box: torus region.
Table 1
Water Entropy Contributions, As Defined in Text, for
Regions in and near the Binding Cavity of the CB7 Host, and for the
Region of the FXa Binding Site Occupied by Ligand ZK-807834 (CI-1031)a
NK
TΔSwwR,trans,NN
TΔSswR,trans,NN
TΔSswR,orient,NN
cavity
7.54
–0.96
–5.17
–6.33
–0.13
–0.69
–0.84
torus
2.82
–0.24
–2.42
–2.41
–0.09
–0.86
–0.85
portal
25.85
–1.95
–9.66
–18.32
–0.08
–0.37
–0.71
FXa
16.34
–4.96
–15.31
–26.43
–0.30
–0.94
–1.62
The CB7 regions are defined in the text and in Figure . NK: mean number
of water molecules in each region. The first three entropy columns
(kcal/mol) are sums over the respective regions; the second three
entropy columns (kcal/mol/water molecule) are normalized by the numbers
of waters in the respective regions and thus report on the average
property of water in each region. The results are based on K and L grids with a 0.5 Å spacing
and L grids with RL =
5 Å, as noted in the text.
Regions of
interest in and around the CB7 host, shown in side (left) and top
(right) views, with the host (stick diagram, omitting hydrogen atoms),
and water density contoured at 0.10 water molecules/Å3 (gray). Central blue box: cavity region. Union of upper and lower
cyan boxes: portal region. Red box: torus region.For all three regions—cavity, torus, and portal—the
net second-order translational entropy has the same sign as the first-order
translational and orientational entropy terms. Thus, the second-order
term always reinforces, rather than partly cancels, the first order
term. This result contrasts with the heuristic assumption that the
second-order term tends to partly cancel the first-order one.[52,67] Although the second-order contribution is substantially smaller
than the first order terms (Table ), at 5–8% of their sum, it is large enough
to substantially influence the thermodynamics of hydration and of
guest-binding to this synthetic host molecule. It is worth noting
that the nonlogarithmic term contributes minimally to these second
order terms, 0.08, 0.04, and −0.12 kcal/mol, for the cavity,
torus, and portal regions, respectively.It is also of interest
that the first-order orientational term is about equal to the first-order
translational term for the cavity and torus regions, but it is about
double the first-order translational term for the portal. Presumably,
waters at the portal are particularly well-ordered because they form
hydrogen bonds with the carbonyl oxygens of the host.The entropic
properties of water molecules in the three regions may be characterized
by correcting for the mean number of waters in each region, as done
in the last three columns of Table . The most correlated waters, based on the value of
the second-order translational entropy per water, are found in the
cavity region. The degree of correlation within the torus is smaller,
and similar to that in the portal region. Since the torus is a subset
of the cavity, the high second-order entropy of the cavity water as
a whole clearly reflects the particularly high correlation of water
above and below the central torus, which is evident in the left panel
of Figure . The fact
that the first-order translational entropy per water of water in the
torus is lower (−0.86 kcal/mol/water) than that for the cavity
as a whole is simply a reflection of the fact that water is present
at higher density in the torus (left panel, Figure ). The least correlated waters are in the
portal region, although the difference relative to the torus is small.The CB7 regions are defined in the text and in Figure . NK: mean number
of water molecules in each region. The first three entropy columns
(kcal/mol) are sums over the respective regions; the second three
entropy columns (kcal/mol/water molecule) are normalized by the numbers
of waters in the respective regions and thus report on the average
property of water in each region. The results are based on K and L grids with a 0.5 Å spacing
and L grids with RL =
5 Å, as noted in the text.The second-order translational entropy is based on a six-dimensional
probability distribution function, the two-point water density ρ(r,r′), and thus can require long simulations
to achieve acceptable convergence. Reasonably good convergence is
obtained here for the various regions in and around CB7 (Figure , left), though the
portal graph, in particular, still has a visible trend after 600 ns
of simulation time (1 200 000 frames).
Figure 9
Convergence of second-order translational entropy
of water in and around host CB7 as a function of MD frames (left),
and of first order translational entropy for the cavity region, by
the NN and histogram methods (right). The regions are defined in the
text and Figure .
Finally,
it is worth commenting here on the NN implementation of the first
order translational entropy (section 2.5), a term we previously computed with the histogram method. For the
cavity region, both the NN and histogram method with a 0.5 Å
grid spacing converge well (Figure , right), but the histogram result is higher than the
NN result. In general, the NN method yields entropies for the three
regions 13–22% lower than those provided by the histogram method
(results not shown). This is because, unlike the NN method, the histogram
method approximates the water density as uniform within each voxel.
This approximation smooths the water density and thereby leads to
higher entropy estimates.Convergence of second-order translational entropy
of water in and around host CB7 as a function of MD frames (left),
and of first order translational entropy for the cavity region, by
the NN and histogram methods (right). The regions are defined in the
text and Figure .
Factor
Xa
Visual Mapping of Second-Order Translational Entropy
Water in and near the active site of the enzyme FXa adopts a complicated
distribution of increased and decreased translational correlation,
as manifest in contours of the second-order translational entropy
density, relative to bulk, shown in Figure (left panel). (Note that, although the
simulations analyzed here placed the protein in a large box of explicit
solvent with period boundary conditions, the K grid
covered only part of the surface, so the contours are localized.)
Indeed, every hollow in the surface of the enzyme is occupied by contours
of negative (pink) or positive (green) entropy density (Figure , left), relative
to bulk, whereas water at more convex regions of the surface shows
fewer entropic features. The tendency of this entropy term to be perturbed
in enclosed regions of the FXa surface is consistent with the fact
that, for CB7, the greatest perturbations also occurred within the
binding cavity.
Figure 10
Water properties in the active site of enzyme FXa. Left:
Second-order translational entropy density of water, scaled by T to yield kcal/mol/Å3, contoured at −0.02
(pink) and +0.005 (green) kcal/mol/Å3), with molecular
surface uniformly colored. Middle: Water number density contoured
at 0.175 waters/Å3, with protein surface colored by
element; blue-gray, carbon; blue, nitrogen; red, oxygen; hydrogens
not show. Right: Second-order entropy per water molecule (kcal/mol/water),
contoured at −0.5 kcal/mol/water (pink) and +0.16 kcal/mol/water
(green). Note that the number density contours were computed for a
somewhat larger region than the entropy density contours, especially
toward the bottom and right of the active site. All simulations represented
here are 400 ns long, with frames analyzed every 1 ps.
Water properties in the active site of enzyme FXa. Left:
Second-order translational entropy density of water, scaled by T to yield kcal/mol/Å3, contoured at −0.02
(pink) and +0.005 (green) kcal/mol/Å3), with molecular
surface uniformly colored. Middle: Water number density contoured
at 0.175 waters/Å3, with protein surface colored by
element; blue-gray, carbon; blue, nitrogen; red, oxygen; hydrogens
not show. Right: Second-order entropy per water molecule (kcal/mol/water),
contoured at −0.5 kcal/mol/water (pink) and +0.16 kcal/mol/water
(green). Note that the number density contours were computed for a
somewhat larger region than the entropy density contours, especially
toward the bottom and right of the active site. All simulations represented
here are 400 ns long, with frames analyzed every 1 ps.This patterning may be considered in the context
of the polarity of the enzyme’s surface and the number density
of water (Figure , middle). For example, a large, nonpolar pocket without a strong
elevation of number density (red arrow, Figure , middle) nonetheless has a substantial
increase in second-order translational entropy density (Figure , left), indicating
reduced water–water correlation. Accordingly, the water molecules
in this pocket also have a notable increase in the corresponding per
water entropy (Figure , right). Further examination of the figure may suggest a tendency
for elevated water–water entropy (decreased correlation) at
other patches of hydrophobic surface too, and a tendency for decreased
water–water entropy (increased correlation) in more polar clefts.
On the other hand, water in the relatively hydrophobic interior of
CB7 showed reduced, rather than increased, water–water translational
entropy, so any connection between hydrophobicity of the surface and
reduced water–water correlation cannot be viewed as universal.Concentration
of negative second-order translational entropy at the entry of a deep,
water-filled pocket in FXa. Left: second order translational entropy per water molecule contoured
at −2 kcal/mol/water. Right: second order translational entropy density, contoured at
−0.1 kcal/mol/A3.In the case of CB7, water at the center of the portal to
the binding-site cavity was found to have particularly low second-order
translational entropy per water molecule, implying a high degree of
water–water correlation (section 3.2.1). A similar phenomenon, though more pronounced, is also observed
in the case of FXa, as the entryway to a deep, water-filled cavity
is occupied by water with even lower values of the second-order translational
entropy than seen in the case of CB7, on a per water basis (Figure , left), and in
terms of entropy density (Figure , right). Presumably water at such locations is in
a particularly good position to influence the density distribution
within the nearby cavity or cleft. This was seen previously for CB7
(Figure ) and maybe
also be observed in the case of FXa. Thus, the number density distribution
of water in the deep cleft of FXa is very different when conditioned
on the presence of a water molecule in an entryway site of strongly
negative translational entropy per water (−3.5 kcal/mol/water,
red) than when it is unconditioned (Figure , pink conditioned vs blue unconditioned
density contours). The red site here is at the center of one of the
green regions in Figure .
Figure 11
Concentration
of negative second-order translational entropy at the entry of a deep,
water-filled pocket in FXa. Left: second order translational entropy per water molecule contoured
at −2 kcal/mol/water. Right: second order translational entropy density, contoured at
−0.1 kcal/mol/A3.
Figure 12
Comparison of unconditional (blue) number density distribution
of water with number density distribution conditioned on the presence
of a water at the red site (pink), in the active site cleft of FXa.
This is a side view, relative to the other FXa representations; front
and rear clipping and a transparent protein surface allow visualization
of density contours deep in the cleft. The water densities are contoured
at 0.1 water molecules/Å3; the red contour of second
order translational entropy is contoured at −3 kcal/mol/water
molecule.
Comparison of unconditional (blue) number density distribution
of water with number density distribution conditioned on the presence
of a water at the red site (pink), in the active site cleft of FXa.
This is a side view, relative to the other FXa representations; front
and rear clipping and a transparent protein surface allow visualization
of density contours deep in the cleft. The water densities are contoured
at 0.1 water molecules/Å3; the red contour of second
order translational entropy is contoured at −3 kcal/mol/water
molecule.
Contribution
of the Second Order Translational Entropy to Ligand Binding
Prior studies have considered the role of the first-order entropy
of water in the binding site of FXa to the binding affinity of drugs
and drug-like ligands to this enzyme.[34,45] The present
methodology now allows examination of the second-order translational
entropy of water in this region. We focus on the water displaced by
the ligand (PDB HET ID Z34) present in the binding site of FXa in the PDB entry
studied here, 1JFS. This ligand occupies part of the complicated active site region
(Figure , left)
and displaces water from the deep cleft that contains the most correlated
water (Figures , 12, and 13, right).
Figure 13
FXa with
cocrystallized ligand (PDB HET ID Z34, InChI Key NPNSVNGQJGRSNR-UHFFFAOYSA-N)
seen from outside of active site (left) and in a rotated view with
transparent, z-clipped protein surface to show penetration of the
ligand into the deep active site cleft also shown in Figure .
FXa with
cocrystallized ligand (PDB HET ID Z34, InChI Key NPNSVNGQJGRSNR-UHFFFAOYSA-N)
seen from outside of active site (left) and in a rotated view with
transparent, z-clipped protein surface to show penetration of the
ligand into the deep active site cleft also shown in Figure .At nearly −5 kcal/mol (Table ), the integrated water–water translational
entropy in the region occupied by this ligand is large enough to have
a major impact on the ligand affinity. This value is substantially
greater than the corresponding result for the largest regional integral
of the pairwise entropy for CB7, which is about −2 kcal/mol,
for the portal region (Table ), even though the number of waters in the portal region is
larger. Accordingly, the second-order entropy per water is about 3-fold
larger (more negative) in the binding site than in or around CB7.
The first order entropy terms are also greater (more negative) for
FXa than for CB7, on both a regional and a per water basis (Table ), but not by as large
a factor as the second-order term. As observed in the case of CB7,
the nonlogarithmic contribution to the second-order entropy is minimal,
at 0.20 kcal/mol. Also, again as for CB7, the second-order term contributes
with the same, negative, sign as the first-order term, thus reinforcing
it rather than balancing it.Convergence of translational water–water
entropy for region overlapped by ligand Z34 in the binding site of
FXa, reported in kcal/mol.Two potentially balancing sources of error in the second
order term should be noted. On one hand, this quantity is not fully
converged, as shown in Figure , and a longer simulation would apparently lead to
a somewhat less negative value. On the other hand, as discussed in section 2.3, the use of a 0.5 Å grid leads
to smoothing of the conditional water densities, and the use of a
finer grid, if numerically practical, would lead to a somewhat more
negative value. Thus, the combination of a finer grid and a much longer
simulation might shift the final result either up or down a bit, and
the result in Table is likely a reasonably good estimate of the desired quantity.
Figure 14
Convergence of translational water–water
entropy for region overlapped by ligand Z34 in the binding site of
FXa, reported in kcal/mol.
Discussion
We have described a grid-based
methodology, which uses inhomogeneous solvation theory to estimate
and regionally map the two-body translational contribution to the
hydration entropy of a solute of interest, using the data from a simulation
of the solute immersed in explicit water molecules. The present formulation
shows that the contribution to this two-body term associated with
location r′ tells the degree to which the first-order
translational entropy of water is changed when it is computed from
a density distribution conditioned on the presence of a water molecule
at r′. This contribution must also be weighted
by the density of water at r′. Integrating over r′ then yields the total second order translational
entropy, which may be referenced to the corresponding quantity for
bulk water. Our implementation of this idea combines a gridded discretization
of r′ with a nearest neighbor method that efficiently
computes the required conditional translational entropies. Evaluating
this second order entropy term is substantially more computationally
demanding than evaluating the first order translational and orientational
terms, but the numerical convergence obtained, both in terms of grid
spacing and simulation length, appears good enough to provide at least
semiquantitative information and intuition about how water–water
correlations contribute to the hydration entropy of solutes. It is
worth emphasizing that the NN method used here yields efficiency and
accuracy far beyond that afforded by a simple histogram-based method.The present method has several methodological parameters that affect
the trade-off between accuracy and the amount of data needed to achieve
a given level of numerical convergence. On the basis of the RDF study,
one would ideally use a grid spacing of about 0.1 Å to achieve
highly accurate correlation results. However, very long simulations
would be required to converge these calculations, so we have used
a grid spacing of 0.5 Å, which seems to afford a reasonable balance
between accuracy and convergence, based on the RDF study. However,
it will be of interest to press for further refinement, via either
improved algorithms or brute-force increases in simulation lengths.
Another relevant parameter is the size of the L grid positioned around each k voxel to capture the effects of a water at k on
the local water density distribution. For the TIP3P water model used
here, we obtained reasonable results with an L grid
of about 10 Å side length, which hence extends a minimum of 5
Å in all directions from voxel k. However, the
RDF of TIP3P water reaches bulk density at a rather short distance,
and other water models, whose RDF deviates from bulk at longer distances,
may require a larger L grid.We find that this
second order term contributes only 5–8% of the first order
terms to the hydration entropy associated with regions of interest
in and around the synthetic host molecule CB7 and in the active site
region of the enzyme FXa. Interestingly, the second-order term is
more significant for FXa, with its more polar and geometrically complex
binding site, than for CB7. Indeed, for FXa, the integral of the second
order term over the region occupied by a high-affinity, small molecule
inhibitor is about −5 kcal/mol, which is substantial on the
scale of the free energies of ligand-protein binding. Hence, the second-order
translational term probably makes a substantial contribution to the
change in hydration entropy on protein–ligand binding. It is
also worth remarking that the second order term contributes with the
same, negative, sign as the first order terms, in the cases studied
here; intuitively, the presence of the solute increases water–water
correlations and further decreases the entropy. This result appears
to be consistent with the room temperature results previously obtained
in a study of methane hydration,[64] but
less so with prior heuristic estimations in which the water–water
term is assumed to increase, rather than decrease, the hydration entropy.[52,67]Three-dimensional mapping of the second order entropy contribution
in and around CB7 and FXa suggests that this local quantity depends
not only on whether the nearby solute surface is polar or nonpolar,
but also on overall shape of the nearby surface. In particular, the
entryways to deep binding cavities have particularly low values of TΔSww,trans, indicating
a high degree of correlation with other waters. This makes intuitive
sense, as these are locations where the presence or absence of a water
can strongly effect how other waters are able to pack and fill the
nearby cavity, making for a large difference in the baseline and conditional
translational entropy of water. We anticipate that this “entryway
effect” will be important in many other systems as well, such
as at the entryways of nanotubes and of other deep protein binding
sites, like the active site gorge of the enzyme acetylcholinesterase.[81] Displacement of water from such locations, if
the density is significant there, is expected to be thermodynamically
favorable. Of course, it will be important to consider such second-order
contributions in the context of other contributions to the thermodynamics,
both entropic and enthalpic.The present methodology opens new
possibilities for exploring the thermodynamic properties of water
at molecular surfaces and thus has potential to deepen our understanding
of molecular recognition and to find practical application in various
aspects of molecular design, including drug design. There is also
room for methodological improvement. For this translational term,
it will be of interest to seek enhanced convergence with respect to
both grid spacing and simulation duration. In addition, it will be
of interest to extend these studies to the entropic consequences of
water–water orientational correlations, which may be substantial,
due to the cooperative formation of various different hydrogen-bonded
configurations of water molecules. The use of strong numerical methods,
like nearest-neighbor entropy estimation, combined with growing computer
power, should make for continued progress along these lines.
Authors: M Adler; D D Davey; G B Phillips; S H Kim; J Jancarik; G Rumennik; D R Light; M Whitlow Journal: Biochemistry Date: 2000-10-17 Impact factor: 3.162
Authors: Kamran Haider; Anthony Cruz; Steven Ramsey; Michael K Gilson; Tom Kurtzman Journal: J Chem Theory Comput Date: 2017-12-08 Impact factor: 6.006
Authors: Lieyang Chen; Anthony Cruz; Daniel R Roe; Andrew C Simmonett; Lauren Wickstrom; Nanjie Deng; Tom Kurtzman Journal: J Chem Theory Comput Date: 2021-04-08 Impact factor: 6.006