Literature DB >> 26636620

Spatial Decomposition of Translational Water-Water Correlation Entropy in Binding Pockets.

Crystal N Nguyen¹, Tom Kurtzman^2,3, Michael K Gilson¹.

Abstract

A number of computational tools available today compute the thermodynamic properties of water at surfaces and in binding pockets by using inhomogeneous solvation theory (IST) to analyze explicit-solvent simulations. Such methods enable qualitative spatial mappings of both energy and entropy around a solute of interest and can also be applied quantitatively. However, the entropy estimates of existing methods have, to date, been almost entirely limited to the first-order terms in the IST's entropy expansion. These first-order terms account for localization and orientation of water molecules in the field of the solute but not for the modification of water-water correlations by the solute. Here, we present an extension of the Grid Inhomogeneous Solvation Theory (GIST) approach which accounts for water-water translational correlations. The method involves rewriting the two-point density of water in terms of a conditional density and utilizes the efficient nearest-neighbor entropy estimation approach. Spatial maps of this second order term, for water in and around the synthetic host cucurbit[7]uril and in the binding pocket of the enzyme Factor Xa, reveal mainly negative contributions, indicating solute-induced water-water correlations relative to bulk water; particularly strong signals are obtained for sites at the entrances of cavities or pockets. This second-order term thus enters with the same, negative, sign as the first order translational and orientational terms. Numerical and convergence properties of the methodology are examined.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2015 PMID： 26636620 PMCID： PMC4819442 DOI： 10.1021/acs.jctc.5b00939

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Molecular recognition in water cannot be understood purely in terms of the direct interactions of the two molecules that bind each other. Water itself plays an intimate role, such as by effectively weakening charge–charge interactions, by competing with the solutes for hydrogen-bonding opportunities, and by providing hydrophobic interactions that drive the association of nonpolar parts of the solutes. Much is known about the physical principles by which water influences binding,[1−12] and the effects of water can, to a large extent, be captured in both implicit[13−21] and explicit[22−29] representations of water. However, there are also still puzzles and gaps in our understanding of water’s role in molecular recognition. For example, although the hydrophobic effect may be reasonably well understood for simple nonpolar surfaces, many solutes, particularly biomolecules like proteins and DNA, present surfaces with complicated shapes and complex patterns of polarity, where water might have properties quite different from water at simpler, better understood surfaces. Thus, small surface clefts can lead to the formation of water clusters with a reduced set of hydrogen-bond arrangements, making such waters particularly easy to displace into bulk, where they regain many more hydrogen bonding opportunities.[30−34] Classical notions of the hydrophobic effect[35−37] do not necessarily envision such distinctive cases. In recent years, new computational tools have come online to compute and visualize the structure and thermodynamics of water at complex surfaces, by applying inhomogeneous solvation theory (IST)[38−44] to simulations of solutes immersed in explicit water. Some of these tools, such as WaterMap,[30,45] STOW,[46] and others[47−49] gain simplicity by modeling the density distribution of water in terms of discrete hydration sites. Alternatively, Grid Inhomogeneous Solvation Theory (GIST)[34,50−52] discretizes water density and thermodynamic contributions onto a three-dimensional grid, in order to gain a more comprehensive and quantitative description of solute hydration. Such tools offer insights into the properties and role of surface water and can also be useful in computer-aided drug design, by pointing out regions in a protein binding pocket where a ligand may gain affinity by displacing thermodynamically unfavorable water.[30,34,45,53] Implementations of IST have so far concentrated almost entirely on calculating first-order contributions to the entropy of water. The first-order translational entropy reflects, in effect, the bumpiness of the density distribution of water induced by the solute, while the first-order orientational entropy reflects the degree to which the solute reduces the orientational freedom of nearby water. However, it is likely that higher-order terms also contribute significantly to the hydration entropy. Higher-order terms relate to water–water correlations, which certainly exist in bulk water[54,55] and are expected to be modified by the presence of a solute. Note that greater correlation implies lower entropy and that the higher-order entropy terms in the IST series expansion are akin to the mutual information terms in the mutual information expansion[56,57] which has been used to estimate changes in the configurational entropy of small molecules and proteins,[58−61] and have been tied to the multibody expansion of IST.[62] Indeed, Velez-Vega et al. have recently obtained encouraging results for pure water with a novel method that treats water on an atomistic, rather than molecular, basis and is based on the mutual information expansion.[63] Several prior studies have addressed the contributions of water–water correlations to hydration entropy. An early work used IST to study the contributions of water–water correlations to the hydration entropy of methane, modeled as a single Lennard-Jones atom, for which numerical convergence is facilitated by exploitation of spherical symmetry.[64] Interestingly, the signs of the water–water translational and orientational terms were found to be negative (increased correlation relative to bulk water) at room temperature but found to become positive at high temperature, whereas the first order terms remained negative at all temperatures considered. This study also tested an approximation in which the water–water correlation function near the solute was approximated by the bulk water–water correlation function; however, the resulting entropies were found to deviate substantially from those computed without the approximation. Nonetheless, small molecule hydration free energies computed with this approximation were shown to agree reasonably well with corresponding values obtained by a reference free energy perturbation method,[65] and the same approximation was used to estimate water–water correlations within clusters of water molecules buried at a protein–ligand binding interface.[66] Two other studies have taken an alternative approach to obtaining these numerically challenging terms for water near small molecules[67] and protein binding sites:[52] instead of computing the water–water terms from actual correlation data, they were instead heuristically set as proportional to the first order entropy, where the constant of proportionality is on the order of −0.3 to −0.5. Finally, a recent study used the water–water correlations obtained from simulations of the system of interest, rather than of bulk water, to compute the contributions of these correlations to the hydration entropy contributions of pairs of waters buried in protein cavities.[62] The water–water correlation contribution was found to be negligible, however, apparently because the waters were tightly bound to the protein and hence had minimal fluctuations available to correlate with each other. Thus, considerable work has already been done to study or estimate the contributions of water–water correlations to the entropy of hydration. However, correlations computed from simulations have not yet been used to quantify or visualize second order terms for water at well-hydrated, complex surfaces, such as the binding sites of host–guest systems and proteins. The present paper addresses this challenge with a three-dimensional grid formulation of the second-order translational entropy of water, which is related to the two-point spatial correlation function of water density. Rewriting the second order translational entropy in terms of a conditional density, instead of a two-point density, leads to an intuitively interpretable form of this term and also facilitates spatial mapping and visualization. The present method also makes use of the efficient nearest-neighbor entropy-estimation approach.[50,67,68] We present an evaluation of the numerical properties of the method and analyses of the second-order translational entropy of water in and around the synthetic host molecule cucurbit[7]uril and the blood clotting cascade enzyme Factor Xa (FXa). These second order results are placed in the context of the first-order translational and orientational entropy around these solutes, and visualization is used to understand the spatial distribution of the second order term, including molecular features that generate particularly high water–water correlations and hence particularly low second-order translational entropies.

Methods

Theory

Here, we express the second order (or pairwise) translational entropy, Sww, in terms of conditional distribution functions, then show how the resulting integrals can be estimated from simulation data by using a three-dimensional grid to discretize space and then estimating water densities with either a histogram or a nearest neighbor method. We also review how the first order (or single-body) translational entropy, Ssw, can be estimated with the nearest neighbor method,[67] as this offers numerical advantages over the histogram approach used previously.[50]

Pairwise Translational Entropy in the Inhomogeneous Solvation Theory: Background

The solvation entropy of a flexible solute may be written aswhere ρsol(q) is the equilibrium probability distribution function of the solute over its internal coordinates q when it is in solution and ρvac(q) is the corresponding function for the solute in vacuo, ΔSsol(q) is the solvation entropy the solute would have if it were constrained in conformation q,[69] and kB is Boltzmann’s constant. The first term in eq is the solvation entropy averaged over the Boltzmann ensemble of solute configurations, and the second term is the change in solute configurational entropy on being transferred from a vacuum to solvent. Here, as in prior applications of IST, we consider the solvation of a solute in a given conformation, q, or in a narrow range of conformations, although one could, at some computational cost, explore solvation over a range of conformations by applying IST to each one separately and appropriately weighting each conformation. For simplicity, we will refer to ΔSsol(q) as the solvation entropy and write ΔSsol instead of ΔSsol(q). For solvation in water, inhomogeneous solvation theory (IST) provides an expansion of ΔSsol in terms of one-water, two-water, and higher order terms.[38,39,41−43] The one-water term, Ssw, accounts for the patterning of water density around the solute, while the two-water term, ΔSww, accounts for the difference between the two-point, water–water correlations around the solute and those in bulk; subsequent terms capture still higher-order correlations. Thus, neglecting terms above pairwise leads to the following approximation:The one-water term has been considered in some detail in prior studies (see Introduction) and is considered here only in relation to its calculation via a nearest-neighbor (NN) method (section 2.5). The pairwise term, which is of primary interest here, may be broken down further into orientational and translational parts:[46,66]The present study focuses exclusively on the translational part, ΔSwwtrans, which accounts for the solute-induced change in the two-point correlation of water density, without reference to the spatial orientation of the waters. Using a form provided by Morita and Hiroike[42] and clearly rendered in eq 12 of a subsequent paper by Lazaridis,[43] one may write this quantity asThe first equation of eq writes ΔSwwtrans as the difference between the pairwise translational entropy of the solute-perturbed water from the pairwise translational entropy of unperturbed (bulk) water. The second and third equations provide expressions for the solute-perturbed system and bulk water. Here, ρ is the number density of bulk water, g(r) is the unitless ratio of the number density of water at r in the presence of the solute to the bulk density, , and g(r,r′) is the two-point correlation function in the presence of the solute, , where ρ(r,r′) is the two-point number density of water at locations r and r′. The solute is considered to be immobilized in the lab frame of reference, so that coordinates r and r′ represent both lab-frame coordinates and solute-frame coordinates. The integrals range over the whole system or, at any rate, to points r and r′ far enough from the solute to effectively capture the entire effect of the solute on the solvent distribution. The expression in line three for the pairwise translational entropy of pure water uses the corresponding distribution functions, go(r), go(r′), and go(r,r′) for unperturbed bulk water. Note, however, that in the absence of the perturbing solute (or, in the terms of Morita and Hiroike, in the absence of an external field), go(r) = go(r′) = 1. We include these terms here to make clear that derivations and algorithms which apply in the presence of a solute fixed in the lab frame are equally applicable to pure water, where no solute is present.

Conditional Formulation of the Pairwise Translational Entropy

We now develop an expression for Swwtrans in terms of conditional densities. As noted in the prior subsection, the same derivation carries through in the case of pure water, so the pure water case is not included explicitly. The resulting formulation enables an informative spatial decomposition and provides a convenient basis for evaluating this correlation term with nearest-neighbor[51,68] or histogram methods. We first use the definitions in the prior paragraph to write the starting expression in terms of number densities:Because, in the canonical ensemble, ∫ρ(r) dr = N and ∫∫ρ(r,r′) dr dr′ = N(N – 1), where N is the number of water molecules, the second line in eq , which we call the nonlogarithmic term, reduces to . For the sake of brevity, this simple term will be omitted from subsequent expressions, except as otherwise noted. We now use the fact that the two-point density may be written in terms of either of the conditional two-point densities, ρ(r,r′) = ρ(r|r′) ρ(r′) = ρ(r′|r) ρ(r), to rewrite Swwtrans, without the nonlogarithmic term, asIntuitively, the first term in the last line of eq reports how the presence of a water at r′ influences the entropy of the water in the system as a whole, while the second term essentially subtracts out the entropy of the water for the distribution not conditioned on the presence of water at r′. In the absence of any correlation between the density of water at r and at r′, then ρ(r|r′) = ρ(r) and ρ(r′|r) = ρ(r′), so the integrands of both terms in the last line of eq become equal, and the volume elements at these locations will make zero contribution to the pairwise entropy. The nearest neighbor entropy estimation method provides averages of the form ⟨ln ρ⟩,[68] so it is of interest to express eq in a form which uses such averages. We use the facts that ∫ρ(r) dr = N and ∫ρ(r|r′) dr = N – 1, to recognize that and are normalized probability density functions and hence can be cast as probability weighting factors of the corresponding logarithm terms in eq . Multiplying and dividing the first and second terms in the last two lines of eq by the respective normalization constants (N – 1) and N and integrating the second term over r′ to obtain a factor of N – 1 yieldswhere angle brackets indicate averages over the Boltzmann distribution.

Localized Perturbation Approximation

The integrals in the equations above range over the entire system, but in practice we will often be most interested in the region of solvent near the solute. For one thing, such a region may have particular functional importance, such as binding of a ligand by the solute, if it is a receptor. Furthermore, the influence of the solute on the solvent distribution functions dies off with distance from the solute,[41,43] so distant regions contribute little, and ignoring distant regions can make calculations more tractable without introducing much error. Accordingly, it is useful to restrict the integrals over r′ to a local region of interest, K, such as a receptor binding site. Furthermore, the contribution of the water at r′ to Swwtrans is directly connected with the conditional density function ρ(r|r′), which reports on the perturbation of water density at r by the presence of a water at r′, and this perturbation also dies off with the distance of r from r′. As a consequence, it is often reasonable to restrict the integrals over r to finite regions centered on r′, which will be termed L. With these localizing approximations, eq may be written in either of the following equivalent forms:In physical terms, eq provides the contribution to the total pairwise entropy from the degree to which the water density in regions L is modified by conditioning on the presence of a water at each location r′ in K. Note that eq is recovered when the K and L regions span the entire system.

Using a 3D Grid to Estimate Distribution Functions from Simulations with Explicit Water

Following the existing first-order GIST approach,[51] we use a three-dimensional grid to compute the spatial distribution of the second-order translational entropy from explicit-water simulations for a region of interest, such as a protein binding site. Thus, we discretize eq by breaking space into cubic voxels of side-length Rvox and volume Vvox = Rvox3, which are assumed to be small enough that a voxel can contain at most one water molecule. (Here, the translational coordinates of a water molecule are identified with those of its oxygen atom.) As a matter of notation, location r′ maps to voxel index k, and the region K corresponds to a grid in the region of interest, such as a binding site (Figure ), while r maps to voxel index l and the regions L, which are centered around r′, map to regions L, which are centered around voxels k ∈ K; see Figure . The following subsections describe a histogram method and a nearest-neighbor method of using this discretization to estimate Sww,trans through analysis of n frames, or snapshots, from a molecular dynamics (MD) simulation of explicit water molecules around the solute; an additional subsection explains how the nonlogarithmic term in eq is computed.

Figure 1

Diagram of a receptor (green) in a water-filled simulation box (blue), with grid corresponding to region K in pale pink and two examples of regions L in yellow (orange where overlapping with K). In each case, the L grid is centered in one K-grid voxel, k, which is highlighted in pink.

Histogram Method

The histogram method uses MD data to estimate the various densities in the first line of eq by counting the instances of water molecules in voxels:Here, n and n are the numbers of MD frames for which a water is found in voxels l and k, respectively, and n = n are the numbers of MD frames for which a water is found in both voxels k and l. Note that this approximation treats the water densities as constant over the volume of each voxel, and, as noted above, we have assumed that the voxels are small enough that at most one water molecule can occupy a voxel. In going from a double integral to a double sum over voxels, two factors of the volume element Vvox appear. Thus, the histogram method estimates Sww,trans asHere, the contribution of regions K and L to the pairwise translational part of the solvation entropy is estimated in terms of the quantities summed over voxels k ∈ K. The contribution of voxel k, Vvox(scond,hist – shist), which is the product of the voxel volume, Vvox, and the difference between the conditional and unconditional entropy densities, is computed as the probability of finding a water in voxel k, , multiplied by the per water normalized quantity scond,norm,hist – snorm,hist. Writing the pairwise translational entropy in terms of a sum over voxels, in this manner, allows a spatial decomposition of this term which is useful for generating graphical representations. In practice, the expressions in eq are evaluated by analyzing the n available MD frames to determine n, the number of frames with a water in voxel k; n, the number of frames with a water in voxel l; and n, the number of frames with a water in both voxels l and k, where voxels k and l are required to reside in regions K and L, respectively.

Nearest-Neighbor Method

Starting with the final expression in eq , we multiply and divide the first term by , which is the mean number of waters on the L grid given a water in the k voxel, and we multiply and divide the second term by N = ∫ ρ(r) dr, which is the mean number of waters on the L grid without any condition, in order to convert the corresponding number density functions into probability density functions, much as previously done in deriving eq . The present expression for N is an approximation, due to the discretization of r′ into voxels, so the following expressions are, accordingly, also written as approximations. We obtainHere, ⟨⟩ indicates an average over the volume of the L grid. These integrals are now discretized via the approximations in eq :Here, ρ(r|k) represents the number density of water at r conditioned on the presence of a water in voxel k, and L is the L region associated with, and typically centered on, voxel k. This may be put into a form analogous to the histogram formulation in eq , as follows:Both average log terms inside eq can be found using the NN method, with the gamma correction for the bias, as follows:Here, the index i runs over water instances and r is the distance between water instance i and its nearest-neighbor water instance. The water instances are defined as follows. For the second expression in eq , the water coordinates found in all n MD frames are merged into one “superframe,” containing Nn sets of water coordinates, so that each of the N water molecules in the system appears n times, each time with a different set of coordinates; each resulting set of water coordinates in the superframe is considered one instance of a water. The symbol ∑ indicates a sum over the Nn water instances falling within the L grid, but the nearest neighbors of these water instances may be water instances which fall just outside the L grid. For the conditional term in the first line of eq , the water coordinates in all n frames for which a water molecule is found in k are merged into one superframe, which is now conditioned on the presence of a water at k, and each set of water coordinates in this conditional superframe is considered one instance of a water. The symbol ∑( indicates a sum over the water instances in this superframe which fall within the L grid; again, the nearest neighbors of these conditional water instances may be conditional water instances which fall just outside the L grid.

Nonlogarithmic Term

The nonlogarithmic term in eq may readily be computed for regions K and L, within the grid approach. Thus,whereThis term reflects the difference between the two-point density of water around the solute and the product of the corresponding one-point densities. As detailed in the Results section, once referenced to bulk, its contribution becomes negligible, at least for the cases analyzed here.

Referencing the Results to Bulk Water

As explained in section 2.1.1, the derivation and algorithms described above for water in the presence of a solute fixed in the lab frame are equally applicable to pure water. In order to reference the entropy of water in the presence of solute to pure water values matched numerically to those computed in the presence of the solute, we analyze a simulation of neat water with a small K grid near the center of the simulation box and use the same grid spacing and L-grid parameters as used to analyze water in the presence of the solute to compute the value of TSww,bulktrans for the water on the K grid. This quantity is then normalized by the mean number of water molecules on the K grid, for the pure-water calculation, to obtain Sww,bulktrans,norm, the mean pairwise translational entropy per water in bulk. Note that the dimensions of the pure water K grid need not be matched to the one used in the presence of solute: every voxel k is equivalent to every other, for the pure water case, so there is no need to cover any particular region. Now, if the K region in the presence of solute contains an average of N waters, again in the presence of solute, then one obtains the net pairwise entropy difference of the waters in the presence of solute versus in bulk from the following expression:In the formulations detailed above, each voxel k on the K grid is associated with an entropy density. For example, for the nearest neighbor approach, this quantity is obtained from eq as scond,NN – sNN. This density may be referenced to bulk by subtracting the corresponding entropy density for bulk water; when the nonlogarithmic term is also included, the resulting entropy difference density is . The corresponding per-water normalized entropy difference is . These voxel quantities may be visualized in terms of three-dimensional contours, as illustrated in the Results section.

Validation of Grid Methods by Comparison with the Radial Distribution Method for Pure Water

The applicability of the present methods to pure water provides an opportunity to validate them by comparing their results with those obtained by an independent computational approach. We use the fact that the second order translational entropy of a homogeneous liquid may be obtained from the one-dimensional radial distribution function (RDF) centered on a given water in the pure liquid:Formally, the upper limit of the integral, R, corresponds to the size of the entire system, but far smaller values of R yield good approximations, because water–water correlations decay quickly with values of r on the scale of the mean water–water distances, as detailed in the Results. Here, g(r), the radial correlation function between water molecules, is estimated from n MD frames aswhere n is the number of water instances in the spherical shell of thickness Δr between r – Δr/2 and r + Δr/2. In practice, we maximize statistical convergence by averaging the values of g(r) over all N waters in a simulation of pure water. The result from eq was compared with that obtained by the grid methods as described in the prior subsection.

NN Estimation of the First Order Translational Entropy

The first order translational entropy of the water around a solute is given byThe contribution of region K to Sswtrans may, for the sake of visualization and analysis, be decomposed into contributions from voxels k ∈ K, as follows:where is the mean number of waters in voxel k, so that , and ⟨ln ρ(r)⟩, the mean log density of water in voxel k, may be computed by the nearest neighbor method as

Implementation of Entropy Calculations

The K and L regions used for both the histogram and NN entropy calculations were sized to fit within the simulation boxes (see details below), and the grid spacings were 0.5 Å, except as otherwise noted. The efficient ANN algorithm[70] was used to find nearest neighbors of water instances. Because the difference between the unconditional density and the density conditioned on a water in voxel k diminishes rapidly with distance from k, each voxel k was assigned its own cubic L grid, centered around k, in order to efficiently capture the local contributions of these differences to the overall second-order entropy. The side-length of the L grid was set to twice the desired cutoff distance within which correlations would be captured. For example, a cutoff distance of 5 Å leads to a 10 Å L grid. The first-order orientational entropy was computed with the previously described GIST NN method,[51] and the first order translational entropy was computed by the NN method described in the subsection above.

Simulation Details

The pure water calculations used in section 3.1 were carried out in cubic simulation boxes with side lengths of about 32 Å and periodic boundary conditions. The temperature was maintained at 300 K with the Langevin thermostat and a collision frequency of 2.0 ps–1; pressure was maintained at 1 atm by isotropic position scaling with a pressure relaxation time of 0.5 ps. A 9 Å cutoff distance was employed for all nonbonded interactions in the pairlist, and the particle-mesh Ewald method was used to account for long ranged electrostatic interactions. The simulations were run for 1.5 μs on a single GPU, with the pmemd.cuda component of AMBER 12[71] or AMBER 14, with a 2 fs time step. The SHAKE algorithm[72] was used to constrain the lengths of all bonds involving hydrogen atoms. Frames were saved for analysis every 0.5 ps. Parameters were assigned to cucurbit[7]uril (CB7) with the GAFF[73] force field and RESP[74] with the HF 6-31G* basis set. The starting structure was solvated with 1096 water molecules, resulting in a 10 Å buffer of water surrounding the solute, using the program Tleap in AmberTools.[75] The system was then briefly energy-minimized with Cartesian positional restraints on CB7, with force constants of 10 kcal/mol/ Å2. The minimization comprised 1500 steps of the steepest descents algorithm, followed by a maximum of 2000 steps of the conjugate gradient method. The crystal structure of FXa complexed with the inhibitor ZK-807834 (alternatively called CI-1031), from Protein Data Bank entry 1FJS,[76] was prepared and simulated as described previously;[34] the ligand was deleted to leave an empty binding site, and two ions observed in the crystal structure (Ca2+ and Cl–) were included as part of the solute and hence were restrained along with the protein atoms, as detailed below. TIP3P[23] water molecules were added to solvate the protein using the program Tleap,[77] and the AMBER99sb force field[78] was used to assign the atomic parameters. The final system comprised 29 338 atoms, including the 8557 water molecules. The MD simulation started with an energy minimization of 1500 steps by the steepest descents algorithm, followed by conjugate gradient energy minimization for a maximum of 2000 steps. During the initial minimization, all protein atoms were harmonically restrained to their initial positions with a force constant of 100 kcal/mol/Å2. The second minimization further relaxed the system by keeping only non-hydrogen protein atoms restrained with the same force constant. The energy-minimized systems were heated in increments of 50 K lasting 20 ps each, at constant volume and temperature. After the system reached 300 K, it was then equilibrated for 10 ns at a constant temperature and constant pressure of 1 atm. The equilibrated system was further run for an additional 5 ns at constant volume, and data were then collected at NVT. During the heating process and thereafter, all solute atoms were harmonically restrained to their energy-minimized positions with a force constant of 100 kcal/mol/Å2 for FXa and 10 kcal/mol/Å2 for CB7. Other simulation parameters were the same as for the pure water simulations described above. A 400 ns production simulation was run for FXa, with coordinates saved every 1 ps, for a total of 400 000 snapshots. For CB7, the production run was 600-ns-long, with coordinates saved every 0.5 ps, for a total of 1 200 000 snapshots.

Results

Second Order Translational Entropy of Pure Water

Radial Distribution Function Method

Here, we study the second order translational entropy of pure water, for which the RDF method is available for reference. (It is worth noting that the RDF method is not applicable to the solute-water systems, for which the grid methods were developed.) First, as a verification of the present methodology, we used the RDF method, with spherical shells of thickness Δr = 0.01 Å, and 200 000 MD frames, to estimate Swwtrans,RDF for the TIP3P, TIP4P,[23] TIP4PEW,[27]and SPC/E[25] water models; the results all agree with prior calculations[79] to within 0.5%, with the exception of TIP3P, for which the difference was 1.9%. (A replicate TIP3P simulation yielded the same result.) No significant changes were observed on extending any of the simulations to 400 000 frames. We then examined the RDF results as a function of the two numerical parameters R, the upper limit of the integral in eq , and Δr, the width of the spherical shells in eq . As detailed in Figure (left panel), for the TIP3P water model, the entropy from the RDF method converges quickly as the upper limit of the integral increases, changing by less than 0.3% when R increases from 4 to 10 Å. This observation supports the use of local L(k) grids (section 2.2) of modest size. The RDF result also depends on the width of the spherical shell, closely approaching its presumed asymptote for values less than or equal to 0.1 Å (Figure , right panel). Intuitively, the entropy falls as the shells become thinner and thereby reveal finer structure in the water distribution. This result suggests that voxels larger than about 0.1 Å in linear dimension are likely to overestimate the entropy somewhat, due to smoothing of the density distributions.

Figure 2

Second-order water–water entropy, provided as a free energy contribution TS (kcal/mol), computed for pure TIP3P water with the RDF method. Left: entropy as a function of the distance cutoff, R in eq , for a spherical shell thickness, Δr in eq , of 0.001 Å. Right: entropy as a function of spherical shell thickness, for a distance cutoff of 10 Å; the two right-most points are for shell thicknesses of 0.25 and 0.5 Å. These calculations used 50 000 MD frames; extending to 200 000 frames changes the results by only a few thousandths of a percent.

Grid Methods

The grid methods, both histogram and NN, were used to compute the same quantity, by defining region K as a cube of side length 1.5 Å, centered in a pure water simulation box and divided into cubic voxels k, each of side length Rvox. Each voxel k was associated with its own cubic region L (see Figure ) centered on k and extending a fixed distance R from k along each grid axis, so that the side of each region L was of length 2R. For the histogram method, the cubic L regions were divided into voxels l of side-length Rvox, and the K and L grids were registered so that each k voxel was also an l voxel. However, not every l voxel was a k voxel, because L extended outside K. The agreement of the grid method with the RDF method is examined below, as a function of the voxel size, Rvox, and the size of the L grids, R. Dependence of second order pure water entropy on grid spacing. Left: NN results for grid spacings (Rvox) of 0.25 (green), 0.5 (red), and 0.75 Å (blue), with the reference RDF result (dashed line; computed with R = 15 Å, Δr = 0.01 Å, and 200 000 MD frames). Error bars for the 0.5 Å results indicate the standard deviations across all 27 k voxels in this K region. Right: Analogous results from the histogram method; results for a grid spacing of 0.25 Å were computed but are below the scale of the graph. All calculations used L regions with R = 8 Å, and all entropies are reported as free energy contributions, TS, in kcal/mol. The NN method provides reasonably well-converged results within 1–2 million MD frames, for voxel sizes, Rvox, of 0.75 and 0.5 Å. Thus, as shown in Figure (left), the mean of TSww across all 27 k voxels (solid lines) changes little with increasing simulation time for both of these grid spacings. In addition, for 0.5 Å voxels (red), the standard deviation of TSww across the 27 k voxels falls to 0.066 kcal/mol at 1 million frames and 0.05 kcal/mol at 2.3 million frames (red error bars). For 0.75 Å voxels (blue), the standard deviation is 0.042 kcal/mol at 1 million frames (data not shown). However, calculations with these grid spacings converge to values that overestimate the reference RDF result somewhat, particularly for a grid spacing of 0.75 Å (Figure , blue). Because the NN method does not make use of the discretization of the L grid into voxels l, these systematic errors are attributed to the fact that larger k voxels lead to greater smoothing of the conditional probability density functions over the corresponding regions L. Going to a finer grid spacing, Rvox, of 0.25 Å yields ultimately a closer approach to the reference RDF result (Figure , green), but convergence is notably slower. The histogram method converges slowly with the number of MD frames (Figure , right). This is evident even for the relatively coarse 0.75 Å grid spacing, and our 0.25 Å results are below the scale of the graph. In addition, it is evident that the 0.75 Å result is converging to a result that is further from the reference RDF result than the NN 0.75 Å result. This difference presumably results from further smearing of the conditional density functions, due to the added discretization of the L regions, which is not required for the NN method.

Figure 3

Dependence of second order pure water entropy on grid spacing. Left: NN results for grid spacings (Rvox) of 0.25 (green), 0.5 (red), and 0.75 Å (blue), with the reference RDF result (dashed line; computed with R = 15 Å, Δr = 0.01 Å, and 200 000 MD frames). Error bars for the 0.5 Å results indicate the standard deviations across all 27 k voxels in this K region. Right: Analogous results from the histogram method; results for a grid spacing of 0.25 Å were computed but are below the scale of the graph. All calculations used L regions with R = 8 Å, and all entropies are reported as free energy contributions, TS, in kcal/mol.

Dependence of second order pure water entropy on size of L regions. Left: Comparison of NN results, for L regions with R of 4–10 Å, with the reference RDF result (dashed line; R = 15 Å, Δr = 0.01 Å, and 200 000 MD frames). Right: Analogous results from the histogram method. All results are for a grid spacing of 0.5 Å, and all entropies are reported as free energy contributions, TS, in kcal/mol. As detailed in the Appendix, further comparison of the NN and histogram methods, with grid spacings down to 0.15 Å, highlights the advantage of NN over the histogram method, in terms of the rate of convergence and the amount of sampling required to reach the reference RDF result. The graphs in the Appendix suggest that the RDF result would be closely approximated by the NN method at a grid spacing of 0.15 Å, given about 4 × 106 snapshots, whereas the histogram method would require far more sampling. The present calculations depend not only on the grid spacing, but also on the size of the L region, in which the perturbation of water density by a water in each voxel k is evaluated. It is thus appropriate to evaluate the sensitivity of the second order entropy calculations to the size of the L regions. Based on the RDF study (Figure , left), one may anticipate that R ≥ 4 Å should suffice to capture most of the second-order entropy for the TIP3P water model examined here. This expectation is borne out by the insensitivity of the NN results to R, over a range of 4–10 Å (Figure , left); in addition, good numerical convergence is observed over this range. It is worth noting that, because the L regions are cubic, a value of R equal to 5 Å, for example, includes locations as far as Å from the center of voxel k. On the other hand, the poor convergence of the histogram method (Figure , right) makes it difficult to assess the sensitivity of its results to the value of R.

Figure 4

On the basis of this analysis of calculations for pure water, we selected a grid spacing of 0.5 Å as a good working compromise between speed and accuracy for the cucurbituril and protein calculations described below, and we set R, the size of the L regions, to 5 Å. The bulk value used to compute the change in the second order entropy is taken from a pure water calculation under the same conditions and with 2 000 000 MD frames. In addition, given the poor convergence of the histogram method, the following calculations use only the NN method for the second order calculations.

Cucurbit[7]uril (CB7)

Visual Mapping of Second-Order Translational Entropy

Figure visualizes the spatial distribution of the second-order translation entropy, relative to bulk, in and around the synthetic host molecule CB7, which we previously studied to first order in entropy.[51] As shown in the left panel, there is a large region of water with reduced second order entropy density, relative to bulk, centered around the symmetry axis of CB7 (yellow), with particular enhancement in two toroids (pink) roughly level with the upper and lower nitrogen–carbon rings of the host. These are regions where more highly correlated water (hence reduced second-order entropy relative to bulk) is present at relatively high number density, leading to notably negative pairwise entropy density relative to bulk. On the other hand, small loci of water with positive pairwise entropy density relative to bulk (i.e., less correlated than bulk) are present at seven symmetry-related sites around the relatively apolar exterior of the host (green), and also at symmetry-related sites between the carbonyl oxygens at the upper and lower portals of the host (green).

Figure 5

Spatial maps of second order translational entropy density of water, relative to bulk, in and around host CB7, based on 1 200 000 MD frames. Left: Entropy contoured at −0.02 kcal/mol/Å3 (pink), −0.01 kcal/mol/Å3(transparent orange), and +0.005 kcal/mol/Å3 (green). Middle: same, but omitting the contour at −0.01 kcal/mol/Å3 and additionally showing water number density contoured at 0.1 waters/Å3 (gray) and 0.175 waters/Å3 (magenta). Right: rotated view, showing only entropy density contours at −0.02 and 0.005 kcal/mol/Å3 and number density contour at 0.01 waters/Å3, with the same colors. Entropies are multiplied by temperature to convert to kcal/mol. These and all other molecular graphics in this paper were generated with the program VMD.[80]

It is of interest to compare this distribution of pairwise entropy density with the simple number density of water, which is contoured in gray and magenta in the middle and right-hand panels of Figure . Whereas the number density is highest in the central torus (magenta, middle panel), the absence of a central torus in the entropy contours of the left-hand panel indicates that the water molecules in the central torus are not especially correlated with other water molecules. In contrast, water molecules present at intermediate density in the upper and lower tori (middle panel) are highly correlated and therefore appear prominently in the entropy map (pink in the left-hand panel). Similarly, the regions of elevated pairwise entropy (green) appear as a subset of the high number-density regions contoured in gray (middle and right-hand panels). Spatial maps of second order translational entropy density of water, relative to bulk, in and around host CB7, based on 1 200 000 MD frames. Left: Entropy contoured at −0.02 kcal/mol/Å3 (pink), −0.01 kcal/mol/Å3(transparent orange), and +0.005 kcal/mol/Å3 (green). Middle: same, but omitting the contour at −0.01 kcal/mol/Å3 and additionally showing water number density contoured at 0.1 waters/Å3 (gray) and 0.175 waters/Å3 (magenta). Right: rotated view, showing only entropy density contours at −0.02 and 0.005 kcal/mol/Å3 and number density contour at 0.01 waters/Å3, with the same colors. Entropies are multiplied by temperature to convert to kcal/mol. These and all other molecular graphics in this paper were generated with the program VMD.[80] Further insight may be obtained by examining the pairwise entropy on a per-water-molecule basis, rather than on the basis of entropy density, as shown in Figure . The contours are more irregular than those in Figure , because, unlike the density-weighted entropies there, here the contours involve voxels where the number density is low, so the numerical convergence is worse. Nonetheless, it is interesting to observe that the most correlated water molecules are those at the centers of the two portals, as highlighted in the pink and clear orange contours. In contrast, the water in the center of the host is not especially correlated, as indicated by the lack of contouring there. This result is consistent with the absence of a torus of entropy density in the left-hand panel of Figure .

Figure 6

Spatial map of second order translational water entropy per water molecule, relative to bulk, in and around host CB7. Pink: contour at −0.5 kcal/mol/water. Transparent orange: −0.3 kcal/mol/water. No significant positive regions were observed after setting aside voxels with inadequate sampling, e.g., those with number densities less than 0.001 water molecules/Å3. The strongly negative pairwise entropy of water molecules at the centers of the portals (pink in Figure ) suggests that the density distribution of water is strongly influenced by the presence of a water at these locations, as discussed in section 2.1.2. Put differently, the density conditioned on the presence of water in the middle of the portal should be quite different from the unconditioned density. This is indeed the case, as shown in Figure . Here, the left-hand panel shows the baseline density distribution of water, with a strong central torus and a few other small loci of high density, while the right-hand panel shows water density conditioned on the presence of a water at the center of the portal (star in figure), contoured at the same level as in the left-hand panel. Despite some noise in the conditional contours, which results from a reduction in the number of MD frames available for averaging, due to the condition, one can clearly see a dramatic increase in water structure, with the appearance of an additional torus below the star and a more jagged ring of density above it. This increase in structure gives a conditional entropy lower than the baseline unconditional entropy and hence a negative contribution to the overall pairwise translational entropy from the voxel corresponding to the star.

Figure 7

Number density of water, contoured at 0.15 water molecules/Å3, in and around CB7. Left: standard number density derived directly from simulation. Right: number density conditioned on the presence of a water molecule at the location marked by a star.

Quantifying the Second Order Translational Entropy of Water in and around CB7

Binding of guest molecules to the CB7 host leads to displacement of water molecules to the bulk from regions in and near the binding cavity. The resulting change in water thermodynamics makes a contribution to the overall thermodynamics of binding, so it is of interest to examine the magnitude and sign of these contributions. In a previous study, we used the histogram method to study the first-order translational entropy, and the NN method to study the orientational entropy, of water in the CB7 binding cavity.[51] Here, we report the second-order translational entropy for regions of interest in and around CB7 and consider it in the context of the first-order terms. We also compare the present NN implementation of the first order translational entropy (section 2.5) with the prior histogram method. Three regions are considered (Figure ): the cavity (blue), which comprises the complete interior of the host (including the torus, below), with upper and lower cutoffs positioned at a trough in water density near the level of the host’s carbonyl oxygens; the torus (red), which is a subset of the cavity region holding only the central torus of high-density water;[51] and the portals (cyan), which extend 2 Å above and below the upper and lower borders, respectively, of the cavity region and hold regions of increased water density above the upper and lower rings of carbonyl oxygens. It should be noted that the geometric definitions of the cavity and torus regions used here have been refined, relative to those used previously,[51] so the first-order thermodynamic results for them differ somewhat. In particular, the cavity is somewhat taller, to reach the troughs in water density at the portals, and the torus is slightly shorter, to capture only its peak density. The results (Table ) are reported as regional integrals, as indicated by the superscript “R” in each case, and also on a per-water-molecule basis, which provides information on the average properties of water in each region.

Figure 8

Table 1

Water Entropy Contributions, As Defined in Text, for Regions in and near the Binding Cavity of the CB7 Host, and for the Region of the FXa Binding Site Occupied by Ligand ZK-807834 (CI-1031)a

	N_K	TΔS_ww^R,trans,NN	TΔS_sw^R,trans,NN	TΔS_sw^R,orient,NN
cavity	7.54	–0.96	–5.17	–6.33	–0.13	–0.69	–0.84
torus	2.82	–0.24	–2.42	–2.41	–0.09	–0.86	–0.85
portal	25.85	–1.95	–9.66	–18.32	–0.08	–0.37	–0.71
FXa	16.34	–4.96	–15.31	–26.43	–0.30	–0.94	–1.62

The CB7 regions are defined in the text and in Figure . NK: mean number of water molecules in each region. The first three entropy columns (kcal/mol) are sums over the respective regions; the second three entropy columns (kcal/mol/water molecule) are normalized by the numbers of waters in the respective regions and thus report on the average property of water in each region. The results are based on K and L grids with a 0.5 Å spacing and L grids with RL = 5 Å, as noted in the text.

Regions of interest in and around the CB7 host, shown in side (left) and top (right) views, with the host (stick diagram, omitting hydrogen atoms), and water density contoured at 0.10 water molecules/Å3 (gray). Central blue box: cavity region. Union of upper and lower cyan boxes: portal region. Red box: torus region. For all three regions—cavity, torus, and portal—the net second-order translational entropy has the same sign as the first-order translational and orientational entropy terms. Thus, the second-order term always reinforces, rather than partly cancels, the first order term. This result contrasts with the heuristic assumption that the second-order term tends to partly cancel the first-order one.[52,67] Although the second-order contribution is substantially smaller than the first order terms (Table ), at 5–8% of their sum, it is large enough to substantially influence the thermodynamics of hydration and of guest-binding to this synthetic host molecule. It is worth noting that the nonlogarithmic term contributes minimally to these second order terms, 0.08, 0.04, and −0.12 kcal/mol, for the cavity, torus, and portal regions, respectively. It is also of interest that the first-order orientational term is about equal to the first-order translational term for the cavity and torus regions, but it is about double the first-order translational term for the portal. Presumably, waters at the portal are particularly well-ordered because they form hydrogen bonds with the carbonyl oxygens of the host. The entropic properties of water molecules in the three regions may be characterized by correcting for the mean number of waters in each region, as done in the last three columns of Table . The most correlated waters, based on the value of the second-order translational entropy per water, are found in the cavity region. The degree of correlation within the torus is smaller, and similar to that in the portal region. Since the torus is a subset of the cavity, the high second-order entropy of the cavity water as a whole clearly reflects the particularly high correlation of water above and below the central torus, which is evident in the left panel of Figure . The fact that the first-order translational entropy per water of water in the torus is lower (−0.86 kcal/mol/water) than that for the cavity as a whole is simply a reflection of the fact that water is present at higher density in the torus (left panel, Figure ). The least correlated waters are in the portal region, although the difference relative to the torus is small. The CB7 regions are defined in the text and in Figure . NK: mean number of water molecules in each region. The first three entropy columns (kcal/mol) are sums over the respective regions; the second three entropy columns (kcal/mol/water molecule) are normalized by the numbers of waters in the respective regions and thus report on the average property of water in each region. The results are based on K and L grids with a 0.5 Å spacing and L grids with RL = 5 Å, as noted in the text. The second-order translational entropy is based on a six-dimensional probability distribution function, the two-point water density ρ(r,r′), and thus can require long simulations to achieve acceptable convergence. Reasonably good convergence is obtained here for the various regions in and around CB7 (Figure , left), though the portal graph, in particular, still has a visible trend after 600 ns of simulation time (1 200 000 frames).

Figure 9

Convergence of second-order translational entropy of water in and around host CB7 as a function of MD frames (left), and of first order translational entropy for the cavity region, by the NN and histogram methods (right). The regions are defined in the text and Figure .

Finally, it is worth commenting here on the NN implementation of the first order translational entropy (section 2.5), a term we previously computed with the histogram method. For the cavity region, both the NN and histogram method with a 0.5 Å grid spacing converge well (Figure , right), but the histogram result is higher than the NN result. In general, the NN method yields entropies for the three regions 13–22% lower than those provided by the histogram method (results not shown). This is because, unlike the NN method, the histogram method approximates the water density as uniform within each voxel. This approximation smooths the water density and thereby leads to higher entropy estimates. Convergence of second-order translational entropy of water in and around host CB7 as a function of MD frames (left), and of first order translational entropy for the cavity region, by the NN and histogram methods (right). The regions are defined in the text and Figure .

Factor Xa

Visual Mapping of Second-Order Translational Entropy

Water in and near the active site of the enzyme FXa adopts a complicated distribution of increased and decreased translational correlation, as manifest in contours of the second-order translational entropy density, relative to bulk, shown in Figure (left panel). (Note that, although the simulations analyzed here placed the protein in a large box of explicit solvent with period boundary conditions, the K grid covered only part of the surface, so the contours are localized.) Indeed, every hollow in the surface of the enzyme is occupied by contours of negative (pink) or positive (green) entropy density (Figure , left), relative to bulk, whereas water at more convex regions of the surface shows fewer entropic features. The tendency of this entropy term to be perturbed in enclosed regions of the FXa surface is consistent with the fact that, for CB7, the greatest perturbations also occurred within the binding cavity.

Figure 10

Water properties in the active site of enzyme FXa. Left: Second-order translational entropy density of water, scaled by T to yield kcal/mol/Å3, contoured at −0.02 (pink) and +0.005 (green) kcal/mol/Å3), with molecular surface uniformly colored. Middle: Water number density contoured at 0.175 waters/Å3, with protein surface colored by element; blue-gray, carbon; blue, nitrogen; red, oxygen; hydrogens not show. Right: Second-order entropy per water molecule (kcal/mol/water), contoured at −0.5 kcal/mol/water (pink) and +0.16 kcal/mol/water (green). Note that the number density contours were computed for a somewhat larger region than the entropy density contours, especially toward the bottom and right of the active site. All simulations represented here are 400 ns long, with frames analyzed every 1 ps. This patterning may be considered in the context of the polarity of the enzyme’s surface and the number density of water (Figure , middle). For example, a large, nonpolar pocket without a strong elevation of number density (red arrow, Figure , middle) nonetheless has a substantial increase in second-order translational entropy density (Figure , left), indicating reduced water–water correlation. Accordingly, the water molecules in this pocket also have a notable increase in the corresponding per water entropy (Figure , right). Further examination of the figure may suggest a tendency for elevated water–water entropy (decreased correlation) at other patches of hydrophobic surface too, and a tendency for decreased water–water entropy (increased correlation) in more polar clefts. On the other hand, water in the relatively hydrophobic interior of CB7 showed reduced, rather than increased, water–water translational entropy, so any connection between hydrophobicity of the surface and reduced water–water correlation cannot be viewed as universal. Concentration of negative second-order translational entropy at the entry of a deep, water-filled pocket in FXa. Left: second order translational entropy per water molecule contoured at −2 kcal/mol/water. Right: second order translational entropy density, contoured at −0.1 kcal/mol/A3. In the case of CB7, water at the center of the portal to the binding-site cavity was found to have particularly low second-order translational entropy per water molecule, implying a high degree of water–water correlation (section 3.2.1). A similar phenomenon, though more pronounced, is also observed in the case of FXa, as the entryway to a deep, water-filled cavity is occupied by water with even lower values of the second-order translational entropy than seen in the case of CB7, on a per water basis (Figure , left), and in terms of entropy density (Figure , right). Presumably water at such locations is in a particularly good position to influence the density distribution within the nearby cavity or cleft. This was seen previously for CB7 (Figure ) and maybe also be observed in the case of FXa. Thus, the number density distribution of water in the deep cleft of FXa is very different when conditioned on the presence of a water molecule in an entryway site of strongly negative translational entropy per water (−3.5 kcal/mol/water, red) than when it is unconditioned (Figure , pink conditioned vs blue unconditioned density contours). The red site here is at the center of one of the green regions in Figure .

Figure 11

Concentration of negative second-order translational entropy at the entry of a deep, water-filled pocket in FXa. Left: second order translational entropy per water molecule contoured at −2 kcal/mol/water. Right: second order translational entropy density, contoured at −0.1 kcal/mol/A3.

Figure 12

Comparison of unconditional (blue) number density distribution of water with number density distribution conditioned on the presence of a water at the red site (pink), in the active site cleft of FXa. This is a side view, relative to the other FXa representations; front and rear clipping and a transparent protein surface allow visualization of density contours deep in the cleft. The water densities are contoured at 0.1 water molecules/Å3; the red contour of second order translational entropy is contoured at −3 kcal/mol/water molecule.

Contribution of the Second Order Translational Entropy to Ligand Binding

Prior studies have considered the role of the first-order entropy of water in the binding site of FXa to the binding affinity of drugs and drug-like ligands to this enzyme.[34,45] The present methodology now allows examination of the second-order translational entropy of water in this region. We focus on the water displaced by the ligand (PDB HET ID Z34) present in the binding site of FXa in the PDB entry studied here, 1JFS. This ligand occupies part of the complicated active site region (Figure , left) and displaces water from the deep cleft that contains the most correlated water (Figures , 12, and 13, right).

Figure 13

FXa with cocrystallized ligand (PDB HET ID Z34, InChI Key NPNSVNGQJGRSNR-UHFFFAOYSA-N) seen from outside of active site (left) and in a rotated view with transparent, z-clipped protein surface to show penetration of the ligand into the deep active site cleft also shown in Figure . At nearly −5 kcal/mol (Table ), the integrated water–water translational entropy in the region occupied by this ligand is large enough to have a major impact on the ligand affinity. This value is substantially greater than the corresponding result for the largest regional integral of the pairwise entropy for CB7, which is about −2 kcal/mol, for the portal region (Table ), even though the number of waters in the portal region is larger. Accordingly, the second-order entropy per water is about 3-fold larger (more negative) in the binding site than in or around CB7. The first order entropy terms are also greater (more negative) for FXa than for CB7, on both a regional and a per water basis (Table ), but not by as large a factor as the second-order term. As observed in the case of CB7, the nonlogarithmic contribution to the second-order entropy is minimal, at 0.20 kcal/mol. Also, again as for CB7, the second-order term contributes with the same, negative, sign as the first-order term, thus reinforcing it rather than balancing it. Convergence of translational water–water entropy for region overlapped by ligand Z34 in the binding site of FXa, reported in kcal/mol. Two potentially balancing sources of error in the second order term should be noted. On one hand, this quantity is not fully converged, as shown in Figure , and a longer simulation would apparently lead to a somewhat less negative value. On the other hand, as discussed in section 2.3, the use of a 0.5 Å grid leads to smoothing of the conditional water densities, and the use of a finer grid, if numerically practical, would lead to a somewhat more negative value. Thus, the combination of a finer grid and a much longer simulation might shift the final result either up or down a bit, and the result in Table is likely a reasonably good estimate of the desired quantity.

Figure 14

Convergence of translational water–water entropy for region overlapped by ligand Z34 in the binding site of FXa, reported in kcal/mol.

Discussion

We have described a grid-based methodology, which uses inhomogeneous solvation theory to estimate and regionally map the two-body translational contribution to the hydration entropy of a solute of interest, using the data from a simulation of the solute immersed in explicit water molecules. The present formulation shows that the contribution to this two-body term associated with location r′ tells the degree to which the first-order translational entropy of water is changed when it is computed from a density distribution conditioned on the presence of a water molecule at r′. This contribution must also be weighted by the density of water at r′. Integrating over r′ then yields the total second order translational entropy, which may be referenced to the corresponding quantity for bulk water. Our implementation of this idea combines a gridded discretization of r′ with a nearest neighbor method that efficiently computes the required conditional translational entropies. Evaluating this second order entropy term is substantially more computationally demanding than evaluating the first order translational and orientational terms, but the numerical convergence obtained, both in terms of grid spacing and simulation length, appears good enough to provide at least semiquantitative information and intuition about how water–water correlations contribute to the hydration entropy of solutes. It is worth emphasizing that the NN method used here yields efficiency and accuracy far beyond that afforded by a simple histogram-based method. The present method has several methodological parameters that affect the trade-off between accuracy and the amount of data needed to achieve a given level of numerical convergence. On the basis of the RDF study, one would ideally use a grid spacing of about 0.1 Å to achieve highly accurate correlation results. However, very long simulations would be required to converge these calculations, so we have used a grid spacing of 0.5 Å, which seems to afford a reasonable balance between accuracy and convergence, based on the RDF study. However, it will be of interest to press for further refinement, via either improved algorithms or brute-force increases in simulation lengths. Another relevant parameter is the size of the L grid positioned around each k voxel to capture the effects of a water at k on the local water density distribution. For the TIP3P water model used here, we obtained reasonable results with an L grid of about 10 Å side length, which hence extends a minimum of 5 Å in all directions from voxel k. However, the RDF of TIP3P water reaches bulk density at a rather short distance, and other water models, whose RDF deviates from bulk at longer distances, may require a larger L grid. We find that this second order term contributes only 5–8% of the first order terms to the hydration entropy associated with regions of interest in and around the synthetic host molecule CB7 and in the active site region of the enzyme FXa. Interestingly, the second-order term is more significant for FXa, with its more polar and geometrically complex binding site, than for CB7. Indeed, for FXa, the integral of the second order term over the region occupied by a high-affinity, small molecule inhibitor is about −5 kcal/mol, which is substantial on the scale of the free energies of ligand-protein binding. Hence, the second-order translational term probably makes a substantial contribution to the change in hydration entropy on protein–ligand binding. It is also worth remarking that the second order term contributes with the same, negative, sign as the first order terms, in the cases studied here; intuitively, the presence of the solute increases water–water correlations and further decreases the entropy. This result appears to be consistent with the room temperature results previously obtained in a study of methane hydration,[64] but less so with prior heuristic estimations in which the water–water term is assumed to increase, rather than decrease, the hydration entropy.[52,67] Three-dimensional mapping of the second order entropy contribution in and around CB7 and FXa suggests that this local quantity depends not only on whether the nearby solute surface is polar or nonpolar, but also on overall shape of the nearby surface. In particular, the entryways to deep binding cavities have particularly low values of TΔSww,trans, indicating a high degree of correlation with other waters. This makes intuitive sense, as these are locations where the presence or absence of a water can strongly effect how other waters are able to pack and fill the nearby cavity, making for a large difference in the baseline and conditional translational entropy of water. We anticipate that this “entryway effect” will be important in many other systems as well, such as at the entryways of nanotubes and of other deep protein binding sites, like the active site gorge of the enzyme acetylcholinesterase.[81] Displacement of water from such locations, if the density is significant there, is expected to be thermodynamically favorable. Of course, it will be important to consider such second-order contributions in the context of other contributions to the thermodynamics, both entropic and enthalpic. The present methodology opens new possibilities for exploring the thermodynamic properties of water at molecular surfaces and thus has potential to deepen our understanding of molecular recognition and to find practical application in various aspects of molecular design, including drug design. There is also room for methodological improvement. For this translational term, it will be of interest to seek enhanced convergence with respect to both grid spacing and simulation duration. In addition, it will be of interest to extend these studies to the entropic consequences of water–water orientational correlations, which may be substantial, due to the cooperative formation of various different hydrogen-bonded configurations of water molecules. The use of strong numerical methods, like nearest-neighbor entropy estimation, combined with growing computer power, should make for continued progress along these lines.

52 in total

1. Preparation, characterization, and the crystal structure of the inhibitor ZK-807834 (CI-1031) complexed with factor Xa.

Authors: M Adler; D D Davey; G B Phillips; S H Kim; J Jancarik; G Rumennik; D R Light; M Whitlow
Journal: Biochemistry Date: 2000-10-17 Impact factor: 3.162

2. Comparison of multiple Amber force fields and development of improved protein backbone parameters.

Authors: Viktor Hornak; Robert Abel; Asim Okur; Bentley Strockbine; Adrian Roitberg; Carlos Simmerling
Journal: Proteins Date: 2006-11-15

Review 3. Molecular modeling of hydration in drug design.

Authors: Ricardo L Mancera
Journal: Curr Opin Drug Discov Devel Date: 2007-05

Review 4. The role of water molecules in computational drug design.

Authors: Stephanie B A de Beer; Nico P E Vermeulen; Chris Oostenbrink
Journal: Curr Top Med Chem Date: 2010 Impact factor: 3.295

5. Grid inhomogeneous solvation theory: hydration structure and thermodynamics of the miniature receptor cucurbit[7]uril.

Authors: Crystal N Nguyen; Tom Kurtzman Young; Michael K Gilson
Journal: J Chem Phys Date: 2012-07-28 Impact factor: 3.488

6. Displacement of disordered water molecules from hydrophobic pocket creates enthalpic signature: binding of phosphonamidate to the S₁'-pocket of thermolysin.

Authors: L Englert; A Biela; M Zayed; A Heine; D Hangauer; G Klebe
Journal: Biochim Biophys Acta Date: 2010-07-01

7. Hydration in drug design. 2. Influence of local site surface shape on water binding.

Authors: C S Poornima; P M Dean
Journal: J Comput Aided Mol Des Date: 1995-12 Impact factor: 3.686

8. Thermodynamic properties of liquid water: an application of a nonparametric approach to computing the entropy of a neat fluid.

Authors: Lingle Wang; Robert Abel; Richard A Friesner; B J Berne
Journal: J Chem Theory Comput Date: 2009-06-09 Impact factor: 6.006

9. Thermodynamics of buried water clusters at a protein-ligand binding interface.

Authors: Zheng Li; Themis Lazaridis
Journal: J Phys Chem B Date: 2006-01-26 Impact factor: 2.991

10. How Can Hydrophobic Association Be Enthalpy Driven?

Authors: Piotr Setny; Riccardo Baron; J Andrew McCammon
Journal: J Chem Theory Comput Date: 2010-08-24 Impact factor: 6.006

12 in total

Review 1. Relationship between Solvation Thermodynamics from IST and DFT Perspectives.

Authors: Ronald M Levy; Di Cui; Bin W Zhang; Nobuyuki Matubayasi
Journal: J Phys Chem B Date: 2017-02-28 Impact factor: 2.991

2. Validating the Water Flooding Approach by Comparing It to Grand Canonical Monte Carlo Simulations.

Authors: Hanwool Yoon; Vesselin Kolev; Arieh Warshel
Journal: J Phys Chem B Date: 2017-10-02 Impact factor: 2.991

3. Utilizing Grand Canonical Monte Carlo Methods in Drug Discovery.

Authors: Michael S Bodnarchuk; Martin J Packer; Alexe Haywood
Journal: ACS Med Chem Lett Date: 2019-12-11 Impact factor: 4.345

4. The Role of Interfacial Water in Protein-Ligand Binding: Insights from the Indirect Solvent Mediated Potential of Mean Force.

Authors: Di Cui; Bin W Zhang; Nobuyuki Matubayasi; Ronald M Levy
Journal: J Chem Theory Comput Date: 2018-01-12 Impact factor: 6.006

5. Solvation Structure and Thermodynamic Mapping (SSTMap): An Open-Source, Flexible Package for the Analysis of Water in Molecular Dynamics Trajectories.

Authors: Kamran Haider; Anthony Cruz; Steven Ramsey; Michael K Gilson; Tom Kurtzman
Journal: J Chem Theory Comput Date: 2017-12-08 Impact factor: 6.006

6. Thermodynamic Decomposition of Solvation Free Energies with Particle Mesh Ewald and Long-Range Lennard-Jones Interactions in Grid Inhomogeneous Solvation Theory.

Authors: Lieyang Chen; Anthony Cruz; Daniel R Roe; Andrew C Simmonett; Lauren Wickstrom; Nanjie Deng; Tom Kurtzman
Journal: J Chem Theory Comput Date: 2021-04-08 Impact factor: 6.006