Literature DB >> 28527450

On the accuracy of one- and two-particle solvation entropies.

Abstract

Evaluating solvation entropies directly and combining with direct energy calculations is one way of calculating free energies of solvation and is used by Inhomogeneous Fluid Solvation Theory (IFST). The configurational entropy of a fluid is a function of the interatomic correlations and can thus be expressed in terms of correlation functions. The entropies in this work are directly calculated from a truncated series of integrals over these correlation functions. Many studies truncate all terms higher than the solvent-solute correlations. This study includes an additional solvent-solvent correlation term and assesses the associated free energy when IFST is applied to a fixed Lennard-Jones particle solvated in neon. The strength of the central potential is varied to imitate larger solutes. Average free energy estimates with both levels of IFST are able to reproduce the estimate made using the Free energy Perturbation (FEP) to within 0.16 kcal/mol. We find that the signal from the solvent-solvent correlations is very weak. Our conclusion is that for monatomic fluids simulated by pairwise classical potentials the correction term is relatively small in magnitude. This study shows it is possible to reproduce the free energy from a path based method like FEP, by only considering the endpoints of the path. This method can be directly applied to more complex solutes which break the spherical symmetry of this study.

Entities: Chemical

Year: 2017 PMID： 28527450 PMCID： PMC5438298 DOI： 10.1063/1.4983654

Source DB: PubMed Journal: J Chem Phys ISSN： 0021-9606 Impact factor: 3.488

INTRODUCTION

The ability to estimate the free energy difference between two defined states is a useful tool in computational chemistry. Direct applications are predicting the free energy of a solvation process, whether it may be testing a small molecule in a solvent to see if the pair is likely to be miscible or a larger system, for example, a peptide, protein, or protein-ligand complex. The latter has a direct implication to in silico drug designs. Being able to quantitatively measure such a change then allows the relative comparison of ligands for a given protein. For the protein-ligand complex, if the free energy of the bound state is less than the unbound state then the equilibrium will favour the bound state. There exist a number of methods of estimating changes in the free energy with computational simulations. These include the Free Energy Perturbation (FEP) and Thermodynamic Integration (TI). Both these methods rely on a well defined path from the reference state to the new state, but are very generally applicable and work with different levels of theory for the description of the physical system. Another method which has seen growing success is Inhomogeneous Fluid Solvation Theory (IFST), which does not need a well-defined path between the reference state and the new state; however, IFST can only calculate changes in the free energy in the context of a solvation process. IFST performs this by using a direct computation of the change in entropy due to solvation and combining this with a direct energy measurement to calculate the free energy. There are numerous examples of calculating and estimating such a change in free energy. This work attempts to expand on one method of measuring the solvation entropy change directly, namely the Mutual Information Expansion (MIE), and uses a k-Nearest Neighbours (KNN) estimator to evaluate an approximation to the change in the solvation entropy. This work is different from the majority of previous works, in that we truncate the MIE at a higher order, including an additional term. This additional term represents correlations between two solvent molecules in the presence of a solute, and such terms have been measured before in the context of water in protein binding pockets using Grid Inhomogeneous Solvation Theory (GIST). We seek to find quantitatively, the change in free energy associated with the extra information, and whether it is necessary to include this term in the MIE. The system under study in this work is fundamental and simple, a Lennard-Jones neon solvent with a fixed Lennard-Jones atom in the centre of the simulation cell which represents the solute in a solvation process. The changes in free energy will be compared to an equivalent FEP simulation. The comparison of FEP and IFST in this work is independent of the forcefield used. First, we will discuss the theory used in all calculations, then review the computational methods used, and input simulation parameters. The results will then be discussed and then analysed.

THEORY

Our calculations of the solvation free energy associated with fixing a Lennard-Jones atom at the centre of a neon simulation box (see Fig. 1) are based on the following:

FIG. 1.

Schematic of the canonical ensemble simulation for both IFST and FEP. The N solvent neons with the Lennard-Jones parameter surround the solute with the Lennard-Jones parameter (left). The solute interactions are turned off, while the volume and temperature remain constant (right). Both methods used in this study are measuring the change in the Helmholtz free energy, , between these two states. Step 1. A change in free energy in the canonical ensemble is given by a change in the Helmholtz free energy,where is the change in the internal energy of the system, T is the temperature of the system, and is the change in the entropy of the system. The changes here are the differences between the solute-liquid system and the bulk liquid system. Step 2. By using a molecular dynamics (MD) simulation with a parameterized forcefield, it is possible to estimate aswhere is the equilibrium expectation energy of a system of N neon atoms in a periodic cell of volume V at temperature T, and is the equilibrium expectation energy of a system of N neon atoms in the presence of a fixed Lennard-Jones atom at volume V and temperature T. Step 3. It is possible to write the total change in solvation entropy, in Eq. (1), as an expansion over correlations of one-body, two-body, three-body, and so on, as discussed by Baranyai and Evans for the homogeneous fluid and by Lazaridis for the inhomogeneous fluid. The theory for homogeneous fluids was extensively studied by Kirkwood, Nettleton, Raveche, and Wallace. For the system of N ideal atoms with coordinates and momenta , we have the N body distribution in positions and momentaand the total entropy of a bulk fluid is given bywith R the gas constant and h the Planck constant. Then, the separability of the momentum can be exploited to givewhich is written explicitly as The momentum terms are the same as those of an ideal gas where the one-body distribution of momenta is given bywhere k is the Boltzmann constant, ρ is the number density of the equivalent ideal gas, and m is the mass of the atom. Then,with λ the thermal wavelength of an atom in the liquid. For a vector of random variables , we may writewhere H is an information entropy and the notation H(X|Y) is the conditional entropy of X given Y. In a similar fashion, we may writegiving This expansion was performed for a homogeneous fluid by Wallace. It applies to the canonical ensemble and is non-local, meaning that it is only exact when integrated over the entire system volume. An ensemble invariant and a local form of the expansion are discussed by Baranyai and Evans. In the conditions of this study, the distinguishing terms between the local and non-local forms cancel, so here the non-local form is used for simplicity. In this case, the expansion has the following terms:We can write the part of the three-body correlation function which cannot be expressed multiplicatively by its marginal distributions aswhere each of these N particle correlation functions for k bodies can be written in terms of the k body density asFinally, we may write the excess entropy of the liquid aswe haveTherefore,where terms with an S denote an entropy and terms with an I denote a mutual information term. The expansion with these terms is true for a homogeneous fluid. For the simulations used in this work, we include the presence of the central solute. The central solute in our calculations is spherically symmetric, and our end goal is to generalise to any solute and solvent, so we must consider an inhomogeneous system. This was analytically performed by Lazaridis. Eqs. (12)–(14) are the relevant terms for the liquid, and for a system with a solute we may writewhere the |s argument indicates the presence of a solute. We can write the part of the two- and three-body correlation functions which cannot be expressed by its marginal distributions as Some slight differences exist between the homogeneous and inhomogeneous formulations, is not a mutual information term (denoted with an I rather than an S), as the marginal distributions vanish. We may then writeand subtracting the ideal gas terms, Eq. (18),To find the change in entropy from the addition of the solute, we calculate the excess entropy of solution, . This is then the excess entropy of the solute system, Eq. (26), minus the excess entropy of the neat liquid, Eq. (19),where we see the factors of −R from and cancel. It is at this point we choose a truncation of . The most severe truncation generates what we will call the conditional one particle entropy (C1PE),This is achieved by removing all integrals with a subscript greater than 1. If we include the two-body terms, then we have the conditional two particle entropy (C2PE),It is that this work is concerned with calculating. Step 4. Instead of directly integrating numerically the terms in Eqs. (29) and (30), we can instead convert the integral into a sum which converges to the integral asymptotically in the limit of infinite data. We can then extrapolate toward that infinite limit by measuring the sum for finite quantities of data and fitting an appropriate extrapolating function. The estimators we use in this study are KNN estimators and are formulated as follows. Inserting the explicit integral terms into Eq. (30) giveswe may separate out the denominators of the logarithm in the second term and then swap the coordinates r1 and r2 in one of those new integrals to giveand it is then possible to integrate over r2 in the third term givingIt is convenient to construct estimators to measure three distinct quantitiesgiving an expression for the conditional one- and two-particle entropies in terms of these estimators In general, we may state the information entropy of a probability density function over p random variables as the p-dimensional integralas the expectationHowever, in Eqs. (34)–(36), we have correlation functions in the logarithm rather than densities. We may then instead calculateThis quantity can then be approximated by a finite sum which is performed in Sec. II A.

k-Nearest Neighbours (KNN) estimators

To efficiently calculate the integrals in Eqs. (29) and (30), we used the k-nearest neighbours method which has shown previous success in calculating first order solvation entropies, dihedral entropies in small molecules, and entropies of water in protein binding pockets. The general estimator for N, p-dimensional objects across F frames of data is given bywhere is the Euler gamma function, is the Euler digamma function, and the symbol ≅ denotes an asymptotic equivalence in the limit of an infinite number of frames F. This estimator was initially worked on for the first nearest neighbour by Leonenko and extended for the kth nearest neighbour by Singh and has been used by various studies. This estimator is a limiting case of the adaptive anisotropic elliptic kernel estimators. The estimator for an H1 term, given F sufficiently uncorrelated MD frames containing N neon atoms, is given by the expressionwhich is used in Refs. 13 and 14. The next order expression is then In previous work, Eqs. (43) and (44) have been shown to fit a power law for increasing values of Fwith a and b some constants for the kth neighbour selected, and the asymptotic value of the entropy. b is a negative constant such thatTo extract this value, a power law can be fitted to H(F).

SIMULATION DETAILS

FEP and IFST were used to calculate the free energy change associated with adding a fixed Lennard-Jones particle to the centre of a neon simulation box in the canonical ensemble. The IFST free energy with only and with both and were calculated. All of the simulation data used in this study came from the same forcefield and MD simulations were all carried out in NAMD. Thus, the resulting free energies are directly comparable.

Systems setup

The neon MD parameters were taken from the CHARMM27 forcefield. There are no electrostatic interactions for this solvent-solute system, so all potential energy terms are in the form of a Lennard-Jones potentialwhere r is the distance between atoms i and j, and R is the radius of minimum potential energy for atom k, . 4 different solutes were taken which had varying Lennard-Jones parameters as shown in Table I. All solutes had the same R parameter as the neon solvent. Each solute was initially solvated with 900 neon atoms in a cubic box of edge length . Where reduced units are given for reference they are calculated as the temperature , pressure , length , and number density .

TABLE I.

Forcefield parameters used for each of the solute atoms where is the value for bulk neon.

Solute name	εs [kcal/mol]	R_k [Å]	L	ρ [NeÅ−3]	T (K)
SOL1	0.215	3.06	27.3760	0.043 82	25
SOL2	0.430	3.06	27.3710	0.043 84	25
SOL3	0.645	3.06	27.3707	0.043 84	25
SOL4	0.860	3.06	27.3685	0.043 85	25

Forcefield parameters used for each of the solute atoms where is the value for bulk neon. For the 12-6 LJ fluid, the triple point density of both the liquid and solid phases have been measured by previous studies to be in the ranges and , and the triple point has approximate temperature and pressure and , respectively. Thus, the simulated neon was in the liquid phase.

IFST

IFST equilibration

16 replicate equilibrations per solute type were performed in the NpT ensemble for 4 ns to find the natural densities of the simulations cells with each solute present. The resulting edge lengths are shown in Table I. A 2 ns equilibration was then performed for each replicate in the NVT ensemble at T = 25 K () and 1 atm () using the Langevin temperature control. An MD timestep of 2.0 fs was used. Electrostatic interactions were turned off as they were not present in the forcefield. van der Waals interactions were removed for separations over , and switching was used between and . The simulations took place with cubic periodic boundary conditions. This process was repeated for each of the 4 solutes and one bulk system with no solute.

IFST production

The production simulations were carried out in the NVT ensemble for 60 ns at 25 K () with the same MD parameters as the NVT equilibration. NAMD produces a trajectory (dcd) file, which is a set of system coordinate snapshots. A trajectory frame was saved every 500 fs such that neighbouring frames in the dcd file were not too strongly correlated; this is important for the KNN estimator when used later to extract entropies as commented in the previous work by Huggins. The energies were calculated by taking the average energy across all 16 repeats of the 60 ns production simulation, and these are shown in Table II along with the standard deviation across 16 repeats, .

TABLE II.

Average free energy results for each solute. is the standard deviation of the 16 energy results for each solute. is the standard deviation across all 16 repeats of the FEP free energy change . and are the standard deviations across all 16 repeats of the IFST free energy changes and , respectively. All units are in kcal/mol.

	ΔAFEP	ΔU	ΔS1\|s	ΔS2\|s	ΔA1\|s	ΔA2\|s
SOL1	1.036	−1.717	−0.541	<10−4	1.176	1.176
SOL2	1.820	−2.561	−0.589	<10−4	1.972	1.972
SOL3	2.438	−3.257	−0.694	<10−4	2.563	2.563
SOL4	2.975	−3.839	−0.773	<10−4	3.067	3.067

IFST free energy calculations

To calculate the change in free energy for IFST simulations, we need to calculate each component on the right hand side ofand was calculated using Eq. (2), and therefore the quantities and must be calculated using Eqs. (37) and (38). This then requires collecting a set of nearest neighbour distances and nearest pair distances from the respective MD data for the entropy estimators.

K-Dimensional (K-D) trees

Previous studies using IFST or GIST have used a grid of voxels or cut-cell approach to find nearest neighbour distances. For the pair nearest neighbour distances required for Eq. (44), this method was not practical. In order to increase the efficiency and allow the calculation of the pair terms, a K-Dimensional (K-D) tree method was used. This is covered in more detail in the supplementary material.

Free energy perturbation (FEP)

FEP protocol

The FEP simulation measures the change in free energy of removing a fixed solute from a box of neon. This simulation is parameterized by a variable such that when , the solute is fully present and when , the solute is fully annihilated. In this study, each forward and backward FEP simulation had N = 64 “λ-windows,” and each window has a different value of lambda which is labeled as degree of annihilation in the schedule displayed in Fig. 2. For the nth window out of a total of N windows, the value of λ in that window was generated with the expressionswhich satisfy ,,, and , where and are the forward and backward schedules, respectively. These curves were picked to sample the endpoints of the FEP simulation more heavily to “avoid the end point catastrophe.” The endpoints are when the solute is close to full annihilation in the forward simulation () or when the solute is just being created in the backward simulation (). A van der Waals soft core potential parameter of 5.0 was used. One energy reading was stored every 100 steps which equates to every 0.2 ps.

FIG. 2.

The lambda schedule for the forward and backward FEP simulations across the 64 lambda windows as generated by Eqs. (50) and (51).

FEP equilibration

The end points of the NVT equilibrations for the IFST simulations were used as starting points for the FEP simulations. Each solute had 16 replicates. Each simulation underwent a further 0.2 ns equilibration at T = 25 K () from this point to adjust to their individual values.

FEP simulation

For each λ-window, 0.469 ns of simulation was performed in the NVT ensemble at T = 25 K (). Then, considering the 64 windows in both directions the overall time was 60.1 ns, which is comparable to the 60 ns used for the IFST production run.

FEP calculations

The final change in the free energy from the FEP simulations is found by summing the changes in free energy between each pair of neighbouring λ windows. Then, the forward change isand the backward change isTo calculate the changes in free energy between the λ-windows, the Bennett Acceptance Ratio (BAR) method was used, which is included in the ParseFEP Plugin for Visual Molecular Dynamics (VMD) in which all FEP results for this study were calculated. The BAR estimator provides a calculated statistical error, and these errors were less than 0.007 kcal/mol for all of the 64 simulations.

RESULTS AND DISCUSSION

Results

Fig. 3 shows the solute-fluid radial distribution functions for all solutes and bulk neon.

FIG. 3.

The radial distribution functions for the solute to all fluid atoms. The fluid-fluid RDF is also plotted. For stronger central potentials, the peaks get higher and the troughs get deeper along with a consistent distortion in the second solvation shell.

FIG. 4.

The free energy estimates for the two IFST approximations and the FEP result at values of , and . Different levels of theory have been spaced for clarity. C1PE is the conditional one-particle entropy correction. C2PE is the conditional two-particle entropy correction. The error bars are the spread in the 16 repeats of each result. Table II shows the comparison of the average free energies from FEP and IFST calculations, and the IFST calculations show the conditional one-particle and conditional two-particle entropy values. The IFST free energies in Table II were calculated by Eqs. (48) and (49), the entropy terms were calculated by extrapolating Eq. (45) for the conditional one-particle entropies every 1000 frames from 1000 to 12 000, and Eq. (45) for conditional two-particle entropies, 16 repeats were taken at intervals of 1000, 2000, and 3000 frames and 2 repeats were extended to 1000, 2000, 3000, 4000, 5000, 7500, and 10 000 frames. Fig. 5 shows an example of the conditional two-particle entropy extrapolation for all solutes. The extrapolation routine used was from the gnuplot software, when fitting it weighted data points by the standard deviation of the 16 repeats of that point. For the conditional two-particle entropy fitting, where no standard deviation was available it was estimated to be 10/F kcal/mol, where F is the number of frames for that data point.

FIG. 5.

A plot demonstrating how the asymptotic values for the H and H estimators were extracted. A power law is fitted to the three data points for each solute. The constant in the limit of infinite frames is used as in Eq. (45).

Discussion

Fig. 3 shows the radial distribution functions (RDFs) from the central solute to all solvent atoms. The troughs become deeper and the peaks become higher for increased solute potential parameter. Also plotted is the RDF for the bulk solvent, which has the shallowest troughs and lowest peaks. There is a slight distortion in the second solvation shell, which becomes more pronounced with higher . This is likely to be a gentle crowding effect as the solvent molecules in the first solvation shell draw close to the solute, and the closest locations for second shell atoms correspond to the pits in the already tightly packed first solvation shell. The magnitude of the solvent-solvent correlation entropy was expected to be small in the system of monatomic species used in this study. It has been shown that in the Lennard-Jones system, the primary source of the solvent structure comes from packing, and this will correspond to an entropy associated with the volume exclusion of the central solute which is captured fully by the C1PE term. For more complex solvents, namely water, evidence already exists for the solvent-solvent correlations. Fig. 4 shows the calculations of the free energy of solute annihilation from all three methods of evaluation with the standard deviations of 16 repeats used as error bars. The average values of these estimates are displayed in Table II along with the standard deviations of the 16 repeats. The average values for all methods are in good agreement. The IFST results including the conditional two-particle correction have the same averages and standard deviations of the first order IFST. This demonstrates that the C2PE has no clear contribution to the free energy of this system. The FEP results have the lowest standard deviation of all results even though a comparable length of MD run was used. The increased uncertainty associated with the IFST results likely arises from fitting power laws to extract the asymptotic entropy. The standard deviation in the free energy is at least 3 times greater than that of the MD energies used, which indicates the added uncertainty is from the entropy. This could potentially be remedied by taking more data points during the extrapolation process at the expense of data processing time. Both the average IFST results are within 0.16 kcal/mol of the FEP results. These results indicate that it is possible to reconstruct a fairly accurate measurement of the free energy change of this kind of process by only using the start and end points of an equivalent FEP path. The results indicate that the entropy term contributes relatively less to the free energy of solute annihilation for solutes with larger ε.

CONCLUSIONS

The translational entropy associated with the solute-solvent correlation and the solvent-solvent correlation can be used to evaluate the free energy of solvation in the IFST framework for a system of neon atoms solvating a fixed central Lennard-Jones potential. Both IFST estimates reproduce the estimate by an equivalent FEP simulation to within around 0.16 kcal/mol and appear to consistently overestimate slightly. The IFST method has an advantage over FEP and can give a free energy estimate without having to define a path between the two systems. However, the accuracy associated with FEP estimates is greater, so it is a better tool for computation. We conclude that IFST in its current state is the better tool for physical interpretation as it avoids the non-physical lambda states utilized by FEP. We note that FEP is an already well-developed method, and IFST may yet have future improvements to increase its optimization. The conditional two-particle entropy term did not contribute a noticeable change in free energy for this system for any of the strengths of the solute potential tested. The IFST framework can give the spatial contributions of configurational entropy when analysed with voxel based methods (as shown in Fig. 3 of the supplementary material). This allows areas of interest around the solute to be highlighted. Although the two-particle terms in the MIE do not appear to be significant in this simple system, they may be significant in a system with a liquid water solvent with hydrogen bonding and charges. The methods used in this work may be extended to such a system. If the solvent-solvent interactions become very strong in a particular solvation process, it may be necessary to calculate the conditional three-particle entropyand the methodology used in this work could be extended to such a calculation. See supplementary material for a description of the implementation of K-D trees in this work, and for a spatial illustration of the C2PE contributions around a fixed solute. The data and codes used for calculations in this work will be available at https://doi.org/10.17863/CAM.9335

26 in total

1. Good practices in free-energy calculations.

Authors: Andrew Pohorille; Christopher Jarzynski; Christophe Chipot
Journal: J Phys Chem B Date: 2010-08-19 Impact factor: 2.991

Review 2. Calculation of protein-ligand binding affinities.

Authors: Michael K Gilson; Huan-Xiang Zhou
Journal: Annu Rev Biophys Biomol Struct Date: 2007

3. Adaptive anisotropic kernels for nonparametric estimation of absolute configurational entropies in high-dimensional configuration spaces.

Authors: Ulf Hensen; Helmut Grubmüller; Oliver F Lange
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2009-07-20

4. Statistical theory for the entropy of a liquid.

Authors:
Journal: Phys Rev A Gen Phys Date: 1989-05-01

5. Grid inhomogeneous solvation theory: hydration structure and thermodynamics of the miniature receptor cucurbit[7]uril.

Authors: Crystal N Nguyen; Tom Kurtzman Young; Michael K Gilson
Journal: J Chem Phys Date: 2012-07-28 Impact factor: 3.488

6. Unrestrained computation of free energy along a path.

Authors: Bradley M Dickson; He Huang; Carol Beth Post
Journal: J Phys Chem B Date: 2012-08-30 Impact factor: 2.991

7. Estimation of Solvation Entropy and Enthalpy via Analysis of Water Oxygen-Hydrogen Correlations.

Authors: Camilo Velez-Vega; Daniel J J McKay; Tom Kurtzman; Vibhas Aravamuthan; Robert A Pearlstein; José S Duca
Journal: J Chem Theory Comput Date: 2015-10-21 Impact factor: 6.006

8. Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods.

Authors: Vladimir Hnizdo; Jun Tan; Benjamin J Killian; Michael K Gilson
Journal: J Comput Chem Date: 2008-07-30 Impact factor: 3.376

9. Thermodynamic properties of liquid water: an application of a nonparametric approach to computing the entropy of a neat fluid.

Authors: Lingle Wang; Robert Abel; Richard A Friesner; B J Berne
Journal: J Chem Theory Comput Date: 2009-06-09 Impact factor: 6.006

10. Spatial Decomposition of Translational Water-Water Correlation Entropy in Binding Pockets.

Authors: Crystal N Nguyen; Tom Kurtzman; Michael K Gilson
Journal: J Chem Theory Comput Date: 2015-12-04 Impact factor: 6.006