Literature DB >> 25066823

Optimizing potentials for a liquid mixture: a new force field for a tert-butanol and water solution.

Michele Di Pierro¹, Mauro L Mugnai, Ron Elber.

Abstract

A technology for optimization of potential parameters from condensed-phase simulations (POP) is discussed and illustrated. It is based on direct calculations of the derivatives of macroscopic observables with respect to the potential parameters. The derivatives are used in a local minimization scheme, comparing simulated and experimental data. In particular, we show that the Newton trust region protocol allows for more accurate and robust optimization. We apply the newly developed technology to study the liquid mixture of tert-butanol and water. We are able to obtain, after four iterations, the correct phase behavior and accurately predict the value of the Kirkwood Buff (KB) integrals. We further illustrate that a potential that is determined solely by KB information, or the pair correlation function, is not necessarily unique.

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 25066823 PMCID： PMC4306501 DOI： 10.1021/jp505401m

Source DB: PubMed Journal: J Phys Chem B ISSN： 1520-5207 Impact factor: 2.991

Introduction

Molecular dynamics (MD) is a useful tool to study molecular mechanisms in materials science and biophysics. Advancements in computer power and simulation techniques continuously raise the bar of what is possible to model; systems of millions of atoms can be studied with MD,[1] and the longest simulations can reach milliseconds.[2] New interesting applications are investigated, and new simulation challenges are found. As the world of molecular simulations grows in size and complexity, there is a growing demand for more accurate force fields capable of recovering subtle physical phenomena that are difficult to reproduce with simplified interaction models. The recent and continuous increase of simulation lengths allows us to compute converged statistical averages to be compared to experimental data that were inaccessible for simulations in the past.[3] The functional form of MD force fields remained essentially the same for decades. The energy function consists of bonding and nonbonding terms. The bonded interactions consist of two-, three-, and four-body interactions (respectively, bonds, angles, and torsions), and the nonbonded interactions are a sum of electrostatic forces between fixed-point charges placed on atom centers, and hard-core and dispersion forces modeled by Lennard-Jones (LJ) interaction. This functional form showed robustness and transferability, and is the method of choice of most simulation software. While many quantitative and qualitative observations support the validity of such MD force fields, it is certainly possible to improve their functional form. There are many ongoing efforts in this direction; the addition of polarization terms[4] and the addition of statistical potentials[5] are examples of such efforts. In this paper we consider the standard functional form for the MD force field, and we focus on the process of the choice of the parameters optimized against condensed-phase simulations. This idea was put forward by Jorgensen in the OPLS force field.[6] Our contribution is in the design of an automated refinement algorithm that, in principle, can handle a large number of parameters. Our algorithm is not restricted to a particular choice of a functional form; however, our software[7] is designed to work with the functional form and the data structure of the MOIL program.[8] Choosing the optimal set of parameters is important and attracted a considerable amount of manual and partially automated work in the past. Widely used parameter sets (OPLS,[6] AMBER,[9] CHARMm[10]) have been subject to continuous updates and refinements; some updates are improvements on biopolymer models (peptides[11]), while others are additions of parameters for some new small molecules.[12] The development of force field parameters for a small molecule typically involves multiple stages. It involves quantum mechanical calculations (usually in the gas phase) to fit molecular mechanics parameters and condensed-phase calculations to fit parameters so that thermodynamic properties can be reproduced. Sometimes, properties of liquid mixtures are not well reproduced by the parameters developed to describe a single-component system. It is therefore desired to address liquid mixtures more directly and to consider theories and algorithms tailored for these systems. There are a few examples of theories that capture properties of solutions in a relatively small number of parameters. The Kirkwood–Buff (KB) integrals[13] summarize a set of experimental observables characterizing liquid mixtures and are useful targets of optimization of potential parameters.[12,14,15] Di Pierro and Elber have recently published an automated method to refine parameters of force fields using as targets experimental observables that can be predicted with computational statistical mechanics. A prime advantage of the technology is its ability to handle the coupled optimization of a large number of potential parameters. We named our algorithm POP (Parameter OPtimization),[7] and we illustrated its effectiveness on hundreds of parameters, exploring potentials for peptide folding in aqueous solutions. Independently, and at the same time, Wang et al. published a similar algorithm and used it to refine a potential for liquid water,[16] illustrating the general applicability of the method. In the present paper, we combine POP and the observables of KB theory to optimize the potential for liquid mixtures. We also discuss enhancements to the original POP algorithm that enable faster and more accurate convergence to the desired set of parameters. We use the POP method to improve the current force field for tert-butanol (TBA) in aqueous solution (see Figure 1). Our starting point is the OPLS united-atom (OPLSUA) parameters for TBA[17] and the TIP3P[18] water model. We develop a new set of parameters only for TBA. We retain the same water model that was tested comprehensively by now on a very large number of systems. We seek a set of TBA parameters that better reproduce the KB integrals estimated from experiments[19] over a range of different concentrations. While optimization for TBA–water mixtures according to KB integrals have been done in the past,[15] the present study is automated, producing high-quality potentials, and makes it possible to address questions about the uniqueness of the results.

Figure 1

United-atom model of TBA; note that each methyl group is represented by a single united atom.

United-atom model of TBA; note that each methyl group is represented by a single united atom. This paper proceeds as follows. In the Methods Section, we revisit the theory of POP and introduce an improved optimization algorithm. We then discuss the KB theory and its application in MD simulations; last, we focus on its use in the context of the POP algorithm. In the Results and Discussion section, we develop a new force field for liquid mixtures of TBA and water. Discussion and conclusions are left for the final section.

Methods Section

POP Algorithm

We denote an experimental measurement of an observable O by Oexp. The measured quantity corresponds to the (canonical) ensemble average of a certain function of the phase space (positions are collectively indicated by R and momenta by P) that may or may not depend on the force field parameters πThe ensemble average of the observable always depends on the set of parameters π through the exponential weight, and of course, it depends on the macroscopic constraints of the system (number of particles N, volume V, and temperature of the thermal reservoir in contact with the system T). In practice, the ensemble average above can be substituted by a time average over a trajectoryprovided that the system is ergodic and that the dynamics reproduces the canonical sampling (e.g., isokinetic dynamics[100]). One way to validate the results of a simulation is to measure how much computed observables differ from experimental measurements. Here, we optimize the parameters in the MD force field in order minimize the discrepancy between computed and experimental observables. Given N experimental observations, we define our target function to be where wi are constant weights which are ones in the present example. Other choices of the target function are possible, provided that the target function is differentiable. Ideally, the target function Θ should be zero. The optimal set of parameters is π*, such that We minimize the target function using a trust region Newton method.[21] To do so, we need the gradient vector and the Hessian matrix (or some approximation of it[22]) in parameter space. We have shown in a previous paper that any derivative of the target function can be calculated as a single ensemble average;[7] the calculation is analytical and affected only by the statistical error associated with the ensemble average. The gradient vector iswhile the Hessian matrix isNote that the Hessian matrix, while symmetric by construction, is in general indefinite. Using the gradient and the Hessian (calculated for a given parameter set π0), we can build a quadratic model for the target function Θ(π) in a neighborhood of the point π0; the quadratic model m(p) is a function of the displacement vector p = π – π0. The quadratic model is accurate in a neighborhood of π0; we characterize this region of the parameter space by the space contained in a spherical domain of radius Δ. Later on, we will explain how the radius can be iteratively updated. We minimize the target function by iteratively updating the parameter setwhere the increment p is chosen by solving the subproblem The subproblem can be solved in approximated way[22] or exactly; here, given the dimension of the parameter space and having calculated the Hessian, we find the exact solution of the subproblem following the method of Moré and Sorensen[21] (Appendix A). One specific problem of full space optimization of MD force fields lies in the range of values of different parameters; some parameter ranges of values are in the hundreds of thousands (e.g., some van der Waals parameters), while others are of order one (e.g., torsion coefficients). The difference of several orders of magnitude presents a significant challenge for simple optimization algorithms like steepest descent. In order to make adjustments that are homogeneous, we introduce a scaling matrix D such that every element of the vector γ = Dπ is of order one. The matrix D is a diagonal matrixIn this way, all of the parameters are scaled to be in the range of −1 to 1. We can now solve the subproblem in an elliptical trust region The parameters that we optimize include the atomic (partial) charges within a molecule. An obvious constraint on the space of the optimization is the preservation of the molecular charge, that is, the total molecular charge must not change upon optimization. This is in contrast to other parameters such as bond length. We impose charge conservation as a linear constraint. We write the total charge as Q = ∑ nq, where q is the partial charge associated with the atom type l, there are n atoms of type l, and the total number of atom types is m. Keeping the total charge constant means satisfying the linear constraint Q = Q0, where k, as before, is the iteration index. Exploiting the linearity of the constraint, we can fix the total charge by projecting the increment p on the hyperplane of constant charge defined by the constraint Q We now outline the strategy to update the trust region radius Δ.[22] The choice of radius is based on the agreement between the model function m and the target function Θ. Given a step p at iteration k, such agreement can be measured by the following ratio The predicted reduction is always positive; therefore, if ρ is negative, then Θ(π + p) is greater than the current value Θ(π), and the step must be rejected. If ρ is close to 1, there is good agreement between the model and target function; therefore, it is safe to expand the trust region. If ρ is positive but not close to 1, we do not alter the trust region; if ρ is close to 0, then we shrink the trust region (see Appendix A for more details).

KB Theory

The KB theory of fluid mixtures[13] relates some integrals of the pair correlation functions (microscopic observables) computed in the grand-canonical ensemble to derivatives of the chemical potential, isothermal compressibility, and partial molar volumes (macroscopic quantities). A detailed derivation of the theory can be found elsewhere.[13,23] Here, we present the key concepts for our application. The use of KB to optimize potential parameters was put forward by Smith.[24] The optimization using Newton–Raphson in a trusted region is our contribution. Let us consider a binary mixture of two chemical species, chemical species S1 and chemical species S2. The symbols “A” and “B” can either be S1 or S2. Let us define the following pair correlation function The averages are performed in the grand-canonical ensemble (holding fixed the reservoir temperature T, the volume V, and the chemical potentials of the two chemical species S1 and S2, μ1 and μ2, respectively). This function expresses the joint probability of finding the center of mass of a molecule of species A (we indicate its position by r⃗) at r⃗1 and the center of mass of a molecule of species B (we indicate its position by r⃗) at r⃗2, relative to the probability of the two independent events. Note that, even though only the centers of mass appear in eq 13, the theory is general; it does not require spherically symmetric molecules. Indeed, the internal degrees of freedom of A and B molecules, as well as their overall orientation, are accounted for in the ensemble average.[13,23] Let us assume that the system is homogeneous and therefore that the probability of finding a molecule in a specific place is constant anywhere in the system. In this case, we can rewrite eq 13 asLet us now define r⃗ = r⃗1 – r⃗2 and r⃗ = r⃗ – r⃗. If the probability of finding the center of mass of B-type molecules around a molecule of species A depends only on their distance and not the orientation of the vector that connects them (i.e., the system, averaged over its internal degrees of freedom and the orientations of the molecules, is isotropic), then we can rewrite eq 14where we transformed the Dirac’s delta from Cartesian coordinates to polar coordinates. The key object in the KB theory is the so-called KB integralThe meaning of this quantity becomes clearer if we rewrite eq 16 asThe left-hand side of eq 17 is the so-called excess coordination number. The integrand on the right-hand side of eq 17 has two terms: first the conditional probability of finding a molecule of species B around a molecule of species A and second the probability of finding the molecule of species B. The integral gives the excess (or shortage) of molecules of species B in volume V around a molecule of species A with respect to the average number of B-type molecules in the same volume V. Obviously, at large distances (for small solutes, typically a few nanometers), the correlation between A-type and B-type molecules is lost (i.e., ρABμ(r) ρAμρBμ), and the integrand of eq 17 is zero. This means that the KB integral carries local, microscopic information that can be evaluated with a MD simulation. At the same time, it is possible to show[13,23] that the KB integral (eq 16) is equal towhere NA and NB are the number of molecules of type A and B, respectively, and δAB is the usual Kronecker’s delta. This equation expresses the connection with thermodynamics. The fluctuation of numbers of particles in the system is a macroscopic object, and it can be expressed in terms of derivatives of the chemical potential of A-type molecules with respect to the number of particles of species B, the isothermal compressibility, and the partial molar volumes of the two species.[13,23] These quantities can be measured experimentally. It is also possible to extract the KB integrals from these thermodynamical quantities.[25] Therefore, KB theory provides a useful protocol to analyze MD simulation and connect the results with measurable quantities; it requires extracting from the trajectory the pair correlation function and its integral, which are routinely computed. Indeed, it has been found in numerous applications in recent years, particularly in the context of force field parametrization.[12,14,15,26] Nevertheless, there are some caveats. First of all, MD simulations are performed at a constant number of particles, therefore, in the canonical, not in the grand-canonical ensemble. The connection between the KB integral and the measurable quantities relies upon eq 18, which in the canonical ensemble would be[23]where N1 and N2 are the fixed number of particles of species S1 and S2, respectively. Obviously, in this case, the connection with the chemical potential would be lost. How can we compute a grand-canonical average from a canonical simulation? A possible way is to compute the KB integral in a volume V′ that is much smaller than the volume of the system. In such a volume the number of molecules fluctuates. The rest of the system acts as a molecular reservoir. This procedure is correct as long as the pair correlation function in eq 16 decays to 1 within V′. This leads to the second caveat; sometimes the pair correlation function decays to 1 very slowly, and we need to account properly for its long tail. A careless truncation may have a bad impact on the evaluation of the KB integral. To understand why, let us consider the case in which the KB integral in eq 16 is computed in a spherical domain of radius RCIf at RC the pair correlation function is not 1, we are neglecting a contribution to the integral that might potentially be very large as it is multiplied by the square of the radius. Ganguly and van der Vegt[27] have investigated these caveats and proposed empirical corrections to the KB integrals to alleviate these problems. Others[28] have carried out the calculation of KB integrals using the adaptive resolution scheme for MD.[29] In our case, we decided to compute the integral of the pair correlation in the NVT ensemble without corrections, but we computed the integral of the pair correlation function up to 20 Å, which is large compared with what is commonly found in the literature.[12,14,15,27] In this way, we can check whether the KB integral has indeed reached a plateau. To ensure that the system can be considered grand-canonical, we run the simulations in cubic systems of roughly a 65 Å box length. This ensures that the reservoir is around 7 times larger than the volume V′ in which the pair correlation function is computed.

KB Observable Functions

In this paper, we use the method described above to optimize the MD force field using the three KB integrals of a binary liquid mixture as a target for optimization. We write an observable function associated with the KB integrals. We derive the observable function from the definition of the KB integral; using eq 15 for the pair correlation function but computed in the canonical ensemble, we obtain The function inside of the canonical ensemble average is our observable; in this case, it only depends on the set of coordinates R. We define our observable functions to bewhere ρ1 = N1/V and ρ2 = N2/V. Let us definethat is, the number of molecules of species B that are at a distance less than RC from a molecule of species A averaged on all of the molecules of species A. Similarly, we can definethe average number of molecules in a spherical volume of radius RC. We can rewrite eq 22 asTherefore, the observables in eq 23 measure the excess (or shortage) of a molecule of type A around a molecule of type B compared to the average in the system. The observables do not carry explicit parameter dependence, which simplifies the expressions for the gradient and the Hessian. While the observable function depends explicitly only on the coordinates, it is clear that its ensemble average depends on the composition of the liquid mixture. The ensemble average is a function of the variablesif we introduce the molar fractionthe same set of variables can be written as In the following, we are only interested in changes in parameters and molar fractions, and we will therefore drop the other variables; clearly, it is to be intended that experiment and computation are performed under the same conditions. For multiple mole fractions Nx, the target function is

Computational Details

Optimization

For the MD simulations, mixtures of TBA and water were prepared at the mole fractions of TBA in Table 1. The model used for water was TIP3P,[18] and it remained fixed throughout the process of potential refinement. The starting model chosen for TBA was the OPLSUA.[17] The system was prepared to have density consistent with that of experiment (error of about 1%;[30] see Table 1). Particle Mesh Ewald (PME)[31] was used to account for long-range electrostatic interactions with a grid of 64 × 64 × 64. Short-range electrostatic interactions were calculated by real space summation up to a cutoff of 9.5 Å; the same cutoff was used for van der Waals interactions. Periodic boundary conditions were applied. The equations of motion were integrated using the multiple time step integrator RESPA[32] with a time step of 1 fs. Short-range forces were updated every femtosecond, while long-range interactions were calculated every 4 fs according to the protocol in MOIL described in ref (8). The sampling in the NVT ensemble was enforced by rescaling the velocities (isokinetic ensemble[20]). The temperature was set to be 300 K in all of the simulations. The experimental results used in the target function are those in ref (19).

Table 1

Molar Concentration of TBA, Density, Volumes, and Number of Molecules of Each One of the Systems Simulateda

	concentrations
TBA molar concentration	0.04	0.10	0.14	0.17	0.20	0.30
density g/cm³	0.9707	0.9357	0.9146	0.9010	0.8836	0.8577
volume Å³	64.079³	64.393³	64.723³	64.987³	65.255³	66.170³
number of TBA molecules	304	637	808	919	1000	1289
number of water molecules	7290	5733	4968	4488	4096	3006

The parameters were chosen such that the densities were within 1% of the experimental values reported in ref (30) at 308.15 K.

The parameters were chosen such that the densities were within 1% of the experimental values reported in ref (30) at 308.15 K. An iteration of the optimization of the potential parameters includes a series of MD simulations to collect a sample of structures. An ensemble of structures computed with a particular force field is analyzed to calculate the new parameter set. The new parameter set and potential are used in a successive MD simulation, from which we collect new structures that are analyzed again. For the first three iterations, the target function used wasat the single mole fraction of TBA of x = 0.2. For the last iteration, we used the target functionwith mole fractions of TBA of x1 = 0.04 and x2 = 0.10. The optimization process was stopped at the IV iteration, where the experimental observables matched the simulated quantities within acceptable error bars. All of the MD simulations were performed using the software package MOIL in its GPU variant.[8] The analysis of the structures, including the calculation of the gradient and Hessian and the updating of the parameter set, was performed using the software POP[7] included in MOIL. The KB integrals were calculated up to a cutoff distance of 20 Å. For each iteration, a simulation of 60 ns was performed. We discarded the initial equilibration phase (10 ns), and from the last 50 ns, we collected 4990 equally spaced (in time) structures. The structures were used in the calculations for parameter optimization by POP.

Validation

To validate our potential, we examined the performance of the newly developed model over a range of mixtures at different concentrations. We prepared six systems at mole fractions for which experimental results for the Kirkwood integrals are known[19] (see Table 1). Each of the systems was simulated for 60 ns with the same setup as that described in the optimization paragraph, and the experimental observables were compared to the simulations.

Force Fields

The starting force field for TBA is OPLSUA.[17] In this force field, TBA is composed of six particles because each of the methyl groups is treated as a single particle without internal degrees of freedom. Here, we report the OPLSUA force field and the optimized force field, which we will refer to as POP4ff. The complete potential is a sum of bonding and nonbonding terms The functional form for bonded terms (bonds, angles, and torsions) is The nonbonded terms (LJ and electrostatic) arewith combination rules Our software utilizes the equivalent formulation Some of the results will be presented with respect to parameters A and B. Angles and bonds parameters were not optimized and are therefore shared between OPLSUA and POP4ff. The torsions parameters were optimized in POP4ff but are very similar to the ones of OPLSUA; POP4ff final parameters were (K1, K2, K3) = (0.0001, −0.0003, 0.3258), while OPLSUA parameters were (K1, K2, K3) = (0, 0, 0.325). Hence, in practice, only the amplitude of the three-fold rotation is different from zero.

Results and Discussion

TBA is a tertiary alcohol. Unlike the other butyl alcohols, TBA is miscible in water at any proportion and any temperature;[33] it is also the largest monohydric alcohol to be fully soluble.[34] TBA–water mixtures exhibit many anomalous physical properties. Solutions of TBA in water show an anomalously large volume contraction, indicating that the trimethyl groups must be somehow easily accommodated into the water structure.[35] There is evidence that TBA when added to water in solvating peptides behaves as a helix promoter.[36] The peculiar characteristics of TBA motivate us to apply our newly developed procedure to investigate it and improve current potentials. We first simulated the mixture of TBA and water using the parameters of the OPLSUA force field. After equilibration, phase separation is evident by visual inspection, as illustrated in Figure 2. This is a known effect as force fields optimized to reproduce pure liquid properties often exhibit too much self-aggregation when observed in solution.[37] This system was the starting point of our optimization. In the same figure, we show the same system once equilibrated with the optimized force field of the fourth iteration; we will refer to this force field as POP4ff.

Figure 2

Snapshots of the simulation box; TBA is in green. On the left, we show an equilibrium snapshot of the mixture of TBA and water at 0.20 TBA mole fraction using the OPLSUA force field. Phase separation is evident by visual inspection. On the right is the same system after equilibration using POP4ff; by visual inspection, the solution is mixed. In the optimization, we used KB integral extracted from small-angle X-ray scattering.[19] In calculating the KB integrals from MD data, we assumed the position of the oxygen atom to be the position of water molecule; similarly, the position of the TBA molecule was assumed to be the one of the central carbon. In Figure 3, we show the KB integrals for POP4ff as a function of the cutoff RC for all of the concentrations tested in our simulations. At a distance of 20 Å the integrals are approaching a plateau, suggesting that we are close to the region where the integral is converged.

Figure 3

The value of the KB integral of (A) TBA–TBA as a function of RC (see eq 21), (B) TBA–water, and (C) water–water. The mole fractions in Table 1 are displayed in different colors. At 20 Å, most of the curves are close to a plateau, indicating convergence. Lee and Van der Vegt[15] already used the KB theory to develop a force field for TBA, obtaining good results. They used the LJ parameters from GROMOS[38] and SPC[39] as a water model. They tuned the dipole moment of the TBA molecule so that they could better reproduce the KB integrals over a range of concentrations. Their protocol, while successful, shows the typical limitation of current force field development; they could adjust only a few parameters at the time. The choice of those parameters was left to chemical intuition. We repeated the optimization of the force field in an automated procedure using KB integrals as the optimization target for the POP algorithm. All of the parameters (excluding bonds and angles) of the model were subject to automated optimization following the gradient. The TBA molecule, in the united-atom model, is composed of three torsions (of one torsion type) and six atoms (of four different atom types). In OPLSUA, polar hydrogen atoms have a zero van der Waals radius. We kept this convention, and we did not optimize those parameters. The total number of parameters under optimization was 13. Four iterations of the optimization procedure were conducted. The progression in approaching the experimental values through the optimization iterations is shown in Figure 4.

Figure 4

(A) KB integral for TBA–TBA at a mole fraction of TBA 0.2 as a function of the optimization iteration. The first data point corresponds to the KB integral calculated with the force field OPLSUA.[17] The data point IV corresponds to the final force field POP4ff, and the experimental value[19] is represented by the black horizontal line. In the inset, the same data are shown with a magnified scale for the last four data points. The error bars, computed with block analysis,[20] are sometimes below the size of the point. (B,C) Same data for TBA–water and water–water; even in this case, the final force field reproduces the experimental value. The first three iterations were conducted using as a target the KB integrals with the mole fraction of TBA at 0.2. The first optimization step significantly improved the original OPLSUA force field, as shown in Figure 4. Iterations II and III yielded smaller but significant improvements. After the third iteration, the optimization procedure produced only minor improvements. We therefore decided to use a more informative target function. Inspection of the results over a broader range of concentrations suggested using as targets the KB integrals at TBA mole fractions of 0.04 and 0.10. The experimental KB integral of species TBA–TBA exhibits a global minimum at a TBA mole fraction of 0.04 and a global maximum at a TBA mole fraction of 0.10 (see squares in Figure 5). This feature is missing in the force field obtained after the third iteration (Figure 5). Iteration IV produced a significant improvement over the whole concentration range. The force field produced by the fourth iteration (POP4ff) reproduces very well the three KB integrals over a wide concentration range (0.04–0.3), outperforming both OPLSUA and the force field developed by Lee and Van Der Vegt. The KB integrals for the different force fields as functions of the concentration are shown in Figure 5. Note that only TBA mole fractions of 0.2, 0.04, and 0.1 were used at any time in the optimization, leaving us ample data for meaningful testing.

Figure 5

KB integrals as a function of TBA mole fraction. (A) TBA–TBA; (B) TBA–water; (C) water–water. The blue line shows the experimental results from ref (19), the red line shows results from Lee and Van Der Vegt potential,[15] the green line shows computational results from force field POP3ff, and the purple line shows computational results from force field POP4ff. The pair correlation functions of TBA-TBA, TBA–water and water–water are shown in Figure 6. It is clear that the system is now well mixed because long-range correlations are absent. The pair correlation functions of TBA–TBA and TBA–water of OPLSUA (yellow in Figure 6) and POP4ff (purple in Figure 6) are significantly different; OPLSUA shows two nearby peaks, whereas POP4ff shows a single smooth peak. The pair correlation functions of force fields POP3ff (green in Figure 6) and POP4ff deviate only slightly. We did not include any information about the shape of the pair correlation functions in the target function; this change is a byproduct of the optimization procedure. Whether this is correct or not is difficult to say because we do not know the pair correlation function from experiment, only its integral, which hides such features.

Figure 6

(A) Pair correlation function for species TBA–TBA; the yellow curve is the pair correlation function computed with the OPLS force field, the green curve is the pair correlation function computed with POP3ff, and the purple curve is the pair correlation function computed with force field POP4ff. (B,C) The same information as (A) for species TBA–water and water–water. All of the results were obtained at a TBA mole fraction of 0.20. The different interaction types that are optimized at once do not contribute in the same way to the optimization. The sensitivity vector g is defined as the gradient in parameter space of the target function in eq 24; it is a local feature of the target function, and it changes as the parameter set is improved by the optimization process. To provide useful information and following the procedure described in the Methods Section, we multiply the gradient by the scaling matrix D–1, enforcing homogeneity in parameter space (see eq 9). We also normalize the sensitivity vector. In Table 4, we report the 13 different components of the normalized and scaled sensitivity obtained after POP analysis of the simulations carried out with the OPLSUA parameters, that is

Table 4

Scaled and Normalized Sensitivitiesa of the 13 Parameters Used in the Optimization

Torsion
K₁	K₁	K₁
1.9 × 10^–5	1.9 × 10^–5	1.9 × 10^–5

See eq 25.

Bonded terms are the standard OPLS force field. Angles and bonds were not optimized; the change in torsions parameters was found to be small during the calculations, and their adjustment is ignored. See eq 25. Initially, the KB integrals are insensitive to the torsions parameters, mildly sensitive to the LJ parameters (as was also noted by Lee and van der Vegt[15] by direct testing), and highly sensitive to the charge distribution. It has been noted[37] that to properly mimic the behavior of liquid mixtures, tuning only the dipole moment of the solute is not sufficient; it is required to find a solute charge distribution that represents higher-order moments of the charge distribution. Indeed, we find that this distribution is the most important feature affecting the KB integrals. The dipole moment of TBA for OPLSUA is 2.28 D, and in our optimized force fields, it is 3.20 D. The difference in dipole moment is the result of a redistribution of partial charges involving all of the atoms; in OPLSUA, the methyl groups are neutral, and the positive charge is on the central carbon, while in our potential, the central carbon is almost neutral, and each of the methyl groups carry a small positive charge (see Table 3).

Table 3

Nonbonded Parameters for United-Atom TBA for the Force Fields OPLSUA and POP4ff

atom Type	q (e)	ε (kcal/mol)	σ (Å)
OPLS Parameters
C	0.265	0.050	3.800
CH₃	0	0.160	3.910
O	–0.700	0.170	3.070
H	0.435	0	0
POP Parameters
C	0.04670	0.0372	3.9899
CH₃	0.10900	0.1271	4.0618
O	–0.58197	0.1566	3.1104
H	0.20827	0	0

After the third iteration and using a target function that includes information on two concentrations, the sensitivity is significantly different. After the adjustment of the charge distribution, the improved force field shows the highest sensitivity to the LJ parameters. The last optimization step was indeed mainly a readjustment of the LJ parameters. Finally, we extracted the angular dependence of the distribution of TBA and water around a central TBA molecule (see Figure 7). The blue dots represent regions with high water density and the green dots regions with high TBA density. The figure is roughly symmetric for rotations of 120° around the C–O axis of TBA. Because of steric repulsion, the high densities are in the grooves between these atoms. A region with high density of water is situated just under the methyl groups; contrary to what could be intuitive, the methyl groups, which are usually considered hydrophobic, are found to be well hydrated. Around the hydroxyl group, TBA tends to stay closer than water. Our model of TBA does not show hydrophobic interactions between methyl groups.

Figure 7

Regions with the highest density of TBA (green) and water (blue) around a central TBA molecule. The densities are measured on a grid of size 0.5 Å. As before, we assumed the position of the TBA molecule to be the position of the central carbon and the position of the oxygen to represent the center of the water molecule. We color by green the cells that have a density of TBA larger than 85% of the maximum density of TBA measured. We color by blue the cells that have a density of water larger than 75% of the maximum density of water. The methyl groups are well hydrated, as shown by the presence of a region of space with the high density of water just under those groups. Methyl groups of TBA are also found close to the hydroxyl group of other TBA molecules. The presence of both water and TBA around both the hydroxyl group and the methyl groups indicates the absence of hydrophobic interactions between methyl groups.

Conclusions

We provided a simple systematic procedure to optimize force fields to reproduce properties of liquid mixtures connected to the KB integrals. We made a useful improvement to the original POP algorithm. The first version of POP was using gradient descent as a minimization algorithm. The introduction of the trust region Newton algorithm has improved the performance of POP in many ways. First of all, the Newton algorithm is known to have better convergence properties. Also, the concept of the trust region provides an easy and efficient way to assess the quality of the quadratic model used in the minimization. Finally, the use of a hyperelliptical trust region takes into account the different scales of magnitude present in the parameter set and allows modifications to the parameters that are homogeneous on a relative scale. We remark that the results shown in this paper were achieved with just four iterations of parameter adjustments. We developed a new force field for TBA that approximates the behavior of mixtures of TBA with water better than force fields currently available, as shown by comparison to the experimental KB values. In the first three iterations of the optimization, we included in the target function only the KB integrals at one concentration (0.20 mole fraction of TBA). These iterations showed larger sensitivity toward the partial charges of TBA. In the last step of the optimization, we included in the target function the KB integrals of two lower concentrations (0.04 and 0.10 mole fraction of TBA). In this case, the charges, already optimized, did not change significantly, while most of the sensitivity was to LJ parameters. Even though our algorithm allowed for changes in torsional parameters, they remained essentially unchanged from the original OPLSUA parameters. Lee and Van der Vegt[15] observed in the past that the KB integrals depend more on a partial charge distribution than on LJ parameters. Similar observations were made in the context of urea parametrization with KB integrals.[12] Our results for the first three iterations confirm, strengthen, and quantify these previous observations (see Table 3 for sensitivities to different force field parameters) as they are the consequence of an automated optimization of all of the parameters at once. Nevertheless, we also observed that a fourth step of optimization of LJ parameters was necessary to improve the force field beyond what we obtained from the first three iterations (see Figure 5). Lee and Van Der Vegt[15] used the GROMOS[38] force field, which has a more general (and complex) type of LJ interaction. The pair interaction parameters A and B are not separable to single-atom parameters (i.e., A=A·A) but depend instead on both indices. We illustrate here that the decomposable presentation, with a smaller number of parameters, works well. As a practical consequence, we note that separating the parameters describing pairs of interactions to products of two single-particle parameters makes it easier to apply the Ewald sum for LJ interactions and to obtain a more accurate description of long-range forces. Other differences between the POP4ff force field and the force field developed by Lee and Van De Vegt[15] lie in the values of the parameters that in some cases are strikingly different; for example, the radius for the central carbon is ∼6 Å in the force field that they used, while in POP4ff, the same carbon atom has a dimension that deviates only slightly from the original OPLS atom type (∼4 Å). The charge distribution is also very different because they chose to keep the methyl groups neutral. The fact that such diverse force fields produce similar results is a warning sign against optimizing liquid potentials solely according to KB data. The results are unlikely to be unique. This is also reflected in the comparison of two different potentials (POP3ff and control) reported in Appendix B. Therefore, when we refine the parameters using KB integrals, we need either to make sure that the changes to the force field are minimal or to use a larger pool of observables to ensure compatibility toward other observables. Finally, we want to stress two important features of our optimization method. First, our method minimizes the target function with minimal changes to the parameter set. This is an important feature, given that force field parameters were already optimized extensively in the past. If we need to adjust these parameters against a new set of data, small adjustments are to be preferred because they are less likely to perturb results of previous refinement of the current set of potential parameters. Second, our method does not require larger simulation time if the set of observables used for the optimization is increased. The change in the parameters is obtained as the postprocessed analysis of one MD trajectory. This will make it easier and faster to optimize the force field against a larger pool of observables whenever such a large pool is available.

Table 2

Bonded parameters for United-Atom TBA for the Force Fields OPLSUA and POP4ffa

bonds	k_b (kcal/mol Å²)	r₀ (Å)
O–H	553.0	0.945
C–O	320.0	1.430
C–CH₃	268	1.530

Bonded terms are the standard OPLS force field. Angles and bonds were not optimized; the change in torsions parameters was found to be small during the calculations, and their adjustment is ignored.

Table 5

Nonbonded Parameters for POP3ffa

atom type	q (e)	ε (kcal/mol)	σ (Å)
C	0.04670 (0.45702)	0.0499 (0.1155)	3.8010 (4.0471)
CH₃	0.10920 (0)	0.1595 (0.0645)	3.9119 (4.3646)
O	–0.58180 (−0.56038)	0.1696 (0.5831)	3.0695 (2.5736)
H	0.20750 (0.10336)	0 (0)	0 (0)

In brackets are the parameters for control.

21 in total

1. Development and testing of a general amber force field.

Authors: Junmei Wang; Romain M Wolf; James W Caldwell; Peter A Kollman; David A Case
Journal: J Comput Chem Date: 2004-07-15 Impact factor: 3.376

Review 2. Empirical force fields for biological macromolecules: overview and issues.

Authors: Alexander D Mackerell
Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376

3. A Kirkwood-Buff derived force field for the simulation of aqueous guanidinium chloride solutions.

Authors: Samantha Weerasinghe; Paul E Smith
Journal: J Chem Phys Date: 2004-08-01 Impact factor: 3.488

4. A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6.

Authors: Chris Oostenbrink; Alessandra Villa; Alan E Mark; Wilfred F van Gunsteren
Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376

5. Convergence of Sampling Kirkwood-Buff Integrals of Aqueous Solutions with Molecular Dynamics Simulations.

Authors: Pritam Ganguly; Nico F A van der Vegt
Journal: J Chem Theory Comput Date: 2013-02-28 Impact factor: 6.006

6. Kirkwood-Buff Analysis of Liquid Mixtures in an Open Boundary Simulation.

Authors: Debashish Mukherji; Nico F A van der Vegt; Kurt Kremer; Luigi Delle Site
Journal: J Chem Theory Comput Date: 2012-02-03 Impact factor: 6.006

Review 7. CHARMM: the biomolecular simulation program.

Authors: B R Brooks; C L Brooks; A D Mackerell; L Nilsson; R J Petrella; B Roux; Y Won; G Archontis; C Bartels; S Boresch; A Caflisch; L Caves; Q Cui; A R Dinner; M Feig; S Fischer; J Gao; M Hodoscek; W Im; K Kuczera; T Lazaridis; J Ma; V Ovchinnikov; E Paci; R W Pastor; C B Post; J Z Pu; M Schaefer; B Tidor; R M Venable; H L Woodcock; X Wu; W Yang; D M York; M Karplus
Journal: J Comput Chem Date: 2009-07-30 Impact factor: 3.376