Literature DB >> 28418655

Accuracy Test of the OPLS-AA Force Field for Calculating Free Energies of Mixing and Comparison with PAC-MAC.

Augustinus J M Sweere¹, Johannes G E M Fraaije¹.

Abstract

We have calculated the excess free energy of mixing of 1053 binary mixtures with the OPLS-AA force field using two different methods: thermodynamic integration (TI) of molecular dynamics simulations and the Pair Configuration to Molecular Activity Coefficient (PAC-MAC) method. PAC-MAC is a force field based quasi-chemical method for predicting miscibility properties of various binary mixtures. The TI calculations yield a root mean squared error (RMSE) compared to experimental data of 0.132 kBT (0.37 kJ/mol). PAC-MAC shows a RMSE of 0.151 kBT with a calculation speed being potentially 1.0 × 104 times greater than TI. OPLS-AA force field parameters are optimized using PAC-MAC based on vapor-liquid equilibrium data, instead of enthalpies of vaporization or densities. The RMSE of PAC-MAC is reduced to 0.099 kBT by optimizing 50 force field parameters. The resulting OPLS-PM force field has a comparable accuracy as the OPLS-AA force field in the calculation of mixing free energies using TI.

Entities: Chemical Disease Gene

Year: 2017 PMID： 28418655 PMCID： PMC5425945 DOI： 10.1021/acs.jctc.6b01106

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Molecular simulations have made significant contributions in the prediction of chemical processes and thermodynamic properties with applications ranging from protein folding[1] to enhanced oil recovery.[2] The interactions between atoms, which induce motion in classical molecular dynamics (MD) simulations or translation in Monte Carlo (MC) simulations, are calculated using force fields. In general, force fields consist of simplified pairwise potential energy functions with associated atom-type dependent parameters.[3] Various force fields have been developed differing in genericity, coarsening, accuracy, fitted properties, and the used set of molecules within the optimization.[4] We mention three well-known force fields: AMBER, COMPASS, and OPLS. The AMBER force field is designed for simulation of proteins and nucleic acids.[5] AMBER was originally developed as a united-atom force field; however, a more accurate all-atom representation was published 2 years later.[6] An extended version, including parameters optimized for organic molecules, is known as the general AMBER force field (GAFF).[7] Within both AMBER and GAFF, the van der Waals and hydrogen bonding parameters are obtained from crystal structures and lattice energies. The atomic partial charges, required for electrostatic interactions, are derived using ab initio quantum mechanics. The all-atom COMPASS and COMPASS II force fields are developed for simulations of organic molecules, inorganic small molecules, and polymers.[4,8] The van der Waals parameters are obtained by fitting enthalpy of vaporizations and densities, calculated using MD simulations, to experimental data. The atomic partial charges are derived using ab initio quantum mechanics and empirically adjusted to take hydrogen bonding effects into account. The OPLS force field (published in a united-atom version[9] and an all-atom version[10]) and improved OPLS3 force field[11] are created for liquid simulations containing organic molecules and proteins. The van der Waals parameters are, comparable to COMPASS, optimized using experimental liquid properties, mainly enthalpy of vaporizations and densities. The atomic partial charges are derived using quantum calculations as well as using experimental condensed-phase properties. A reliable MD or MC simulation fully depends on the quality of the force field.[3] The accuracy of various force fields is extensively tested by Van der Spoel et al. for various thermodynamic properties.[12] The OPLS-AA force field outperforms GAFF in the prediction of experimental enthalpy of vaporization with a measured root-mean-square error (RMSE) of 6.5 kJ/mol versus 10.6 kJ/mol. For the calculation of hydration free energies and solvation free energies using nonpolarizable force fields, the errors are slightly lower: Jorgensen et al.[13] obtained a minimum average unsigned error (AUE) of 1.03 kcal/mol (= 4.3 kJ/mol), Mobley et al.[14,15] obtained a minimum RMSE of 1.00 kcal/mol (= 4.2 kJ/mol), Sherman et al.[16] obtained for OPLS_2005 an AUE of 1.10 kcal/mol (= 4.6 kJ/mol), and Van der Spoel[17,18] obtained for GAFF a RMSE of 3.7 kJ/mol. For an accurate calculation of mixing free energies, the error should be significantly lower. The difference in free energy between an ideal binary mixture and a phase separated system is maximum 1.7 kJ/mol at 298 K.[19] An extensive accuracy test of mixing free energies calculated using MD or MC is, to our knowledge, not published yet. Jedlovszky et al. did perform mixing free energy calculations of several important binary mixtures using various force fields.[20−22] Although the maximum absolute error was always lower than 1 kBT (= 2.5 kJ/mol at 298 K), the weakness of force fields was exposed. A mixture of acetone and water turned out to be immiscible at 298 K for all considered combinations of force fields with the exception of the combination of the acetone model developed by Pereyra, Asar, and Carignano[23] with the TIP5P-E water model.[20] Also none of the force field combinations were able to predict a positive entropy of mixing at a DMSO mole fraction below 0.8 for a mixture of DMSO and water.[22] A disadvantage of MD and MC simulations is the extensive computational time which is often required to obtain precise results. A less time-consuming approach to obtain mixing free energies is the use of methods based on the quasi-chemical approximation of Guggenheim.[24] The method is described in several textbooks.[25] In short, quasi-chemical models represent condensed molecular mixtures as a collection of independent interacting pairs of sites or molecules. The probability to obtain an interacting pair is derived from the theory of chemical reactions. The quasi-chemical approximation is basically an improvement of the regular solution model which assumes a random distribution of interacting pairs of sites or molecules.[26] The two best known quasi-chemical methods are COSMO-RS[27] and UNIFAC.[28] The COSMO-RS model represents liquid mixtures as a collection of interacting charged surface panels, whereas the UNIFAC model represents liquid mixtures as a collection of interacting functional groups. The parameters of both methods are optimized by fitting experimental data. We recently published a quasi-chemical method based on force fields: the Pair Configuration to Molecular Activity Coefficient (PAC-MAC) model.[29] The PAC-MAC model represents molecular mixtures as a collection of interacting molecular pairs. Activity coefficients and related mixing free energies are calculated using a large set of sampled molecular pair configurations.[29,30] The RMSE of PAC-MAC in the calculation of mixing free energies using the OPLS-AA force field turned out to be 0.153 kBT (or 0.43 kJ/mol), and the calculation time is only a fraction of the required simulation time of corresponding MD or MC simulations. Since only two-body interactions are calculated within PAC-MAC, in order to reduce calculation time, we expect the accuracy of MD and MC, which involve N-body interactions, to be higher. The presented research within this paper consists of three different components: 1. We perform an accuracy test of the OPLS-AA force field based on mixing free energies calculated using thermodynamic integration (TI) of MD simulations for an extensive set of equimolar binary mixtures and compare the results with the PAC-MAC model. 2. We improve the accuracy of the PAC-MAC method by optimizing force field parameters and PAC-MAC specific parameters. 3. We check whether or not the optimized force field parameters induce accuracy improvement of the mixing free energies calculated using TI of MD simulations. We reach a few general conclusions. First, a large data set of mixing free energies is useful for comparison between quasi-chemical thermodynamics and force field simulations and could be a new tool for further analysis of force field performance. Second, PAC-MAC as a force field based quasi-chemical method performs, after optimization, comparable to existing quasi-chemical thermodynamics. The performance is achieved without the need for quantum calculations (COSMO-RS[27]) or group assignments (UNIFAC[28]) but relies on well-established force field parameters. The amount of additional empirical parameters is very little in comparison with other quasi-chemical methods, especially in comparison with UNIFAC which contains hundreds of parameters optimized using experimental miscibility data. Subsequently, TI of MD simulations on the same data set indicates that mixtures of most chemical classes are modeled quite well by the OPLS-AA force field, even though such experimental data was not used in the original force field parameter optimization. Mixtures containing water are an exception. We indicate that a very modest modification of water charge assignment, by reduction of 6.4% in partial atomic charges, is already enough to improve the results on average by 0.3 kBT. Finally, PAC-MAC could potentially serve as a fast proxy method for MD simulations, with a correlation coefficient of 0.82 between both methods. Such correlation could already be sufficient for the use of PAC-MAC as a fast indicator for new intermolecular force field parameters in case of missing interactions. Full replacement of MD, for force field parameter improvement, is a tantalizing prospect but not achieved fully here. We briefly consider quantitatively to what extent correlations need to be improved further to achieve such goal in future research. The article is organized as follows. Within “Theoretical Basis”, we first explain the theory of thermodynamic integration for the calculation of free energies of mixing and then briefly summarize the PAC-MAC method. Subsequently, within “Results and Discussion”, we first test the accuracy of the OPLS-AA force field in the prediction of mixing free energies using TI of MD simulations. Then, we optimize a set of force field parameters using the PAC-MAC method. Finally, we calculate the predictive capacity of the obtained OPLS-PM force field using TI. All experimental and calculated free energies of mixing for 1053 binary mixtures are presented in the Supporting Information.

Theoretical Basis

In this section, we discuss the approach to calculate excess mixing free energies using TI of MD simulations. Also the formulation of the PAC-MAC method is briefly explained. For more information concerning the PAC-MAC method, we refer to two previously published articles.[29,30]

Thermodynamic Integration

We calculate the molar excess Gibbs free energy of mixing for an equimolar binary mixture of molecules A and B, so x = x = 0.5. The excess Gibbs free energy of mixing is given byin which the excess Helmholtz free energy of mixing Fex and the excess molar volume V are given by V, V, and V in eq represent the molar volumes of respectively pure component A, pure component B, and a mixture of A and B with equal mole fractions. The molar volumes are calculated using MD simulations in the NPT ensemble with a barostat set to a virtual pressure p of 1 atm. Fgas in eq represents the excess Helmholtz free energy caused by volume expansion or compression of an ideal gas mixture:The other free energy terms in eq are calculated using thermodynamic integration by switching on intermolecular interactions using a coupling parameter λ[17,20−22,31,32]in which ⟨...⟩λ indicates an ensemble average at a given value for λ, and U(λ) represents the total potential energy in a simulated frame of system I ∈ {A,B,AB}. The decoupled state, in which the molecules behave as an ideal gas, refers to λ = 0, and the coupled state, in which the molecules fully interact, refers to λ = 1. To avoid singularities, caused by the repulsive r–12 term in the Lennard-Jones potential, we choose a scaling of λ4 and λ2 for respectively the van der Waals and electrostatic interaction between two atoms i and j separated by a distance r(32)where ε and σ in eq are respectively the energy and distance parameter of the Lennard-Jones potential between atom i and atom j, and q in eq is the partial charge of atom i. Summation of uVdW (r,λ) and uElec (r,λ) over all combinations of atoms i and j in a simulated frame, using a cutoff radius of 12.5 Å, results in the total van der Waals and electrostatic energy:The integrand of eq can be split in a van der Waals and electrostatic part because other potential energy terms (bond, angle, and torsion energies) are not a function of λin which ⟨UVdW(λ)⟩λ and ⟨UElec(λ)⟩λ represent an ensemble average of the total inter- and intramolecular van der Waals and electrostatic energies at a given value for λ. The ensemble average potential energies are evaluated using MD simulations in the NVT ensemble[20−22] at 11 evenly spaced values for λ: 0.0, 0.1, ..., 0.9, and 1.0. The integral of eq is obtained using a “not-a-knot” spline fitted through ⟨UVdW(λ)⟩λ and ⟨UElec(λ)⟩λ.[33] Furthermore, the block averaging approach is used to predict the standard deviation of the calculated ensemble average potential energies and subsequently the standard deviation of the calculated mixing free energy.[34]

Simulation Setup

All MD simulations are performed using the Culgi software.[35] The force field parameters, used to calculate interactions between the atoms, are taken from the OPLS-AA force field.[10] Our protocol to obtain the excess mixing free energy of an equimolar binary mixture of molecules A and B is summarized below: 1. A cubic box, with an edge length of 30 Å, is filled with randomly oriented molecule A to an amount wherein the density is equal to the experimental density of A. If the total amount of molecules in the box is less than 120, then the edge length is increased to an extent at which 120 molecules are required for a density equal to the experimental density. 2. The potential energy of the box is minimized using the Quick-Min method for a maximum of 10000 steps.[36] An atom-based cutoff radius is set to 10.0 Å. 3. The system is equilibrated for 25 ps in the NVT ensemble. An Andersen thermostat is used to control the simulation temperature at the desired temperature.[37] A group-based cutoff is used to truncate long-range van der Waals and electrostatic interactions. The cutoff radius is set to 12.5 Å. We are aware that the use of a particle mesh Ewald (PME), to better incorporate long-range electrostatics, is more common in MD simulations.[12,14,17] However, a PME would increase the total calculation time by over a factor of 3. Moreover, the use of neutral subunits within the molecules in OPLS-AA is well suited for a group-based cutoff.[10] We expect the added value of a PME to be negligible in comparison with a group-based cutoff of 12.5 Å, in accordance with the results obtained by Piana et al.[38] 4. The equilibration of the system is continued for 25 ps in the NPT ensemble. The simulation pressure is set to 1.0 atm, controlled by the Andersen barostat.[37] 5. A production simulation is performed for 50 ps in the NPT ensemble to calculate the average box volume. The ensemble average box volume is used to calculate the molar volume V in eq and eq . 6. The box volume is rescaled and fixed for the upcoming MD simulations to the ensemble average box volume calculated in step 5. 7. The system is equilibrated for 25 ps in the NVT ensemble using a coupling parameter of λ = 1.0 for the van der Waals and electrostatic potential given in eq and eq respectively. 8. A production simulation is performed for 25 ps in the NVT ensemble using a coupling parameter of λ = 1.0 to calculate ⟨UVdW(λ)⟩λ and ⟨UElec(λ)⟩λ in eq . 9. Steps 7 and 8 are repeated for other coupling parameters λ ∈ {0.9, 0.8, ..., 0.1}. 10. The free energy difference ΔF between a fully interacting condensed system and an ideal gas system is calculated using eq . 11. Steps 1–10 are repeated for a box filled with only molecule B and for a box filled with an equal amount of molecules A and B. 12. The excess molar Gibbs and Helmholtz free energy of mixing are calculated using respectively eq and eq .

PAC-MAC

The theoretical basis of PAC-MAC has been extensively explained in previous papers.[29,30] We present a general overview of the method in this section. The PAC-MAC method consists of three consecutive steps briefly explained below: surface generation, pair sampling, and free energy minimization.

Surface Generation

First, the surface of a molecule is defined by the outer surface of spheres around the atomic nuclei in the molecule. The radius of these spheres is given byin which σ is the Lennard-Jones distance parameter of atom k, and sseg is an atom-type independent multiplication factor with an optimized value of 0.62.[30] Subsequently, the generated molecular surface of a molecule I is divided in L surface panels with an area of about 0.5 Å2 each.

Pair Sampling

Second, a sampling of pair configurations is performed between all possible combinations of two molecules in the mixture. A pair configuration represents a possible composition of two nearest neighboring molecules in a condensed phase; therefore, the sampling is performed within the first coordination shell of the molecules. The total amount of sampled pair configurations between two molecules is m = 5 × 104, by default. Of each pair configuration i ∈ [1..m] between molecules I and J, we track two properties for the last step of the PAC-MAC method. The first property is the intermolecular energy w calculated using the OPLS-AA force field,[10] and the second property is a depiction of the surface panels j which are covered (o = 1) and uncovered (o = 0) by the neighboring molecule. See Figure for an illustration of the pair sampling procedure.

Figure 1

Examples of sampled pair configurations with occupied surface panels (black colored) and intermolecular energy for ethanol–ethanol (1), ethanol–acetonitrile (2), acetonitrile–ethanol (3), and acetonitrile–acetonitrile (4).

Free Energy Minimization

In PAC-MAC, the expression for the free energy of mixing is based on the quasi-chemical approximation of Guggenheim[24,39] and is given byThe 5 contributing free energy terms in eq are given by the entropy of mixing of an ideal solution (Fid), a modified Staverman–Guggenheim correction term (FSG),[30,40] the combinatorial expression according to the quasi-chemical approximation (Fcomb), the entropic contribution due to inhomogeneities in the occupation of the molecular surface (Fvac), and the total intermolecular energy (Fint). The expressions of the free energy terms in eq are, for a binary mixture containing molecules A and B, given byin which x, φ, and y are respectively the mole, volume, and coordination fraction of component I ∈ {A,B} in the mixture, z represents the average coordination number, x is the fraction of pair configuration i between molecules I and J in the mixture, A represents the surface area of surface panel j in component I, ⟨A⟩ is the average occupied area in an interaction on molecule I, xvac represents the unoccupied area fraction of surface panel j on molecule I, and xvac is the total unoccupied surface fraction of a molecule. Only the latter variable, xvac, included in eq , is an empirically obtained general parameter equal to 0.6. The expression for the free energy, eq , is minimized subject to the constraint stating the occupation fraction of each surface panel to be equal to the probability that a panel is covered by neighboring molecules:Finally, various thermodynamic miscibility properties can be obtained from the minimized expression for the free energy. Examples of these properties are activity coefficients, Flory–Huggins χ-interaction parameters, and vapor–liquid equilibrium phase diagrams. We refer to the Supporting Information of a previously published paper[30] for more details regarding the derivation of the activity coefficients. Furthermore, an analytical expression is derived for the calculation of the standard deviation of the obtained mixing free energy. Assuming the set of sampled pair configurations i ∈ [1..m] between molecules I ∈ {A,B} and J ∈ {A,B} to be uncorrelated, then the following relation holds for the variance of the mixing free energy:[41]

Results and Discussion

The excess Gibbs free energy of mixing, calculated using TI and eq , is compared with experimental data from the NIST database.[42] For a binary mixture, containing molecules A and B, the experimental excess free energy of mixing is calculated usingwith xliq and γ being respectively the mole fraction and activity coefficient of component I in the liquid phase. The activity coefficient of component I is calculated using the extended Raoult’s Law assuming gaseous idealityin which xgas represents the mole fraction of component I in the vapor phase, and P(xliq) is the vapor pressure of the binary mixture at mole fraction xliq taken from experimental vapor–liquid equilibrium diagrams at constant temperature. The used subset of the NIST database,[42] containing 1092 experimental excess free energies of mixing of equimolar binary mixtures, is presented in a previously published article.[30] Mixing free energies are not calculated with TI for 39 binary mixtures because intramolecular force field parameters are missing (32 binary mixtures) or the experimental vapor pressure (>5 atm) is far above the simulation pressure of 1 atm resulting in gas formation within the MD simulation (7 binary mixtures). Although the use of a higher simulation pressure would be justified in the latter case, because the pressure–volume work term pV is often negligible in condensed phase systems,[43] we decided to be consistent in the used pressure of 1 atm and remove the 7 mixtures from the data set. The combinations of molecules in the remaining set of 1053 binary mixtures are shown in Figure . The data set contains 159 different molecules which are clustered in 9 categories. The category “molecules containing oxygen (excluding alcohols)” contains 11 ketones, 4 aldehydes, 16 ethers, and 11 esters. The category “molecules containing nitrogen” contains 8 amines, 3 nitriles, and 3 nitrogen-containing aromatic compounds. Dimethyl disulfide, 2-ethoxyethanol, and hexafluoroisopropanol are three examples of the 19 molecules with different or multiple functional groups.

Figure 2

Combinations of molecules in the experimental data set containing 1053 binary mixtures. The 159 different molecules are clustered in 9 categories.

Combinations of molecules in the experimental data set containing 1053 binary mixtures. The 159 different molecules are clustered in 9 categories. A comparison of the calculated excess Gibbs free energy of mixing, by TI performed on MD simulations using the OPLS-AA force field, with experimental data for 1053 binary mixtures is shown in Figure . The error bar attached to each point indicates the predicted standard deviation of the calculation, obtained using block averaging.[34] The root mean squared error (RMSE) is calculated usingin which n is the amount of data points (n = 1053), and h is the index of a data point. The experimental and calculated data are presented in the Supporting Information. Note that mixtures with an obtained Gex above 0.69 kBT are metastable since this is the upper limit for binary mixtures to initiate phase separation. Phase separation is not observed in the performed MD simulations because of the relatively large surface tension that has to be overcome due to the limited size of the box.

Figure 3

Scatterplot of the excess Gibbs free energy of mixing: TI using the OPLS-AA force field versus experimental data for 1053 binary mixtures. Error bars indicate the predicted standard deviation of the calculations. Figure clearly shows correlation between the free energy obtained using TI and experimental data. The obtained correlation coefficient of 0.79 is lower than the correlation coefficient for the prediction of densities (ρ = 0.99) and enthalpies of vaporization (ρ = 0.94) using OPLS-AA.[12] However, this is expected for two reasons. First, the OPLS-AA force field parameters are optimized using experimental data of mainly enthalpies of vaporization and densities and not using experimental free energies of mixing. As a consequence, greater predictability is expected for enthalpies of vaporization and densities compared to mixing free energies. Second, the range of values for the excess free energy of mixing is much smaller than the range of values for the enthalpy of vaporization. The calculated mixing free energies vary between −0.5 kBT and 1.0 kBT, whereas the calculated enthalpies of vaporization are up to 40 kBT.[12] Therefore, a small absolute error can cause a big relative error. On the other hand, because of the smaller range of values, the observed RMSE of 0.132 kBT (or 0.37 kJ/mol) for the prediction of excess mixing free energies is much smaller than the RMSE of 6.5 kJ/mol for the prediction of enthalpies of vaporization. We find it remarkable that OPLS-AA gives such a small error, given that the data set was not included in the original parametrization. The accuracy of the OPLS-AA force field is even higher than indicated by the observed error, since the standard deviation of the TI calculation itself affects the observed RMSE. Two types of calculation errors contribute to the observed deviations from experiments: the error introduced by wrong values of the force field parameters and the standard deviation of the calculation itself. The average observed error is defined by RMSEobs, the average effective error induced by the force field is defined by RMSEeff, and the average predicted standard deviation of the TI calculations is defined by RMSEstd. We can derive a relation between the three RMSEs based on three assumptions. First, the standard deviations of the TI calculations are assumed to be independent of the errors induced by the force field. Second, the standard deviations of the TI calculations and the errors induced by the force field are assumed to be the only factors affecting the observed errors. Finally, the standard deviations of the TI calculations are assumed to be unbiased. If the three proposed assumptions are fulfilled, then the following relation holds between RMSEobs, RMSEeff, and RMSEstd:The root mean squared standard deviation of the TI calculations equals 0.064 kBT resulting in an effective RMSE for the OPLS-AA force field of 0.115 kBT if infinitely long MD simulations are performed. As shown in Figure , the OPLS-AA force field, as calculated by TI, performs quite well over all categories, except for binary mixtures containing water. The mixing free energies of binary mixtures containing water are, in general, overestimated by molecular simulations. The bias is caused by the used flexible TIP3P water model[44] for which the force field parameters are optimized using the enthalpy of vaporization and density of pure liquid water, so experimental miscibility data is not taken into account.[45] Jin et al. show a similar systematic overestimation of hydration free energies calculated using the AMBER force field and a SPC/E water model.[46] The largest overestimation is obtained for hexafluoroisopropanol–water at 298.15 K (Gex experimental: 0.026 kBT, Gex TI: 0.850 kBT). The representation of the strongly electron withdrawing trifluoromethyl groups by nonpolarizable partial charges of the OPLS-AA force field might also affect the results in this case. Another big outlier is 2,2,2-trifluoroethanol–tetrahydrofuran at 298.144 K (Gex experimental: −0.586 kBT, Gex TI: −0.009 kBT). In accordance with the obtained overestimation for hexafluoroisopropanol−water, this result might have been affected by a weak representation of trifluoromethyl groups within the OPLS-AA force field as well. In accordance with the results of Jedlovszky et al.,[20] we also observe acetone to be incorrectly immiscible with water at equal mole fraction. The observed mixing free energy is a positive 0.263 ± 0.083 kJ/mol at a temperature of 308 K. The Helmholtz free energy of mixing for an equal molar acetone–methanol mixture equals −1.075 ± 0.085 kJ/mol. Also this value is in accordance with the results of simulations performed by Jedlovszky et al., even though they fixed the internal coordinates of the atoms within the molecules; they calculated a Helmholtz free energy of mixing of −0.97 kJ/mol using the comparable OPLS-UA force field.[21] The PAC-MAC method predicts mixing free energies assuming incompressible fluids, so the excess molar volume V equals 0. As a consequence, there is no distinction between Gibbs and Helmholtz free energy of mixing in PAC-MAC: Fex = Gex. The pV term is usually negligible in condensed phase systems.[43] This is confirmed by the performed MD simulations: the observed root mean squared value of pV equals 4.4 × 10–5kBT (or 0.13 J/mol). In a previous publication, the total observed RMSE of PAC-MAC is proven to be 0.153 kBT, with a correlation coefficient of 0.695, for the calculation of excess free energies in comparison with experimental data.[30]Figure shows comparable results for the subset of 1053 binary mixtures. Of note is that Figure is almost the same as Figure 9 from our previously published paper.[30] We included Figure for easier comparison with the information from the MD simulations. There are two differences. First, the used database of binary mixtures for Figure is a subset of the used database for Figure 9 in ref (30). Second, each data point in Figure contains an error bar, indicating the predicted standard deviation of the calculation.

Figure 4

Scatterplot of the excess Gibbs free energy of mixing: PAC-MAC using the OPLS-AA force field versus experimental data for 1053 binary mixtures. Error bars indicate the predicted standard deviation of the calculations. It is shown in Figure that the standard deviation is usually within the size of a dot. The values calculated using PAC-MAC are precise: the root mean squared standard deviation equals 0.014 kBT. Therefore, the standard deviation only slightly reduces the effective RMSE to 0.150 kBT, calculated using eq , for an infinite amount of sampled pair configurations. The effective RMSE of PAC-MAC is 30% higher than the effective RMSE of TI caused by several assumptions made within the model. The three assumptions with the highest impact are neglecting multiple body interactions, neglecting interactions beyond the first coordination shell, and using an expression for the free energy which is not exact. MD simulations, which contain no additional assumptions besides the force field, are therefore always more accurate than PAC-MAC using the same force field. The benefit of the assumptions made within PAC-MAC is a great reduction of calculation time: a Gex calculation using PAC-MAC takes 10 min on a single core (Intel Xeon E5-2620), whereas the same calculation using TI takes 5 days. So the calculation speed is increased with a factor 700 at the expense of a 14% reduction in accuracy. In principle, the reduction of calculation time is even much greater. A Gex calculation using TI requires on average the calculation of 1.0 × 1010 molecular pair interactions, whereas a Gex calculation using PAC-MAC requires the calculation of only 2.0 × 105 molecular pair interactions. The calculation of the molecular pair interactions takes about 20% of the total calculation time of PAC-MAC. If we assume that the calculation time of a molecular pair interaction in PAC-MAC is equal to the calculation time of a molecular pair interaction in MD, then the calculation of mixing free energies can potentially be 1.0 × 104 times faster using PAC-MAC than using TI. Furthermore, in accordance with TI and shown in Figure , also PAC-MAC in general overestimates mixing free energies of binary mixtures containing water. The biggest outlier is again obtained for a mixture of hexafluoroisopropanol–water at 298.15K (Gex experimental: 0.026 kBT, Gex PAC-MAC: 0.712 kBT). Correlation in the predicted mixing free energy is expected since both methods use the same OPLS-AA force field. A proof of correlation between the results obtained by TI and PAC-MAC is given in Figure .

Figure 5

Scatterplot of the excess Gibbs free energy of mixing: TI versus PAC-MAC using the OPLS-AA force field for 1053 binary mixtures. Error bars indicate the predicted standard deviation of the calculations. Figure shows a correlation coefficient of 0.815 between TI and PAC-MAC for the calculation of mixing free energies. Moreover, no bias for mixtures containing water is observed. Indicating that the overestimation of Gex by PAC-MAC, shown in Figure , is, in accordance with TI, also caused by the TIP3P water force field and not by other assumptions made within the model. The high correlation between TI and PAC-MAC offers opportunities to optimize force field parameters, required in MD simulation, using PAC-MAC as a quick auxiliary method. An accurate predictive capacity is essential for a good auxiliary method.[47,48] Our protocol to optimize force field parameters consists of three consecutive steps: 1. The correlation between TI and PAC-MAC is increased in order to improve PAC-MAC as the predictive auxiliary method. 2. Force field parameters are optimized by minimizing the RMSE of PAC-MAC in comparison with experimental data. 3. The accuracy of the optimized force field parameters in the prediction of mixing free energies is tested using TI. The three above-mentioned steps are explained hereinafter.

Increasing the Correlation between TI and PAC-MAC

The correlation between TI and PAC-MAC is further increased by optimizing the xvac parameter in eq and by introducing 16 atom-type dependent sseg instead of a single general sseg parameter equal to 0.62 in eq . The different atom-types k are taken from the Dreiding force field[49] and are expressed in Daylight SMARTS notation.[50] The sseg and xvac parameters are optimized by minimizing the RMSE of PAC-MAC compared to TI, in the calculation of excess free energies of mixing, using the Gauss–Newton algorithm. An upper and lower limit for sseg is set to 0.8 and 0.4, respectively, to avoid unphysical values. An overview of the optimized sseg and xvac parameters is given in Table .

Table 1

Optimization of sseg and xvac Parametersa

The atom-types k are written in Daylight SMARTS notation. The column “# In dataset” contains the amount of binary mixtures in which the parameter is present.

The atom-types k are written in Daylight SMARTS notation. The column “# In dataset” contains the amount of binary mixtures in which the parameter is present. Figure shows a reduction of the RMSE and an increase in the correlation coefficient to respectively 0.109 kBT and 0.861 by using the optimized parameters shown in Table .

Figure 6

Scatterplot of the excess Gibbs free energy of mixing: TI versus PAC-MAC using the OPLS-AA force field and optimized sseg and xvac parameters for 1053 binary mixtures. Error bars indicate the predicted standard deviation of the calculations.

Optimization of Force Field Parameters

Subsequently, the accuracy of the PAC-MAC model is increased by optimizing force field parameters given the sseg and xvac parameters presented in Table . The tuned parameters contain 16 Lennard-Jones ε-values and 16 Lennard-Jones σ-values of the Dreiding atom-types k presented in Table . Note that the atom-type [H][!#7;!#8;!#9] refers to a hydrogen atom unable to form hydrogen bonds.[50] Hydrogen atoms attached to oxygen or nitrogen keep Lennard-Jones parameters ε = 0.0 kcal/mol and σ = 0.0 Å, in accordance with the OPLS-AA force field. Also atomic partial charges of the 18 most observed OPLS-AA charge groups in the used data set are optimized by scaling all partial charges of the atoms within the neutral charge group with a factor tCG. Other charge groups keep their original OPLS-AA atomic partial charges. A comparable optimization is performed by Jin et al. to fit hydration free energies using 40 atom-type dependent dispersion term scaling factors.[46] All 50 parameters are optimized by minimizing the RMSE of PAC-MAC compared to experimental data, in the calculation of excess free energies of mixing, using the Gauss–Newton algorithm. The optimization of too many parameters might result in overfitting. We comply with the rule of thumb that the number of data points should be at least 5 times bigger than the number of parameters.[51] Moreover, the optimized parameters may deviate a maximum of 20% from the initial parameters to avoid unphysical values. An overview of the optimized Lennard-Jones ε- and σ-parameters is given in Table .

Table 2

Optimization of Lennard-Jones ε- and σ-Parametersa

The atom-types k are written in Daylight SMARTS notation. The column “# In dataset” contains the amount of binary mixtures in which the atom-type is present.

The atom-types k are written in Daylight SMARTS notation. The column “# In dataset” contains the amount of binary mixtures in which the atom-type is present. An overview of the 18 most observed OPLS-AA charge groups in the used data set and corresponding optimized scaling factors tCG is given in Table .

Table 3

Optimization of the Atomic Partial Charge Scaling Factors tCG of the 18 Most Observed OPLS-AA Charge Groups in the Used Data seta

The atom-types k are written in Daylight SMARTS notation. The column “# In dataset” contains the amount of binary mixtures in which the OPLS-AA charge group is present.

The atom-types k are written in Daylight SMARTS notation. The column “# In dataset” contains the amount of binary mixtures in which the OPLS-AA charge group is present. The obtained results for PAC-MAC using the 50 optimized force field parameters, shown in Table and Table and from now on referred to as the OPLS-PM force field, are presented in Figure .

Figure 7

Scatterplot of the excess Gibbs free energy of mixing: PAC-MAC using force field parameters shown in Table and Table versus experimental data for 1053 binary mixtures. Error bars indicate the predicted standard deviation of the calculations. As shown in Figure , the total RMSE is reduced from 0.151 kBT to 0.099 kBT by the OPLS-PM force field. Binary mixtures containing water show, with the exception of a single data point, a big improvement; the positive bias obtained in Figure is not visible anymore. The impact is remarkable since the force field of water is only moderately modified (the atomic charges are reduced by 6.4%) indicating that small changes can have big effects. The calculated mixing free energies of the 71 binary mixtures containing water are on average reduced by 0.30 kBT using the OPLS-PM force field. Internal calculations showed that the change in van der Waals interactions and electrostatic interactions contributed respectively 31% and 69% to the average reduction of Gex. So the reduced atomic charges of the water molecule have the biggest impact on the change in calculated mixing free energies of the binary mixtures containing water. In accordance with Figure and Figure , the biggest outlier is again a mixture of hexafluoroisopropanol–water at 298.15 K (Gex experimental: 0.026 kBT, Gex PAC-MAC: 0.935 kBT). Hexafluoroisopropanol occurs only once in the used data set, and the atomic charges of the −CF3 groups remain unchanged since they are not present in the 18 most observed OPLS-AA charge groups; so the error is mainly influenced by the water force field. The calculated Gex increases from 0.712 kBT to 0.935 kBT as a sacrifice to reduce the error of the other 70 binary mixtures containing water. The adapted force field parameters have an influence on the properties calculated using MD simulations. Internal MD calculations show that the calculated density of water at 298.15 K is reduced from 1065.6 ± 1.7 kg/m3 for the flexible TIP3P model to 938.1 ± 1.4 kg/m3 for the OPLS-PM force field (experimental:[52] 997.06 ± 0.01 kg/m3). The calculated enthalpy of vaporization of water at 298.15 K is reduced from 49.05 ± 0.04 kJ/mol for the flexible TIP3P model to 35.94 ± 0.02 kJ/mol for the OPLS-PM force field (experimental:[53] 43.90 ± 0.04 kJ/mol).

Accuracy Test of the Optimized Force Field Parameters

It is tested whether or not the OPLS-PM force field is more accurate than the OPLS-AA force field in the calculation of Gex using TI. We use the obtained densities from the performed MD simulations with the OPLS-AA force field for the MD simulations with the OPLS-PM force field, because the OPLS-AA force field is optimized using experimental densities whereas the OPLS-PM force field is not. So only steps 7–12 are repeated of the protocol described in “Simulation Setup”. A comparison of the calculated excess Gibbs free energies of mixing, using force field parameters shown in Table and Table , with experimental data for 1053 binary mixtures is shown in Figure .

Figure 8

Scatterplot of the excess Gibbs free energy of mixing: TI using force field parameters shown in Table and Table versus experimental data for 1053 binary mixtures. Error bars indicate the predicted standard deviation of the calculations. The OPLS-PM force field (Figure ) shows comparable results as the OPLS-AA force field (Figure ). The RMSE is slightly reduced by OPLS-PM (respectively 0.128 kBT versus 0.132 kBT); however, the correlation coefficient is also slightly reduced (respectively 0.769 versus 0.791). A RMSE of 0.128 kBT is expected since the obtained RMSE of 0.109 kBT for TI in comparison with PAC-MAC, shown in Figure , is the limiting accuracy (assuming PAC-MAC to predict experimental data perfectly). The following relation holds between the obtained RMSEs, assuming PAC-MAC and TI to be unbiased[41]in which ρ equals 0.497 and represents the correlation coefficient between the errors of PAC-MAC and TI in comparison with experimental data. RMSEPM–TI represents the deviation between PAC-MAC and TI, which is increased from 0.109 kBT to 0.117 kBT after the optimization of 50 force field parameters. Furthermore, RMSEPM–exp represents the deviation between PAC-MAC and experimental data, which is equal to 0.099 kBT and is shown in Figure . Finally, RMSETI–exp represents the deviation between TI and experimental data. Equation is presented graphically in Figure .

Figure 9

Graphical presentation of eq . Including values 0.099 kBT and 0.117 kBT for respectively RMSEPM–exp and RMSEPM–TI.

Graphical presentation of eq . Including values 0.099 kBT and 0.117 kBT for respectively RMSEPM–exp and RMSEPM–TI. Solving eq results in a value for RMSETI–exp equal to 0.129 kBT, supporting the obtained RMSE in Figure . The accuracy of Gex calculations with TI is hardly changed using the parameters shown in Table and Table . The correlation between PAC-MAC and TI has to be increased in order to use PAC-MAC as a force field optimization engine. On the other hand, using PAC-MAC, it is possible to obtain force field parameters with a comparable accuracy as the OPLS-AA force field for the calculation of Gex, without multiple iterations of time-consuming MD simulations.[54] So, if intermolecular OPLS-AA force field parameters are missing in a molecule, then PAC-MAC can be used as the auxiliary method to quickly estimate the missing force field parameters with comparable accuracy as the OPLS-AA force field. The 71 binary mixtures containing water are heavily affected by the adapted force field: the OPLS-AA force field mainly overestimates the mixing free energy, whereas the OPLS-PM force field mainly underestimates the mixing free energy (with the exception of the earlier discussed mixture hexafluoroisopropanol–water). The big impact again confirms the statement that small changes to the force field parameters can have big effects in the outcome. Figure illustrates two requirements to further increase the accuracy of the force field for the prediction of mixing free energies using PAC-MAC as the correlated auxiliary method. First, a higher correlation between PAC-MAC and TI will result in a reduction of RMSETI–exp. The correlation can be increased by using a physical approach or by using a statistical approach. In the physical approach, the PAC-MAC method is brought closer to TI by reducing disparities between both methods, for example by taking three-body interactions or long-range interactions into account. In the statistical approach, the correlation is increased by increasing the number of fitting parameters shown in Table . Perfect correlation will be obtained if the amount of fitting parameters is at least equal to the amount of experimental data points, for example by using binary mixture dependent sseg and xvac parameters. This statistical approach is used by Van Westen et al. to optimize force field parameters using the PC-SAFT equation of state as the correlated auxiliary function.[48] Second, a higher accuracy of the PAC-MAC method will result in a reduction of RMSETI–exp. Again, a physical or statistical approach can be used. In the physical approach, the PAC-MAC method is modified to get closer to reality not only by taking three-body interactions or long-range interactions into account but also by improving potential energy functions. In the statistical approach, the correlation is increased by increasing the number of tunable force field parameters shown in Table and Table . One can make a quantitative estimate whether such a goal is achievable. Equation can be used to calculate the required accuracy of PAC-MAC calculations to achieve a desired reduction in error. For example, suppose that PAC-MAC could be improved to the same level as the best UNIFAC methods (i.e., a RMSE of about 0.07 kBT compared with experiment). Perhaps such an improvement is not unrealistic, given prospects of including better packing models and potentially three-body interactions. In such cases, RMSEPM–TI could potentially be reduced to 0.08 kBT and RMSEPM–exp to 0.07 kBT, and then it follows that RMSETI–exp would be as low as 0.09 kBT. This is a tantalizing improvement of 30% in comparison with the current OPLS-AA force field. Such investigation is left for further research.

Conclusion

In this paper, excess free energies of mixing are calculated, by thermodynamic integration of MD simulations using the OPLS-AA force field, for an extensive and diverse set of 1053 binary mixtures. The obtained results are compared with Gex obtained from experimental data and the PAC-MAC method. Correlation with experimental data is obtained, and the observed RMSE of 0.132 kBT (and an effective RMSE, for infinitely long MD simulations, of 0.115 kBT) is much smaller than a RMSE of 2.6 kBT observed for the enthalpy of vaporization[12] or a minimum RMSE of 1.5 kBT observed for the free energy of solvation.[17,18] Furthermore, the flexible TIP3P water force field is proven to overestimate mixing free energies. The error of the PAC-MAC method is higher than the error of TI since PAC-MAC contains, besides the chosen force field, several other assumptions in comparison with MD simulations. However, an observed effective RMSE of 0.150 kBT is only 30% above TI indicating that the assumptions made within the method are plausible. The benefit of the small loss in accuracy is a potential 1.0 × 104 times faster calculation, making PAC-MAC a quick alternative for TI in the calculation of mixing free energies using classical force fields. The calculation speed of PAC-MAC allows us to increase its accuracy by optimizing force field parameters based on experimental miscibility data instead of enthalpies of vaporization or densities. The RMSE is reduced to 0.099 kBT using the optimized OPLS-PM force field. Especially the prediction of mixing free energies of mixtures containing water is greatly increased by the adapted water force field: in contrast to the TIP3P force field, a positive bias is not observed. The OPLS-PM force field, optimized without multiple iterations of time-consuming MD simulations,[54] also shows comparable accuracy as the OPLS-AA force field in the prediction of mixing free energies using TI of MD simulations. By increasing the correlation between PAC-MAC and TI or by increasing the accuracy of PAC-MAC, it should be possible to obtain even better force field parameters using PAC-MAC as an auxiliary function.

24 in total

1. Accuracy of free energies of hydration using CM1 and CM3 atomic charges.

Authors: Marina Udier-Blagović; Patricia Morales De Tirado; Shoshannah A Pearlman; William L Jorgensen
Journal: J Comput Chem Date: 2004-08 Impact factor: 3.376

2. Development and testing of a general amber force field.

Authors: Junmei Wang; Romain M Wolf; James W Caldwell; Peter A Kollman; David A Case
Journal: J Comput Chem Date: 2004-07-15 Impact factor: 3.376

3. How fast-folding proteins fold.

Authors: Kresten Lindorff-Larsen; Stefano Piana; Ron O Dror; David E Shaw
Journal: Science Date: 2011-10-28 Impact factor: 47.728

4. Practical Aspects of Free-Energy Calculations: A Review.

Authors: Niels Hansen; Wilfred F van Gunsteren
Journal: J Chem Theory Comput Date: 2014-06-06 Impact factor: 6.006

5. Prediction of Absolute Solvation Free Energies using Molecular Dynamics Free Energy Perturbation and the OPLS Force Field.

Authors: Devleena Shivakumar; Joshua Williams; Yujie Wu; Wolfgang Damm; John Shelley; Woody Sherman
Journal: J Chem Theory Comput Date: 2010-04-14 Impact factor: 6.006

6. COMPASS II: extended coverage for polymer and drug-like molecule databases.

Authors: Huai Sun; Zhao Jin; Chunwei Yang; Reinier L C Akkermans; Struan H Robertson; Neil A Spenley; Simon Miller; Stephen M Todd
Journal: J Mol Model Date: 2016-01-27 Impact factor: 1.810

7. Hierarchical atom type definitions and extensible all-atom force fields.

Authors: Zhao Jin; Chunwei Yang; Fenglei Cao; Feng Li; Zhifeng Jing; Long Chen; Zhe Shen; Liang Xin; Sijia Tong; Huai Sun
Journal: J Comput Chem Date: 2015-11-05 Impact factor: 3.376

8. Comparison of charge models for fixed-charge force fields: small-molecule hydration free energies in explicit solvent.

Authors: David L Mobley; Elise Dumont; John D Chodera; Ken A Dill
Journal: J Phys Chem B Date: 2007-02-10 Impact factor: 2.991