We have calculated the excess free energy of mixing of 1053 binary mixtures with the OPLS-AA force field using two different methods: thermodynamic integration (TI) of molecular dynamics simulations and the Pair Configuration to Molecular Activity Coefficient (PAC-MAC) method. PAC-MAC is a force field based quasi-chemical method for predicting miscibility properties of various binary mixtures. The TI calculations yield a root mean squared error (RMSE) compared to experimental data of 0.132 kBT (0.37 kJ/mol). PAC-MAC shows a RMSE of 0.151 kBT with a calculation speed being potentially 1.0 × 104 times greater than TI. OPLS-AA force field parameters are optimized using PAC-MAC based on vapor-liquid equilibrium data, instead of enthalpies of vaporization or densities. The RMSE of PAC-MAC is reduced to 0.099 kBT by optimizing 50 force field parameters. The resulting OPLS-PM force field has a comparable accuracy as the OPLS-AA force field in the calculation of mixing free energies using TI.
We have calculated the excess free energy of mixing of 1053 binary mixtures with the OPLS-AA force field using two different methods: thermodynamic integration (TI) of molecular dynamics simulations and the Pair Configuration to Molecular Activity Coefficient (PAC-MAC) method. PAC-MAC is a force field based quasi-chemical method for predicting miscibility properties of various binary mixtures. The TI calculations yield a root mean squared error (RMSE) compared to experimental data of 0.132 kBT (0.37 kJ/mol). PAC-MAC shows a RMSE of 0.151 kBT with a calculation speed being potentially 1.0 × 104 times greater than TI. OPLS-AA force field parameters are optimized using PAC-MAC based on vapor-liquid equilibrium data, instead of enthalpies of vaporization or densities. The RMSE of PAC-MAC is reduced to 0.099 kBT by optimizing 50 force field parameters. The resulting OPLS-PM force field has a comparable accuracy as the OPLS-AA force field in the calculation of mixing free energies using TI.
Molecular simulations
have made significant contributions in the
prediction of chemical processes and thermodynamic properties with
applications ranging from protein folding[1] to enhanced oil recovery.[2] The interactions
between atoms, which induce motion in classical molecular dynamics
(MD) simulations or translation in Monte Carlo (MC) simulations, are
calculated using force fields. In general, force fields consist of
simplified pairwise potential energy functions with associated atom-type
dependent parameters.[3] Various force fields
have been developed differing in genericity, coarsening, accuracy,
fitted properties, and the used set of molecules within the optimization.[4] We mention three well-known force fields: AMBER,
COMPASS, and OPLS.The AMBER force field is designed for simulation
of proteins and
nucleic acids.[5] AMBER was originally developed
as a united-atom force field; however, a more accurate all-atom representation
was published 2 years later.[6] An extended
version, including parameters optimized for organic molecules, is
known as the general AMBER force field (GAFF).[7] Within both AMBER and GAFF, the van der Waals and hydrogen bonding
parameters are obtained from crystal structures and lattice energies.
The atomic partial charges, required for electrostatic interactions,
are derived using ab initio quantum mechanics.The all-atom
COMPASS and COMPASS II force fields are developed
for simulations of organic molecules, inorganic small molecules, and
polymers.[4,8] The van der Waals parameters are obtained
by fitting enthalpy of vaporizations and densities, calculated using
MD simulations, to experimental data. The atomic partial charges are
derived using ab initio quantum mechanics and empirically adjusted
to take hydrogen bonding effects into account.The OPLS force
field (published in a united-atom version[9] and an all-atom version[10]) and improved
OPLS3 force field[11] are
created for liquid simulations containing organic molecules and proteins.
The van der Waals parameters are, comparable to COMPASS, optimized
using experimental liquid properties, mainly enthalpy of vaporizations
and densities. The atomic partial charges are derived using quantum
calculations as well as using experimental condensed-phase properties.A reliable MD or MC simulation fully depends on the quality of
the force field.[3] The accuracy of various
force fields is extensively tested by Van der Spoel et al. for various
thermodynamic properties.[12] The OPLS-AA
force field outperforms GAFF in the prediction of experimental enthalpy
of vaporization with a measured root-mean-square error (RMSE) of 6.5
kJ/mol versus 10.6 kJ/mol. For the calculation of hydration free energies
and solvation free energies using nonpolarizable force fields, the
errors are slightly lower: Jorgensen et al.[13] obtained a minimum average unsigned error (AUE) of 1.03 kcal/mol
(= 4.3 kJ/mol), Mobley et al.[14,15] obtained a minimum
RMSE of 1.00 kcal/mol (= 4.2 kJ/mol), Sherman et al.[16] obtained for OPLS_2005 an AUE of 1.10 kcal/mol (= 4.6 kJ/mol),
and Van der Spoel[17,18] obtained for GAFF a RMSE of 3.7
kJ/mol.For an accurate calculation of mixing free energies,
the error
should be significantly lower. The difference in free energy between
an ideal binary mixture and a phase separated system is maximum 1.7
kJ/mol at 298 K.[19] An extensive accuracy
test of mixing free energies calculated using MD or MC is, to our
knowledge, not published yet. Jedlovszky et al. did perform mixing
free energy calculations of several important binary mixtures using
various force fields.[20−22] Although the maximum absolute error was always lower
than 1 kBT (= 2.5 kJ/mol
at 298 K), the weakness of force fields was exposed. A mixture of
acetone and water turned out to be immiscible at 298 K for all considered
combinations of force fields with the exception of the combination
of the acetone model developed by Pereyra, Asar, and Carignano[23] with the TIP5P-E water model.[20] Also none of the force field combinations were able to
predict a positive entropy of mixing at a DMSO mole fraction below
0.8 for a mixture of DMSO and water.[22]A disadvantage of MD and MC simulations is the extensive computational
time which is often required to obtain precise results. A less time-consuming
approach to obtain mixing free energies is the use of methods based
on the quasi-chemical approximation of Guggenheim.[24] The method is described in several textbooks.[25] In short, quasi-chemical models represent condensed
molecular mixtures as a collection of independent interacting pairs
of sites or molecules. The probability to obtain an interacting pair
is derived from the theory of chemical reactions. The quasi-chemical
approximation is basically an improvement of the regular solution
model which assumes a random distribution of interacting pairs of
sites or molecules.[26] The two best known
quasi-chemical methods are COSMO-RS[27] and
UNIFAC.[28] The COSMO-RS model represents
liquid mixtures as a collection of interacting charged surface panels,
whereas the UNIFAC model represents liquid mixtures as a collection
of interacting functional groups. The parameters of both methods are
optimized by fitting experimental data.We recently published
a quasi-chemical method based on force fields:
the Pair Configuration to Molecular Activity Coefficient (PAC-MAC)
model.[29] The PAC-MAC model represents molecular
mixtures as a collection of interacting molecular pairs. Activity
coefficients and related mixing free energies are calculated using
a large set of sampled molecular pair configurations.[29,30] The RMSE of PAC-MAC in the calculation of mixing free energies using
the OPLS-AA force field turned out to be 0.153 kBT (or 0.43 kJ/mol), and the calculation time
is only a fraction of the required simulation time of corresponding
MD or MC simulations. Since only two-body interactions are calculated
within PAC-MAC, in order to reduce calculation time, we expect the
accuracy of MD and MC, which involve N-body interactions, to be higher.The presented research within this paper consists of three different
components:1. We perform an accuracy test of the OPLS-AA force
field based
on mixing free energies calculated using thermodynamic integration
(TI) of MD simulations for an extensive set of equimolar binary mixtures
and compare the results with the PAC-MAC model.2. We improve
the accuracy of the PAC-MAC method by optimizing
force field parameters and PAC-MAC specific parameters.3. We
check whether or not the optimized force field parameters
induce accuracy improvement of the mixing free energies calculated
using TI of MD simulations.We reach a few general conclusions.
First, a large data set of
mixing free energies is useful for comparison between quasi-chemical
thermodynamics and force field simulations and could be a new tool
for further analysis of force field performance. Second, PAC-MAC as
a force field based quasi-chemical method performs, after optimization,
comparable to existing quasi-chemical thermodynamics. The performance
is achieved without the need for quantum calculations (COSMO-RS[27]) or group assignments (UNIFAC[28]) but relies on well-established force field parameters.
The amount of additional empirical parameters is very little in comparison
with other quasi-chemical methods, especially in comparison with UNIFAC
which contains hundreds of parameters optimized using experimental
miscibility data. Subsequently, TI of MD simulations on the same data
set indicates that mixtures of most chemical classes are modeled quite
well by the OPLS-AA force field, even though such experimental data
was not used in the original force field parameter optimization. Mixtures
containing water are an exception. We indicate that a very modest
modification of water charge assignment, by reduction of 6.4% in partial
atomic charges, is already enough to improve the results on average
by 0.3 kBT. Finally,
PAC-MAC could potentially serve as a fast proxy method for MD simulations,
with a correlation coefficient of 0.82 between both methods. Such
correlation could already be sufficient for the use of PAC-MAC as
a fast indicator for new intermolecular force field parameters in
case of missing interactions. Full replacement of MD, for force field
parameter improvement, is a tantalizing prospect but not achieved
fully here. We briefly consider quantitatively to what extent correlations
need to be improved further to achieve such goal in future research.The article is organized as follows. Within “Theoretical Basis”, we first explain the theory of
thermodynamic integration for the calculation of free energies of
mixing and then briefly summarize the PAC-MAC method. Subsequently,
within “Results and Discussion”,
we first test the accuracy of the OPLS-AA force field in the prediction
of mixing free energies using TI of MD simulations. Then, we optimize
a set of force field parameters using the PAC-MAC method. Finally,
we calculate the predictive capacity of the obtained OPLS-PM force
field using TI. All experimental and calculated free energies of mixing
for 1053 binary mixtures are presented in the Supporting Information.
Theoretical Basis
In this section, we discuss the approach to calculate excess mixing
free energies using TI of MD simulations. Also the formulation of
the PAC-MAC method is briefly explained. For more information concerning
the PAC-MAC method, we refer to two previously published articles.[29,30]
Thermodynamic
Integration
We calculate the molar excess
Gibbs free energy of mixing for an equimolar binary mixture of molecules A and B, so x = x = 0.5. The excess Gibbs free energy of mixing is given byin which the excess Helmholtz
free energy
of mixing Fex and the excess molar volume V are given byV, V, and V in eq represent the molar volumes of
respectively pure component A, pure component B, and a mixture of A and B with equal mole fractions. The molar volumes are calculated using
MD simulations in the NPT ensemble with a barostat set to a virtual
pressure p of 1 atm. Fgas in eq represents
the excess Helmholtz free energy caused by volume expansion or compression
of an ideal gas mixture:The other free energy terms in eq are calculated using thermodynamic
integration by switching on intermolecular interactions using a coupling
parameter λ[17,20−22,31,32]in which ⟨...⟩λ indicates
an ensemble average at a given value for λ, and U(λ) represents the total
potential energy in a simulated frame of system I ∈ {A,B,AB}. The decoupled state, in which the molecules behave as an ideal
gas, refers to λ = 0, and the coupled state, in which the molecules
fully interact, refers to λ = 1. To avoid singularities, caused
by the repulsive r–12 term in the
Lennard-Jones potential, we choose a scaling of λ4 and λ2 for respectively the van der Waals and electrostatic
interaction between two atoms i and j separated by a distance r(32)where ε and σ in eq are respectively the energy
and distance
parameter of the Lennard-Jones potential between atom i and atom j, and q in eq is the partial charge of atom i. Summation of uVdW (r,λ) and uElec (r,λ) over all combinations of
atoms i and j in a simulated frame,
using a cutoff radius of 12.5 Å, results in the total van der
Waals and electrostatic energy:The integrand of eq can be split in a van der Waals and electrostatic
part because other potential energy terms (bond, angle, and torsion
energies) are not a function of λin which ⟨UVdW(λ)⟩λ and ⟨UElec(λ)⟩λ represent
an ensemble average of the total inter- and intramolecular van der
Waals and electrostatic energies at a given value for λ. The
ensemble average potential energies are evaluated using MD simulations
in the NVT ensemble[20−22] at 11 evenly spaced values for λ: 0.0, 0.1,
..., 0.9, and 1.0. The integral of eq is obtained using a “not-a-knot” spline
fitted through ⟨UVdW(λ)⟩λ and ⟨UElec(λ)⟩λ.[33] Furthermore, the block
averaging approach is used to predict the standard deviation of the
calculated ensemble average potential energies and subsequently the
standard deviation of the calculated mixing free energy.[34]
Simulation Setup
All MD simulations
are performed using
the Culgi software.[35] The force field parameters,
used to calculate interactions between the atoms, are taken from the
OPLS-AA force field.[10] Our protocol to
obtain the excess mixing free energy of an equimolar binary mixture
of molecules A and B is summarized
below:1. A cubic box, with an edge length of 30 Å, is
filled with randomly oriented molecule A to an amount
wherein the density is equal to the experimental density of A. If the total amount of molecules in the box is less than
120, then the edge length is increased to an extent at which 120 molecules
are required for a density equal to the experimental density.2. The potential energy of the box is minimized using the Quick-Min
method for a maximum of 10000 steps.[36] An
atom-based cutoff radius is set to 10.0 Å.3. The system
is equilibrated for 25 ps in the NVT ensemble. An
Andersen thermostat is used to control the simulation temperature
at the desired temperature.[37] A group-based
cutoff is used to truncate long-range van der Waals and electrostatic
interactions. The cutoff radius is set to 12.5 Å. We are aware
that the use of a particle mesh Ewald (PME), to better incorporate
long-range electrostatics, is more common in MD simulations.[12,14,17] However, a PME would increase
the total calculation time by over a factor of 3. Moreover, the use
of neutral subunits within the molecules in OPLS-AA is well suited
for a group-based cutoff.[10] We expect the
added value of a PME to be negligible in comparison with a group-based
cutoff of 12.5 Å, in accordance with the results obtained by
Piana et al.[38]4. The equilibration
of the system is continued for 25 ps in the
NPT ensemble. The simulation pressure is set to 1.0 atm, controlled
by the Andersen barostat.[37]5. A
production simulation is performed for 50 ps in the NPT ensemble
to calculate the average box volume. The ensemble average box volume
is used to calculate the molar volume V in eq and eq .6. The box volume is rescaled and fixed for the upcoming
MD simulations
to the ensemble average box volume calculated in step 5.7.
The system is equilibrated for 25 ps in the NVT ensemble using
a coupling parameter of λ = 1.0 for the van der Waals and electrostatic
potential given in eq and eq respectively.8. A production simulation is performed for 25 ps in the NVT ensemble
using a coupling parameter of λ = 1.0 to calculate ⟨UVdW(λ)⟩λ and ⟨UElec(λ)⟩λ in eq .9. Steps 7 and
8 are repeated for other coupling parameters λ
∈ {0.9, 0.8, ..., 0.1}.10. The free energy difference
ΔF between a fully
interacting condensed system and
an ideal gas system is calculated using eq .11. Steps 1–10 are repeated
for a box filled with only molecule B and for a box
filled with an equal amount of molecules A and B.12. The excess molar Gibbs and Helmholtz free energy
of mixing
are calculated using respectively eq and eq .
PAC-MAC
The theoretical basis of PAC-MAC has been extensively
explained in previous papers.[29,30] We present a general
overview of the method in this section. The PAC-MAC method consists
of three consecutive steps briefly explained below: surface generation,
pair sampling, and free energy minimization.
Surface Generation
First, the surface of a molecule
is defined by the outer surface of spheres around the atomic nuclei
in the molecule. The radius of these spheres is given byin which σ is the Lennard-Jones distance parameter
of atom k, and sseg is
an atom-type independent
multiplication factor with an optimized value of 0.62.[30] Subsequently, the generated molecular surface
of a molecule I is divided in L surface panels with an area of about 0.5
Å2 each.
Pair Sampling
Second, a sampling
of pair configurations
is performed between all possible combinations of two molecules in
the mixture. A pair configuration represents a possible composition
of two nearest neighboring molecules in a condensed phase; therefore,
the sampling is performed within the first coordination shell of the
molecules. The total amount of sampled pair configurations between
two molecules is m = 5 × 104, by
default. Of each pair configuration i ∈ [1..m] between molecules I and J, we track two properties for the last step of the PAC-MAC method.
The first property is the intermolecular energy w calculated using the OPLS-AA force field,[10] and the second property is a depiction of the
surface panels j which are covered (o = 1) and uncovered (o = 0) by the neighboring molecule. See Figure for an illustration
of the pair sampling procedure.
Figure 1
Examples of sampled pair configurations
with occupied surface panels
(black colored) and intermolecular energy for ethanol–ethanol
(1), ethanol–acetonitrile (2), acetonitrile–ethanol
(3), and acetonitrile–acetonitrile (4).
Examples of sampled pair configurations
with occupied surface panels
(black colored) and intermolecular energy for ethanol–ethanol
(1), ethanol–acetonitrile (2), acetonitrile–ethanol
(3), and acetonitrile–acetonitrile (4).
Free Energy Minimization
In PAC-MAC, the expression
for the free energy of mixing is based on the quasi-chemical approximation
of Guggenheim[24,39] and is given byThe 5 contributing free energy
terms in eq are given
by the entropy of mixing of an ideal solution (Fid), a modified Staverman–Guggenheim correction term
(FSG),[30,40] the combinatorial
expression according to the quasi-chemical approximation (Fcomb), the entropic contribution due to inhomogeneities
in the occupation of the molecular surface (Fvac), and the total intermolecular energy (Fint). The expressions of the free energy terms in eq are, for a binary mixture
containing molecules A and B, given
byin which x, φ, and y are respectively
the mole, volume, and coordination fraction of component I ∈ {A,B} in the mixture, z represents the average coordination number, x is the fraction of pair configuration i between molecules I and J in the mixture, A represents the
surface area of surface panel j in component I, ⟨A⟩ is the average occupied area
in an interaction on molecule I, xvac represents the unoccupied area fraction of surface panel j on molecule I, and xvac is the total unoccupied surface fraction of a molecule.
Only the latter variable, xvac, included
in eq , is an empirically
obtained general parameter equal to 0.6. The expression for the free
energy, eq , is minimized
subject to the constraint stating the occupation fraction of each
surface panel to be equal to the probability that a panel is covered
by neighboring molecules:Finally, various thermodynamic miscibility
properties can be obtained from the minimized expression for the free
energy. Examples of these properties are activity coefficients, Flory–Huggins
χ-interaction parameters, and vapor–liquid equilibrium
phase diagrams. We refer to the Supporting Information of a previously
published paper[30] for more details regarding
the derivation of the activity coefficients.Furthermore, an
analytical expression is derived for the calculation of the standard
deviation of the obtained mixing free energy. Assuming the set of
sampled pair configurations i ∈ [1..m] between molecules I ∈ {A,B} and J ∈ {A,B} to be uncorrelated, then the following
relation holds for the variance of the mixing free energy:[41]
Results and Discussion
The excess
Gibbs free energy of mixing, calculated using TI and eq , is compared with experimental
data from the NIST database.[42] For a binary
mixture, containing molecules A and B, the experimental excess free energy of mixing is calculated usingwith xliq and γ being respectively the mole fraction and
activity coefficient of component I in the liquid
phase. The activity coefficient of component I is
calculated using the extended Raoult’s Law assuming gaseous
idealityin which xgas represents
the mole fraction of component I in the vapor phase,
and P(xliq) is the vapor pressure
of the binary mixture at mole fraction xliq taken from experimental vapor–liquid equilibrium diagrams
at constant temperature.The used subset of the NIST database,[42] containing 1092 experimental excess free energies
of mixing of equimolar
binary mixtures, is presented in a previously published article.[30] Mixing free energies are not calculated with
TI for 39 binary mixtures because intramolecular force field parameters
are missing (32 binary mixtures) or the experimental vapor pressure
(>5 atm) is far above the simulation pressure of 1 atm resulting
in
gas formation within the MD simulation (7 binary mixtures). Although
the use of a higher simulation pressure would be justified in the
latter case, because the pressure–volume work term pV is often negligible in condensed phase systems,[43] we decided to be consistent in the used pressure
of 1 atm and remove the 7 mixtures from the data set.The combinations
of molecules in the remaining set of 1053 binary
mixtures are shown in Figure . The data set contains 159 different molecules which are
clustered in 9 categories. The category “molecules containing
oxygen (excluding alcohols)” contains 11 ketones, 4 aldehydes,
16 ethers, and 11 esters. The category “molecules containing
nitrogen” contains 8 amines, 3 nitriles, and 3 nitrogen-containing
aromatic compounds. Dimethyl disulfide, 2-ethoxyethanol, and hexafluoroisopropanol
are three examples of the 19 molecules with different or multiple
functional groups.
Figure 2
Combinations of molecules in the experimental data set
containing
1053 binary mixtures. The 159 different molecules are clustered in
9 categories.
Combinations of molecules in the experimental data set
containing
1053 binary mixtures. The 159 different molecules are clustered in
9 categories.A comparison of the calculated
excess Gibbs free energy of mixing,
by TI performed on MD simulations using the OPLS-AA force field, with
experimental data for 1053 binary mixtures is shown in Figure . The error bar attached to
each point indicates the predicted standard deviation of the calculation,
obtained using block averaging.[34] The root
mean squared error (RMSE) is calculated usingin which n is the amount
of data points (n = 1053), and h is the index of a data point. The experimental and calculated data
are presented in the Supporting Information. Note that mixtures with an obtained Gex above 0.69 kBT are
metastable since this is the upper limit for binary mixtures to initiate
phase separation. Phase separation is not observed in the performed
MD simulations because of the relatively large surface tension that
has to be overcome due to the limited size of the box.
Figure 3
Scatterplot of the excess
Gibbs free energy of mixing: TI using
the OPLS-AA force field versus experimental data for 1053 binary mixtures.
Error bars indicate the predicted standard deviation of the calculations.
Scatterplot of the excess
Gibbs free energy of mixing: TI using
the OPLS-AA force field versus experimental data for 1053 binary mixtures.
Error bars indicate the predicted standard deviation of the calculations.Figure clearly
shows correlation between the free energy obtained using TI and experimental
data. The obtained correlation coefficient of 0.79 is lower than the
correlation coefficient for the prediction of densities (ρ =
0.99) and enthalpies of vaporization (ρ = 0.94) using OPLS-AA.[12] However, this is expected for two reasons. First,
the OPLS-AA force field parameters are optimized using experimental
data of mainly enthalpies of vaporization and densities and not using
experimental free energies of mixing. As a consequence, greater predictability
is expected for enthalpies of vaporization and densities compared
to mixing free energies. Second, the range of values for the excess
free energy of mixing is much smaller than the range of values for
the enthalpy of vaporization. The calculated mixing free energies
vary between −0.5 kBT and 1.0 kBT, whereas
the calculated enthalpies of vaporization are up to 40 kBT.[12] Therefore,
a small absolute error can cause a big relative error. On the other
hand, because of the smaller range of values, the observed RMSE of
0.132 kBT (or 0.37 kJ/mol)
for the prediction of excess mixing free energies is much smaller
than the RMSE of 6.5 kJ/mol for the prediction of enthalpies of vaporization.
We find it remarkable that OPLS-AA gives such a small error, given
that the data set was not included in the original parametrization.The accuracy of the OPLS-AA force field is even higher than indicated
by the observed error, since the standard deviation of the TI calculation
itself affects the observed RMSE. Two types of calculation errors
contribute to the observed deviations from experiments: the error
introduced by wrong values of the force field parameters and the standard
deviation of the calculation itself. The average observed error is
defined by RMSEobs, the average effective error induced
by the force field is defined by RMSEeff, and the average
predicted standard deviation of the TI calculations is defined by
RMSEstd. We can derive a relation between the three RMSEs
based on three assumptions. First, the standard deviations of the
TI calculations are assumed to be independent of the errors induced
by the force field. Second, the standard deviations of the TI calculations
and the errors induced by the force field are assumed to be the only
factors affecting the observed errors. Finally, the standard deviations
of the TI calculations are assumed to be unbiased. If the three proposed
assumptions are fulfilled, then the following relation holds between
RMSEobs, RMSEeff, and RMSEstd:The root mean squared standard
deviation of the TI calculations equals 0.064 kBT resulting in an effective RMSE for the
OPLS-AA force field of 0.115 kBT if infinitely long MD simulations are performed.As shown in Figure , the OPLS-AA force field, as calculated by TI, performs quite well
over all categories, except for binary mixtures containing water.
The mixing free energies of binary mixtures containing water are,
in general, overestimated by molecular simulations. The bias is caused
by the used flexible TIP3P water model[44] for which the force field parameters are optimized using the enthalpy
of vaporization and density of pure liquid water, so experimental
miscibility data is not taken into account.[45] Jin et al. show a similar systematic overestimation of hydration
free energies calculated using the AMBER force field and a SPC/E water
model.[46] The largest overestimation is
obtained for hexafluoroisopropanol–water at 298.15 K (Gex experimental: 0.026 kBT, Gex TI: 0.850 kBT). The representation of
the strongly electron withdrawing trifluoromethyl groups by nonpolarizable
partial charges of the OPLS-AA force field might also affect the results
in this case.Another big outlier is 2,2,2-trifluoroethanol–tetrahydrofuran
at 298.144 K (Gex experimental: −0.586 kBT, Gex TI: −0.009 kBT). In accordance with the obtained overestimation for hexafluoroisopropanol−water,
this result might have been affected by a weak representation of trifluoromethyl
groups within the OPLS-AA force field as well.In accordance
with the results of Jedlovszky et al.,[20] we also observe acetone to be incorrectly immiscible
with water at equal mole fraction. The observed mixing free energy
is a positive 0.263 ± 0.083 kJ/mol at a temperature of 308 K.
The Helmholtz free energy of mixing for an equal molar acetone–methanol
mixture equals −1.075 ± 0.085 kJ/mol. Also this value
is in accordance with the results of simulations performed by Jedlovszky
et al., even though they fixed the internal coordinates of the atoms
within the molecules; they calculated a Helmholtz free energy of mixing
of −0.97 kJ/mol using the comparable OPLS-UA force field.[21]The PAC-MAC method predicts mixing free
energies assuming incompressible
fluids, so the excess molar volume V equals 0. As a consequence, there is no distinction
between Gibbs and Helmholtz free energy of mixing in PAC-MAC: Fex = Gex. The pV term is usually negligible in condensed phase systems.[43] This is confirmed by the performed MD simulations:
the observed root mean squared value of pV equals 4.4 × 10–5kBT (or 0.13 J/mol).In
a previous publication, the total observed RMSE of PAC-MAC is
proven to be 0.153 kBT, with a correlation coefficient of 0.695, for the calculation of
excess free energies in comparison with experimental data.[30]Figure shows comparable results for the subset of 1053 binary mixtures.
Of note is that Figure is almost the same as Figure 9 from our previously published paper.[30] We included Figure for easier comparison with the information from the MD simulations.
There are two differences. First, the used database of binary mixtures
for Figure is a subset
of the used database for Figure 9 in ref (30). Second, each data point in Figure contains an error bar, indicating
the predicted standard deviation of the calculation.
Figure 4
Scatterplot of the excess
Gibbs free energy of mixing: PAC-MAC
using the OPLS-AA force field versus experimental data for 1053 binary
mixtures. Error bars indicate the predicted standard deviation of
the calculations.
Scatterplot of the excess
Gibbs free energy of mixing: PAC-MAC
using the OPLS-AA force field versus experimental data for 1053 binary
mixtures. Error bars indicate the predicted standard deviation of
the calculations.It is shown in Figure that the standard
deviation is usually within the size of
a dot. The values calculated using PAC-MAC are precise: the root mean
squared standard deviation equals 0.014 kBT. Therefore, the standard deviation only slightly
reduces the effective RMSE to 0.150 kBT, calculated using eq , for an infinite amount of sampled pair
configurations. The effective RMSE of PAC-MAC is 30% higher than the
effective RMSE of TI caused by several assumptions made within the
model. The three assumptions with the highest impact are neglecting
multiple body interactions, neglecting interactions beyond the first
coordination shell, and using an expression for the free energy which
is not exact. MD simulations, which contain no additional assumptions
besides the force field, are therefore always more accurate than PAC-MAC
using the same force field.The benefit of the assumptions made
within PAC-MAC is a great reduction
of calculation time: a Gex calculation
using PAC-MAC takes 10 min on a single core (Intel Xeon E5-2620),
whereas the same calculation using TI takes 5 days. So the calculation
speed is increased with a factor 700 at the expense of a 14% reduction
in accuracy. In principle, the reduction of calculation time is even
much greater. A Gex calculation using
TI requires on average the calculation of 1.0 × 1010 molecular pair interactions, whereas a Gex calculation using PAC-MAC requires the calculation of only 2.0 ×
105 molecular pair interactions. The calculation of the
molecular pair interactions takes about 20% of the total calculation
time of PAC-MAC. If we assume that the calculation time of a molecular
pair interaction in PAC-MAC is equal to the calculation time of a
molecular pair interaction in MD, then the calculation of mixing free
energies can potentially be 1.0 × 104 times faster
using PAC-MAC than using TI.Furthermore, in accordance with
TI and shown in Figure , also PAC-MAC in general overestimates
mixing free energies of binary mixtures containing water. The biggest
outlier is again obtained for a mixture of hexafluoroisopropanol–water
at 298.15K (Gex experimental: 0.026 kBT, Gex PAC-MAC: 0.712 kBT). Correlation in the predicted mixing free energy is expected since
both methods use the same OPLS-AA force field. A proof of correlation
between the results obtained by TI and PAC-MAC is given in Figure .
Figure 5
Scatterplot of the excess
Gibbs free energy of mixing: TI versus
PAC-MAC using the OPLS-AA force field for 1053 binary mixtures. Error
bars indicate the predicted standard deviation of the calculations.
Scatterplot of the excess
Gibbs free energy of mixing: TI versus
PAC-MAC using the OPLS-AA force field for 1053 binary mixtures. Error
bars indicate the predicted standard deviation of the calculations.Figure shows a
correlation coefficient of 0.815 between TI and PAC-MAC for the calculation
of mixing free energies. Moreover, no bias for mixtures containing
water is observed. Indicating that the overestimation of Gex by PAC-MAC, shown in Figure , is, in accordance with TI, also caused
by the TIP3P water force field and not by other assumptions made within
the model.The high correlation between TI and PAC-MAC offers
opportunities
to optimize force field parameters, required in MD simulation, using
PAC-MAC as a quick auxiliary method. An accurate predictive capacity
is essential for a good auxiliary method.[47,48] Our protocol to optimize force field parameters consists of three
consecutive steps:1. The correlation between TI and PAC-MAC
is increased in order
to improve PAC-MAC as the predictive auxiliary method.2. Force
field parameters are optimized by minimizing the RMSE
of PAC-MAC in comparison with experimental data.3. The accuracy
of the optimized force field parameters in the
prediction of mixing free energies is tested using TI.The three
above-mentioned steps are explained hereinafter.
Increasing the Correlation
between TI and PAC-MAC
The
correlation between TI and PAC-MAC is further increased by optimizing
the xvac parameter in eq and by introducing 16 atom-type
dependent sseg instead of a single general sseg parameter equal to 0.62 in eq . The different atom-types k are taken from the Dreiding force field[49] and are expressed in Daylight SMARTS notation.[50] The sseg and xvac parameters are optimized by minimizing the RMSE of
PAC-MAC compared to TI, in the calculation of excess free energies
of mixing, using the Gauss–Newton algorithm. An upper and lower
limit for sseg is set to 0.8 and 0.4, respectively,
to avoid unphysical values. An overview of the optimized sseg and xvac parameters is given
in Table .
Table 1
Optimization of sseg and xvac Parametersa
The
atom-types k are written in Daylight SMARTS notation.
The column “# In
dataset” contains the amount of binary mixtures in which the
parameter is present.
The
atom-types k are written in Daylight SMARTS notation.
The column “# In
dataset” contains the amount of binary mixtures in which the
parameter is present.Figure shows a
reduction of the RMSE and an increase in the correlation coefficient
to respectively 0.109 kBT and 0.861 by using the optimized parameters shown in Table .
Figure 6
Scatterplot of the excess
Gibbs free energy of mixing: TI versus
PAC-MAC using the OPLS-AA force field and optimized sseg and xvac parameters for 1053
binary mixtures. Error bars indicate the predicted standard deviation
of the calculations.
Scatterplot of the excess
Gibbs free energy of mixing: TI versus
PAC-MAC using the OPLS-AA force field and optimized sseg and xvac parameters for 1053
binary mixtures. Error bars indicate the predicted standard deviation
of the calculations.
Optimization of Force Field Parameters
Subsequently,
the accuracy of the PAC-MAC model is increased by optimizing force
field parameters given the sseg and xvac parameters presented in Table . The tuned parameters contain
16 Lennard-Jones ε-values and
16 Lennard-Jones σ-values of the
Dreiding atom-types k presented in Table . Note that the atom-type [H][!#7;!#8;!#9]
refers to a hydrogen atom unable to form hydrogen bonds.[50] Hydrogen atoms attached to oxygen or nitrogen
keep Lennard-Jones parameters ε = 0.0 kcal/mol and σ =
0.0 Å, in accordance with the OPLS-AA force field.Also
atomic partial charges of the 18 most observed OPLS-AA charge groups
in the used data set are optimized by scaling all partial charges
of the atoms within the neutral charge group with a factor tCG. Other charge groups keep their original
OPLS-AA atomic partial charges. A comparable optimization is performed
by Jin et al. to fit hydration free energies using 40 atom-type dependent
dispersion term scaling factors.[46]All 50 parameters are optimized by minimizing the RMSE of PAC-MAC
compared to experimental data, in the calculation of excess free energies
of mixing, using the Gauss–Newton algorithm. The optimization
of too many parameters might result in overfitting. We comply with
the rule of thumb that the number of data points should be at least
5 times bigger than the number of parameters.[51] Moreover, the optimized parameters may deviate a maximum of 20%
from the initial parameters to avoid unphysical values. An overview
of the optimized Lennard-Jones ε- and σ-parameters is given in Table .
Table 2
Optimization of Lennard-Jones ε- and σ-Parametersa
The atom-types k are written in Daylight SMARTS notation. The column “#
In
dataset” contains the amount of binary mixtures in which the
atom-type is present.
The atom-types k are written in Daylight SMARTS notation. The column “#
In
dataset” contains the amount of binary mixtures in which the
atom-type is present.An
overview of the 18 most observed OPLS-AA charge groups in the
used data set and corresponding optimized scaling factors tCG is given in Table .
Table 3
Optimization of the
Atomic Partial
Charge Scaling Factors tCG of the 18 Most Observed
OPLS-AA Charge Groups in the Used Data seta
The atom-types k are written in Daylight
SMARTS notation. The column “# In
dataset” contains the amount of binary mixtures in which the
OPLS-AA charge group is present.
The atom-types k are written in Daylight
SMARTS notation. The column “# In
dataset” contains the amount of binary mixtures in which the
OPLS-AA charge group is present.The obtained results for PAC-MAC using the 50 optimized force field
parameters, shown in Table and Table and from now on referred to as the OPLS-PM force field, are presented
in Figure .
Figure 7
Scatterplot
of the excess Gibbs free energy of mixing: PAC-MAC
using force field parameters shown in Table and Table versus experimental data for 1053 binary mixtures.
Error bars indicate the predicted standard deviation of the calculations.
Scatterplot
of the excess Gibbs free energy of mixing: PAC-MAC
using force field parameters shown in Table and Table versus experimental data for 1053 binary mixtures.
Error bars indicate the predicted standard deviation of the calculations.As shown in Figure , the total RMSE is reduced from 0.151 kBT to 0.099 kBT by the OPLS-PM force field. Binary
mixtures containing
water show, with the exception of a single data point, a big improvement;
the positive bias obtained in Figure is not visible anymore. The impact is remarkable since
the force field of water is only moderately modified (the atomic charges
are reduced by 6.4%) indicating that small changes can have big effects.
The calculated mixing free energies of the 71 binary mixtures containing
water are on average reduced by 0.30 kBT using the OPLS-PM force field. Internal calculations
showed that the change in van der Waals interactions and electrostatic
interactions contributed respectively 31% and 69% to the average reduction
of Gex. So the reduced atomic charges
of the water molecule have the biggest impact on the change in calculated
mixing free energies of the binary mixtures containing water.In accordance with Figure and Figure , the biggest outlier is again a mixture of hexafluoroisopropanol–water
at 298.15 K (Gex experimental: 0.026 kBT, Gex PAC-MAC: 0.935 kBT). Hexafluoroisopropanol occurs only once in the used data set, and
the atomic charges of the −CF3 groups remain unchanged
since they are not present in the 18 most observed OPLS-AA charge
groups; so the error is mainly influenced by the water force field.
The calculated Gex increases from 0.712 kBT to 0.935 kBT as a sacrifice to reduce the error
of the other 70 binary mixtures containing water.The adapted
force field parameters have an influence on the properties
calculated using MD simulations. Internal MD calculations show that
the calculated density of water at 298.15 K is reduced from 1065.6
± 1.7 kg/m3 for the flexible TIP3P model to 938.1
± 1.4 kg/m3 for the OPLS-PM force field (experimental:[52] 997.06 ± 0.01 kg/m3). The calculated
enthalpy of vaporization of water at 298.15 K is reduced from 49.05
± 0.04 kJ/mol for the flexible TIP3P model to 35.94 ± 0.02
kJ/mol for the OPLS-PM force field (experimental:[53] 43.90 ± 0.04 kJ/mol).
Accuracy Test of the Optimized
Force Field Parameters
It is tested whether or not the OPLS-PM
force field is more accurate
than the OPLS-AA force field in the calculation of Gex using TI. We use the obtained densities from the performed
MD simulations with the OPLS-AA force field for the MD simulations
with the OPLS-PM force field, because the OPLS-AA force field is optimized
using experimental densities whereas the OPLS-PM force field is not.
So only steps 7–12 are repeated of the protocol described in
“Simulation Setup”. A comparison
of the calculated excess Gibbs free energies of mixing, using force
field parameters shown in Table and Table , with experimental data for 1053 binary mixtures is shown
in Figure .
Figure 8
Scatterplot
of the excess Gibbs free energy of mixing: TI using
force field parameters shown in Table and Table versus experimental data for 1053 binary mixtures. Error
bars indicate the predicted standard deviation of the calculations.
Scatterplot
of the excess Gibbs free energy of mixing: TI using
force field parameters shown in Table and Table versus experimental data for 1053 binary mixtures. Error
bars indicate the predicted standard deviation of the calculations.The OPLS-PM force field (Figure ) shows comparable
results as the OPLS-AA force field
(Figure ). The RMSE
is slightly reduced by OPLS-PM (respectively 0.128 kBT versus 0.132 kBT); however, the correlation coefficient
is also slightly reduced (respectively 0.769 versus 0.791). A RMSE
of 0.128 kBT is expected
since the obtained RMSE of 0.109 kBT for TI in comparison with PAC-MAC, shown in Figure , is the limiting accuracy
(assuming PAC-MAC to predict experimental data perfectly). The following
relation holds between the obtained RMSEs, assuming PAC-MAC and TI
to be unbiased[41]in which ρ equals
0.497 and represents
the correlation coefficient between the errors of PAC-MAC and TI in
comparison with experimental data. RMSEPM–TI represents
the deviation between PAC-MAC and TI, which is increased from 0.109 kBT to 0.117 kBT after the optimization of 50 force
field parameters. Furthermore, RMSEPM–exp represents
the deviation between PAC-MAC and experimental data, which is equal
to 0.099 kBT and is shown
in Figure . Finally,
RMSETI–exp represents the deviation between TI and
experimental data. Equation is presented graphically in Figure .
Figure 9
Graphical presentation
of eq . Including
values 0.099 kBT and
0.117 kBT for respectively
RMSEPM–exp and RMSEPM–TI.
Graphical presentation
of eq . Including
values 0.099 kBT and
0.117 kBT for respectively
RMSEPM–exp and RMSEPM–TI.Solving eq results
in a value for RMSETI–exp equal to 0.129 kBT, supporting the obtained
RMSE in Figure .The accuracy of Gex calculations with
TI is hardly changed using the parameters shown in Table and Table . The correlation between PAC-MAC and TI
has to be increased in order to use PAC-MAC as a force field optimization
engine. On the other hand, using PAC-MAC, it is possible to obtain
force field parameters with a comparable accuracy as the OPLS-AA force
field for the calculation of Gex, without
multiple iterations of time-consuming MD simulations.[54] So, if intermolecular OPLS-AA force field parameters are
missing in a molecule, then PAC-MAC can be used as the auxiliary method
to quickly estimate the missing force field parameters with comparable
accuracy as the OPLS-AA force field.The 71 binary mixtures
containing water are heavily affected by
the adapted force field: the OPLS-AA force field mainly overestimates
the mixing free energy, whereas the OPLS-PM force field mainly underestimates
the mixing free energy (with the exception of the earlier discussed
mixture hexafluoroisopropanol–water). The big impact again
confirms the statement that small changes to the force field parameters
can have big effects in the outcome.Figure illustrates
two requirements to further increase the accuracy of the force field
for the prediction of mixing free energies using PAC-MAC as the correlated
auxiliary method. First, a higher correlation between PAC-MAC and
TI will result in a reduction of RMSETI–exp. The
correlation can be increased by using a physical approach or by using
a statistical approach. In the physical approach, the PAC-MAC method
is brought closer to TI by reducing disparities between both methods,
for example by taking three-body interactions or long-range interactions
into account. In the statistical approach, the correlation is increased
by increasing the number of fitting parameters shown in Table . Perfect correlation will be
obtained if the amount of fitting parameters is at least equal to
the amount of experimental data points, for example by using binary
mixture dependent sseg and xvac parameters. This statistical approach is used by Van
Westen et al. to optimize force field parameters using the PC-SAFT
equation of state as the correlated auxiliary function.[48] Second, a higher accuracy of the PAC-MAC method
will result in a reduction of RMSETI–exp. Again,
a physical or statistical approach can be used. In the physical approach,
the PAC-MAC method is modified to get closer to reality not only by
taking three-body interactions or long-range interactions into account
but also by improving potential energy functions. In the statistical
approach, the correlation is increased by increasing the number of
tunable force field parameters shown in Table and Table . One can make a quantitative estimate whether such
a goal is achievable. Equation can be used to calculate the required accuracy of PAC-MAC
calculations to achieve a desired reduction in error. For example,
suppose that PAC-MAC could be improved to the same level as the best
UNIFAC methods (i.e., a RMSE of about 0.07 kBT compared with experiment). Perhaps such
an improvement is not unrealistic, given prospects of including better
packing models and potentially three-body interactions. In such cases,
RMSEPM–TI could potentially be reduced to 0.08 kBT and RMSEPM–exp to 0.07 kBT, and then
it follows that RMSETI–exp would be as low as 0.09 kBT. This is a tantalizing improvement
of 30% in comparison with the current OPLS-AA force field. Such investigation
is left for further research.
Conclusion
In
this paper, excess free energies of mixing are calculated, by
thermodynamic integration of MD simulations using the OPLS-AA force
field, for an extensive and diverse set of 1053 binary mixtures. The
obtained results are compared with Gex obtained from experimental data and the PAC-MAC method. Correlation
with experimental data is obtained, and the observed RMSE of 0.132 kBT (and an effective RMSE,
for infinitely long MD simulations, of 0.115 kBT) is much smaller than a RMSE of 2.6 kBT observed for the enthalpy
of vaporization[12] or a minimum RMSE of
1.5 kBT observed for
the free energy of solvation.[17,18] Furthermore, the flexible
TIP3P water force field is proven to overestimate mixing free energies.The error of the PAC-MAC method is higher than the error of TI
since PAC-MAC contains, besides the chosen force field, several other
assumptions in comparison with MD simulations. However, an observed
effective RMSE of 0.150 kBT is only 30% above TI indicating that the assumptions made within
the method are plausible. The benefit of the small loss in accuracy
is a potential 1.0 × 104 times faster calculation,
making PAC-MAC a quick alternative for TI in the calculation of mixing
free energies using classical force fields.The calculation
speed of PAC-MAC allows us to increase its accuracy
by optimizing force field parameters based on experimental miscibility
data instead of enthalpies of vaporization or densities. The RMSE
is reduced to 0.099 kBT using the optimized OPLS-PM force field. Especially the prediction
of mixing free energies of mixtures containing water is greatly increased
by the adapted water force field: in contrast to the TIP3P force field,
a positive bias is not observed.The OPLS-PM force field, optimized
without multiple iterations
of time-consuming MD simulations,[54] also
shows comparable accuracy as the OPLS-AA force field in the prediction
of mixing free energies using TI of MD simulations. By increasing
the correlation between PAC-MAC and TI or by increasing the accuracy
of PAC-MAC, it should be possible to obtain even better force field
parameters using PAC-MAC as an auxiliary function.
Authors: Huai Sun; Zhao Jin; Chunwei Yang; Reinier L C Akkermans; Struan H Robertson; Neil A Spenley; Simon Miller; Stephen M Todd Journal: J Mol Model Date: 2016-01-27 Impact factor: 1.810