Literature DB >> 30763086

Deriving Force-Field Parameters from First Principles Using a Polarizable and Higher Order Dispersion Model.

Abstract

In this work we propose a strategy based on quantum mechanical (QM) calculations to parametrize a polarizable force field for use in molecular dynamics (MD) simulations. We investigate the use of multiple atoms-in-molecules (AIM) strategies to partition QM determined molecular electron densities into atomic subregions. The partitioned atomic densities are subsequently used to compute atomic dispersion coefficients from effective exchange-hole-dipole moment (XDM) calculations. In order to derive values for the repulsive van der Waals parameters from first principles, we use a simple volume relation to scale effective atomic radii. Explicit inclusion of higher order dispersion coefficients was tested for a series of alkanes, and we show that combining C6 and C8 attractive terms together with a C11 repulsive potential yields satisfying models when used in combination with our van der Waals parameters and electrostatic and bonded parameters as directly obtained from quantum calculations as well. This result highlights that explicit inclusion of higher order dispersion terms could be viable in simulation, and it suggests that currently available QM analysis methods allow for first-principles parametrization of molecular mechanics models.

Entities: Chemical Disease Gene

Year: 2019 PMID： 30763086 PMCID： PMC6581419 DOI： 10.1021/acs.jctc.8b01105

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Atomistic force fields are becoming increasingly important for explaining events in biological systems and for molecular design.[1−3] Through the use of GPU accelerated computing and specialized hardware it is now possible to simulate large biologically relevant systems on unprecedented time scales.[4,5] These simulations are reliant on the performance of the underlying force-field description of the relevant molecular interactions.[6,7] Next-generation atomistic force fields are polarizable force fields, which explicitly model electron density deformation by external electric fields stemming from environmental effects.[8,9] While still employing simple point potential functions, polarizable models can help improve our ability to describe typical properties of interest.[10] Parametrizing molecules for force fields can be laborious, particularly for polarizable force fields, which contain additional parameters to describe the inducible dipole moments.[11] In addition most force-field optimizations are underdetermined by nature as they typically require calibrating too many parameters on a relatively small set of observables. The ability of biomolecular force fields to describe processes and equilibria of interest is usually measured by the ability to reproduce experimentally determined observables in condensed and hydrated phases.[7] Considering that the number of model compounds for which high-quality experimental data are available can be limited,[12] this optimization problem cannot be simply solved by a broad extension of calibration data sets. As a consequence, minimizing the amount of free variables in the parameter optimization step can strongly accelerate force-field development. For that purpose bonded parameters describing bond stretching and angle bending have been directly and successfully derived in several automated parametrization protocols[12,13] from quantum mechanical (QM) Hessian calculations. Electrostatic parameters can be derived from a gas-phase ground-state calculation, either by fitting the molecular electrostatic potential on a grid around the compound resulting in electrostatic potential (ESP) derived charges[12−14] or by partitioning the computed electron density in an atoms-in-molecules (AIM) manner, where the difference between the effective number of electrons and the nuclear charge determines the partial charge of the atoms.[15−19] We recently showed that the additional electrostatic parameters in nonadditive force fields (polarizabilities of the inducible dipole moments) can be derived from combined QM/MM calculations in the condensed phase, and we successfully applied this approach in first parametrization studies.[20−22] A complicating factor in many force-field calibration efforts is the optimization of van der Waals parameters, which are typically fitted based on condensed-phase properties of a training set of molecules. Applying response theory to determine values for the attractive C6 dispersion parameter for use in molecular simulation is computationally intensive and impractical for larger molecules.[23] Moreover a partitioning into atom–atom contributions from a molecular dispersion strength is required, which can be nontrivial. As a result most of the available force fields use tabulated van der Waals parameters that are assigned based on local atom typing.[7,24−27] While significant progress has been made in automated force-field parametrization through smart site typing[28−30] and automatic calibration through gradient descent,[20,31] covering a large part of possible chemical space for, e.g., drug-like compounds is still difficult for calibration-based approaches with many free parameters. However, recent work by Cole et al.[32,33] to determine dispersion parameters for classical force fields from atomic reference values scaled based on AIM derived atomic volumes shows promise as an ab initio approach to estimate attractive van der Waals parameters. Recently Mohebifar et al.[34] used the exchange-hole–dipole moment (XDM) model and density functional theory (DFT) calculations at the PBE0/aug-cc-pVTZ level of theory to determine dispersion constants of 88 compounds. These constants were found to be systematically and significantly lower than the values currently used in biomolecular force fields.[34−37] This raises the question of whether many of these force-field parameter sets are currently in a suboptimal part of parameter space simply because traditional calibration efforts work on relatively small databases of compounds (due to time and/or experimental data quality constraints), where the minimal number of free parameters in the calibration is in the order of the number of relevant independent experimental observations. To answer this question, several complications in parameter optimization have to be kept in mind. Higher order dispersion terms that comprise interactions involving higher order multipole moments can contribute up to more than 30% of the total dispersion energy and cannot be ignored.[35] However, it would be hard to justify adding additional dispersion terms and associated free parameters (C8, C10) when already facing an underdetermined calibration problem. As a result C6 dispersion coefficients in force fields have to be large enough to compensate for the omission of higher order terms.[34,37] In addition, due to the omission of explicit polarization treatment, additive force fields partially encode polarization effects in their dispersion parameters. While the partial charge model in static-charge force fields is determined under polarizing conditions (e.g., from COSMO calculations[38]), a re-distribution of charges cannot fully envelop the full scope of polarization space (for example when using partial charges to describe a polarized flat molecule such as benzene). This leaves a potentially substantial role for explicitly polarizable models in further minimizing the number of free parameters in force-field calibration, as a more appropriate balance between electrostatic and dispersion forces could be achieved. A polarizable model can also aid in preventing possible inconsistencies when determining properties and parameters either in vacuum and/or condensed-phase environments. In the current work, nonbonded parameters are obtained from gas-phase electronic structure calculations, following the principle that any density shift under influence of electric fields should be (solely) modeled by explicitly treating electronic polarization. The set of (fixed) partial charges used in this study is hence determined in vacuo, preventing double counting of polarization contributions. When using a dielectric continuum with given dielectric permittivity (ϵmedium) to model solvation, one should take into account the self-polarization and distortion contributions, which cannot be represented well in a nonpolarizable force field. Therefore, as reported by Cole et al.,[32] the choice of ϵmedium may not represent solvation properly but rather reflect a state in which the contribution of polarization and distortion cancel out, leading to effective inclusion of induction effects. Similarly, if a condensed-phase environment is described by including static Bq (MM) charges in the QM Hamiltonian, one should consider that the static charges from force fields are typically corrected for polarization and distortion effects and are therefore producing a too small dipole moment of the solvent molecules included in the quantum calculation.[39] In this work we investigate the applicability of using a polarizable model that explicitly includes a higher order (C8) dispersion term together with a force-field parameter set directly derived from quantum calculations, to achieve a physical description of dispersion effects while simultaneously minimizing the number of free variables in parameter optimization. In order to quantum mechanically determine van der Waals parameters, we partition computed molecular electron densities over the available atoms such that we can use the XDM model to estimate dispersion coefficients C6 and C8 from atomic-based densities (ρa):[36] Calculating C6 and C8 for a pair of atoms i and j requires their multipole moment integral M2 and their static atomic polarizability α or α (derived here as described below). Repulsive van der Waals parameters can be estimated quantum mechanically by using (tabulated) free and (computed) in-molecule atomic volumes obtained from QM calculations on isolated atoms and ρa, respectively, to determine values for AIM static polarizabilities α:[32,35] Note that this approximation to derive α is inherently built into XDM. Using the Slater–Kirkwood approximation that links polarizability to atomic radii (r0),[40] Miller derived an approximation (eq ) to compute effective van der Waals radii from polarizabilities,[41] where the prefactor is empirically derived to reproduce van der Waals radii tabulated by Bondi.[42] Atomic repulsive van der Waals parameters C for use in combination with C6 and C8 can be directly determined from the zero-energy point at the van der Waals potential energy function as employed by us, eq . Note that use of different values for the exponent x in the repulsive van der Waals term are evaluated in this work. For the atomic decomposition of QM determined electron densities, we rely here on AIM approaches to obtain ρa’s from the molecular density. AIM methods assign each atom a weight to each of the grid points in the underlying density integration grid, under the condition that the division of electron density on each grid point is complete.[18] All subsequent analyses can therefore be performed at an atomic level, as required for a fully atomistic force field. Popular choices for AIM partitioning schemes include the original Hirshfeld method,[15] in which weights are assigned to each atom (for each integration grid point) based on the local density stemming from a pro-atom placed at the nuclear site. A clear disadvantage of Hirshfeld paritioning is that initial pro-atomic charge states are arbitrarily set and may be wrongly assigned.[18] Iterative Hirshfeld (Hirshfeld-I)[16] introduces an iterative scheme to overcome this limitation and interpolates the value for the pro-atomic states by using a linear combination of the two pro-atoms with closest values for their net atomic charge (as compared to the value obtained in the last iteration). We also evaluated use of the iterative-stockholder analysis (ISA) scheme,[43] where instead of employing a predefined functional form of the pro-atom, a spherical average of the local atomic density is used. As an additional AIM method we tested use of mimimal-basis-iterative-stockholder (MBIS),[17] in which the pro-atom density is expressed as a series of s-type slater functions. A more detailed description of the AIM methods considered here is given in a recently published review by Heidar-Zadeh et al.[18] As part of our efforts we compare the use of these four AIM schemes to partition molecular electron densities into atomic ones. The performance of the different schemes in combination with XDM is evaluated for a series of alkane model compounds (listed in Table ), which predominantly interact via dispersion. Encouragingly, we find that an optimal part of model parameter space for force-field calibration can be directly derived from quantum calculations. Note that in order to determine charge populations and assign partial atomic charges for our final model, we found here a fifth AIM scheme to be most suitable, i.e., DDEC3.[44,45] In this scheme, core and valence electrons are handled separately, and therefore it provides the best performing set of atomic charges in this work. DDEC3 was only used for partial charge assignment here, as its introduction into a XDM dispersion analysis is currently not feasible yet.

Table 1

Simulation Conditions for the 11 Alkane Systems Considered: Values for the Reaction-Field Dielectric Permittivity (ϵ0) and Simulation Temperature (T, in K; Based on the Measurement Temperature of the Density (ρ) and Heat of Vaporization (ΔHvap))a

compound	ϵ₀	T(H_vap)	T(ρ)
propane	1.80	231	231
butane	1.77	273	273
pentane	1.84	298	293
hexane	1.88	298	298
heptane	1.91	298	298
octane	1.95	298	298
nonane	1.96	298	298
isobutane	1.83	261	261
isopentane	1.85	298	293
3-methylpentane	1.89	298	298
3-ethylpentane	1.94	298	298

Reference data and measurement temperature taken from ref (64 and 65).

Methods

Polarizable System

An explicit polarizable model is used in the molecular dynamics (MD) simulations performed in this work. Inducible dipole moments were added to all atoms (including hydrogens) using the charge-on-spring (COS) model,[46,47] where positions of the massless virtual COS sites are iteratively optimized each time step in the simulation according to the local electric field, assuming isotropic and linear response. Virtual sites with a net charge of −8.0 e are used, neutralized by a counter charge of +8.0 e added to the polarizable atom in question. A Thole or other damping model is not used, and therefore all dipole interactions between 1,2 and 1,3 neighbors are excluded. During the local electric field calculations, 1,4 interactions are excluded as well, in accordance with the GROMOS implementation of the COS model.[46,48] Force constants of the springs connecting the COS and polarizable atoms are determined using atomic polarizabilities documented by Miller, with values of sp3 carbon and hydrogen atoms set to 1.061 × 10–3 and 0.387 × 10–3 nm3, respectively.[41]

Type Annotations

To enable accounting for the effect of local chemical environments on values for force-field parameters while keeping our model simple and transferable, atoms with similar chemical environment were clustered into type annotations for all bonded, van der Waals, and static electrostatic parameters. For the atomic charges we considered a chemical environment up to two bonds away to be identical and encoded this into local environment subgraphs.[49] For dispersion parameters the local environment subgraphs were reduced to a single-bond neighbor graph, significantly reducing the number of van der Waals types considered. Angle and covalent bond type assignments were based on the specific elements under consideration and did not include additional environmental rules, but dihedral assignments used van der Waals types instead, resulting in an exhaustive set of dihedral parameters.

Bonded Parameters

For the determination of bonded parameters in an automated fashion, geometry optimizations were performed and Hessians were calculated at the B3LYP/aug-cc-pVTZ level of theory using NWChem 6.8,[50] for all alkane molecules under consideration. Harmonic force constants for bonds and angles were computed using a Hessian matrix projection according to the modified Seminario approach.[51] Zero-energy bond lengths and angles were derived directly from the geometry-optimized coordinate sets. Force constants and zero-energy bond length and angle values were averaged for each respective bonded type. Because a completely new set of interaction parameters was used here, we needed to reoptimize the dihedral profiles. A GROMOS dihedral potential energy term was assigned to every dihedral type, with multiplicity and phase shift set to 3 and 0, respectively.[7] We performed dihedral potential energy surface scans at the B3LYP/aug-cc-pVTZ level of theory using GAMESS-US (version 2014) to determine the force constant associated with the amplitude.[52] A dispersion correction was introduced in these calculations using Grimme’s D3 model.[53] For each dihedral scan, 15° angle increments were applied and optimized geometries from unconstrained geometry optimizations at the same level of theory were used as starting structure by prerotating to a given dihedral angle value. Z-matrix constraints were used to keep dihedral angles constant during dihedral-scan geometry optimizations. Force constants for each dihedral were fitted by linear regression to the difference profile of the lowest and highest energy states, as obtained from the scan for the smallest alkane molecule in which the dihedral type was occurring.

XDM Calculations and Partial Charge Assignment

Gas-phase optimized coordinates (determined at the B3LYP/6-31G* level of theory) were downloaded from the ATB for a set of 11 small molecule training compounds.[12] Subsequently this starting geometry was refined using NWChem 6.8,[50] using a B3LYP functional and an augmented Dunning basis set aug-cc-pVTZ.[54,55] Because the postg program[56,57] was employed for the XDM postprocessing of the resulting densities, we stored them in Molden files and subsequently converted these to the postg recommended .wfx format using Molden2AIM.[56−58] The published version of postg only provides the option of Hirshfeld partitioning for AIM analysis prior to the calculation of atomic exchange dipole holes.[15] Therefore, postg was modified to accept an external Becke grid with corresponding Hirshfeld weights. Horton version 2.1[59] was used for the density partitioning following the Hirshfeld, Hirshfeld-I,[16] ISA,[43] and MBIS methods.[17] Both Hirshfeld and Hirshfeld-I require a pro-atom database with electron densities of the isolated elements in multiple charge states. For consistency these were generated using the same functional (B3LYP) and aug-cc-pVTZ basis set as used in the molecular density calculations. Partial charges were directly derived from the difference between the effective number of electrons assigned during AIM partitioning and the nuclear charge, making the charge model consistent with the dispersion parameter assignment.

MD Simulations

Pure liquid MD simulations of the alkane compounds were performed using GROMOS11 (release 1.4.0),[48] which was modified to include higher order dispersion coefficients. Starting geometries for all molecules were taken from the ATB,[12] and 1024 molecules were packed into periodic cubic boxes using the GROMOS++ tool ran_box.[12,60] After steepest descent energy minimization, initial atomic velocities were generated based on a Maxwell–Boltzmann distribution at 50 K and the systems were subsequently slowly heated in five subsequent NVT simulations of 20 ps each to reach the target simulation temperature (Table ). After this thermalization of the system, 1.5 ns of pre-equilibration time was used under production conditions. In all simulations, leap-frog integration was used to integrate the equations of motion with a 2 fs time step. Production simulations lasted in total 5 ns, and energies and coordinates were stored every 100 and 500 time steps, respectively. Constant temperature was maintained with a Berendsen weak coupling thermostat and a corresponding coupling constant of 0.1 ps.[61] For systems for which experimental densities (ρ) were not available at the same temperature as heats of varporization (ΔHvap), independent simulations were run at each temperature. In the production (and preproduction) simulations, pressure was coupled weakly to 1 atm using a Berendsen barostat[61] with a 0.5 ps coupling constant and an isothermal compressibility set to 4.575 × 10–4 kJ mol–1 nm–3. Center of mass motion of the systems was removed every 1000 time steps in a linear manner. All bond lengths were constrained using SHAKE with a relative geometric tolerance criterion of 10–4.[62] Interaction energies and forces were computed using a charge group based cutoff and a multiple time step algorithm: they were evaluated every time step for pairs of atoms within an inner cutoff of 0.8 nm, while contributions from pairs of atoms within an intermediate range (between 0.8 and 1.4 nm) were computed every five time steps. Long range electrostatic interactions beyond 1.4 nm were treated by the reaction-field approach, using dielectric constants for the pure liquids simulated as listed in Table .[63] Simulations of single-model compounds in the gas phase were coupled to a stochastic bath to maintain the target simulation temperature. The internal friction coefficient was set to 24 ps–1. Gas-phase production runs were in total 20 ns. Equation shows how for each alkane the enthalpy of vaporization (ΔHvap) was computed using the averaged potential energy from a condensed phase (Uconpot) and gas-phase simulation (Ugaspot), where Nmol, R, and T are the number of molecules in simulation, gas constant, and temperature of the system, respectively. Average potential energies were extracted from the energy trajectories using GROMOS++ tool ene_ana.[60]

Results and Discussion

Choice of AIM Method

To quantum mechanically derive model parameters at the atomic level of detail, we used and compared four popular and/or state-of-the-art AIM schemes. The first method considered is Hirshfeld partioning,[15] in which pro-atom densities of the isolated atoms are used to (noniteratively) partition the molecular density into atomic contributions. It is known that the pro-atom choice in Hirshfeld partitioning typically leads to an underestimated charge separation.[18] Consequently the alkanes considered in this work show relatively small values for the individual atomic charges when employing the Hirshfeld partitioning scheme, with partial charges ranging between −0.087 and 0.027 e, Supporting Information Table S1. Similarly, the corresponding dispersion parameters calculated from XDM show a small spread for different carbon atom types (Table ), consistent with results presented by Mohebifar et al.[34]

Table 2

C6 and C8 Dispersion Constants Calculated Using Four Different AIM Methods (Hirshfeld, Hirshfeld-I, MBIS, and ISA) for the Four van der Waals Atom Types Considered in This Worka

	C₆	C₈	r₀
Hirshfeld
HC	2.41	52.9	0.1275
CH1	18.49	675.1	0.1709
CH2	19.67	706.4	0.1698
CH3	21.11	748.0	0.1687
Hirshfeld-I
HC	1.88	38.4	0.1234
CH1	16.88	594.4	0.1688
CH2	24.72	1050.3	0.1766
CH3	33.99	1574.2	0.1820
MBIS
HC	0.99	14.3	0.1090
CH1	19.49	766.3	0.1714
CH2	26.82	1140.4	0.1766
CH3	37.65	1695.2	0.1820
ISA
HC	2.17	44.3	0.1231
CH1	9.60	255.0	0.1516
CH2	13.30	363.4	0.1576
CH3	20.06	624.3	0.1654

HC denotes a generic alkane hydrogen, while CHx denotes an sp3 carbon with x hydrogens bound to it. Column r0 lists predicted atomic radii derived from static polarizabilities using the Slater–Kirkwood approximation. C6 and C8 are given in atomic units; r0 in nanometer. In the iterative-Hirshfeld scheme, the atomic-based electron densities are determined in an iterative approach in which atomic weights at the integration grid points are updated every iteration based on the results of the previous one.[18] This approach has been shown to result in an improved description of molecular electrostatic potentials.[18] With Hirshfeld-I we find distinctly different dispersion coefficients for the carbon atoms depending on the number of attached hydrogens, Table . A larger part of the hydrogen–carbon boundary region is assigned to the carbon atoms resulting in significantly larger estimates for the negative partial charges and the dispersion coefficients of the carbons with multiple hydrogen atoms attached, when compared to the values obtained using noniterative Hirshfeld (Tables and S1). Minimal-basis-iterative-stockholder was suggested to be an improvement over Hirshfeld in partitioning electron densities of oxides, where the dependency on a pro-atom database was removed.[17] While direct performance of XDM on MBIS partitioned densities has to our knowledge not been reported in literature, results from previous work in which the Tkatchenko–Scheffler method was used to derive dispersion coefficients[66] suggest that XDM may also benefit from using this relatively new partitioning method. With MBIS we find slightly stronger charge separation between the carbon and hydrogen atoms when compared to Hirshfeld-I, resulting in larger partial charges for most atom types (Table S1) and smaller dispersion coefficients for the hydrogens (Table ). We also tested the iterative-stockholder analysis partitioning scheme, which was developed independently from Hirshfeld-I. Table shows that dispersion is highly underestimated by ISA compared to the other AIM partitioning schemes, probably due to the spherical averaged pro-atom densities used in ISA. Because of this discrepancy in the dispersion calculations, ISA was not considered further in deriving alkane models for the pure liquid simulations in the last part of this work. Similarly to applying damping functions, the evaluation of van der Waals interactions relies on using atomic van der Waals radii, which describe the distance at which the van der Waals interaction energy between two particles is zero. Values for these radii are not trivial to derive.[35] Typically they have been calibrated against pure liquid reference data, but having well-defined starting points for calibration is preferential. For that purpose we used eqs and 4 to quantum mechanically compute radii as starting points, in combination with the four different AIM partitioning schemes. The results are listed in Table and show in general a smaller variation in the calculated radii among the different AIM schemes than observed for the computed dispersion parameters.

Pure Liquid Performance

To verify the use of an additional dispersion term and of the van der Waals parameters derived from the QM, AIM, and XDM calculations, we performed pure liquid simulations of alkanes because their properties predominantly depend on the strength of involved dispersion interactions. In MD simulations the repulsive part of the van der Waals potential energy function is typically modeled with a C12 asymptotic function, chosen for computational efficiency. Here we opted to test which exponential term gave the best performance in combination with a C6 and C8 potential. We tested C11 to C14 repulsive functions (i.e., x in eq = 11, 12, 13, or 14) for a first selection of four alkanes (propane, butane, hexane, and isobutane) and for each of the AIM methods under consideration, Tables S2 and S3. From all 12 models, Hirshfeld-I combined with a C11 repulsive scheme appears the most promising with a root-mean-square deviation (rmsd) of computed values from experiment of 21.4 kg m3 and 1.15 kJ mol–1 for density (Table S2) and heat of vaporization (Table S3), respectively. While the C11 repulsive term is not commonly used in molecular simulation, there is a theoretical basis for this repulsive shape, as higher order dispersion calculations on hydrogen show that the odd indexed (e.g., C11 and C13) are repulsive, stemming from third order perturbation theory.[67] As summarized in Table , Hirschfeld-I thus outperformed the other tested combinations (Table ), in line with our finding that Hirschfeld-I performs well in predicting molecular C6 values when compared to the corresponding experimental (DOSD oscillator) estimates,[68]Table S4.

Table 3

Pure-Liquid Simulation Results for a Set of Four Alkanes, Using Dispersion Parameters Obtained from XDM and Our Predicted van der Waals Radiia

	ρ(exp)	ρ(sim)	ΔH_vap(exp)	ΔH_vap(sim)
Hirshfeld (C₁₂)
propane	582.5	565.4	18.87	20.04
butane	600.7	564.6	22.4	22.89
hexane	654.8	606.2	31.6	30.11
isobutane	594.0	587.8	21.42	23.50

rmsd		31.6		1.43
Hirshfeld-I (C₁₁)
propane	582.5	576.4	18.87	20.08
butane	600.7	569.0	22.4	22.58
hexane	654.8	627.2	31.6	31.28
isobutane	594.0	599.3	21.42	23.35

rmsd		21.4		1.15
MBIS (C₁₁)
propane	582.5	646.7	18.87	20.55
butane	600.7	618.4	22.4	22.82
hexane	654.8	712.7	31.6	33.52
isobutane	594.0	684.2	21.42	24.59

rmsd		63.1		2.0

For each of the AIM methods tested, use of the best performing repulsive exponential shape is listed (indicated in parentheses per AIM method). Experimental (exp)[64,65] and calculated values (sim) for densities (ρ) and heats of vaporization (ΔHvap) are given in kg m–3 and kJ mol–1, respectively, together with values for the root-mean-square deviation (rmsd) from experiment. Encouraged by the results for our set of four training compounds, we tested model performance for a larger range of alkanes, including multiple branched alkanes (Table S5). In general the computed values for ΔHvap are close to experiment (with a maximum deviation of 1.9 kJ mol–1), implying that we successfully predicted a region of parameter space with balanced interaction energies. However, for several of the branched alkanes (isopentane, 3-methylpentane, and 3-ethylpentane), calculated densities are significantly off from experiment with a rmsd of 50 kg m–1, Table S5. We traced this discrepancy to electrostatic interactions. The Hirshfeld-I model is dipole agnostic, and the resulting static molecular dipole moments were too large compared to their experimental estimates (approximately 0.3 D for the branched alkanes, while the value should be on the order of 0 D). Upon switching to the DDEC3 AIM method for determination of partial charges,[44,45] the in-molecule charge separations become less pronounced. In the DDEC3 scheme, core and valence electrons are explicit, which results in a more stable set of charges. Unsurprisingly our partial charges (Table S1) are similar to the ones used in the CHARMM Drude oscillator (DO) based polarizable force field for alkanes,[69] which were obtained from electrostatic potential based fitting and range from +0.095 to −0.177 e. For us, in terms of reproducing experimental values for density and heat of vaporization, DDEC3 was superior for charge determination over all other AIM methods tested. Using the DDEC3 derived point charge model, the results of our simulations are consistent with experiment (Table ), with a rmsd from experiment of only 19.9 kg m–3 and 0.64 kJ mol–1 for the density and heat of vaporization, respectively.

Table 4

Pure Liquid Performance of Our Hirshfeld-I Decomposition Based Model Using a C11 Repulsive Shape and Partial Atomic Charges Obtained with the DDEC3 AIM Methoda

compound	ρ(exp)	ρ(sim)	ΔH_vap(exp)	ΔH_vap(sim)
propane	582.5	574.8	18.87	19.60
butane	600.7	580.1	22.4	22.82
pentane	626.2	606.1	26.43	26.47
hexane	654.8	634.8	31.6	31.55
heptane	679.4	660.2	36.65	37.02
octane	698.6	679.7	41.49	42.01
nonane	713.8	695.0	46.4	46.45
iso-butane	594.0	599.2	21.42	23.1
isopentane	620.1	594.8	24.85	25.26
3-methylpentane	659.8	633.7	30.28	29.69
3-ethylpentane	693.8	668.2	35.22	35.30

rmsd		19.9		0.64

Experimental (exp)[64,65] and calculated (sim) values for the densities and heats of vaporization are given in kg m–3 and kJ mol–1, respectively.

Experimental (exp)[64,65] and calculated (sim) values for the densities and heats of vaporization are given in kg m–3 and kJ mol–1, respectively. Tables S6 and S7 compare the performance of our final model with other COS or Drude oscillator based polarizable force fields for hydrocarbons, i.e., the CHARMM-DO[69] and the GROMOS 45A3/COS alkane models.[70] Both were calibrated to reproduce experimental values for bulk liquid properties, and from a comparison with experiment, they perform slightly better than our model for the alkanes that were included both in this work and in ref (69) or (70) (Tables S6 and S7). We also find that both the CHARMM-DO and GROMOS 45A3/COS[70] force fields comprise effective C6 parameters that are significantly larger than those calculated by XDM (cf. Table S8), as was reported earlier by Mohebifar et al.[34] In terms of fitted and calculated effective radii, our model utilizes the same part of parameter space as the CHARMM-DO force field, in particular for the CH2 and CH3 groups; see Table S8 as well.

Conclusions

In this work we present a strategy to develop an all-atom force field in which every parameter is determined from first principles, including higher order van der Waals constants. For a set of 11 linear and branched alkanes we found that by adding a single higher order dispersion term (C8) to a polarizable model and using a C11 repulsive potential energy term, we reproduce experimental estimates for thermodynamic pure liquid properties of the alkanes with a rmsd of 19.9 kg m3 and 0.64 kJ mol–1 for density and heat of vaporization, respectively, when using a set of model parameters that was completely derived from (a series of) quantum calculations. For that purpose we included application of the XDM model to compute dispersion coefficients and a Slater–Kirkwood approach to determine atomic radii. To derive AIM values for these radii and coefficients, we found the Hirschfeld-I partitioning scheme well-suited to obtain van der Waals parameters. In a next study we will examine whether XDM dispersion calculations have sufficient accuracy to correctly predict dispersion parameters for a large variety of molecules of interest, including other model compounds for typical biomolecular building blocks. The notion that our strategy can provide high-quality starting points required for automated calibration of novel force fields (e.g., by using automated calibration procedures[71]) is encouraging. Furthermore we show that it is possible to capture higher order dispersion in a relatively simple and efficient model, where required changes to force-field and simulation software are minimal. According to recent studies in literature, effects from higher order dispersion may be important for a correct model description of, for example, the folding of disordered protein domains, which is an additional motivation for including such a term in our force fields.[72,73] While our work clearly demonstrates that it is possible to capture higher order dispersion terms explicitly in molecular simulation, it is undeniable that classical force fields with only a C6 dispersion term have been successful for years. We speculate that the reasons that C6 attractive potentials can be successful are 4-fold: (i) there is no exact definition for atomic radii that should be used, and therefore effective radii can be slightly lowered to deepen well depths (ϵ). (ii) C6 values can be raised to effectively include higher order dispersion coefficients, as addressed by Mohebifar et al..[34] (iii) The choice of combination rules can have a large impact on effective well depths for heterogeneous atom pairs. In this work we used additive combination rules where the total van der Waals distance between two heterogeneous atoms is the sum of their invididual radii (σ = r + r). However, in other force fields (e.g., GROMOS and OPLS) geometric combination rules for the repulsive part are used (, which effectively lowers van der Waals distances (σ) for heterogeneous atom pairs. Finally, (iv) the shape of the repulsive function is purely empirical, as exchange interactions cannot be mapped to simple classical mechanics functions. Having switched to C11 in this work provides additional damping of near- to medium-ranged interactions, thereby effectively reducing ϵ. Similarly, choosing a C13 or higher repulsive potential would deepen effective well depths, compensating for the omission of higher order dispersion terms. Here we showed that for dispersion driven systems inclusion of higher order dispersion terms can lead to efficient force-field parametrization. Therefore such an approach may be advantageous when both C6 and C8 can be calculated efficiently for the molecules of interest. In addition we speculate that by following a physical regime of parameter space for all atoms, we may create more coherent interactions between heterogeneous atom types, possibly reducing the number of special interaction parameters that are currently used in force fields.

5 in total

1. Data-Driven Mapping of Gas-Phase Quantum Calculations to General Force Field Lennard-Jones Parameters.

Authors: Sophie M Kantonen; Hari S Muddana; Michael Schauperl; Niel M Henriksen; Lee-Ping Wang; Michael K Gilson
Journal: J Chem Theory Comput Date: 2020-01-17 Impact factor: 6.006

2. A collection of forcefield precursors for metal-organic frameworks.

Authors: Taoyi Chen; Thomas A Manz
Journal: RSC Adv Date: 2019-11-13 Impact factor: 4.036

3. Deriving a Polarizable Force Field for Biomolecular Building Blocks with Minimal Empirical Calibration.

Authors: Koen M Visscher; Daan P Geerke
Journal: J Phys Chem B Date: 2020-02-19 Impact factor: 2.991

4. New scaling relations to compute atom-in-material polarizabilities and dispersion coefficients: part 1. Theory and accuracy.

Authors: Thomas A Manz; Taoyi Chen; Daniel J Cole; Nidia Gabaldon Limas; Benjamin Fiszbein
Journal: RSC Adv Date: 2019-06-19 Impact factor: 4.036

5. Atomic Partitioning of the MPn (n = 2, 3, 4) Dynamic Electron Correlation Energy by the Interacting Quantum Atoms Method: A Fast and Accurate Electrostatic Potential Integral Approach.

Authors: Mark A Vincent; Arnaldo F Silva; Paul L A Popelier
Journal: J Comput Chem Date: 2019-08-02 Impact factor: 3.376

5 in total