Literature DB >> 35675713

Replica-Exchange Enveloping Distribution Sampling Using Generalized AMBER Force-Field Topologies: Application to Relative Hydration Free-Energy Calculations for Large Sets of Molecules.

Salomé R Rieder¹, Benjamin Ries¹, Kay Schaller¹, Candide Champion¹, Emilia P Barros¹, Philippe H Hünenberger¹, Sereina Riniker¹.

Abstract

Free-energy differences between pairs of end-states can be estimated based on molecular dynamics (MD) simulations using standard pathway-dependent methods such as thermodynamic integration (TI), free-energy perturbation, or Bennett's acceptance ratio. Replica-exchange enveloping distribution sampling (RE-EDS), on the other hand, allows for the sampling of multiple end-states in a single simulation without the specification of any pathways. In this work, we use the RE-EDS method as implemented in GROMOS together with generalized AMBER force-field (GAFF) topologies, converted to a GROMOS-compatible format with a newly developed GROMOS++ program amber2gromos, to compute relative hydration free energies for a series of benzene derivatives. The results obtained with RE-EDS are compared to the experimental data as well as calculated values from the literature. In addition, the estimated free-energy differences in water and in vacuum are compared to values from TI calculations carried out with GROMACS. The hydration free energies obtained using RE-EDS for multiple molecules are found to be in good agreement with both the experimental data and the results calculated using other free-energy methods. While all considered free-energy methods delivered accurate results, the RE-EDS calculations required the least amount of total simulation time. This work serves as a validation for the use of GAFF topologies with the GROMOS simulation package and the RE-EDS approach. Furthermore, the performance of RE-EDS for a large set of 28 end-states is assessed with promising results.

Entities: Chemical

Mesh：

Substances：
Water

Year: 2022 PMID： 35675713 PMCID： PMC9241072 DOI： 10.1021/acs.jcim.2c00383

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 6.162

Introduction

In recent years, free-energy calculations (either absolute or relative) based on classical molecular dynamics (MD) simulations have started to play an increasingly important role in the field of computer-aided drug design.[1−9] There exist many well-established pairwise free-energy methods such as thermodynamic integration (TI),[10] free-energy perturbation (FEP),[11] Bennett’s acceptance ratio (BAR),[12] and multistate BAR (MBAR).[13] Approaches such as multisite λ-dynamics[14−16] and enveloping distribution sampling (EDS)[17,18] enable the calculation of pairwise free-energy differences for multiple end-states from a single simulation. While λ-dynamics uses the coupling parameter λ as a dynamic variable to connect the end-states, EDS is a pathway-independent method that samples a reference state “enveloping” all end-states. Recently, replica-exchange EDS (RE-EDS)[19−21] and accelerated EDS (A-EDS)[22,23] have been developed as extensions of EDS to simplify the parameter optimization and improve the performance of EDS. Both methods are implemented in the GROMOS software package.[24] Apart from the free-energy method, the quality of the underlying force field is crucial for the accuracy of free-energy calculations and of MD simulations in general.[25−28] In the past years, various tools have been developed to automate the otherwise laborious task of topology generation for small molecule ligands, such as antechamber,[29−31] the automated topology builder (ATB),[32,33] the fragment-based CombiFF,[34,35] general automated atomic model parametrization (GAAMP),[36,37] LigParGen,[38] open force field (SMIRNOFF and OpenFF),[39,40] ParamChem,[41−43] PRODRG,[44] R.E.D.,[45] or SwissParam.[46] MD simulation engines such as AMBER,[47−49] CHARMM,[50,51] GROMACS,[52,53] GROMOS,[24,54] or OpenMM[55,56] require specific file formats to describe the system topology and coordinates. In addition, there are also small differences in the functional form of the force fields or in the units used by different MD engines.[28] In many cases, tools are already available to translate between some of the different file formats, enabling, e.g., the use of AMBER topologies with GROMACS.[57−61] The calculation of (absolute or relative) hydration free energies serves as a straightforward test case to assess and compare the quality of different free-energy methods and to validate force fields.[62,63] Thanks to databases such as FreeSolv,[64,65] the Minnesota solvation database,[66] or the ATB server,[32,33] ample reference data is available for both experimental results as well as calculated values obtained with different force fields and free-energy methods. Furthermore, the calculation of hydration free energies is computationally far less expensive than, for example, that of binding free energies. In this work, a newly introduced GROMOS++[67] program amber2gromos, developed by the authors of this study, is described. It translates a topology from the AMBER prmtop[68] file format to a GROMOS topology, enabling users of the GROMOS MD engine to simulate systems with the AMBER or generalized AMBER (GAFF[30]) force fields. Extension to the OpenFF[39,40] family of force fields is straightforward. In the following, the underlying differences between the AMBER and GROMOS force fields are discussed, and the necessary conversions are described in detail. The correctness of the topology conversion is validated by comparison of single-molecule simulations in vacuum using GROMACS or GROMOS. Furthermore, two sets of small benzene derivatives are assembled from the FreeSolv[64,65] database: a small set of six molecules, labeled A, and a larger set of 28 molecules, labeled B. For these molecules, relative hydration free energies are calculated with TI[10] in GROMACS (set A only) and with RE-EDS[19−21] in GROMOS (sets A and B). The results are compared to each other and to the experimental and calculated values reported in the FreeSolv[64,65] database.

Theory

Differences between the AMBER and GROMOS Force Fields

The use of an automation tool such as AmberTools[49] (i.e., antechamber(29−31) and tleap) simplifies the process of topology generation for small organic molecules considerably. In order to use GAFF[30] topologies in GROMOS, they have to be converted to a GROMOS-compatible file format. There are several differences between the AMBER/GAFF and GROMOS[69−71] force fields. First, GAFF is an all-atom force field, whereas most GROMOS (compatible) force fields use united atoms (i.e., implicit hydrogens) for the aliphatic CH groups, to reduce the computational cost.[28] The GROMOS force fields are usually parametrized with the simple point-charge (SPC)[72] water model, whereas the AMBER force-field family is parametrized with the TIP3P[73] water model.[28] A minor difference is the use of different units, e.g., nm, degrees, and kJ mol–1 in GROMOS versus Å, radians, and kcal mol–1 in AMBER.[49,74] Second, there are several differences in the potential-energy function, i.e., the functional form of the force fields. Here, the subscripts “A” (AMBER) and “G” (GROMOS) are used to distinguish the terms/parameters of the two force-field families. In AMBER, harmonic bond stretching and bond-angle bending terms are used[28,30,49]where d is the distance between two bonded atoms, K is the harmonic bond force constant, d0, is the equilibrium distance, θ is the angle formed by three bonded atoms, K is the harmonic angle force constant, and θ0, is the reference bond angle. In GROMOS, harmonic bond stretching and bond-angle bending are also implemented. However, quartic bond stretching and cosine-harmonic bond-angle bending are used by default to increase computational efficiency.[28] They are defined as follows[75]with the parameters being defined analogously to the AMBER parameters. It is important to note that for AMBER/GAFF, the factor 1/2 in the harmonic bond stretching and bond-angle bending equations is already included in the force constants K and K (compare eqs and 2 with eqs and 4).[30,49] The harmonic force constants can be converted to the quartic and cosine-harmonic force constants, respectively, as[75]where kB is the Boltzmann constant, and Teff is an effective absolute temperature for the conversion (e.g., 300 K).[75] Another difference between AMBER and GROMOS is the potential-energy function used for the out-of-plane distortions. In AMBER, the same function is used for both proper and improper dihedral changes[28,30,49]In contrast, GROMOS uses different functional forms for proper and improper dihedral changes[28,75]The improper term is also used for out-of-tetrahedron distortions around the CH1 united atom. Both force-field families use the Lennard-Jones functional form for the van der Waals interactions[30,49,75]where r is the (minimum-image) distance between atoms i and j, A and C12, are the repulsion coefficients, and B and C6, are the dispersion coefficients. There are, however, differences in the combination rules used (geometric in GROMOS and Lorentz–Berthelot in AMBER[28]) and the handling of third-neighbor interactions. In AMBER/GAFF, the Lennard-Jones 1,4-interactions are scaled by a factor 1/2,[30,49] whereas GROMOS force fields contain a special set of parameters[76] (CS12 and CS6) for such interactions, typically involving a reduced repulsion coefficient. Using a scaling factor or reduced interaction parameters for third-neighbor interactions avoids having a too large repulsion in gauche conformations (relative to trans conformations).[75] In AMBER/GAFF, electrostatic 1,4-interactions are also scaled by a factor 1/1.2.[30,49] Such a scaling is not applied in GROMOS, although in some cases, third neighbors are excluded completely, for example for atoms that are in or attached to an aromatic ring.[76] Furthermore, in AMBER/GAFF, the factor[68]is included in the atomic charges for computational efficiency. Here, k is Coulomb’s constant, and ε0 is the permittivity of vacuum.

Relative Hydration Free Energies

The hydration free energy quantifies the free-energy change when a molecule is transferred from gas to water.[62,77] In this work, relative hydration free energies ΔΔGhyd are used to compare different free-energy methods against each other and against experimental values (Figure ). For three molecules i, j, and k, it holds that[78]where ΔGhyd is the hydration free energy of molecule i, ΔGvac is the free-energy difference between molecules i and j in vacuum, ΔGwat is the free-energy difference between the two molecules in water, and ΔΔG is the hydration free-energy difference between the two molecules (relative hydration free energy).

Figure 1

Thermodynamic cycle to calculate relative hydration free energies ΔΔGhyd for three small molecules i (aniline), j (1,3-dichlorobenzene), and k (anisole). Here, ΔGhyd is the hydration free energy of molecule i, ΔGvac is the free-energy difference between molecules i and j in vacuum, ΔGwat is the free-energy difference between the two molecules in water, and ΔΔG is the hydration free-energy difference between the two molecules (relative hydration free energy). The free-energy difference between two molecules can be calculated from multiple pairwise simulations (as shown on the left, e.g., with TI) or from one simulation with multiple molecules (as shown on the right, e.g., with RE-EDS). In classical MD simulations, hydration free energies are typically calculated with so-called alchemical free-energy methods.[64,79−81] Such methods transform a molecule (or its interaction with the environment) into another one via nonphysical pathways.[64] The following sections give a brief overview of the two free-energy methods used in the present study.

Thermodynamic Integration (TI)

TI is a well-established method to calculate free-energy differences.[10] For two end-states A and B, and using a linear coupling scheme, the potential energy of the system is defined asAt λ = 0 and λ = 1, the potential energy corresponds to that of end-state A and end-state B, respectively. This defines a λ-dependent path between the two end-states. After carrying out independent simulations at discrete λ-values between 0 and 1, the free-energy difference between states A and B can be estimated as[10]

Replica-Exchange Enveloping Distribution Sampling (RE-EDS)

RE-EDS is a multistate free-energy method,[19−21] which combines Hamiltonian replica exchange[82,83] (RE) with enveloping distribution sampling (EDS).[17,18] In EDS, a reference state V is defined based on N end-states aswhere s is the smoothness parameter (s > 0), E is a vector of energy offsets, and β = 1/(kBT). At high s-values (∼1.0), due to the negative exponent, the end-state with the lowest value of V(r) – E contributes the most to the sampling of V. As s decreases, it “smoothes” the potential-energy landscape such that all end-states contribute to the reference state, leading to an unphysical intermediate situation,[19] referred to as “undersampling”.[84] The energy offsets control the contribution of the end-states to the reference-state potential. Optimal energy offsets ensure equal weights of all end-states in the reference state. The free-energy difference between any pair of end-states can then be obtained from a single simulation as[17,18]For EDS simulations, it is essential to choose an optimal s-value along with optimal energy offsets to achieve adequate sampling of all end-states, which is nontrivial for more than two end-states.[84] RE-EDS enhances the sampling and simplifies the parameter optimization by simulating several replicas with different s-values in parallel and performing replica exchanges between them.[19] In recent studies, RE-EDS was successfully used to calculate relative binding and hydration free energies for molecules containing relatively large structural changes (i.e., R-group modifications and core-hopping transformations such as ring opening/closing and ring size changes).[21,85]

Methods

Implementation of amber2gromos

The amber2gromos program is a novel C++[86] tool integrated into the GROMOS++ package of programs.[67] It converts AMBER topologies to a GROMOS-compatible file format. For a given AMBER topology, the program parses the input topology file, converts the force-field parameters, and outputs a GROMOS topology. The conversion steps of the program are outlined in the Supporting Information. An example of an AMBER topology for aniline and the resulting GROMOS topology translated with amber2gromos can be found in the Supporting Information in Listings 1 and 2, respectively. The conversion process described in the Supporting Information generates a valid GROMOS topology from a given AMBER topology. However, there is still a difference between AMBER and GROMOS in the handling of the 1,4-electrostatic interactions (i.e., scaling by a factor 1/1.2[30,49] in AMBER). As 1,4-electrostatic interactions are not scaled in the GROMOS force fields, the scaling option is not supported within a GROMOS topology. Therefore, some changes had to be made to the GROMOS[24] source code. A new block type “AMBER” was added to the GROMOS input file, which contains a switch (0 = off, 1 = on) for the scaling of the electrostatic third-neighbor interactions together with the scaling parameter (e.g., 1.2, for scaling by 1/1.2). When the switch is off, the scaling parameter is set to 1.0 so that no scaling is applied. When a reaction field (RF) correction[87] is used in GROMOS to account for long-range electrostatic interactions, these interactions are calculated as[28,75]where q and q are the charges of atoms i and j, ε0 is the permittivity of vacuum, and R is the cutoff distance.[28,75,76,87] Here, C is a constant characterizing the effect of the RF continuum given by[75,87]where ε is the relative permittivity of the medium in which the simulation is performed, ε is the RF permittivity, and κ is the inverse Debye screening length.[75] The electrostatic interactions in the GROMOS[24,88] source code were changed towhere s corresponds to the AMBER scaling factor when i and j are third neighbors, and 1.0 otherwise. Note that in this expression, only the direct Coulombic interactions are scaled.[88,89] As the AMBER force fields do not use charge groups, by default, all the atoms in a molecule/residue are assigned to the same charge group for the simulations with RF electrostatics[87] in GROMOS (which is only appropriate for small molecules). In addition, there is also the option to consider each atom as defining its own charge group (resulting in an atom-based cutoff). The user can also prepare a so-called charge-group file, defining which (consecutive) atoms should be assigned to the same charge group. There are currently ongoing efforts to combine RE-EDS with the shifting-function based scheme developed by Kubincóva et al.[90] such that an atom-based cutoff can be employed in the RF electrostatics without cutoff artifacts. This will become the default choice when using the AMBER (and OpenFF) force fields in GROMOS in the near future, mitigating the requirement to define charge groups altogether.

RE-EDS Pipeline

Recently, Ries et al.[21] presented an improved pipeline to perform relative free-energy calculations with RE-EDS. The RE-EDS pipeline can be divided into three main phases: parameter exploration, parameter optimization, and production. The parameter exploration phase is used to generate relevant configurations for all end-states, to determine a lower bound for the s-values that ensures undersampling, and to obtain initial estimates for the energy offsets. During the parameter optimization, the distribution of s-values and the energy offsets E are refined such that all end-states are sampled nearly equally at s = 1. Finally, a production run is performed to calculate the free-energy differences between all pairs of end-states simultaneously. The entire workflow can be executed using the Python3[91]reeds module.[92] In this work, the RE-EDS pipeline was applied to the calculation of relative hydration free energies of benzene derivatives.

Data Sets

The main goals of this work were to validate the topology conversion with amber2gromos and the accompanying changes in the GROMOS MD engine, as well as to investigate the performance of RE-EDS for systems with many end-states. To this end, two sets of benzene derivatives with experimental hydration free energies in the FreeSolv[64,65] database were selected. This test system was chosen due to the common benzene core, the relatively small size (which limits deviations due to sampling issues), and the availability of calculated and experimental reference data. The potential of RE-EDS to handle larger perturbations has already been demonstrated in previous publications.[21,85]

Set A: Six Benzene Derivatives

As a first test set, six benzene derivatives were selected (Figure ). The number of end-states was deliberately kept small to efficiently test the implementation. The mol2 and frcmod files provided by FreeSolv[64,65] were used to generate AMBER topologies using tleap (AmberTools16).[49] The topologies were then converted to GROMOS format using amber2gromos and to GROMACS[53,93] format using ParmEd.[57] The force-field parameters of the original AMBER topologies and of the generated GROMOS/GROMACS topologies were compared manually. In addition, a single energy evaluation was performed for the individual molecules in vacuum/water using both GROMOS and GROMACS. It was verified that the covalent and nonbonded energy terms calculated by the two different MD engines were nearly identical. Longer MD simulations of 5 ns were then carried out for each molecule in vacuum to compare properties such as the system temperature and the different energy terms.

Figure 2

Set A consists of six benzene derivatives, selected from the FreeSolv[64,65] database. The molecule indices, the FreeSolv identifiers, the SMILES strings, and the names of the molecules can be found in Table S1 in the Supporting Information. Next, the free-energy differences in vacuum/water were calculated for all 15 molecule pairs from RE-EDS simulations containing six end-states. Complementary single-topology pairwise TI calculations were performed using GROMACS. The relative hydration free energies ΔΔGhyd = ΔGwat – ΔGvac were calculated from the RE-EDS and TI calculations. They were compared to the relative hydration free energies ΔΔGhyd = ΔGhyd – ΔGhyd obtained from the experimental and calculated hydration free energies reported in the FreeSolv[64,65] database.

Set B: 28 Benzene Derivatives

To investigate the performance of RE-EDS for a larger number of end-states, the set of benzene derivatives was increased to 28 (Figure ). This extended test set serves as a proof of principle that RE-EDS can be used to calculate free-energy differences for systems with larger numbers of end-states. It is currently the largest set of end-states considered in an RE-EDS simulation. Previous studies involved systems with five,[21] six,[85] nine,[20] and ten[19,85] end-states. The free-energy differences were calculated for all 378 molecule pairs in vacuum/water from RE-EDS simulations containing 28 end-states. Analogous to set A, the relative hydration free energies from the RE-EDS calculation were compared to the hydration free energies reported in the FreeSolv[64,65] database.

Figure 3

Set B consists of 28 benzene derivatives, selected from the FreeSolv[64,65] database. Set B was further subdivided into subset Ba (B1–B14) and subset Bb (B1 along with B15–B28). The molecule indices, the FreeSolv identifiers, the SMILES strings, and the names of the molecules can be found in Table S2 in the Supporting Information. To further investigate the performance of RE-EDS for such a large set of end-states, set B was subdivided into two smaller subsets, Ba and Bb. Subset Ba consisted of molecules B1–B14, and subset Bb consisted of molecule B1 together with molecules B15–B28. For the two subsets, RE-EDS simulations were carried out in vacuum/water to calculate the relative hydration free energies for all end-state pairs in both sets. The relative hydration free energies between the molecule pairs j-k that were not in the same subset were calculated via molecule B1, which was present in both subsets, as

Simulation Details

The AMBER topologies were generated using AmberTools16[49] and the GAFF 1.7[30] force field with the mol2 and frcmod files provided in the FreeSolv[64,65] database as a starting point. The atomic charges were generated using the AM1-BCC[94,95] approach. The input files for the single-topology TI calculations in GROMACS were prepared with FESetup.[96] The input files for the GROMOS RE-EDS simulations were prepared using amber2gromos as well as the GROMOS++[67] programs pdb2g96, red_top, and prep_eds. The molecules were aligned for the RE-EDS simulations using the RDKit[97] (details in Figures S2 and S8 in the Supporting Information). Since the coordinates of all molecules are present separately in the system, the molecules are, in principle, able to drift away from each other during a simulation. To ensure that the molecules remain well-aligned during the whole simulation, atomic distance restraints were applied. The pairwise distance restraints were generated with RestraintMaker.[85]RestraintMaker chooses reference distances r0 between restrained atoms according to the input alignment. For some molecule pairs, the ring atoms did not perfectly overlap in the initial alignment. The reference distances assigned by RestraintMaker were manually set to 0 for those pairs. Four atoms of each molecule were restrained to four atoms of two other molecules, forming a chain of pairwise distance restraints. For set B with 28 molecules, the chain arrangement allowed for relatively large deviations between the molecules furthest apart in the chain. Therefore, additional distance restraints were manually added for four molecule pairs on opposite sides of the chain. The workflow to generate the input files for RE-EDS simulations with GAFF[30] parameters is illustrated in Figure .

Figure 4

Schematic illustration of the RE-EDS input file preparation. The input files (topology, perturbed topology, coordinates, and distance restraints) for the RE-EDS simulations in GROMOS were created from the frcmod and mol2 files of the FreeSolv[64,65] database. This workflow can easily be extended to also perform the molecule parametrization (i.e., to generate mol2 and frcmod files) using antechamber and parmchk.[29−31] The RE-EDS simulations were performed with a modified version of GROMOS[24,98] 1.5.0 and the open-source Python3[91]reeds module.[92] The TI simulations were performed in GROMACS[53,93] version 2016.6. For the simulations in water, the TIP3P water model[73] was used. A single cutoff radius of 1.2 nm was used for the calculation of the nonbonded interactions. The integration time step was set to 2 fs, and the pairlist was updated every five steps. Long-range nonbonded interactions were calculated in GROMOS using a reaction-field correction[87] with εRF = 1 for the simulations in vacuum and εRF = 78.5 for the simulations in water.[99,100] In GROMACS, the long-range nonbonded interactions were calculated using a plain cutoff in vacuum and smooth particle mesh Ewald (SPME)[101−103] in water with a grid spacing of 0.1 nm and an interpolation of order 6. To maintain the temperature at 298.15 K and the pressure at 0.06102 kJ mol–1 nm–3 (≈1 atm), a Berendsen thermostat and barostat[104] were used in GROMOS for the simulations in water. For the GROMACS TI calculations and the RE-EDS calculations in vacuum, the leapfrog stochastic dynamics integrator was used, so that no temperature scaling was necessary. In the TI calculations, the pressure in water was kept constant at 1.01325 bar (≈1 atm) using a Parrinello–Rahman barostat.[105] All bonds were constrained with the LINCS algorithm[106] (for the TI calculations, 12 order) or the SHAKE algorithm[107] (for the RE-EDS calculations, relative tolerance 10–4), respectively, and harmonic bond-angle bending was employed. In the RE-EDS simulations in GROMOS, the force constant for the distance restraints was set to 5000 kJ/(mol·nm2).[85] The calculated hydration free energies reported in the FreeSolv[64,65] database were obtained from alchemical MBAR[13] simulations in GROMACS with 20 λ-values, each 5 ns long. In the first five intermediate states, the electrostatic interactions were modified, and in the last 15 states, the Lennard-Jones terms were changed.[65] In this work, the relative hydration free energies for set A were obtained from pairwise TI simulations and from RE-EDS simulations in vacuum/water. For set B, they were determined from RE-EDS simulations in vacuum/water. Here, it is important to note that the three methods used different pathways of the thermodynamic cycle to calculate the relative hydration free energies. Considering Figure , the MBAR calculations[64,65] correspond to the yellow pathways, the TI calculations correspond to the pink pathways, and the multistate RE-EDS calculations correspond to the blue pathways. The input files for the RE-EDS simulations can be found at https://github.com/rinikerlab/reeds/tree/main/examples/systems/benzenes_amber2gromos.

Set A

To obtain the pairwise free-energy differences in vacuum/water with TI calculations in GROMACS, 21 λ-values were used in vacuum, and 27 λ-values were used in water. In vacuum, they were spread in steps of 0.05 between 0 and 1. In water, they were spread in steps of 0.05 between 0.1 and 0.9 and more densely in steps of 0.02 around the extreme values (i.e., between 0 and 0.1 and between 0.9 and 1). The values were spread more densely around the extreme values for the simulations in water to smooth discontinuities in the ⟨∂V/∂λ⟩-curves. Such discontinuities were mainly observed between 0.0 and 0.1 but also occurred between 0.9 and 1.0 for molecule pairs A1 - A2, A1 - A3, and A2 - A3 (Figure S3 in the Supporting Information). The masses of the end-state atoms were not perturbed but kept to the masses of the first end-state. After a short steepest-descent energy minimization with maximally 5000 steps, the systems were equilibrated for 0.5 ns. The free-energy differences were calculated from five independent production runs in vacuum/water. The production runs were 5 ns long at each λ-point. To generate relevant configurations of all end-states for the RE-EDS calculations, six EDS simulations in vacuum/water of 2 ns length were carried out at s = 1. The energy offsets were biased toward one of the end-states (molecules) in each of the simulations by setting one energy offset to 500 kJ mol–1 and the others to −500 kJ mol–1. The optimized configurations were used for the starting-state mixing (SSM) approach.[21] Simultaneously, 21 EDS simulations of 0.2 ns length with s-values logarithmically distributed between 1 and 10–5 were used to determine the lower bound of s for the RE-EDS simulations, which were set to 0.0178 in vacuum and 0.01 in water. RE-EDS simulations of 0.8 ns length were carried out to estimate the energy offsets with 11 replicas in vacuum and 12 replicas in water. Next, the s-distribution was optimized to achieve frequent round trips. Following the SSM approach,[21] the s-values were redistributed to include a replica with s = 1 and optimized initial coordinates for each end-state (in total 12 replicas in vacuum, 13 in water). In vacuum, four replicas were added after one s-optimization step of 0.5 ns length. In water, two s-optimization steps of 0.5 and 1.0 ns lengths, respectively, were used, adding in total eight replicas. To achieve good sampling of all end-states, the initial energy offsets in water were rebalanced over two 0.5 ns RE-EDS simulations. Finally, the free-energy differences were calculated from five independent production runs of 0.5 ns in vacuum (11 replicas) and water (16 replicas). For the production runs, the additional replicas with s = 1, which were added during the optimization phase, were removed again.

Set B

For set B, 28 EDS simulations of 2 ns length were carried out to generate optimized configurations. The setup for the lower bound search was analogous to set A, and the values were set to 0.01 in vacuum and 0.0056 in water. For the energy offset estimation, 34 replicas were used in vacuum, and 35 replicas were used in water with 0.8 ns length. The s-optimization steps were analogous to set A, adding four replicas in vacuum and eight replicas in water. For set B, rebalancing was also required in vacuum. Both in vacuum and in water, four rebalancing steps were used, of 0.5 ns length each. The production run was 1 ns long in vacuum (34 replicas) and 2 ns long in water (39 replicas). As for set A, the obtained free-energy differences were averaged over five independent production runs. To generate optimized coordinates, 14 EDS simulations were performed for subset Ba, and 15 EDS simulations were performed for subset Bb. The lower bounds for s were 0.01 for subset Ba in vacuum, 0.032 for subset Bb in vacuum, and 0.01 for both subsets in water. For the energy offset estimation, 20 (Ba, vacuum), 19 (Bb, vacuum), 20 (Ba, water), and 21 (Bb, water) replicas were used. For both subsets in vacuum, one s-optimization step was sufficient. For subset Ba in water, two s-optimization steps of 0.5 and 1.0 ns, respectively, were necessary, while for subset Bb in water, only one s-optimization step was needed. In vacuum, three energy offset rebalancing steps were carried out for both subsets. In water, four rebalancing steps were used for subset Ba, and three steps were used for subset Bb. Finally, the production runs were 1 ns long in vacuum (20 replicas for Ba, 19 replicas for Bb) and 2 ns long in water (24 replicas for Ba, 21 replicas for Bb). Again, the obtained free-energy differences were averaged over five independent production runs.

Analysis

The analysis of the simulations was carried out using GROMOS++[67] and PyGromosTools.[108] The following Python packages were used for visualization and analysis: Matplotlib,[109] mpmath,[110] NumPy,[111] Pandas,[112] SciPy,[113] and Seaborn.[114] For all sets/subsets, the root-mean-square error (RMSE), the mean absolute error (MAE), and the Spearman[115] correlation coefficient were calculated between the different simulation methods and the experimental values.

Results

Validation of amber2gromos

For the molecules of set A, the manual topology comparison showed that the topologies generated by amber2gromos were almost identical to the GROMACS topologies generated by ParmEd[57] (apart from differences in units, functional forms, and slight numerical differences). Apart from negligible numerical differences, the potential-energy terms of the 0 integration step in vacuum and water were identical for the simulations in GROMOS and in GROMACS. After this initial validation, simulations of 5 ns length were performed in vacuum. The distributions of different energy terms as well as the system temperature were compared. They were all qualitatively similar, with almost identical mean values (Figure and Figure S1 in the Supporting Information).

Figure 5

Potential-energy distributions of single-molecule simulations of set A based on 5 ns vacuum simulations in GROMOS[24] (orange bars) and GROMACS[53] (pink lines). The topologies for the simulations are based on the AMBER topologies taken from the FreeSolv[64,65] database. The GROMACS topologies were translated with ParmEd,[57] and the GROMOS topologies were converted with amber2gromos. Energies were written every 100 time steps (i.e., every 200 fs), and the first 1.25 ns of the simulations were discarded as equilibration.

Calculation of Relative Hydration Free Energies for Set A

For set A, the 15 pairwise free-energy differences in vacuum/water obtained from the RE-EDS calculations were first compared to the ones from the TI calculations in GROMACS. With an RMSE of 1.9 kJ mol–1 (vacuum) and 2.0 kJ mol–1 (water) and an MAE of 1.6 kJ mol–1 (vacuum) and 1.7 kJ mol–1 (water), the results agreed well. The Spearman correlation coefficient was 1.00 for the vacuum and the water simulations (Figure S4 in the Supporting Information). Next, the relative hydration free energies from the different sources (i.e., TI in GROMACS, RE-EDS in GROMOS, MBAR in GROMACS,[65] and experimental[64,65]) were compared. The results from all three simulation methods agreed well with the other calculated results, as well as with the experimental values (Figure and Table ). The RMSEs against the experimental results were 2.4 kJ mol–1 (TI), 2.4 kJ mol–1 (RE-EDS), and 3.1 kJ mol–1 (MBAR). The corresponding MAEs were 2.0 kJ mol–1 (TI), 2.1 kJ mol–1 (RE-EDS), and 2.7 kJ mol–1 (MBAR), respectively. The calculated relative hydration free energies also had a high correlation with the experimental results, with Spearman correlation coefficients ranging between 0.93 and 0.94. The full details are provided in Table S3 in the Supporting Information.

Figure 6

Table 1

Set A: Overview of Statistical Metrics (RMSE, MAE, and Spearman Correlation Coefficients) with Respect to the Experimental Results and Total Simulation Time for the Different Free-Energy Methodsa

	ΔΔG_hyd^MBAR(64,65)	ΔΔG_hyd^TI	ΔΔG_hyd^RE-EDS
RMSE [kJ mol^–1]	3.1 ± 0.4	2.4 ± 0.4	2.4 ± 0.3
MAE [kJ mol^–1]	2.7 ± 0.3	2.0 ± 0.4	2.1 ± 0.3
r_Spearman	0.94	0.93	0.94
t_preparation [ns]	18	360	101.3
t_production [ns]	600	3600	13.5

The RE-EDS and TI results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. The uncertainties of the RMSE and MAE values were estimated from the distribution of RMSE and MAE when a random selection of up to four molecules was removed from the calculations (5000 repetitions). The accumulated simulation time is split into preparation (pre-processing, equilibration) and production time. Table S3 in the Supporting Information contains the complete table.

Comparison of the relative hydration free energies of set A: RE-EDS (ΔΔGhydRE-EDS) versus MBAR (ΔΔGhydMBAR) (left) and RE-EDS versus experiment (ΔΔGhydexp) (right) as reported by the FreeSolv[64,65] database. The gray diagonal lines correspond to perfect alignment within ±4.184 kJ mol–1 (±1 kcal mol–1). The RE-EDS results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. The numerical values are provided in Table S3 in the Supporting Information. A plot of the deviations from experiment for the different methods is shown in Figure S7 in the Supporting Information. All pairwise comparisons between the different simulation methods and the experimental results are provided in Figure S5. The RE-EDS and TI results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. The uncertainties of the RMSE and MAE values were estimated from the distribution of RMSE and MAE when a random selection of up to four molecules was removed from the calculations (5000 repetitions). The accumulated simulation time is split into preparation (pre-processing, equilibration) and production time. Table S3 in the Supporting Information contains the complete table. While all three free-energy methods achieve comparable and accurate results, the RE-EDS calculations require by far the lowest accumulated simulation time. The total simulation time (preprocessing and production) for the RE-EDS calculations was about 115 ns. For the TI calculations, the total simulation time (equilibration and production) was 3960 ns. This could, of course, be reduced by calculating only the minimal number of required pairwise free-energy differences (i.e., N–1, which is five for set A). The production time could likely also be reduced to a total of 2–3 ns without affecting the convergence significantly. Both measures would reduce the total simulation time required for the TI calculations to about 600–840 ns. However, even then, the total simulation time would still be more than five times longer than for the RE-EDS calculations. The calculated values reported in FreeSolv[64,65] required 618 ns total simulation time (equilibration and production). Also here, the simulation time was chosen with convergence in mind and could probably be reduced. However, even with 2–3 ns production runs, the total simulation time would still be about 2–3 times the simulation time required by RE-EDS. Plots of the convergence of the free-energy calculations with RE-EDS as well as the λ-curves for the TI simulations can be found in Figures S6 and S3 in the Supporting Information.

Calculation of Relative Hydration Free Energies for Set B

For set B, we found again excellent agreement between the results obtained from the RE-EDS simulations and the calculated and experimental values reported in the FreeSolv[64,65] database (Figure and Table ). The RMSE against the experimental results was 2.6 kJ mol–1 for RE-EDS and 2.0 kJ mol–1 for MBAR.[64,65] The corresponding MAEs were 2.2 kJ mol–1 and 1.6 kJ mol–1, respectively. The correlation with experiment was high for both methods, with rSpearmanRE-EDS = 0.89 and rSpearmanMBAR = 0.92. The agreement between the two simulation methods was also good, with an RMSE of 1.0 kJ mol–1, an MAE of 0.7 kJ mol–1, and a Spearman correlation coefficient of 0.99. For RE-EDS, molecule pairs B1 - B4, B4 - B8, B4 - B9, B4 - B23, B4 - B25, B6 - B8, and B6 - B9 showed deviations above 5.5 kJ mol–1, the largest for molecule pair B4 - B9 with 6.7 kJ mol–1 (Figure S12 in the Supporting Information). For MBAR, molecule pairs B1 - B4, B1 - B6, B4 - B8, B4 - B25, B4 - B27, and B4 - B28 deviated by more than 4.3 kJ mol–1 from experiment. Here, the highest absolute deviation was observed for molecule pair B1 - B4 with 4.9 kJ mol–1. The Spearman correlation coefficient of the absolute deviations from experiment for the two simulation methods was relatively high at 0.88, indicating that some of the deviations might be related to shortcomings in the force field or in the experimental determination.

Figure 7

Table 2

Set B: Overview of Statistical Metrics (RMSE, MAE, and Spearman Correlation Coefficients) with Respect to the Experimental Results and Total Simulation Time for the Different Free-Energy Methodsa

	ΔΔG_hyd^MBAR(64,65)	ΔΔG_hyd^RE-EDS	ΔΔG_hyd^{RE-EDS,subsets}
RMSE [kJ mol^–1]	2.0 ± 0.2	2.6 ± 0.3	2.4 ± 0.3
MAE [kJ mol^–1]	1.6 ± 0.2	2.2 ± 0.2	2.0 ± 0.2
r_Spearman	0.92	0.89	0.90
t_preparation [ns]	84 ns	549 ns	514 ns
t_production [ns]	2800 ns	112 ns	129 ns

The RE-EDS and TI results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. For RE-EDS, both the results for the full set B and for the combined subsets Ba and Bb are reported. The uncertainties of the RMSE and MAE values were estimated from the distribution of RMSE and MAE when a random selection of up to 26 molecules was removed from the calculations (5000 repetitions). The accumulated simulation time is split into preparation (preprocessing, equilibration) and production time. Table S4 in the Supporting Information contains the complete table.

Comparison of the relative hydration free energies of set B: RE-EDS (ΔΔGhydRE-EDS) versus MBAR (ΔΔGhydMBAR) (left) and RE-EDS versus experiment (ΔΔGhydexp) (right) as reported by the FreeSolv[64,65] database. The gray diagonal lines correspond to perfect alignment within ±4.184 kJ mol–1 (±1 kcal mol–1). The ΔΔGhyd values are colored according to end-state i (i.e., the “reference molecule” for the calculation). The RE-EDS results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. The numerical values are provided in Table S4 in the Supporting Information. A plot of the deviations from experiment for the different methods is shown in Figure S12 in the Supporting Information. All pairwise comparisons between the different simulation methods and the experimental results are provided in Figure S9. The RE-EDS and TI results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. For RE-EDS, both the results for the full set B and for the combined subsets Ba and Bb are reported. The uncertainties of the RMSE and MAE values were estimated from the distribution of RMSE and MAE when a random selection of up to 26 molecules was removed from the calculations (5000 repetitions). The accumulated simulation time is split into preparation (preprocessing, equilibration) and production time. Table S4 in the Supporting Information contains the complete table. To investigate the efficiency and accuracy of RE-EDS free-energy calculations for a larger number of molecules, the RE-EDS results of set B were compared to the ones obtained from RE-EDS simulations of subsets Ba and Bb (Figure and Table ). With an RMSE against the experimental values[64,65] of 2.4 kJ mol–1, an MAE of 2.0 kJ mol–1, and a Spearman correlation coefficient of 0.90, the combined results from the two separate RE-EDS pipelines (i.e., Ba and Bb) were marginally more accurate. The agreement with the MBAR results[64,65] was slightly higher with an RMSE of 0.8 kJ mol–1 and an MAE of 0.5 kJ mol–1 (Figure S9 in the Supporting Information). The RMSE between the RE-EDS results for set B versus subsets Ba and Bb was 0.4 kJ mol–1 with perfect correlation (rSpearman = 1.00). The slightly higher agreement of the results obtained from the two subsets with the experimental and MBAR results indicates that there is a small “diffusion effect” for this system when more molecules are added to the simulation. As more end-states are added to a system, the number of frames where an end-state contributes maximally to the reference state decreases. Additionally, more s-values are required to obtain frequent round trips and more energy offset rebalancing iterations are needed to obtain approximately equal sampling of all the end-states.

Figure 8

Comparison of the relative hydration free energies with RE-EDS of subsets Ba and Bb (ΔΔGhydRE-EDS,subsets) compared to the full set B (ΔΔGhydRE-EDS) (left) and compared to experiment (ΔΔGhydexp) (right) as reported by the FreeSolv[64,65] database. The gray diagonal lines correspond to perfect alignment within ±4.184 kJ mol–1 (±1 kcal mol–1). The ΔΔGhyd values are colored according to end-state i (i.e., the “reference molecule” for the calculation). The RE-EDS results were averaged over five independent production runs in vacuum/water, and the errors of the ΔG values correspond to the standard deviation over the five repeats. The error estimate of the ΔΔG values was calculated via Gaussian error propagation. The numerical values are provided in Table S4 in the Supporting Information. A plot of the deviations from experiment for the different methods is shown in Figure S12 in the Supporting Information. All pairwise comparisons between the different simulation methods and the experimental results are provided in Figure S9. Also here, the relatively small simulation time required for the RE-EDS calculations can be highlighted. The total simulation time for the RE-EDS simulations of set B was about 661 ns, compared to 2884 ns for MBAR.[64,65] If pairwise TI simulations in vacuum/water had been used as for set A, the total simulation time would have been between 7128 ns (minimal 27 pairs = N–1) and 99627 ns (all pairs). The total simulation time for the RE-EDS pipelines of subsets Ba and Bb combined was about 643 ns. This is slightly shorter than the simulation length for the full set B, mainly due to the fact that for subset Bb in water, only four instead of eight replicas were added during the s-optimization, and one less rebalancing step was required for both subsets in vacuum and for Bb in water. Plots of the convergence of the free-energy calculations with RE-EDS can be found in Figures S10 and S11 in the Supporting Information.

Conclusion and Outlook

In this work, the GROMOS++ program amber2gromos was introduced to convert topologies from the AMBER to the GROMOS file format. An overview of the differences between AMBER and GROMOS force fields was presented together with a description of the conversion of the AMBER topology parameters to GROMOS topology parameters and the necessary slight modification to the GROMOS source code. A workflow was outlined to prepare topology, coordinate, and distance restraint input files for RE-EDS free-energy calculations with GAFF parameters in GROMOS. The extension of this workflow to the OpenFF family of force fields is straightforward. Two sets of benzene derivatives were selected from the FreeSolv database with six (set A) and 28 molecules (set B). Set A was used to validate the implementation of amber2gromos and the related source-code changes to the GROMOS MD engine. The generated GROMOS topologies for the six benzene derivatives were compared to GROMACS topologies generated by ParmEd from the same AMBER topologies. Single-molecule simulations in vacuum performed in GROMOS and in GROMACS showed nearly identical energy and temperature distributions. Finally, relative hydration free energies were calculated for both sets. For set A, both TI and RE-EDS simulations were carried out in vacuum/water to estimate the 15 pairwise free-energy differences. These results were compared to the relative hydration free energies obtained from experiment as well as the hydration free energies reported in the FreeSolv database (calculated with MBAR). Overall, an excellent agreement was observed between the different free-energy methods and with experiment. While all methods delivered highly accurate results, the RE-EDS calculations required the least amount of total simulation time. The system size was increased to 28 molecules in set B to challenge the RE-EDS pipeline. Again, the results agreed well with the ones from MBAR and with the experimental values. To test if it is more efficient to use a large set of molecules or two subsets with a shared molecule, set B was divided into two subsets: Ba (molecules B1–B14) and Bb (molecule B1 and molecules B15–B28). While both the results and simulation time of the two RE-EDS approaches were almost identical, smaller subsets may offer some advantages in practice. RE-EDS simulations are, in principle, highly parallelizable, as large parts of both the replicas and the interactions within the replicas can be carried out independently with relatively infrequent communication. Nevertheless, as more molecules/replicas are added to the system, the wall-clock time of the simulations increases (more interactions to calculate, larger communication overhead, more replicas). Using two subsets decreased the elapsed real-time of the RE-EDS pipeline. Further research will be needed to determine optimal splits of data sets into subsets as well as the choice of the common molecule(s). The aim will be to find a balance between avoiding a diffusion effect from too many end-states in one system and error propagation due to too small subsets or suboptimal common molecule(s). Overall, it has been shown that hydration free-energy calculations with RE-EDS and GAFF parameters executed in GROMOS accurately reproduce both experimental values and results obtained with different free-energy estimators and MD engines. While the molecules of the chosen data sets were relatively small and contained a well-defined common benzene core, previous studies successfully used RE-EDS to calculate binding and hydration free energies for molecule sets involving larger structural changes such as R-group modifications, ring opening/closing, and ring size changes. In future work, GAFF parametrized topologies will be used to perform binding free-energy calculations with RE-EDS in GROMOS.

Data and Software Availability

The input files for the RE-EDS simulations can be found at https://github.com/rinikerlab/reeds/tree/main/examples/systems/benzenes_amber2gromos. The GROMOS software package and the GROMOS++ package of programs can be downloaded for free at http://gromos.net/, AmberTools can be downloaded at https://ambermd.org/AmberTools.php, and GROMACS can be downloaded at http://www.gromacs.org/. The Python code for PyGromosTools and the RE-EDS pipeline are freely available at https://github.com/rinikerlab. amber2gromos will be part of the next scheduled release of GROMOS.

65 in total

1. SwissParam: a fast force field generation tool for small organic molecules.

Authors: Vincent Zoete; Michel A Cuendet; Aurélien Grosdidier; Olivier Michielin
Journal: J Comput Chem Date: 2011-05-03 Impact factor: 3.376

Review 2. Force fields for classical molecular dynamics.

Authors: Luca Monticelli; D Peter Tieleman
Journal: Methods Mol Biol Date: 2013

3. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set.

Authors: Bruno A C Horta; Pascal T Merz; Patrick F J Fuchs; Jozica Dolenc; Sereina Riniker; Philippe H Hünenberger
Journal: J Chem Theory Comput Date: 2016-07-08 Impact factor: 6.006

4. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery.

Authors: Tai-Sung Lee; Bryce K Allen; Timothy J Giese; Zhenyu Guo; Pengfei Li; Charles Lin; T Dwight McGee; David A Pearlman; Brian K Radak; Yujun Tao; Hsu-Chun Tsai; Huafeng Xu; Woody Sherman; Darrin M York
Journal: J Chem Inf Model Date: 2020-09-16 Impact factor: 4.956

5. Automation of the CHARMM General Force Field (CGenFF) II: assignment of bonded parameters and partial atomic charges.

Authors: K Vanommeslaeghe; E Prabhu Raman; A D MacKerell
Journal: J Chem Inf Model Date: 2012-11-28 Impact factor: 4.956

6. FreeSolv: a database of experimental and calculated hydration free energies, with input files.

Authors: David L Mobley; J Peter Guthrie
Journal: J Comput Aided Mol Des Date: 2014-06-14 Impact factor: 3.686

7. Multi-Site λ-dynamics for simulated Structure-Activity Relationship studies.

Authors: Jennifer L Knight; Charles L Brooks
Journal: J Chem Theory Comput Date: 2011-09-13 Impact factor: 6.006

8. Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database.

Authors: Guilherme Duarte Ramos Matos; Daisy Y Kyu; Hannes H Loeffler; John D Chodera; Michael R Shirts; David L Mobley
Journal: J Chem Eng Data Date: 2017-04-24 Impact factor: 2.694

9. Optimized Lennard-Jones Parameters for Druglike Small Molecules.

Authors: Eliot Boulanger; Lei Huang; Chetan Rupakheti; Alexander D MacKerell; Benoît Roux
Journal: J Chem Theory Comput Date: 2018-05-07 Impact factor: 6.006

Review 10. SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors: Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal: Nat Methods Date: 2020-02-03 Impact factor: 28.547