Literature DB >> 24803863

Multiscale Free Energy Simulations: An Efficient Method for Connecting Classical MD Simulations to QM or QM/MM Free Energies Using Non-Boltzmann Bennett Reweighting Schemes.

Gerhard König¹, Phillip S Hudson², Stefan Boresch³, H Lee Woodcock².

Abstract

THE RELIABILITY OF FREE ENERGY SIMULATIONS (FES) IS LIMITED BY TWO FACTORS: (a) the need for correct sampling and (b) the accuracy of the computational method employed. Classical methods (e.g., force fields) are typically used for FES and present a myriad of challenges, with parametrization being a principle one. On the other hand, parameter-free quantum mechanical (QM) methods tend to be too computationally expensive for adequate sampling. One widely used approach is a combination of methods, where the free energy difference between the two end states is computed by, e.g., molecular mechanics (MM), and the end states are corrected by more accurate methods, such as QM or hybrid QM/MM techniques. Here we report two new approaches that significantly improve the aforementioned scheme; with a focus on how to compute corrections between, e.g., the MM and the more accurate QM calculations. First, a molecular dynamics trajectory that properly samples relevant conformational degrees of freedom is generated. Next, potential energies of each trajectory frame are generated with a QM or QM/MM Hamiltonian. Free energy differences are then calculated based on the QM or QM/MM energies using either a non-Boltzmann Bennett approach (QM-NBB) or non-Boltzmann free energy perturbation (NB-FEP). Both approaches are applied to calculate relative and absolute solvation free energies in explicit and implicit solvent environments. Solvation free energy differences (relative and absolute) between ethane and methanol in explicit solvent are used as the initial test case for QM-NBB. Next, implicit solvent methods are employed in conjunction with both QM-NBB and NB-FEP to compute absolute solvation free energies for 21 compounds. These compounds range from small molecules such as ethane and methanol to fairly large, flexible solutes, such as triacetyl glycerol. Several technical aspects were investigated. Ultimately some best practices are suggested for improving methods that seek to connect MM to QM (or QM/MM) levels of theory in FES.

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 24803863 PMCID： PMC3985817 DOI： 10.1021/ct401118k

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Free energy simulations (FES) have become an indispensable tool in biophysics. Application of FES has become amazingly broad and includes free energy calculations of ligand binding,[1−6] solvation,[7,8] protein mutation,[9,10] pKa,[11−15] redox potentials,[16,17] and more. Although the application of FES has become more routine, there are still two fundamental prerequisites: the accurate description of inter- and intramolecular interactions, and adequate sampling of all relevant conformational degrees of freedom.[18,19] For example, the recent SAMPL blind prediction competitions[20−28] have highlighted the need for proper treatment of polarization, high quality charges, and bonded parameters, as well as the necessity of extensive conformational sampling. One logical approach to improving the treatment of inter- and intramolecular interactions in FES is to move beyond molecular mechanical (MM) methods; in fact, the application of quantum mechanical (QM) and QM/MM techniques in FES is now the source of intense interest.[29−42] However, the requirement for adequate conformational sampling presents a major challenge. Typically, free energy differences between two states are computed via a series of molecular dynamics (MD) or Monte Carlo (MC) simulations, each consisting of typically 105–107 energy and force calculations. While this level of computational effort is routine in MM simulations, it quickly becomes prohibitive if ab initio or density functional theory (DFT) is required. Semiempirical QM (SQM) or empirical valence bond (EVB) approaches can reduce the computational effort;[43−50] however, these come with well-known accuracy limitations.[51−54] [For the remainder of this manuscript, we will use the following abbreviations: QM/MM refers to ab initio or DFT (e.g., HF, DFT, MP2, etc.) based Hamiltonians coupled to MM; SQM/MM indicates semiempirical QM (e.g., SCC-DFTB, AM1, PM3, etc.) based Hamiltonians coupled to MM; and (S)QM/MM denotes the use of either semiempirical or EVB methods. For convenience, we subsume all of the these approaches as “QM” since the FES methods described herein are independent of Hamiltonian.] Highlighting these weaknesses is an active resurgence in the development of improved SQM potentials,[51,55−57] but higher level ab initio/DFT methods remain essential where accurate results are desired and/or unique chemical environments encountered.[58−61] So-called indirect schemes for FES employing QM Hamiltonians remove some of the contradictory requirements of accuracy and sufficient sampling. This technique was largely pioneered by Gao and co-workers and Warshel and co-workers,[43−47] with generalizations and extensions made by numerous others.[48,62−70] The basic premise behind indirect FES is the use of a thermodynamic cycle to calculate the free energy between two states, 0 and 1, at a high level of theory in three steps: i.e., Here, high denotes the use of an accurate method, typically QM or QM/MM, which in general is too expensive for large scale MD simulations. The label low, on the other hand, denotes a level of theory at which MD simulations can be carried out easily; this could be plain MM or SQM/MM. Thus, expensive QM calculations are only required in steps (i) and (iii) of the thermodynamic cycle, whereas the transformation (ii) 0low → 1low is carried out at, e.g., the MM level. Aside from lowering the computational cost, this makes it possible to use specialized techniques, such as soft-core potentials, to avoid the so-called van der Waals end point problem.[71,72] Indirect approaches reduce the computational complexity of QM FES significantly since in practice no QM MD simulations are carried out at all. The free energy differences ΔA(0low → 0high) and ΔA(1low → 1high) are typically computed by free energy perturbation (FEP, also known as Thermodynamic Perturbation or Zwanzig’s exponential formula),[73] using only a subset of configurations at the MM end states, for which the QM energy is computed. However, if the potential energy surfaces of the MM and the QM description of states 0 and 1 are significantly different, configurations sampled by MM will not be representative of those at QM. In such situations FEP does not converge and the free energy differences for steps (i) and (iii) above will be inaccurate and “noisy”. Yang and co-workers,[74] as well as Rod and Ryde,[64,65] circumvented the problem by “freezing” the QM region [QM region denotes those atoms which at the high level of theory are computed by ab initio or DFT] during the pure MM calculations; these fixed atoms interact with the MM atoms through assigned electrostatic potential (ESP) derived point charges. The dual-level strategy for QM/MM calculations by Moliner, Tuñón, et al. provides an alternative to frozen QM regions; however, while the method was used in computations of kinetic isotope effects, it has not been used in “traditional” QM/MM free energy simulations.[75,76] Warshel and co-workers have criticized the use of frozen QM regions for quite some time;[77,78] clearly, any entropic contributions from that region will not be accounted for. To overcome the large differences between the MM and QM potential energy surfaces, Warshel and co-workers replace the MM description with an EVB one that is specifically parametrized to reproduce the QM target states.[42,77] Recently, Heimdal and Ryde also pointed out the limitations of frozen QM regions.[79] However, despite using either specially parametrized force fields or a SQM/MM description of the quantum region, the FEP steps connecting the low and high levels of theory (i, iii) converged poorly, or at least very slowly, when a flexible QM region was employed. Recent work by Essex and co-workers avoided convergence problems of FEP by inserting differences in interaction energy rather than total energy differences into the FEP formula.[80,81] In classical FES it has been known for quite some time[82−85] that FEP is much less efficient than Bennett’s acceptance ratio (BAR) method.[86] Specifically, BAR can often be used to compute a free energy difference in a single step where other methods, such as FEP or thermodynamic integration (TI),[87] require intermediate steps.[88] FEP and BAR are examples of what has been referred to as one- and two-sided methods to compute free energy differences, respectively.[89] Two-sided methods require simulations at both end points; i.e., in the present context simulations with both the MM and the QM Hamiltonian. Since calculations with the latter are prohibitively expensive, BAR so far seems not to be used in QM FES. In some cases, Warshel and co-workers employed another two-sided method, linear response approximation.[41,42,90] It follows from the overview just given that an ideal (indirect) QM FES employs a flexible QM region and computes the free energy differences for steps (i) and (iii) above accurately and precisely by employing a BAR-like approach. One possible way to avoid the costs of directly simulating the QM end states, as required for BAR, is to generate them “virtually” from MM or SQM/MM simulations. Recently, it was demonstrated how to conduct such calculations by regarding low level simulations as a special case of high level simulations in the presence of an unusual biasing potential. By accounting for those biasing potentials, it is possible to obtain free energy differences between the virtual high level end states; an approach referred to as Non-Boltzmann-Bennett (NBB).[91] In ref (91), the utility of this approach was illustrated for two classical implicit solvent models with significantly different computational costs, saving about a factor of 10 in computer time compared to using the more accurate, but expensive, model throughout. Herein, we describe how NBB can be combined with an indirect QM FES approach without actually having to carry out expensive QM simulations. As in standard indirect schemes, it is enough to recompute energies of selected configurations sampled with a low level of theory, e.g., MM, at the desired QM level. Since this is a postprocessing step and individual configurations are independent of each other, these calculations are embarrassingly parallel. In the current work, technical details of this new approach (i.e., QM-NBB) are presented with particular emphasis on methodological and technical aspects of FES methods that connect MM and QM levels of theory. In addition to utilizing NBB in indirect QM FES schemes, we also extend FEP to incorporate unusual biasing potentials; this results in a formulation of FEP better suited to connecting two levels of theory (NB-FEP). QM-NBB is then applied to a number of test cases in explicit and implicit solvent with both absolute and relative solvation free energy differences computed. The solutes studied range from frequently used model compounds, such as ethane and methanol, to flexible molecules of up to 30 atoms, e.g., bis-2-chloroethylether and triacetyl glycerol. Further, QM-NBB is compared critically to both the standard FEP based indirect scheme and the newly developed NB-FEP approach. A first large-scale application of the methodology described here to the blind hydration free energy test set of the SAMPL4 competition is reported elsewhere, leading to very good results (König et al., Predicting hydration free energies with a hybrid QM/MM approach: An evaluation of implicit and explicit solvation models in SAMPL4, submitted to J. Comput. Aided. Mol. Des.).

Theory

Standard Methods To Compute Free Energy Differences

Given two states 0 and 1, the free energy difference between them can be computed according toHere kB is Boltzmann’s constant, T the temperature, and U0 and U1 are the potential energies of coordinates evaluated for states 0 and 1, respectively. The angular brackets ⟨⟩0 denote an ensemble average obtained for state 0, i.e., averaging over frames in a trajectory generated in a simulation corresponding to state 0. Equation 2 forms the basis of FEP or thermodynamic perturbation and is commonly attributed to Zwanzig,[92] although the method can be traced back far earlier.[89,93] Over the past decade, an extension to FEP suggested originally by Bennett[86] and, hence, usually referred to as Bennett’s acceptance ratio method (BAR) has become increasingly popular.[82−85,89,94] In contrast to FEP, one needs simulations at both states to compute ΔA(0 → 1).where f(x) denotes the Fermi function f(x) = (1 + exp(x/(kBT)))−1 andHere, Q0 and Q1 are the canonical partition functions of the two states and N0 and N1 are the number of data points used to compute the ensemble averages for states 0 and 1, respectively. Equation 3 is iterated until the conditionis fulfilled. With C determined in this manner, one immediately obtains As already mentioned in the Introduction, BAR has been shown to be much more efficient as compared to FEP for classical FES; i.e., fewer intermediate states are needed to compute a free energy difference of interest.[82−85,89] The need for simulations of both end states, however, has so far prevented the use of BAR to connect MM and QM calculations as MD simulations of sufficient length at the QM end state are too expensive. In classical FES, there is a third, widely used method, thermodynamic integration (TI);[87] however, in connection with QM it is less frequently used and thus not a focus of the current work.[33]

Unusual Biasing Potentials

Recently, it was shown that results obtained with a relatively cheap (i.e., low computational effort) potential energy function, Ulow, can be viewed as higher quality results (i.e., more computationally demanding), Uhigh, in the presence of a biasing potential; Vb = Ulow – Uhigh.[91] Classical simulations are the most common basis for sampling conformational space, hence Ulow = UMM, whereas an improved target for FES results would be QM potentials; Uhigh = UQM. Thus, in the context of QM FES we consider an MM simulation in the presence of the following biasing potential: Torrie and Valleau showed how to obtain an unbiased ensemble average ⟨X⟩ of some property X from simulations of a biased state:[95]where β has the usual meaning of 1/kBT and we use the notation ⟨⟩b to indicate that the ensemble averages on the right-hand side of eq 8 are evaluated from simulations of the biased state.

“Non-Boltzmann” Free Energy Methods

Non-Boltzmann Bennett

Applying eq 8 to eq 3 (i.e., BAR) leads to what we call Non-Boltzmann Bennett (NBB).[91] In the context of classical biasing potentials (e.g., accelerated molecular dynamics), this method has also been referred to as weighted BAR.[96] Reference (91) contains several successful examples of free energy simulations based on biased states (i.e., U = Ubiased). In the current work, the computationally cheap MM potential energy function (Ulow = UMM) is used for the exploration of phase space with the QM energy being the ultimate target (Uhigh = UQM). Thus, regular classical MD simulations are followed by an analysis of the trajectories at the more exact but computationally demanding QM level of theory. This leads to the equivalent of eq 3 for NBB:The notation follows that of eq 3; however, the additional subscript b indicates that the ensemble averages were obtained in the presence of an unusual biasing potential (eq 7). To use eq 9 it is necessary to evaluate three quantities for each frame of the trajectories: U0, U1, and V0b for state 0 and U0, U1, and V1b for state 1. Note that U0 and U1 denote the energies without the biasing potential, i.e., both the MM and QM energies of interest. The workflow for NBB is illustrated in Figure 1. In the first step, an MD simulation is conducted for each of the end states, saving coordinates to trajectories at regular intervals. For each frame of the trajectory, state i, the potential energies are evaluated using both MM and QM. The difference between these energies is the biasing potential Vb (i = 0,1) and is required for NBB. Calculating Vb at every step of the simulation would obviously be cost prohibitive; however, since typically only every hundredth or thousandth MD step is saved, the expensive QM calculations are needed for only a small fraction of the total simulation steps; this greatly reduces the computational cost. In fact, ideally the saving frequency should be long enough to ensure that consecutive data points are statistically independent. In the remainder of this paper, we refer to the use of NBB to connect MM and QM energy surfaces as “QM-NBB” or as reweighting from MM to QM.

Figure 1

Workflow for using non-Boltzmann Bennett in the hybrid QM/MM free energy simulation approach.

Non-Boltzmann FEP

In this study we also tested the utility of reweighting from MM to QM based on FEP, which in analogy to NBB we refer to as NB-FEP. Applying eq 8 to eq 2 givesAs in the case for QM-NBB, U0 and U1 are the re-evaluated energies without the biasing potentials; i.e., in the present case the QM energies. Note that the idea of carrying out, e.g., FEP based on simulations in the presence of a biasing potential is hardly new. It was first described by Straatsma and McCammon in 1994[97] and has been gradually rediscovered recently.[91,98] However, in most cases the purpose of the biasing potential was to overcome barriers; the only applications of unusual biasing potentials as employed in this work we are aware of are ref (91) and, to some extent, ref (3).

Practical Aspects

Implicit Solvent QM Calculations

Calculating solvation free energies using an implicit solvent (IS) model is a special case. The free energy difference between the solute in the gas phase and IS can be computed in one step (i.e., no intermediate states), in particular, if BAR is used.[88,99] Thus eq 9 can be used directly. States 0 and 1 represent gas phase and IS, respectively, with separate simulations (e.g., MM and MM/IS) in addition to re-evaluated QM and QM/IS energies required for both. This leads to the generic symbols in eqs 9 and 10 having the following concrete meaning: U0 = UQM, U1 = UQM/IS, and the biasing potentials are V0 = UMM – UQM, V1 = UMM/ – UQM/IS. In all IS calculations presented here, all atoms are treated by either MM or QM (i.e., MM/IS, QM/IS).

Reweighting in Indirect QM FES

In practice, FES usually require intermediate λ states; i.e., one needs to carry out simulations at λ = λ0, λ1, ..., λ, λ, with λ0 = 0 and λ = 1. Since the free energy is a state function, one does not have to reweight every λ-state; it is sufficient to carry out the reweighting only at the end states. This is the basis of the indirect approach to QM FES; Scheme 1. The corresponding indirect QM-NBB scheme can be seen in Figure 2. We outline the approach for end state 0. For the analogous steps at state 1 replace λ0 by λ and λ1 by λ. The MM simulation at, e.g., λ0 is considered a QM simulation in the presence of the biasing potential Vb = UMM – UQM. Together with an MM simulation at λ1, NBB is used to compute the free energy difference ΔA(λ0, QM/MM → λ1, MM). This should be contrasted with the usual FEP based indirect scheme where ΔA(λ0, MM → λ1, MM) would be computed by some method and combined with ΔA(λ0, MM → λ0, QM/MM).

Scheme 1

Typical Thermodynamic Cycle Used in Indirect Free Energy Calculations

Figure 2

Illustration of the QM-NBB scheme applied to indirect alchemical FES (i.e., reweighting only the end states). Gray nodes represent simulated states, and white nodes are virtual states that are generated through reweighting (thin arrows). Except for the first and the last free energy step, all free energy calculations are performed with regular BAR (eq 3); i.e., without reweighting. The first and the last free energy calculation use NBB to calculate the free energy difference between a virtual QM state and a simulated MM state.

Methods

Calculations in Explicit Solvent

All explicit solvent simulations were conducted with CHARMM,[100,101] using the CHARMM22[102] force field. The QM and QM/MM calculations were performed with Q-Chem[103] based on the Q-Chem/CHARMM interface.[104]

Calculation of Alchemical Intermediate States with QM/MM

Many applications of FES involve alchemical mutations of one molecule to another; e.g., to compare binding affinities of two ligands to a particular target or calculate relative solvation free energy differences. Such alchemical mutations are usually realized by mixing the potential energy functions (U) of both end states 0 and 1 as a function of the coupling parameter λ (e.g., Uλ = (1 – λ)U0 + λU1) to form artificial intermediate states between the two end states. In order to evaluate the potential energies U0 and U1, the coordinates of the atoms of both end states have to be defined. This can be achieved in two possible ways. The first strategy involves a single topology setup,[105,106] where the bonded and nonbonded parameters of atoms are changed according to λ (e.g., in the mutation of ethane to methanol where the C–C bond length is slowly changed to the C–O bond length). While this approach is viable in MM, it is more problematic for QM or QM/MM approaches. The second approach, dual topology,[105,106] is illustrated for the mutation of ethane to methanol in Figure 3. It involves a hybrid molecule that contains three sets of coordinates: (a) the common environment (atoms that are the same in both end states, e.g., for ethane–methanol the first methyl group shown in black); (b) atoms that exist only in the initial state 0 (e.g., the second methyl group in ethane, shown in blue); and (c) atoms that exist only in the final state 1 (e.g., the hydroxyl group in methanol, shown in red). It is important to ensure that there are no interactions between groups b and c. This approach is easy to implement in QM calculations, as the potential energy evaluations are only performed using coordinates of the pure end states (i.e., to produce the initial state 0, groups a and b are used, while for the final state 1 groups a and c are required).

Figure 3

Dual topology setup of a mutation from ethane to methanol. Starting from the hybrid molecule (middle), it is possible to calculate the potential energy of both end states by ignoring all atoms corresponding to the other end state. The system is divided into three groups: The common environment that is present in both end states (black); atoms that only exist in the ethane initial state (blue); and atoms that only exist in the methanol final state (red). The last two groups do not interact with each other.

Ethane–Methanol

Solvation free energy differences between ethane and methanol were calculated using the standard thermodynamic cycle.[107] The dual topology hybrid scheme was implemented using the MSCALE module[108] of CHARMM and follows the recommendations by Boresch and Karplus.[106,109] For the simulations, each energy evaluation was divided into three tasks: calculate energetic contributions of all (a) bond, angle, and Urey–Bradley terms from the full hybrid molecule, UcommonbondsMM (this was done in the “main” MSCALE process to maintain the connectivity of the hybrid molecule); (b) dihedral angle and nonbonded contributions corresponding to state 0, U0MM (i.e., all atoms that are not part of ethane or the common environment were deleted); and (c) dihedral angle and nonbonded contributions corresponding to state 1, U1MM (i.e., all atoms that are not part of methanol or the common environment were deleted). The λ states were generated by mixing those three energy contributions according to UλMM = UcommonbondsMM + (1 – λ)U0MM + λU1MM. To calculate the biasing potential, Vb, individual potential energies were calculated with Q-Chem and CHARMM based on input files generated by the Q-Chem/CHARMM interface.[104] B3LYP/6-31G* was used to describe the solute in both gas phase and explicit solvent QM/MM calculations (solvent was treated classically with the TIP3P water model). Each frame of the trajectory was calculated as follows: (a) by removing all atoms not corresponding to the initial state 0 (ethane) and calculating the potential energy, U0QM, and (b) by removing all atoms not corresponding to the final state 1 (methanol) and calculating the potential energy, U1QM. To calculate the potential energy of λ states, the two terms were mixed according to UλQM = (1 – λ)U0QM + λU1QM. Of course it is not necessary to compute the terms that are multiplied with zero at the corresponding end state. Unfortunately, at the time those calculations were conducted, periodic boundary conditions (PBC) and Particle Mesh Ewald (PME) calculations were not supported by Q-Chem. Therefore, the approach to calculate UλQM as outlined in the last paragraph could only be used in the gas phase. Instead, the explicit solvent QM/MM calculations in Q-Chem used a single box of water molecules that were centered around the solute for each frame of the trajectory (since no PBCs were used, we refer to this energy as UnopbcQM/MM). To make the calculation of Vb possible, we also performed MM calculations with CHARMM in exactly the same setup (i.e., without PBC and water molecules centered around the solute, using a cutoff of 999 Å; UnopbcMM). Thus, Vb = UnopbcMM – UnopbcQM/MM for the solvent trajectories. To include the effects from periodic boundary conditions in the FES, each U in the QM-NBB calculations consisted of UQM/MM = UwithpbcMM – Vb, where UwithpbcMM is the potential energy of the hybrid molecule as used in the simulation (i.e., with PME using CHARMM). This assumes that long-range polarization effects are minimal. To generate potential energies for each λ state, UQM/MM was evaluated once for the initial state 0 (U0QM/MM) and once for the final state 1 (U1QM/MM), leading to UλQM/MM = (1 – λ)U0QM/MM + λU1QM/MM for simulations in solution. Correspondingly, the biasing potential for each lambda state Vλb is given by Vλb = (1 – λ)Unopbc0MM + λUnopbc1MM – (1 – λ)Unopbc0QM/MM – λUnopbc1QM/MM, where the indices 0 and 1 indicate which end state is used. Gas phase simulations were conducted with Langevin dynamics, using a friction coefficient of 5 ps–1 on all atoms and random forces according to a target temperature of 300 K. In solution, we used 862 water molecules and an octahedral box that was cut from a cube with a side length of 32.168 Å. The temperature was maintained at about 300 K by a Nosé-Hoover thermostat.[110] Lennard-Jones interactions were switched off between 10 and 12 Å, while electrostatic interactions were computed with the PME method.[111] Three different time steps were evaluated: 0.5, 1, and 2 fs. For the last two time steps (1 and 2 fs), we also compared simulations with and without SHAKE on all hydrogen atoms. In the gas phase, the cutoff radius was set to 998 Å. Free energy differences were calculated based on simulations of 5 ns in gas phase and 1 ns in solution. Trajectories were written every 100 steps in the gas phase and every 20 steps in solution. For the free energy calculations, 5 λ points were employed in the gas phase (0.00, 0.25, 0.50, 0.75, and 1.00) and 11 in solution (0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0). The standard deviations of the free energy results were determined by repeating each simulation four times, starting with different initial random velocities.

Absolute Solvation Free Energies

The classical absolute solvation free energies of ethane and methanol were calculated by turning off all nonbonded interactions of the solute in both gas phase and solution. Since turning off both inter- and intramolecular interactions at the QM level is nontrivial, the indirect FES approach was employed. The alchemical mutation was done in two steps: first, all charges of the solute were set to zero using 6 λ states in gas phase (λ = 0.00, 0.05, 0.15, 0.40, 0.80, and 1.00) and 12 in solution (λ = 0.00, 0.05, 0.10, 0.20, ..., 0.90, and 1.00). QM/MM potential energy calculations were carried out only at λ = 0.00 and λ = 0.05 of the uncharging step. In the second step, all Lennard-Jones interactions of the solute were set to zero, using 7 λ states in gas phase (λ = 0.00,0.15, 0.35, 0.65, 0.80, 0.90, and 1.00) and 13 in solution (λ = 0.00, 0.05, 0.10, 0.20, ..., 0.90, 0.95, and 1.00). Soft core potentials, as implemented in the PERT module of CHARMM, were employed to avoid the end point problem. Free energy differences were calculated based on simulations of 50 ns in gas phase and 0.5 ns in solution with coordinates saved every 1000 steps and 20 steps, respectively. Simulations were repeated in triplicate with different random seeds to compute standard deviations. Gas phase simulations were conducted with Langevin dynamics, using a friction coefficient of 5 ps–1 on all atoms and random forces according to a target temperature of 300 K. In solution, we used 1492 water molecules and an octahedral box that was cut from a cube with a side length of 38.604 Å. The temperature was maintained at about 300 K by a Nosé-Hoover thermostat. Lennard-Jones interactions were switched off between 10 and 12 Å, while electrostatic interactions were computed with PME. Both molecules were equilibrated for 0.1 ns using constant pressure and, prior to production, each λ point was further equilibrated for 0.1 ns using constant volume.

Calculations in Implicit Solvent (IS)

The reweighting formalism described in Sections 2.2 and 2.4 can be directly applied to solvation free energy calculations employing IS models. Analogous to previous work,[91] we carry out classical gas phase and GBMV[112] IS simulations. Subsequently, the solvation free energy differences were reweighted via NBB to approximate results obtained using DFT (e.g., M06-2X,[113] B3LYP[114,115]) and QM/IS models (e.g., SMD,[116,117] SM8,[103,118] SM12[103,119]). To demonstrate the applicability of QM-NBB, we chose model compounds from two groups. First, a set of amino acid side chain analogues that are neutral at pH = 7 were chosen; these have been established as a gauge for the accuracy and efficiency of free energy simulations.[7,120,121] Second, in a study combining BAR with implicit solvent models Mobley and co-workers pointed out a number of compounds where contributions from solute entropy were expected to be significant;[99] this list included methyl formate, 2-methoxy phenol, bis-2-chloroethylether, 1-octanol, phenyl trifluoroethyl ether, and triacetyl glycerol, which we, therefore, also include in our set. The MM simulations on which the reweighting was based employed the CHARMM36 generalized (CGenFF) and protein force fields.[122,123]

Static QM/IS Calculations

In addition to solvation free energy differences obtained from MD simulations, we also computed solvation free energy differences based on single coordinate frames. The starting geometry of each solute, taken from the Supporting Information of ref (99), was minimized at the M06-2X/6-31G* level of theory using GAMESS-US[113−115,117] with and without the SMD IS model.[116] ΔAsolv is the difference between the minimized energy in the gas phase and in the presence of the SMD model. These data are referred to as “static” SMD results, labeled just “SMD” in Table 5.

Table 5

Simulation Details for and Results of Absolute Solvation Free Energy Difference Calculations Based on Implicit Solvent Modelsa

compound	atomsb	no. ptsc	Δt, psd	exp.e	GBMVf	SMDg	SMD,NBBh	SMD,FEP^tradi	NB-FEP,fwj	NB-FEP,bwk
methane	5/1	5000	20	1.99	1.35	2.23	2.17 ± 0.03	2.16 ± 0.07	2.16	–2.17
ethane	8/2	15 000	20	1.83	1.33	1.83	1.76 ± 0.03	1.74 ± 0.11	1.77	–1.75
propane	11/3	5000	20	1.96	1.37	1.91	1.88 ± 0.01	1.61 ± 0.27	1.88	–1.88
i-butane	14/4	5000	20	2.32	1.47	2.22	2.21 ± 0.01	2.27 ± 0.36	2.20	–2.21
n-butane	14/4	10 000	40	2.07	1.52	2.11	2.09 ± 0.03	1.87 ± 0.28	2.05	–2.12
methanol	6/2	15 000	20	–5.10	–5.27	–3.88	–4.00 ± 0.09	–3.98 ± 0.31	–4.06	3.92
ethanol	9/3	5000	20	–5.00	–4.96	–3.60	–3.85 ± 0.06	–5.20 ± 0.90	–3.71	3.97
methanethiol	6/2	5000	20	–1.24	–0.29	–0.88	–0.78 ± 0.08	–1.16 ± 0.26	–0.76	0.79
ethyl-methylsulfide	12/4	5000	20	–1.50	1.09	–0.33	–0.30 ± 0.09	0.08 ± 0.26	–0.36	0.26
methylformate	8/4	12 500	40	–2.78	–6.39	–1.62	–1.67 ± 0.07	–1.78 ± 1.59	–1.63	1.67
2-methoxyphenol	17/9	5000	50	–5.57	–4.42	–1.06	–3.33 ± 0.13	–2.67 ± 0.33	–3.39	3.05
bis-2-chloroethylether	15/7	10 000	50	–4.23	–3.04	–5.24	–4.02 ± 0.49	–2.34 ± 1.27	–4.46	4.07
1-octanol	27/9	5000	60	–4.09	–3.62	–1.88	–2.42 ± 0.28	–2.50 ± 0.65	–2.46	2.35
phenyl-trifluoroethyl-ether	19/12	5000	60	–1.29	–2.88	–1.60	–0.57 ± 0.13	–3.21 ± 2.33	–0.67	0.56
triacetylglycerol	29/15	5000	100	–8.84	–14.55	–7.21	–6.37 ± 0.36	–4.02 ± 2.70	–6.09	6.54
acetamide	9/4	5000	20	–9.68	–8.95	–7.96	–7.98 ± 0.28	–9.02 ± 0.58	–7.18	8.57
propionamide	12/5	5000	20	–9.38	–8.56	–7.43	–7.15 ± 0.29	–7.65 ± 0.55	–7.32	7.05
4-methylimidazole	12/6	5000	20	–10.27	–11.27	–7.81	–8.05 ± 0.18	–8.81 ± 0.41	–8.01	8.08
toluene	15/7	5000	20	–0.89	0.11	–0.14	–0.12 ± 0.04	0.59 ± 0.90	–0.15	0.10
p-cresol	16/8	5000	20	–6.13	–4.46	–3.41	–3.69 ± 0.03	–4.30 ± 0.87	–3.75	3.58
3-methylindole	19/9	5000	20	–5.88	–5.50	–3.64	–3.35 ± 0.11	–3.50 ± 0.50	–3.45	3.25

All solvation free energies are in kcal/mol.

Number of atoms/number of non-hydrogen atoms.

Total number of conformations used to compute ΔAsolv by the various methods.

Time interval for saving conformations.

Experimental ΔAsolv taken from the Supporting Information ref (99).

ΔAsolv based on the classical GBMV implicit solvent model calculated with BAR.

“Static” ΔAsolv calculated with the quantum chemical SMD implicit solvent model based on a single conformation.

ΔAsolv based on the quantum chemical SMD implicit solvent model calculated with NBB.

ΔAsolvFEP based on the quantum chemical SMD implicit solvent model calculated from the classical GBMV result plus corrections between classical and quantum chemical description computed with FEP; cf. eq 11 and Scheme 1.

ΔAsolv based on the quantum chemical SMD implicit solvent model calculated with NB-FEP in the forward direction

ΔAsolv based on the quantum chemical SMD implicit solvent model calculated with NB-FEP in the backward direction.

Data Generation

CHARMM version c38b1 was used to carry out Langevin dynamics simulations of the model compounds in the gas phase and in GBMV IS. A friction coefficient of 5 ps–1 was applied to all atoms. All interactions were included in both gas phase and GBMV simulations; i.e., there was no truncation of nonbonded interactions. The time step in all simulations was 0.5 fs, and the molecules were fully flexible; no bond-length constraints were used. For each solute, at least 5000 coordinate sets were saved in the gas phase and in solution; see the third column in Table 5. The time interval for saving consecutive coordinates Δt is specified in the fourth column of Table 5. The total simulation length in gas phase and implicit solvent, respectively, for a particular compound was, therefore, the number of coordinate sets times Δt; it ranged from 100 ns (e.g., propane) to 500 ns (bis-2-chloroethylether, triacetylglycerol).

Analysis

For each of the coordinates saved, both classical and QM energies were recomputed (gas phase and IS - GBMV, SMx (x = D, 8, 12)). Benchmark results for SMx models were examined and the M06-2X/6-31G* level of theory was determined appropriate for SMD and SM8 IS models while B3LYP/6-31G* showed good performance for the SM12 model. All SMD calculations were carried out with GAMESS-US[117] while SM8/SM12 results were generated with Q-Chem.[103] Initially, classical simulation data was used to estimate the solvation free energy differences (i.e., gas phase → GBMV); referred to as “GBMV”. Second, NBB was applied to the forward and backward energy differences obtained from QM/IS calculations, treating the differences between MM and QM energies for the respective reference state as the “biasing” potential. These results are referred to as “SMD,NBB”, “SM8,NBB”, etc. In addition to using QM-NBB, the raw data were also evaluated by FEP and NB-FEP (SMD only). For these, the standard thermodynamic cycle of indirect QM FES (Scheme 1) was used; applied to the calculation of ΔAsolv for a solute X using an IS model, the scheme shown in Figure 2 simplifies toHere, ΔA2 is the classical GBMV result. ΔA1 and ΔA3 were computed by FEP, perturbing from the end points of the MM FES to the corresponding QM target states. We refer to results obtained in this manner as “traditional” FEP (FEPtrad). Next, based on classical gas phase and GBMV simulations, NB-FEP (eq 10) was applied to estimate both the “forward” (FW; QM → QM/IS) and “backward” (BW; QM/IS → QM) directions, respectively. Similar to QM-NBB, the difference between MM and QM energies for the respective reference state was considered the biasing potential, Vb.

Estimating Error

All free energy differences, regardless of method, reported in Table 5 are the values obtained from the full data set. The standard deviations reported were calculated as follows. Free energy differences for blocks of 1000 coordinate frames were calculated, and the standard deviation of these block averages is reported. Given that the coordinate frames should already be almost statistically independent (Δt ≥ 20 fs), this is likely to overestimate the statistical error; yet the number gives some feeling for the variability of the data. We also always compared ΔAsolv computed from the full data and the mean value of the block averages. If the calculations are converged, the two values should be identical. Any discrepancies between these two numbers indicate that the results are not (fully) converged.

Results and Discussion

Calculations in Explicit Solvent

Relative Solvation Free Energy Difference between Ethane and Methanol

Many applications of FES involve alchemical mutations of one molecule to another to predict relative free energies. Further, many systems of interest are poorly described by current classical force fields. Therefore, the capability to efficiently and accurately conduct QM alchemical FES is of critical importance. Here, we evaluate the performance of the proposed QM-NBB approach for a simple mutation of ethane to methanol in water. Calculation of the corresponding solvation free energy difference (ΔΔAsolv) has become somewhat of a benchmark for FES.[124] A variety of MM force fields yield highly accurate results within very short simulation lengths. Therefore, this system is a useful test for computationally expensive QM free energy methods, as different approaches can be evaluated within reasonable time. In Table 1 we compare the results of FES based on MM with BAR (MM-BAR, left) and the QM-NBB approach (QM-NBB, middle). The first row gives the free energy difference between ethane and methanol in the gas phase (ΔAgas), while the second row represents the corresponding free energy change in aqueous solution (ΔAH). Subtracting ΔAgas from ΔAH leads to the solvation free energy difference, ΔΔAsolv, which is shown in the last row together with the experimental reference result (Exp., rightmost column).[125] Both MM-BAR and QM-NBB are in excellent agreement with experiment (deviations of 0.04 and 0.02 kcal/mol). While this small difference in accuracy between MM-BAR and QM-NBB is probably fortuitous, it is still an indicator that QM-NBB FES can lead to improved accuracy even in cases where the MM parameters are well developed.

Table 1

Free Energy Differences between Ethane and Methanol (kcal/mol)

	MM-BAR	QM-NBB	Exp.a
ΔA_gas	+6.02 ± 0.01	–22517.95 ± 0.01	-
ΔA_H₂O	–0.86 ± 0.02	–22524.91 ± 0.04	-
ΔΔA_solvb	–6.89 ± 0.02	–6.96 ± 0.01	–6.93

Experiment: ref (125).

Relative solvation free energy difference: ΔΔAsolv = ΔAH – ΔAgas.

Experiment: ref (125). Relative solvation free energy difference: ΔΔAsolv = ΔAH – ΔAgas. Obviously, the QM data for ΔAgas and ΔAH are orders of magnitude larger than the corresponding MM-BAR results. This reflects the differences in internal energies between ethane and methanol (i.e., the energetic costs of creating the atoms in the respective method). The major contribution to the internal energy in QM methods arises from the interactions between core electrons and the nuclei. In particular for systems consisting of different numbers of atoms, this results in large differences in internal energies. However, such electron–nuclei interactions are not present in MM methods, since they do not contribute to the chemical bond. In other words, the apparent discrepancies between MM and QM single free energy differences ΔAgas and ΔAH reflect the different reference states of the methods. It should be pointed out that the single free energy differences for MM would also be very different if another force field had been used (cf., Figure 1 of ref (88)). Strictly speaking, only the difference between ΔAgas from ΔAH, which leads to ΔΔAsolv in the last row, is of relevance, as the effect of the arbitrary reference state cancels out. The only practical ramification of the large values in QM is that the computer codes for NBB or BAR have to be stable numerically. This is easily accomplished by factoring out large offsets and/or using a suitable starting value for C in eqs 3 and 9. In Table 2, we compare experimental solvation free energies (ΔΔAsolvExp, first column) with QM-NBB results (ΔΔAsolvQM-NBB, second column) and alternative free energy methods to analyze the same set of QM potential energy data (columns three and four). All computational results presented in this section are based on the same set of trajectories. Thus, any errors resulting from sampling should be consistent and provide a fair test environment for a relative evaluation of accuracy and precision.

Table 2

Comparison of Relative Solvation Free Energies for Ethane and Methanol from Several Approaches Based on the Same Set of QM Potential Energy Dataa

	ΔΔA_solv^Expb	ΔΔA_solv^QM-NBBc	ΔΔA_solv^QM-BARd	ΔΔA_solv^{FEP^trad}e
ethane–methanol	–6.93	–6.96 ± 0.04	–6.09 ± 0.02	–7.14 ± 0.09

All energies are reported in kcal/mol.

Experiment: ref (125).

QM-NBB.

QM-BAR (i.e., no reweighting is employed for QM data).

Zwanzig’s equation (i.e., the traditional FEP approach).

All energies are reported in kcal/mol. Experiment: ref (125). QM-NBB. QM-BAR (i.e., no reweighting is employed for QM data). Zwanzig’s equation (i.e., the traditional FEP approach). One conceivable alternative to QM-NBB consists of using BAR with the QM potential energy data (which does not involve reweighting). This approach assumes that all frames in the MM trajectory are generated with the correct Boltzmann probability in regard to the QM energy surface. The results for ΔΔAsolvQM-BAR are shown in the third column of Table 2. As can be seen, the omission of any kind of reweighting step in the workflow leads to errors of about 0.8 kcal/mol. This large deviation illustrates that a correction for the change of probabilities from the MM to the QM energy surface is absolutely necessary. Interestingly, the standard deviation of ΔΔAsolvQM-BAR is the same as for pure MM (c.f. last row of MM-BAR in Table 1). This indicates that QM-NBB suffers from some loss of precision (σ 0.04 vs 0.02) and that it is not a result of using QM energies instead of MM but rather an effect of changing the weights of the frames with NBB. The second alternative consists of using Zwanzig’s exponential FEP formula[92] instead of NBB; i.e., the traditional approach according to Scheme 1. In contrast to BAR or NBB that employ two trajectories per FES, FEP uses only a single trajectory, which lowers computational costs. The corresponding results are shown in the rightmost column of Table 2 (ΔΔAsolvFEP). As can be seen, the use of FEP leads to a deviation of 0.21 kcal/mol from experiment. While this deviation might be considered acceptable for standard applications of FES, it is an order of magnitude higher than the QM-NBB deviation (0.03 kcal/mol). In addition, the standard deviation (0.09 kcal/mol) is more than twice that of QM-NBB (0.04 kcal/mol). The poor performance of FEP here agrees with recent observations for MM FES,[82−84] where FEP was considerably less accurate and precise than all other free energy methods.

Influence of Time Step and SHAKE

One main difference between MM and QM/MM is the treatment of chemical bonds. Most MM simulations replace the Morse potential of bonds by a harmonic potential. Far from the equilibrium bond length, the errors incurred by this approximation can be considerable. One common way to avoid this problem consists of using SHAKE to keep bond lengths fixed at their equilibrium distances, thus avoiding errors from the harmonic approximation. As a side effect, SHAKE also allows the use of larger time steps in molecular dynamics simulations, leading to improved sampling and thus higher efficiency. One potential drawback to this approach, however, comes from the neglect of anharmonicity, i.e., nonharmonic behavior upon bond stretching. Table 3 reports data used to analyze the effects of SHAKE and simulation time step on the accuracy of QM-NBB FES. In particular, we compare the performance of simulations that use 0.5 and 1 fs time steps without SHAKE (ΔΔAsolv0.5fs and ΔΔAsolv1.0fs) as well as simulations with 1 and 2 fs time steps with SHAKE (ΔΔAsolv1.0fs/SHAKE and ΔΔAsolv2.0fs/SHAKE). Notably, 2 fs time steps are only possible when using SHAKE, so the two sets of 1 fs time step simulations serve as a control to possible errors resulting from SHAKE.

Table 3

QM-NBB Solvation Free Energy Results from Simulations with Different Time Steps (δt) and with and without SHAKEa

	ΔΔA_solv^0.5fsb	ΔΔA_solv^1.0fsc	ΔΔA_solv^1.0fs/SHAKEd	ΔΔA_solv^2.0fs/SHAKEe	ΔΔA_solv^Expf
ethane–methanol	–6.96 ± 0.06	–6.96 ± 0.04	–6.65 ± 0.02	–6.45 ± 0.02	–6.93

All energies are reported in kcal/mol.

QM-NBB, δt = 0.5 fs, No SHAKE.

QM-NBB, δt = 1.0 fs, No SHAKE.

QM-NBB, δt = 1.0 fs, SHAKE.

QM-NBB, δt = 2.0 fs, SHAKE.

Experiment: ref (125).

All energies are reported in kcal/mol. QM-NBB, δt = 0.5 fs, No SHAKE. QM-NBB, δt = 1.0 fs, No SHAKE. QM-NBB, δt = 1.0 fs, SHAKE. QM-NBB, δt = 2.0 fs, SHAKE. Experiment: ref (125). As illustrated in Table 2, using time steps of 0.5 and 1 fs without SHAKE (first two columns) has negligible effect on accuracy. Notably, there is a small difference in terms of precision (standard deviation of 0.06 vs 0.04 kcal/mol), which likely can be attributed to sampling since simulation time doubles when going from 0.5 fs →1.0 fs. Therefore, a 1 fs time step is recommended in the underlying simulations used in QM-NBB FES; however, when high frequency bond stretching is expected to significantly contribute to free energy differences, a smaller time step may be required. To determine the effect of SHAKE, two 1 fs time step simulations are employed (with and without SHAKE, second and third column, respectively). While the deviation from experimental results is small for the FES without SHAKE (0.03 kcal/mol), the error becomes significantly higher when using SHAKE (0.28 kcal/mol). Notably, a similar error is found for MM-BAR FES of the same trajectories, where the computed solvation free energy difference is −6.73 kcal/mol (error of 0.20 kcal/mol). However, the accuracy deteriorates further when a time step of 2 fs with SHAKE is used, yielding an error of almost 0.5 kcal/mol. This is significantly higher than the error of the underlying MM-BAR FES, which yields a result of −6.76 kcal/mol (error of 0.17 kcal/mol). This suggests that the difference in fixed versus flexible X–H bond treatment (X = N, C, O) can have a significant effect on the free energy when QM-based FES are employed. To gain additional insight into errors associated with SHAKE, both standard harmonic and anharmonic gas phase QM calculations were carried out on ethane and methanol using the transition-optimized shifted Hermite (TOSH) method.[126] From Table 2, the energetic penalty of using SHAKE on the 1 fs simulations is 0.31 kcal/mol. This appears to be a result of limiting high frequency bond stretching in two systems where the effects do not cancel; i.e., a larger effect in methanol due to its O–H bond. Calculations at the B3LYP/6-31G* level of theory seemingly confirm this. Examining ΔStotal (= StotalAnharm – StotalHarm) reveals that 0.21 kcal/mol of entropy is gained for methanol upon accounting for anharmonicity whereas 0.09 kcal/mol is lost in ethane. This yields a total effect of 0.30 kcal/mol, in near perfect agreement with QM-NBB results (vide supra). This should serve as another point of caution for approaches that connect MM ↔ QM using either fixed or restrained QM regions.[77−79]

Absolute Solvation Free Energies of Ethane and Methanol

The calculation of absolute solvation free energies involves the gradual deactivation of solute–solvent interactions until the solute is in a noninteracting ideal gas state. This process requires scaling all solute–solvent interactions and naturally leads to the application of the indirect FES scheme. Table 4 reports the performance of both MM and QM/MM based results for ethane and methanol. The MM results are shown in the first column, while the QM-NBB results are shown in the second. Again, the difference between MM and QM/MM is statistically significant. Both methods show good agreement with experiment; root-mean-square deviations, RMSD, are about 0.4 and 0.2 kcal/mol, respectively (see last row of Table 4). Of note, RMSD results indicate that the previously observed excellent experimental agreement (ΔΔAsolv, Table 2) was fortuitous and likely a result of cancellation of errors. Nevertheless, it is also clearly demonstrated that significant error reduction can be realized when applying QM-NBB FES.

Table 4

Absolute solvation free energies for ethane and methanola

	ΔA_solv^MM–BARb	ΔA_solv^QM-NBBc	ΔA_solv^Expd
ethane	2.29 ± 0.10	2.03 ± 0.10	1.83
methanol	–4.68 ± 0.03	–4.82 ± 0.04	–5.10
RMSDe	0.42	0.22

All energies are reported in kcal/mol.

MM-BAR FES.

QM-NBB, indirect FES approach.

Experiment: ref (125).

RMSD from experimental results.

All energies are reported in kcal/mol. MM-BAR FES. QM-NBB, indirect FES approach. Experiment: ref (125). RMSD from experimental results.

Calculations in Implicit Solvent

In Table 5 we report ΔAsolv for 21 compounds, ranging in size from 5 to 29 atoms; see the second column in the table. Aside from the experimental values, taken from the Supporting Information of ref (99), we report computed free energy differences obtained by BAR based on the MM gas phase and GBMV raw data, by DFT/SMD based on a single coordinate set, and by QM-NBB based on the reweighting to DFT/SMD, as well as DFT/SMD results obtained by FEP and NB-FEP (cf. Methods). Results obtained by reweighting to DFT/SM8 and DFT/SM12 are shown in Table 6; cf. below.

Table 6

Simulation Results of Absolute Solvation Free Energy Difference Calculations Based on QM Implicit Solvent Models SMD, SM8, and SM12a

	exp.	GBMV	SMD,NBB	SM8,NBB	SM12,NBB
methane	1.99	1.35	2.17	1.72 ± 0.01	1.33 ± 0.01
ethane	1.83	1.33	1.76	1.12 ± 0.01	0.82 ± 0.01
propane	1.96	1.37	1.88	1.12 ± 0.01	0.85 ± 0.01
i-butane	2.32	1.47	2.21	1.42 ± 0.01	1.12 ± 0.01
n-butane	2.07	1.52	2.09	1.21 ± 0.01	0.97 ± 0.01
methanol	–5.10	–5.27	–4.00	–4.88 ± 0.01	–5.02 ± 0.02
ethanol	–5.00	–4.96	–3.85	–4.71 ± 0.07	–4.91 ± 0.08
methanethiol	–1.24	–0.29	–0.78	–0.50 ± 0.01	–1.11 ± 0.01
ethyl-methylsulfide	–1.50	1.09	–0.30	–0.42 ± 0.05	–0.70 ± 0.03
methyl formate	–2.78	–6.39	–1.67	–2.56 ± 0.04	–3.12 ± 0.03
2-methoxy phenol	–5.57	–4.42	–3.33	–5.40 ± 0.09	–6.14 ± 0.04
bis-2-chloroethylether	–4.23	–3.04	–4.02	–3.66 ± 0.14	–4.16 ± 0.11
1-octanol	–4.09	–3.62	–2.42	–3.45 ± 0.05	–3.49 ± 0.04
phenyl-trifluoroethyl-ether	–1.29	–2.88	–0.57	–1.89 ± 0.06	–2.26 ± 0.09
triacetyl glycerol	–8.84	–14.55	–6.37	–9.03 ± 0.19	–9.71 ± 0.14
acetamide	–9.68	–8.95	–7.98	–10.93 ± 0.17	–10.89 ± 0.04
propionamide	–9.38	–8.56	–7.15	–10.57 ± 0.30	–10.58 ± 0.10
4-methylimidazole	–10.27	–11.27	–8.05	–9.18 ± 0.19	–8.84 ± 0.08
toluene	–0.89	0.11	–0.12	–0.95 ± 0.01	–1.17 ± 0.01
p-cresol	–6.13	–4.46	–3.69	–5.35 ± 0.04	–5.58 ± 0.02
3-methylindole	–5.88	–5.50	–3.35	–4.66 ± 0.05	–4.79 ± 0.01
RMSDb		1.79	1.47	0.76	0.85

All solvation free energies are in kcal/mol.

RMSD of the solvation free energy compared to the experimental result.

All solvation free energies are in kcal/mol. Number of atoms/number of non-hydrogen atoms. Total number of conformations used to compute ΔAsolv by the various methods. Time interval for saving conformations. Experimental ΔAsolv taken from the Supporting Information ref (99). ΔAsolv based on the classical GBMV implicit solvent model calculated with BAR. “Static” ΔAsolv calculated with the quantum chemical SMD implicit solvent model based on a single conformation. ΔAsolv based on the quantum chemical SMD implicit solvent model calculated with NBB. ΔAsolvFEP based on the quantum chemical SMD implicit solvent model calculated from the classical GBMV result plus corrections between classical and quantum chemical description computed with FEP; cf. eq 11 and Scheme 1. ΔAsolv based on the quantum chemical SMD implicit solvent model calculated with NB-FEP in the forward direction ΔAsolv based on the quantum chemical SMD implicit solvent model calculated with NB-FEP in the backward direction. All solvation free energies are in kcal/mol. RMSD of the solvation free energy compared to the experimental result. The classical force field combined with the GBMV implicit solvation model (column “GBMV” in Table 5) does reasonably well for most compounds; however, there are some severe errors, such as for ethyl-methylsulfide, methyl formate, and triacetyl glycerol. The statistical error of all results is extremely low (<0.1 kcal/mol in all cases, data not shown); thus, the results reflect the strengths and weaknesses of the classical force field and solvation model. Overall, the results obtained with DFT and the SMD solvation model, based both on single configurations (column “SMD”) and QM-NBB (column “SMD,NBB”), are in better agreement with experiment. There are fewer huge outliers compared to MM/GBMV; yet, the results for alcohols are somewhat disappointing and unexpected. Much better agreement with experiment is obtained with the SM8 and SM12 solvation models, cf. Table 6. The last line in this table lists the RMSDs of the computed ΔAsolv results relative to experiment. While SMD (1.47 kcal/mol) is an improvement over GBMV (1.79 kcal/mol), it compares poorly to the values of 0.76 and 0.85 kcal/mol for SM8 and SM12. Interestingly, the newer SM12 model fares slightly worse than its predecessor SM8 for this particular set of 21 compounds; however, this could be a consequence of the functional and grid employed (B3LYP (SG-1) vs M06-2X (99,590), respectively). The focus of this study is not a critical evaluation of implicit solvation models. Instead, it is more interesting to note that introducing flexibility has a large influence on results in some cases, e.g., bis-2-chloroethylether or phenyl-trifluoroethyl-ether, where static and MD based results differ by over 1 kcal/mol. Overall, the QM-NBB results are in slightly better agreement with experiment (i.e., SMD vs SMD,NBB results in Table 5). The standard deviation is quite low in most cases; the largest variation of results was observed for bis-2-chloroethylether. However, the difference between the free energy difference obtained for the full data set and the average of the individual blocks (cf. Methods) is less than 0.1 kcal/mol for all solutes, i.e., well below the statistical error estimated based on the standard deviation of the block results. This indicates that the results are well converged; e.g., for bis-2-chloroethylether, although the standard deviation is ±0.49 kcal/mol, the discrepancy is only 0.02 kcal/mol (data not shown). While our QM-NBB approach performs well and leads to converged results in all cases, the standard indirect approach (FEPtrad) based on FEP from MM → QM (eq 11) is problematic. It does work for small molecules, e.g., methane or ethane. In those cases, the FEPtrad result (see column “SMD,FEPtrad”) agrees with the QM-NBB result, the standard deviation is low, and the free energy difference obtained from all data agrees with the average of the block results (data not shown). However, already for, e.g., propane with FEPtrad, the quality of the result is noticeably poorer than QM-NBB; differing by more than 0.2 kcal/mol with considerably higher standard deviation (±0.27 kcal/mol for FEPtrad vs ±0.01 for QM-NBB). For larger molecules, e.g., toluene, the FEPtrad results are unusable. The standard deviation is now almost 1 kcal/mol, while the free energy difference obtained from all data (0.59 kcal/mol) is quite different from the average of the block results (0.1 kcal/mol, data not shown); both values differ noticeably from the QM-NBB result (−0.12 kcal/mol). By contrast, NB-FEP performs surprising well. In most cases, the NB-FEP results agree within error bars with QM-NBB; the QM-NBB result is usually close to the average value of the forward and backward NB-FEP result. However, for larger molecules the hysteresis between forward and backward results increases. The outlier is acetamide, where the hysteresis is over 1 kcal/mol; this is the only case where the forward and backward result cannot be reconciled based on the statistical error estimate. Given that the computational effort of doing NB-FEP in both forward and backward direction is comparable to that of QM-NBB, the latter is clearly advantageous. Several of the compounds studied can adopt multiple conformations that require sufficient sampling, the simplest example being butane. In ref (94) the authors observed that the central dihedral angle of butane was poorly sampled even in simulations of 1 ns length, cf. their Figure 7. Figure 4 shows the degree of sampling of the central dihedral angle of butane in the data used for the reweighting. The red line is a reference histogram obtained from a separate 500 ns simulation, whereas the green curve displays the histogram obtained from the 10 000 data points (gas phase) used in the free energy simulations. The two curves are practically identical. The data underlying our FES (green curve) should be contrasted to the amount of sampling achieved in a simulation of just 50 000 MD steps, saving every 10th step (blue line). One gauche minimum is not sampled at all, and the other gauche minimum and the trans minimum have wrong weights. Of course, 50 000 MD steps may seem ridiculously short, but a brute force simulation with a QM potential as used in this work for reweighting (DFT - M06-2X, SMD, or SM8 IS model) would already be a significant computational undertaking. Performing a 1 ns simulation, the simulation length used in the study by Shirts et al.[94] would be prohibitively expensive even today. The present implicit solvent work relied on regular long MD simulations for sufficient sampling, as the MM simulations are cheap compared to re-evaluation at the QM level. However, the approach would work equally well (or even better) if enhanced sampling techniques would have been used to generate conformations for reweighting; evaluation of this is already underway.

Figure 4

Sampling of butane’s conformational space. The x-axis is butane’s central dihedral angle while the y-axis is probability.

Conclusions

Two new approaches (QM-NBB and NB-FEP) are presented for effectively connecting MM simulations with QM calculations to determine free energy differences. Both methods are based on the use of “unusual” biasing potentials to obtain more accurate results; i.e., simulations carried out at low levels of theory (MM) in conjunction with high level (QM) potential energy evaluations. QM-NBB is initially applied to calculate both absolute and relative solvation free energy differences of ethane and methanol in explicit solvent. Our results demonstrate that the reweighting step is necessary. Just inserting the QM energies into regular BAR (“QM-BAR” in Table 2) leads to a clearly erroneous result. This can be explained by the differences in the underlying MM and QM potential energy surfaces; i.e., the ensemble of “important” states on the two surfaces are significantly different. The effect of those differences can be aggravated by constraining degrees of freedom with SHAKE (Table 3). In this case, the results became incorrect because the MM and QM description of the system did not match. Quite generally, the potential energy surface at which sampling is carried out has to be, to some degree, representative of the QM surface, with mismatches being caused by a variety of energetic components; e.g., electrostatics, harmonic approximates, and more. While two-sided methods such as BAR/NBB are much more efficient than, e.g., FEP, some minimal amount of overlap is required. However, if sampling is adequate, QM-NBB leads to a more accurate and precise result than the traditional indirect scheme coupled with FEP (ΔΔAsolvFEP in Table 2). The computation of solvation free energies for a diverse series of 21 compounds further validates the usefulness of these new approaches. The employed compounds range from small molecules to fairly large, flexible solutes, such as triacetyl glycerol, and, therefore, can be considered representative of practical FES applications. Triacetyl glycerol is an example of a compound where force fields result in poor solvation free energies (cf. “GBMV” in Table 5 and Supporting Information of ref (99)). The traditional FEP approach for QM/MM FES also leads to inaccurate results, as it becomes numerically unstable or completely unusable for systems much larger than 10 atoms. NB-FEP greatly improved performance compared to standard FEP; however, numerical inconsistencies were still observed. Thus, QM-NBB is established as the preferred method to connect MM to QM (or QM/MM) levels of theory. This is clearly reflected by comparisons to experiment. The combination of a classical force field with BAR and the GBMV implicit solvent model leads to a RMSD of 1.79 kcal/mol while the use of QM-NBB with SM8 reduces this RMSD to 0.76 kcal/mol. To put those numbers into perspective, the best results of the SAMPL0 to SAMPL2 prediction competitions exhibited RMSDs between 1.3 and 3.6 kcal/mol.[21,24,127,128] While the quality of traditional MM FES strongly depends on the selected parameters (in particular charges),[129] conventional QM solvation free energy evaluations based on a single conformation can suffer from not accounting for solute entropy. QM-NBB overcomes these weaknesses by combining the strengths of both approaches: efficient sampling from MM and accurate intra- and intermolecular interactions from QM. Thus, QM-NBB with the right choice of QM level of theory holds significant promise as an “affordable” method for calculating highly accurate free energies, e.g., on par with or better than standard methods currently being employed.

80 in total

1. Alchemical free energy calculations and multiple conformational substates.

Authors: Martin Leitgeb; Christian Schröder; Stefan Boresch
Journal: J Chem Phys Date: 2005-02-22 Impact factor: 3.488

Review 2. Advances in methods and algorithms in a modern quantum chemistry program package.

Authors: Yihan Shao; Laszlo Fusti Molnar; Yousung Jung; Jörg Kussmann; Christian Ochsenfeld; Shawn T Brown; Andrew T B Gilbert; Lyudmila V Slipchenko; Sergey V Levchenko; Darragh P O'Neill; Robert A DiStasio; Rohini C Lochan; Tao Wang; Gregory J O Beran; Nicholas A Besley; John M Herbert; Ching Yeh Lin; Troy Van Voorhis; Siu Hung Chien; Alex Sodt; Ryan P Steele; Vitaly A Rassolov; Paul E Maslen; Prakashan P Korambath; Ross D Adamson; Brian Austin; Jon Baker; Edward F C Byrd; Holger Dachsel; Robert J Doerksen; Andreas Dreuw; Barry D Dunietz; Anthony D Dutoi; Thomas R Furlani; Steven R Gwaltney; Andreas Heyden; So Hirata; Chao-Ping Hsu; Gary Kedziora; Rustam Z Khalliulin; Phil Klunzinger; Aaron M Lee; Michael S Lee; Wanzhen Liang; Itay Lotan; Nikhil Nair; Baron Peters; Emil I Proynov; Piotr A Pieniazek; Young Min Rhee; Jim Ritchie; Edina Rosta; C David Sherrill; Andrew C Simmonett; Joseph E Subotnik; H Lee Woodcock; Weimin Zhang; Alexis T Bell; Arup K Chakraborty; Daniel M Chipman; Frerich J Keil; Arieh Warshel; Warren J Hehre; Henry F Schaefer; Jing Kong; Anna I Krylov; Peter M W Gill; Martin Head-Gordon
Journal: Phys Chem Chem Phys Date: 2006-06-12 Impact factor: 3.676

3. Accurate and efficient corrections for missing dispersion interactions in molecular simulations.

Authors: Michael R Shirts; David L Mobley; John D Chodera; Vijay S Pande
Journal: J Phys Chem B Date: 2007-10-19 Impact factor: 2.991

4. A blind challenge for computational solvation free energies: introduction and overview.

Authors: J Peter Guthrie
Journal: J Phys Chem B Date: 2009-04-09 Impact factor: 2.991

5. pKa calculations in solution and proteins with QM/MM free energy perturbation simulations: a quantitative test of QM/MM protocols.

Authors: Demian Riccardi; Patricia Schaefer; Qiang Cui
Journal: J Phys Chem B Date: 2005-09-22 Impact factor: 2.991

6. Thermodynamic integration to predict host-guest binding affinities.

Authors: Morgan Lawrenz; Jeff Wereszczynski; Juan Manuel Ortiz-Sánchez; Sara E Nichols; J Andrew McCammon
Journal: J Comput Aided Mol Des Date: 2012-02-16 Impact factor: 3.686

Review 7. Computations of standard binding free energies with molecular dynamics simulations.

Authors: Yuqing Deng; Benoît Roux
Journal: J Phys Chem B Date: 2009-02-26 Impact factor: 2.991

8. Using Selectively Applied Accelerated Molecular Dynamics to Enhance Free Energy Calculations.

Authors: Jeff Wereszczynski; J Andrew McCammon
Journal: J Chem Theory Comput Date: 2010-10-13 Impact factor: 6.006

9. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles.

Authors: Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell
Journal: J Chem Theory Comput Date: 2012-07-18 Impact factor: 6.006

10. Determinants of reactivity and selectivity in soluble epoxide hydrolase from quantum mechanics/molecular mechanics modeling.

Authors: Richard Lonsdale; Simon Hoyle; Daniel T Grey; Lars Ridder; Adrian J Mulholland
Journal: Biochemistry Date: 2012-02-10 Impact factor: 3.162

48 in total

1. An Estimation of Hybrid Quantum Mechanical Molecular Mechanical Polarization Energies for Small Molecules Using Polarizable Force-Field Approaches.

Authors: Jing Huang; Ye Mei; Gerhard König; Andrew C Simmonett; Frank C Pickard; Qin Wu; Lee-Ping Wang; Alexander D MacKerell; Bernard R Brooks; Yihan Shao
Journal: J Chem Theory Comput Date: 2017-01-24 Impact factor: 6.006

2. Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 4. Adaptive QM/MM.

Authors: Jia-Ning Wang; Wei Liu; Pengfei Li; Yan Mo; Wenxin Hu; Jun Zheng; Xiaoliang Pan; Yihan Shao; Ye Mei
Journal: J Chem Theory Comput Date: 2021-02-16 Impact factor: 6.006

3. Perspective: Quantum mechanical methods in biochemistry and biophysics.

Authors: Qiang Cui
Journal: J Chem Phys Date: 2016-10-14 Impact factor: 3.488

4. Comparison of Methods To Reweight from Classical Molecular Simulations to QM/MM Potentials.

Authors: Eric C Dybeck; Gerhard König; Bernard R Brooks; Michael R Shirts
Journal: J Chem Theory Comput Date: 2016-03-23 Impact factor: 6.006

5. Predicting hydration free energies with a hybrid QM/MM approach: an evaluation of implicit and explicit solvation models in SAMPL4.

Authors: Gerhard König; Frank C Pickard; Ye Mei; Bernard R Brooks
Journal: J Comput Aided Mol Des Date: 2014-02-07 Impact factor: 3.686

6. Absolute binding free energies for octa-acids and guests in SAMPL5 : Evaluating binding free energies for octa-acid and guest complexes in the SAMPL5 blind challenge.

Authors: Florentina Tofoleanu; Juyong Lee; Frank C Pickard Iv; Gerhard König; Jing Huang; Minkyung Baek; Chaok Seok; Bernard R Brooks
Journal: J Comput Aided Mol Des Date: 2016-09-30 Impact factor: 3.686

7. An efficient protocol for obtaining accurate hydration free energies using quantum chemistry and reweighting from molecular dynamics simulations.

Authors: Frank C Pickard; Gerhard König; Andrew C Simmonett; Yihan Shao; Bernard R Brooks
Journal: Bioorg Med Chem Date: 2016-08-22 Impact factor: 3.641

8. Toward polarizable AMOEBA thermodynamics at fixed charge efficiency using a dual force field approach: application to organic crystals.

Authors: Ian J Nessler; Jacob M Litman; Michael J Schnieders
Journal: Phys Chem Chem Phys Date: 2016-11-09 Impact factor: 3.676

9. QM/MM free energy simulations: recent progress and challenges.

Authors: Xiya Lu; Dong Fang; Shingo Ito; Yuko Okamoto; Victor Ovchinnikov; Qiang Cui
Journal: Mol Simul Date: 2016-07-05 Impact factor: 2.178

10. Computation of Hydration Free Energies Using the Multiple Environment Single System Quantum Mechanical/Molecular Mechanical Method.

Authors: Gerhard König; Ye Mei; Frank C Pickard; Andrew C Simmonett; Benjamin T Miller; John M Herbert; H Lee Woodcock; Bernard R Brooks; Yihan Shao
Journal: J Chem Theory Comput Date: 2015-12-11 Impact factor: 6.006