Literature DB >> 29438621

Explicit Solvation Matters: Performance of QM/MM Solvation Models in Nucleophilic Addition.

Jelle M Boereboom¹, Paul Fleurat-Lessard², Rosa E Bulo¹.

Abstract

Nucleophilic addition onto a carbonyl moiety is strongly affected by solvent, and correctly simulating this solvent effect is often beyond the capability of single-scale quantum mechanical (QM) models. This work explores multiscale approaches for the description of the reversible and highly solvent-sensitive nucleophilic N|···C═O bond formation in an Me2N-(CH2)3-CH═O molecule. In the first stage of this work, we rigorously compare and test four recent quantum mechanical/molecular mechanical (QM/MM) explicit solvation models, employing a QM description of water molecules in spherical regions around both the oxygen and the nitrogen atom of the solute. The accuracy of the models is benchmarked against a reference QM simulation, focusing on properties of the solvated Me2N-(CH2)3-CH═O molecule in its ring-closed form. In the second stage, we select one of the models (continuous adaptive QM/MM) and use it to obtain a reliable free energy profile for the N|···C bond formation reaction. We find that the dual-sphere approach allows the model to accurately account for solvent reorganization along the entire reaction path. In contrast, a simple microsolvation model cannot adapt to the changing conditions and provides an incorrect description of the reaction process.

Entities: Chemical Disease Gene Species

Year: 2018 PMID： 29438621 PMCID： PMC6023263 DOI： 10.1021/acs.jctc.7b01206

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Compared to the gas phase, the mechanism of a chemical reaction in solvent can be strongly affected by the interactions between the solute and the solvent molecules. More than a century ago, the pioneering work of Berthelot[1] and Menshutkin[2] paved the way for a better understanding of these solvent effects, but despite this and many more advances,[3−5] the atomistic details of solvent effects cannot yet be resolved experimentally. In contrast, atomistic aspects of chemical mechanisms can be intimately studied with computer simulations, and recent progress allows solvent inclusion into many molecular models used.[6−12] Such an atomistic description of a reaction in solution requires rigorous sampling of all possible configurations of the solvent molecules around the solute. This sampling can be achieved through molecular dynamics (MD) simulations, but as the size of the molecular model increases, the approach becomes very arduous. The efficiency of MD simulations of large solute–solvent systems can be greatly improved with a multiscale approach. Microsolvation is the simplest multiscale solvation model; it includes explicitly only the solvent molecules in the first solvation shell using a quantum mechanical (QM) description, while the long-range effects are approximated by implicit solvation.[13] A more complex model uses the same QM description for the closest solvent molecules but describes the rest of the system explicitly with molecular mechanics (MM). The latter (QM/MM) models have been used since the 1970s and were recognized in 2013 with the Nobel Prize in Chemistry. Originally the systems studied with QM/MM were biomolecular proteins or other relatively rigid systems.[14,15] In such applications, the composition of the region of interest does not change over time. Nowadays, there is an increasing interest in extending the application of these QM/MM models to highly diffusive systems (e.g., homogeneous/heterogeneous catalysis and chemistry of solvated systems).[16−30] Several QM/MM solvation models have been developed for this purpose, but so far, there has not been one model that clearly outperforms the others.[31] In this work, we first benchmark the state of the art in multiscale solvation models and then go on to perform accurate simulations of a class of particularly solvent-sensitive reactions. Chemical reactions involving charged intermediates are especially strongly affected by solvent. The intermediates are stabilized in polar protic solvents, while they are destabilized in apolar solvents, often yielding different mechanisms. Acid-catalyzed nucleophilic substitutions, for example, usually proceed through a nucleophilic addition onto the carbonyl moiety followed by the elimination of a different nucleophile in a subsequent step.[32] However, a direct S2 mechanism has also been observed (in polar solvents),[33−36] and computations have comfirmed that the latter mechanism benefits from the stabilization of negative charge on the oxygen atom as the nucleophile approaches the carbonyl carbon. Bürgi and Dunitz identified crystal structures featuring a nucleophilic tertiary amino group and a carbonyl moiety separated by remarkably short distances.[37] The unusual interaction (Figure ) has since then been observed in many biological systems (e.g., as a central part of a new class of HIV inhibitors)[38,39] and has been exploited in the development of heterogeneous catalysts for the enantioselective hydrogenation of α-functionalized ketones.[40,41] But foremost, it can be seen as a transition state analogue in the nucleophilic addition of an amine to a carbonyl moiety. Indeed, the interaction is known to be stable only in the presence of polar protic solvents (Figure ).[42]

Figure 1

Lewis structures of the NCO molecule in only implicit solvent (left) and the NCO molecule in implicit solvent and three explicit water molecules (right).

Lewis structures of the NCO molecule in only implicit solvent (left) and the NCO molecule in implicit solvent and three explicit water molecules (right). Previous works[42,43] have focused on understanding the N|···C=O interaction at the electronic level. It was found that this interaction has a partial dative character (nN → πCO*), but this participation is quite low, less than 0.18 electrons. Instead, a four electron/three center bonding model is more suited. As the N–C distance decreases, the donation of electrons into the N–C bond stems not only from the lone pair of the nitrogen atom but from the C=O bond as well. Overall, the combined effects result in a polarized carbonyl moiety (N|···C+–O–), which explains the observed stabilization by protic solvents; they stabilize the ionic C+–O– bond. One study[42] employed a single molecule model system containing both moieties (denoted NCO molecule, Figure ). Indeed, in this system, the absence of solvent does not yield stable N|···C distances that correspond to a bond. Step-wise introduction of water molecules in the model yields a decreasing N|···C bond length after geometry optimization. This demonstrates the remarkable sensitivity to solvation of this nucleophilic substitution mimic. Calculations of the electronic structure of the NCO molecule in its N–C bonded state, using both microsolvation models and QM/MM electrostatic embedding, revealed that a QM description of only the solute molecule is not sufficient to correctly describe the electronic structure of the weak bond.[44] This makes the system ideally suited to benchmark QM/MM solvation models that can deal with diffusion of water molecules in the QM region. Pilmé et al. obtained binding energies for the N|···C+–O– bond,[42] using a bimolecular model containing Me3N|···H2CO, only one explicit water molecule and an implicit solvent model. Static electronic structure calculations yielded binding energies of 9 kcal/mol (B3LYP) and 11 kcal/mol (CCSD(T)), which is about twice as strong as an average hydrogen bond in water. In this work, we aim to improve on this result by representing the aqueous solvent at varying levels of accuracy. We use the NCO molecule in Figure , which has the advantage that whenever the N|···C distance becomes large the different moieties do not diffuse away from each other. The QM/MM solvation models tested in this work all divide the system into an active (A) region and an environment (E) region. Molecules in the A-region are treated QM, and molecules in the E-region are treated MM. More complex models also define a third region: the transition (T) region. This region separates the A- and E-regions, and molecules inside the T-region have fractional QM character (depending on their distance to the QM center). In Figure , a schematic representation of a QM/MM simulation is shown for NCO solvated in water. Note that the A-region is defined by two spheres around the nitrogen and oxygen of the central NCO molecule. The two spheres can overlap or separate depending on the geometry of the system. The dual-sphere adaptive QM/MM scheme was introduced 10 years ago by Heyden et al.[18] and constitutes a vast improvement over previous multiscale applications, in which a single large QM sphere needs to encapsulate all reactive chemical entities as well as the surrounding solvent. It is therefore noteworthy that the present work is the first application of the suggested dual-sphere scheme. For each of the two central atoms, the T-region is the volume between two spheres of different radii centered on the same atom (either nitrogen or oxygen of the NCO molecule). We define the E-region as the remaining volume, after exclusion of the A- and T-regions.

Figure 2

Schematic representation of a QM/MM description of the closed state of NCO solvated in water (left) and the open state of NCO solvated in water. The system is partitioned into three regions: A-region [orange], T-region [yellow], and E-region [white] around the central NCO molecule. Ball and stick water molecules are QM, and MM molecules are depicted by thick lines. The QM character of the solvent molecules is determined by their distance to the nitrogen and oxygen of the NCO molecule (r1 and r2). In summary, we aim to obtain an accurate free energy profile for the revesible intramolecular nucleophilic addition of an amine moiety to a carbonyl group, using multiscale solvation models. After benchmarking several of these models, we select two for the final computation of the reaction free energies (an explicit and an implicit solvation model) and discuss their performance in the simulation of the reaction mechanism. The work is organized as follows: In Section , we briefly describe our methodology and list the computational details. In Section , the different models are benchmarked on the structural features of the bonded (closed) state (Figure , left). Finally, in Section , the best performing explicit solvation model is used to obtain an accurate free energy profile. A comparison is made with a microsolvation model that only includes five explicit water molecules. A summary and conclusion can be found in Section .

Methodology

In this section, we provide a very brief overview of the multiscale solvation methods used, summarize our computational details, and discuss the relative computational costs of the different solvation models.

Multiscale Solvation

Simulation of complex systems often requires a multilevel description that treats a region of interest (active or A-region) with a quantum mechanical (QM) method and embeds it in an environment (E) region that is treated in a more approximate way. The approximate description can be a mean field (implicit) solvation method or it can be explicit molecular mechanics (MM). The latter combination of QM and MM (QM/MM) was originally designed for the simulation of biological systems, in which case most challenges arise from the description of covalent bonds across the boundary between the QM and MM zones. The QM/MM description of smaller solvated molecules does not require a boundary placed across covalent bonds. In order to account for solvent diffusion, however, aditional measures are required. Molecules need to be either reassigned to the A- or E-region every single MD time-step, or they need to be constrained to their respective regions. Throughout this work, we refer to the former as adaptive QM/MM models and to the latter as restrictive QM/MM models. In this work, we employ five different multiscale models that include solvation effects on the NCO solute molecule. Three of them are adaptive QM/MM models: DAS[19] (continuous switching between QM and MM in T-region), abrupt[19] (instantaneous switch between QM and MM), and buffered-force.[20] The latter switches instantaneously between QM and MM, but the molecules in the A-region feel a QM interaction with the molecules in the T- region, whereas molecules in the T-region in turn feel an MM interaction with molecules in the A-region. The fourth model is a restrictive QM/MM model; FIRES[21] restricts MM solvent molecules from penetrating the QM region by a quadratic wall that is positioned where the outermost QM solvent molecule is located. The fifth model is not a QM/MM model. It is a simple microsolvation model that uses a continuum description to simulate the interaction with the E-region. It is also a restrictive model, in the sense that five explicit solvent molecules are restricted to a small region near the NCO molecule. A comprehensive overview of all solvation models is provided in the Supporting Information, and for most of the models, more in-depth descriptions are available in refs (45 and 46). In all QM/MM simulations, we deploy a dual-sphere scheme that guarantees QM solvation of both the amine and the carbonyl moiety. We note that this is the first application of the dual-sphere scheme, which was first proposed by Heyden et al.[18]

Computational Details

All MD simulations are performed with FlexMD,[47] which is distributed with the ADF program package.[48,49] FlexMD is a python wrapper around several molecular modeling packages. These packages provide the required QM or MM forces used to propagate the system. The MD propagation is handled by the atomistic simulation environment (ASE).[50] In order to accurately describe our NCO model system in the open state (Figure , right), we need a cubic box with a minimum dimension of 18 Å. MD simulations of a box with this dimension containing only QM molecules are very costly, but a QM/MM model can simulate significantly larger systems. Our model system is a 30.8 Å cubic box containing the NCO molecule and 913 water molecules. For the QM/MM simulations, we choose an A-region that consists of two QM spheres with a radius of 4.1 Å around the NCO solute molecule (centered on nitrogen and oxygen). In this manner, the two hydrophilic moieties are both solvated by QM water, even when the system is in the open state. For DAS and buffered-force there is also a T-region (or buffer region) with a thickness of 0.9 Å around the A-region spheres, yielding a maximum QM radius of 5.0 Å around the two central atoms. More details on the dual-sphere scheme[18] can be found in the Supporting Information. In an average DAS simulation, the dual-sphere approach assigns (on average) 29 water molecules to the QM set of the most demanding DAS QM/MM partition (5.0 Å radius). A single-sphere DAS computation with the same accuracy (with a radius large enough that it would encompass all water molecules within a radius of 5.0 Å around the oxygen and the nitrogen atom when the system is in the open state) assigns 76 water molecules to the QM set of the most demanding partition. More details regarding the relative efficiency of a dual-sphere simulation versus a single-sphere simulation can be found in the Supporting Information. Our molecular model system was pre-equilibrated with the REAXFF potential[51,52] at 1 atm for 20 ps (time step 0.25 fs). The PM6-DH+ functional[53] as implemented in the MOPAC program[54] was then used for the QM part of the system, whereas the MM part was calculated with REAXFF[51,52] as implemented in the ADF program package.[48,49] REAXFF was chosen because it reproduces the experimental water-in-water radial distribution function well.[31] For the interactions across the periodic boundary, the REAXFF code uses the particle mesh Ewald method.[55] We use mechanical embedding for the QM/MM interaction according to the IMOMM scheme.[56,57] We selected this scheme to avoid the aggregation of molecules near the QM/MM boundary observed in electrostatic embedding simulations[20] and extensively tested its performance for the accurate description of the solvated NCO molecule (see Section ). A direct comparison of mechanical embedding with electrostatic embedding can be found in the Supporting Information. The reference molecular system (fully PM6-DH+) is a 14.2 Å cubic box containing the NCO solute molecule and 87 water molecules. The microsolvation system is nonperiodic and contains five explicit water molecules close to the oxygen atom of the NCO solute molecule. The long-range effects of water are computed with the PCM model[13,58] with a dielectric constant of 78.4 in the static calculations and by the COSMO model[59] with a dielectric constant of 78.4 in the MD simulations. The five water molecules are restricted near the NCO molecules with a half-harmonic wall placed at 4.1 Å from the NCO oxygen atom. We positioned this wall based on a priori knowledge of the desired water structure. Results on the effect of different cluster sizes and different wall positions on electronic structure, geometries, and free energy differences can be found in the Supporting Information. To benchmark the solvation models on the description of the closed NCO state (Figure , left), we run five different MD simulations with each model (including the reference), all with randomly generated starting velocities and a time step of 0.5 fs. The first 10 ps of each of these 30 simulations is considered equilibration. The subsequent 10 ps of the simulations are used for analysis. All simulations are in the canonical ensemble (NVT) using a Langevin thermostat with a friction of 0.05 au for DAS, FIRES, reference and microsolvation, and a friction of 0.5 au for abrupt and buffered-force to avoid overheating (see Supporting Information). To analyze the electronic structure of the NCO molecule with different solvation models and different A-region sizes, we compute the Mulliken charges on the oxygen and nitrogen atoms. We use 100 snapshots taken from the equilibrated reference QM simulation and gradually increase the size of two spheres around the two atoms that contain QM water molecules. To assess the affect of geometry on the NCO electronic structure, we also compute the average Mulliken charges from 1000 geometries from simulations equilibrated with the solvation models. The snapshots are separated by time-intervals of 500 fs (QM reference) and 50 fs (solvation models), such that the available 50 ps of trajectory are optimally represented. To analyze the molecular structure of the NCO molecule after equilibration with the different solvation models, we compute the root mean square deviation (ϵRMSD) of the heavy atoms in an average geometry of the NCO molecule from an average QM geometry. The average geometries are computed using 1000 snapshots from the respective simulations. Reaction free energy profiles are obtained using metadynamics to simulate the rare event of opening and closing the N|···C bond. Background information on metadynamics can be found in the Supporting Information or in several excellent reviews.[60,61] In order to keep these metadynamics simulations affordable, we restrict the number of collective variables (CV) to two. As CV for this system we choose (1) the bond distance between the nitrogen of the tertiary amine-group and the carbon of the aldehyde-group and (2) the combination of two dihedral angles () (see Supporting Information) that together describe the twist of the 5-ring. The first CV was chosen because the bond distance between the nitrogen and the carbon of the aldehyde group is the main coordinate that differentiates between the closed and open states of NCO (Figure , left and right, respectively). The second CV was chosen to conveniently explore the potential energy surface in the open state and avoid any hysteresis. The Gaussians that form the history dependent bias are deposited every 100 MD steps and have a height of 0.1 kcal/mol. The width of the Gaussians along CV1 is 0.1 Å, and along CV2 the width is 18°. In order to obtain statistically meaningful free energy profiles, we use 10 different metadynamics simulations that all have different starting velocities. Convergence of the result with the number of simulations is quantified by the error of the mean. The typical length of a single metadynamics simulation is around 125 ps (with a time step of 0.5 fs). The metadynamics simulations all start with an equilibrated closed state structure. Based on an examination of different criteria, we determine the Helmholt free energy barrier for the bond breaking reaction for the opening of the N|···C bond from the deposited Gaussians when our simulations reach an N|···C distance of 2.50 Å. We estimate the free energy barrier to correspond to an N|···C distance of 2.25 Å. Since only Gaussians corresponding to a single well are used to compute the barrier height, the exact barrier location cannot be determined. Other choices yield barrier heights that deviate no more than 0.3 kcal/mol within one simulation. To determine the barrier for the backward reaction, we sum the deposited Gaussians when the N|···C distance reaches 2.0 Å after having been completely open (N|···C distance of more than 5.0 Å). The free energy difference between the closed and the open state of the NCO molecule is defined as the difference between the barriers of the forward and the backward reaction. The geometry optimizations for different microsolvation models are computed with MP2,[62,63] PBE-D3,[64,65] and PM6-DH+.[53] Both the MP2 and PBE-D3 optimizations use a 6-31+G(d,p) basis set[66,67] using Gaussian09, revision D.01.[68] The geometry optimizations with the PM6-DH+ functional were performed with MOPAC.[54]

Simulation Timings

As mentioned above, the complexity of the solvation models differs. The microsolvation model is by far the least time-consuming because it explicitly describes only the NCO molecule and five water molecules. The most efficient adaptive QM/MM models (abrupt and buffered-force) are approximately 30 times more computationally intensive, as can be seen in Table . The difference in cost between abrupt and buffered-force is caused by the number of QM solvent molecules, which is larger in the buffered-force model. The cost of FIRES is roughly the same as the cost of the most efficient adaptive QM/MM model. The most time-consuming model is DAS, which calculates forces over on average 15 different QM/MM partitions each time step. When using four parallel CPU cores, it takes around six times longer than abrupt (instead of an optimal factor of 3.75). Note that the different QM/MM force calculations are trivially parallel. With our semiempirical QM description, the individual force calculations are fast, and as a result, the overhead of the master process causes a loss in speedup. With more time-consuming force calculations (e.g., DFT), the scaling is linear with the number of cores. For all other models we use one core per simulation.

Table 1

Relative Timings Compared to Abrupt Model of PM6-DH+/REAXFF Simulations of Periodic Box Containing One Me2N–(CH2)3–CH=O Molecule and 913 Water Moleculesa

	Relative timing per step
Microsolvation	0.03
DASa	6.34
Abrupt	1
Buffered-force	1.12
FIRES	1.10

These simulations are run in parallel on four cores, whereas all other simulations are run in serial.

Results and Discussion

In Section , we first test the accuracy of our QM description (PM6-DH+) of the NCO molecule. We then benchmark the five different multiscale solvation models described in Section based on the properties of the NCO molecule in the solvent-sensitive closed state (Figure , right). In Section , we use the best explicit solvation model to compute an accurate free energy profile of the reversible nucleophilic addition of the amine to the carbonyl (N|···C+–O–). We compare the results with those from a microsolvation model, and we show that long-range interactions with explicit water molecules are needed to obtain a reliable free energy profile.

Structure of Closed State

As mentioned in the Introduction, solvation is crucial for the stability of the closed state of the NCO molecule (Figure , right) since the ionic C+–O– entity needs to be stabilized by solvation. In Section , we review the solvent sensitivity of the NCO molecule and test how well the solvated system is described at different levels of QM theory. In Section , we benchmark the multiscale solvation models on a range of properties of the closed state.

Test of QM Description

Using simple microsolvation models, we examine the effect of implicit and explicit solvations on the N–C bond at different levels of theory: Møller–Plesset perturbation theory to the second order (MP2), Density Functional Theory with the PBE-D3 functional, and the semiempirical PM6 functional with dispersion corrections (PM6-DH+). Without any solvation, the optimized N–C distances with MP2 (2.66 Å), PM6-DH+ (2.77 Å), and PBE-D3 (2.48 Å) do not correspond to a chemical bond (Table ). As expected, the introduction of explicit water reduces the N|···C distance considerably (Figure ). The PCM model for implicit solvation also stabilizes the N|···C bond, especially when the number of explicit water molecules equals zero or one. For all levels of theory, the optimized N|···C bond length converges (to roughly 1.6 Å) with three explicit water molecules in a PCM environment. Note that the implicit solvent model has a bigger effect with MP2 and PM6-DH+ than it does with PBE-D3.

Table 2

Effect of Implicit and Explicit Water on N|···CO Distance for Different Levels of Theory: MP2, DFT with PBE-D3 Functional, and Semiempirical PM6-DH+ Functionala

	gas phase			with PCM
n H₂O	MP2 (Å)	PM6-DH+ (Å)	PBE-D3 (Å)	MP2 (Å)	PM6-DH+ (Å)	PBE-D3 (Å)
0	2.66	2.77	2.48	1.71	1.64	2.06
1	2.50	1.72	2.30	1.65	1.63	1.79
3	1.64	1.65	1.98	1.59	1.61	1.65
4	1.64	1.67	1.73	1.59	1.61	1.65

We use the MP2, PBE-D3, and PM6-DH+ distances (in bold) with four explicit water and PCM as a reference here.

We use the MP2, PBE-D3, and PM6-DH+ distances (in bold) with four explicit water and PCM as a reference here. The different levels of theory do not greatly affect the spatial distribution of the coordinated water molecules. In all cases, the first three water molecules are directly hydrogen bonded to the solute oxygen, and the fourth water molecule forms a bridge between two of them. The average deviation ⟨ϵRMSD⟩ of all H2O (plus the C=O) coordinates from the reference MP2 coordinates is not large (PM6-DH+: 0.34 Å, PBE-D3:0.47 Å). Comparison of the geometries of the NCO molecule itself shows that in all cases the N|···C distance computed with PM6-DH+ is closer to the MP2 result than the distance computed with PBE-D3 (Table ). Upon microsolvation (nH2O = 4, PCM), PM6-DH+ reproduces both the N–C and C=O bonds very well (Table ), with very small root mean square deviations (N–C: 0.009 Å, C=O: 0.003 Å) compared to PBE-D3 (N–C: 0.028 Å, C=O: 0.004 Å). The deviation of the ring geometry, however, is considerably larger (PM6-DH+: 0.152, PBE-D3:0.025).

Table 3

Root Mean Square Deviation Values (ϵRMSD) of NCO Geometry Optimized in Presence of Four H2O and PCMa

	ϵ_RMSD (Å)
	N\|···C	C=O	backbone
PM6-DH+	0.009	0.003	0.152
PBE-D3	0.028	0.004	0.026

Left: Heavy atoms. Middle: Only the two atoms in the N–C bond. Right: Only the two atoms in the C=O bond.

Left: Heavy atoms. Middle: Only the two atoms in the N–C bond. Right: Only the two atoms in the C=O bond. The potential energy barriers for the N–C bond breaking reaction computed with PBE-D3 and PM6-DH+ also deviate from the MP2 reference value. Using the same molecular model as above (nH2O = 4, PCM), we find that the PBE-D3 barrier is lower than the MP2 reference (MP2:12.4 kcal/mol, PBE-D3:8.5 kcal/mol), while the PM6-DH+ barrier is higher than the MP2 reference with approximately the same amount (PM6-DH+: 16.5 kcal/mol). More information on the potential energy barriers can be found in the Supporting Information. Overall, the performances of PBE-D3 and PM6-DH+ are similar, and we can use the efficient PM6-DH+ description for the lengthy simulations in this work. The data in Table was used to guide the construction of the microsolvation model used in the remainder of this study (nH2O = 5, PCM).

Benchmark Solvation Models

Now that we established that the PM6-DH+ functional can be used to describe the N|···C interaction, we assess the performance of the five different solvation models discussed in Section (four explicit solvation and one microsolvation model). The assessment is based on three criteria: (i) the electronic structure of the NCO molecule, (ii) the local molecular structure near the NCO molecule, and (iii) the global structure of the solvent.

Electronic Structure

To address the effect of a finite-size QM region on the electronic structure of the solute, we compute the Mulliken charges on the oxygen and the nitrogen atom (qO and qN, respectively) for NCO–water clusters with an increasing number of water molecules around the two atoms (Figure ). The gas phase charges (thick lines, round markers) represent the electronic structure in mechanical embedding simulations. The PCM charges (thin line, square markers) represent the electronic structure in microsolvation simulations. In both cases the charges converge to the reference QM value (black line). In the case of the gas-phase clusters, the charges converge slowly with cluster size. If we define a convergence criterion of 0.01e (which is very small with respect to the standard deviation of 0.03e), then the error of the charge on the oxygen atom (ΔqO) is smaller than this criterion only for clusters of 19 water molecules or more. The PCM charges of the same clusters converge to the QM result much faster (ΔqO = −0.01e with only three water molecules), which supports our selected microsolvation model of five explicit water molecules.

Figure 3

Convergence of the Mulliken charges on the oxygen and nitrogen atom of the NCO molecule with increasing size of the water cluster. The charges are averaged over 100 geometries extracted from the fully QM simulation. The large dots represent atomic charges in 1000 structures from a simulation equilibrated in each solvation model, and the vertical lines represent the corresponding average number of QM water molecules. With increasing gas-phase cluster size, the charge on the oxygen atom becomes more negative, while at the same time the charge on the nitrogen atoms becomes more positive. This suggests that more charge is transferred from the nitrogen atom to the oxygen atom as the cluster grows. We thus (carefully) propose that the bond between the oxygen and the nitrogen atom increases in strength as the number of water molecules increases. Interestingly, in implicit solvent very small clusters overestimate the negative charge on the oxygen atom, as well (mostly) as the positive charge on the nitrogen atom. We propose that small microsolvated clusters transfer too many electrons from N to O and therefore overestimate the bond strength. In Figure , vertical lines represent the average number of QM water molecules after equilibration with the different solvation models. Based on the convergence behavior of the gas phase and PCM charges, only buffered-force and microsolvation include an adequate number of water molecules (buffered-force: n = 21, ΔqO < 0.01e), while the QM regions of the other models appear to be too small. However, we are interested in the description of the electron density during an MD simulation: This involves equilibration of the clusters with the respective solvation models and alters the geometries and the charges. The geometry changes generated by the microsolvation model amplify the already present overestimation of the charge transfer from N to O (ΔqO = −0.02e). Surprisingly, the buffered-force geometries invoke an even stronger deviation but in the opposite direction (ΔqO = +0.03e). The charges obtained with the three remaining QM/MM models exhibit more modest effects of the geometry change, coincidentally bringing them closer to the reference value. The charges produced with the DAS model reflect the reference charges extremely well, with the error well below the convergence criterion (ΔqO < 0.01e).

Geometry of the NCO Molecule

We also benchmark the equilibrated geometry of our model system in the closed state with respect to a reference simulation that treats all molecules QM. The average N|···C and C=O distances in the NCO molecule for the five different solvation models are extracted from the simulations (Table , left side), together with the standard deviations. The first thing to note is a correlation between the N|···C and the C=O distances (see Supporting Information), meaning that the stronger (shorter) the N|···C bond is, the weaker (longer) the C=O bond becomes. All the models yield average N|···C distances that correspond to a bonded N|···C pair and C=O distances that are about 8 to 9.5% longer than a typical C=O bond (e.g., acetaldehyde: 1.22 Å). That means that the CO bond is more polarized, and the negatively charged oxygen atom needs to be stabilized by the solvent.[42] The average distances are a bit larger than our reference results with the static model (1.61 Å) due to anharmonic thermal bond elongation. The abrupt solvation model yields the largest deviation of the N|···C bond (1.66 Å versus 1.64 Å). In our setup, this model has a smaller effective QM region than DAS and buffered-force, and we expect that it is the lack of electrostatic interaction with a second solvation shell that slightly destabilizes the N|···C bond. The microsolvation model yields a slightly shorter N–C bond (1.63 Å) than the reference and a slightly longer C=O bond (1.34 Å). This result is in agreement with our earlier observation that microsolvation appears to overestimate the strength of the bond. The root mean square deviations (ϵRMSD) of all the heavy atoms in the NCO molecule from the reference (Table , center) strikingly yield very small values with abrupt and FIRES (0.04 Å), while the accuracies of the remaining three models are comparable (0.13 Å). Taking into account the N–C distances, the C=O distances, and the ϵRMSD values, FIRES yields the best NCO molecular structure out of all the models. DAS and buffered-force provide high accuracy particularly in the N|···C=O region. The abrupt model appears to underestimate the N–C interaction, and the microsolvation model may overestimate the strength of this bond.

Table 4

N|···C and C=O Distances (with standard deviation) in NCO Molecule (left), Root Mean Square Deviation (ϵRMSD) of Average Geometry of Backbone Atoms from Average Reference Structure (center), and Average Number of Hydrogen Bonds toward Oxygen of Aldehyde Group with Standard Deviation (right) for Different Solvation Models Studied in This Work

	N\|···C distance (Å)	C=O distance (Å)	ϵ_RMSD (Å)	Average number of hydrogen bonds
Reference	1.64 ± 0.05	1.33 ± 0.03	0	2.76 ± 0.43
DAS	1.64 ± 0.05	1.33 ± 0.03	0.13	2.95 ± 0.47
Abrupt	1.66 ± 0.05	1.32 ± 0.03	0.04	3.07 ± 0.56
Buffered-force	1.63 ± 0.05	1.33 ± 0.03	0.13	3.13 ± 0.68
FIRES	1.64 ± 0.05	1.32 ± 0.03	0.04	2.75 ± 0.70
Microsolvation	1.63 ± 0.05	1.34 ± 0.03	0.12	3.01 ± 0.41

Local Solvent Structure

The first solvation shell contributes approximately three hydrogen bonds to the oxygen of the NCO molecule (Table , right), but there is quite some fluctuation between the models. Interestingly, these subtleties in the short-range solvation around the NCO molecule do not appear to affect the average N|···C bond length. Abrupt and buffered-force yield structures with slightly more hydrogen bonds directed toward the C=O oxygen than the other models. In Figure , the radial distribution g(r) of oxygen atoms of water (O) around the oxygen of the NCO molecule (O*) is depicted. On the left, the results for the three adaptive QM/MM models are shown, and the positions of the active, transition, and environment region are indicated. The first solvation shell computed with abrupt and buffered-force models is less sharp than that of the DAS model. The latter is in almost perfect agreement with the reference. This agreement can be related to the Langevin friction term, which can be relaxed when using the DAS model. The friction terms in abrupt and buffered-force are 1 order of magnitude larger to correct for the local heating near the QM/MM boundary. In order to keep a Boltzmann distribution of the velocities during a simulation, the Langevin thermostat counteracts the imposed friction with a proportional stochastic term. This stochastic part of the thermostat causes more randomized geometries, hence broadening the first solvation shell. The FIRES model can also afford a small friction constant, and indeed, the first peak in the g(r) obtained with the FIRES model (Figure , right) is nearly as accurate as the DAS result. The first solvation shell in the g(r) obtained with the microsolvation model also matches quite well with that of the reference. It should be noted that this agreement is highly sensitive to the position of the wall, which was selected based on a priori knowledge of the solvent structure. Combining all results from Table and Figure , the DAS and FIRES solvation models describe the first solvation shell very accurately.

Figure 4

Radial distribution function (g(r)) from the oxygen of the NCO molecule to the oxygens atoms of the water molecules for the three adaptive QM/MM models, DAS, abrupt, and buffered-force (left), and the two restrictive models, FIRES and microsolvation (right). The reference is a full QM (PM6-DH+) simulation of a 14.2 Å cubic box containing the NCO molecule and 87 water molecules. The position of the QM and MM boundaries are indicated (left), and for the FIRES method, the position of the spherical wall is indicated. The average (and median) of the wall is located at 4.245 Å, and the shaded rectangle indicates 90% of the wall distances.

Global Solvent Structure

Beyond the first solvation shell, the O*–O radial distribution obtained with buffered-force severely deviates from the other models. This deviation is caused by the violation of total momentum in our system. This results in unidirectional forces on molecules on both sides of the boundary (as opposed to opposite forces for the other adaptive QM/MM models). These forces push both molecules out/away from the A-region, causing a small depletion near the A/T boundary and an aggregation of molecules just outside the buffer region. This can be better understood if we consider a QM molecule, denoted by 1, close to the A/T boundary and a second molecule, denoted by 2, in the T-region (Figure ). The equilibrium distance between water molecules is smaller with the QM description than with the MM description.[31] With buffered-force, the force exerted on molecule 1 by molecule 2 is described quantum mechanically (partition 1, Figure , solid arrow), but the exertion on molecule 2 by molecule 1 is described by the MM method (partition 2, solid arrow). This means that molecule 1 is attracted to molecule 2, but molecule 2 does not feel this attraction to molecule 1. As a result, the QM molecule moves closer to the MM molecule (in other words closer to the boundary), thereby pushing molecule 2 further away. Therefore, both molecules experience a force away from the QM center of the simulation (Figure , solid arrows). The net effect is a shift to a new equilibrium with low density at the A/T boundary, causing the small depletion of the QM region and an aggregation just outside the T-region. The diminished number of QM water molecules in the second solvation shell is something that buffered-force has in common with abrupt (abrupt does have water molecules in the second solvation shell, but they are all MM). This common feature may explain why the coordination of the first solvation shell water molecules to the C=O oxygen is too large with these models (Table ).

Figure 5

Schematic representation of the two different partitions for which buffered-force calculates the forces each time step. QM molecules are depicted as ball and stick and MM molecules as thick lines. Background colors are used to indicate the different regions: orange (A-region), yellow (T-region), and white (environment). The solvent structure beyond the first solvation shell obtained with all the adaptive QM/MM models is less defined than the reference structure. We attribute this to the difference in interaction between the QM and MM solvent molecules, which leads to a loss of structure near the QM/MM boundary. It can be seen in Figure (right) that both restrictive models yield a more defined solvent structure beyond the first solvation peak. The microsolvation model has no explicit solvent beyond a 4 Å radius and can therefore not properly describe it. The g(r) obtained with the FIRES model matches the reference very well, apart from a small aggregation of molecules near the position of the wall. This aggregation is a known consequence of FIRES, but in this instance, it is not dramatic because the position of the wall coincides roughly with this second solvation peak. The aggregation is an entropic effect of the penetration of MM water molecules into the QM region, which is a consequence of the soft (elastic) wall that is used.[45] Overall, global water structure in a QM/MM simulation will always differ from a QM reference due to the inherent difference between a QM and an MM description. More details on this can be found in the Supporting Information in the discussion of the g(r) of oxygen around the nitrogen atom of the NCO molecule. Considering all the data presented above, the global water structure is best described by FIRES, with DAS following as a close second.

Summary

Out of the four models to describe explicit solvation, two models are less accurate than the others; the abrupt model describes the structure of the closed state of the NCO molecule less accurately than all other models, and buffered-force yields deviations in the O*–O g(r) caused by a violation of Newton’s third law. Since the buffered-force model describes the closed state NCO structure accurately, we cannot ascertain whether it can properly simulate the N|···C+–O– bond breaking process. To be on the safe side, however, we eliminate this model. This leaves two models, DAS and FIRES, that describe the solvated closed N|···C state of the NCO molecule accurately. The microsolvation model may slightly overestimate the strength of the N|···C bond, but the observed effects on the charges and bond lengths are small. This leaves a pressing question: Is the interaction with explicit water molecules across long distances necessary to obtain accurate reaction energies for the N|···C+–O– bond? In the next section we compute the reaction free energy profile with one of the two well-performing explicit solvation models assessed in this section and compare it to the profile obtained in the same manner with microsolvation. We selected DAS as the representative explicit solvation model.

The N|···C+–O– Bond Breaking Process

Metadynamics[69−72] is used to simulate the rare event of opening and closing the N|···C bond. In Figure , the N|···C distance (in red) and a running average (over 100 time steps) of the number of hydrogen bonds toward the oxygen of the NCO molecule (in blue) are shown for one typical metadynamics simulation with both the DAS and microsolvation model. It can be seen that when the N|···C distance is in the bonded regime, there are two to three hydrogen bonds toward the oxygen of the NCO molecule, and there is a negative correlation (ρ = −0.76, see Supporting Information for the expression used) between the N|···C bond length and the number of hydrogen bonds. This means that when the N|···C distance increases, the number of hydrogen bonds decreases at the same time. This value is approximately 1 when the N|···C bond length is larger than 3 Å.

Figure 6

N|···C distance and average number of hydrogen bonds toward the oxygen of the aldehyde group of the NCO molecule of a typical metadynamics run for the DAS and microsolvation models.

Free Energy Barrier

DAS yields a barrier of 10.7 ± 0.8 kcal/mol, whereas the microsolvation model yields a barrier of 12.6 ± 0.3 kcal/mol (Figure , Table ). The difference in barrier height is in line with the observation in Section that the microsolvation model appears to overestimate the N|···C interaction. If this electronic effect is indeed the cause of the free energy difference, then the difference must also be present in the potential energy contribution to the barrier. As a rough estimate of the potential energy barriers, we extracted the average potential energy value associated with the structures that correspond to the transition state and compared it to the average value extracted for the closed state structures. We found that the average potential energy barrier is 1.90 kcal/mol higher with the microsolvation model than with adaptive QM/MM (see Supporting Information for absolute values). This is approximately equal to the difference in free energy barriers (1.86 kcal/mol), supporting our assumption that the observed difference in free energy barriers comes from the bond potential energy.

Figure 7

Schematic representation of the free energy profile of the N|···C+–O– bond breaking reaction. Typical snapshots of a DAS simulation are shown for the closed, transition, and open state of the NCO molecule.

Table 5

Barrier Height (ΔF‡) of Breaking of N|···C Bond and Free Energy Difference (ΔF) of N|···C+–O– Interaction with DAS and Microsolvation Modelsa

	ΔF^‡ [kcal/mol]	s_mean of ΔF^‡[kcal/mol]	ΔF [kcal/mol]	s_mean of ΔF [kcal/mol]

DAS	10.71	0.78	5.94	0.51
Microsolvation	12.57	0.27	9.58	0.32

Also the standard error of the mean is tabulated:

Also the standard error of the mean is tabulated: Schematic representation of the free energy profile of the N|···C+–O– bond breaking reaction. Typical snapshots of a DAS simulation are shown for the closed, transition, and open state of the NCO molecule.

Free Energy Difference

The computed free energy difference Δ between the open and the closed states (Table ) obtained with the microsolvation model (9.6 ± 0.3 kcal/mol) agrees well with the previously obtained (static) binding energies of ref (42). In explicit solvent, the binding energy of 5.9 ± 0.5 kcal/mol is about 40% weaker than the previously obtained value,[42] but it is still about 20% stronger than an average hydrogen bond in water. The error in the average value of ΔG obtained with the microsolvation model (Table ) is smaller than that of the DAS model (0.32 vs 0.51 kcal/mol), but this is mainly caused by the improved statistics. Out of the 10 metadynamics simulations, only 7 trajectories did recross for DAS after 250,000 steps. We disregarded the three nonrecrossed trajectories, which may mean that the open state for DAS is even more stable than our results indicate. In contrast to the barrier heights, the free energies difference is much smaller with the DAS model than with the microsolvation model. Part of this effect is again due to the energetic overstabilization of the closed state by the microsolvation model (∼2 kcal/mol). Examination of the trajectores also reveals an energetic destabilization of the open state in the microsolvation model. We observed an undercoordination of the nitrogen atom in the NCO molecule compared to the DAS result (Table ). In the open state, the number of N-coordinated water computed with DAS equals 0.78, while with the microsolvation model only 0.59 water molecules are coordinated (Table ). This problem is unique to the microsolvation model since the single wall around the oxygen atom of the NCO molecule prevents water from coordinating to the nitrogen atom in the open state. The single wall around the oxygen atom of the NCO molecule prevents water from coordinating to the nitrogen atom in the open state. On the other hand, the dual-sphere approach adopted in the DAS simulations allows QM solvation of both the oxygen and the nitrogen atoms throughout the reaction process. The average potential energy obtained with the microsolvation model for frames where one water molecule is hydrogen bonded to nitrogen is 5.5 kcal/mol lower than for frames without such a water. The average NCO geometries are slightly different for frames with and without a hydrogen-bonded water (average N–C distance 4.75 and 4.91 Å, respectively), but we feel confident that we can attribute the larger part of the energetic difference to the difference in hydrogen bonds. We can now quantify the relative destabilization of the microsolvated open state by this coordination effect. On average, the number of hydrogen-bonded water molecules to nitrogen is only 0.19 lower for the microsolvation model compared with DAS, indicating that the energetic destabilization is around 19% of the value of the hydrogen bond (5.5 kcal/mol). We then estimate the destabilization by the microsolvation model to be a little over 1 kcal/mol. In combination with the ∼2 kcal/mol overstabilization of the closed state, this adequately explains the observed total discrepancy from the DAS result of 3.64 kcal/mol.

Table 6

Average Number of Hydrogen-Bonded Water to Nitrogen Atom in Closed, Transition, and Open States of NCO Molecule for DAS and Microsolvation Modela

	H-bonds to N
	DAS	Microsolvation
Closed state (<2.25 Å)	0.00	0.00
Transition state (2.25 ≤ 3.00 Å)	0.05	0.07
Open state (>3.00 Å)	0.78	0.59

Closed state: N|···C < 2.25 Å. Open state: N|···C > 3.00 Å. Transition State: 2.25 Å < N|···C < 3.00 Å. Only frames from a completed opening or closing event are included.

Summary

In summary, the majority of the difference between the free energy profiles obtained with the DAS and the microsolvation model is caused by an overstabilization of the N–C bond by the polarizable medium of the microsolvation model. An additional difference stems from the nonadaptive nature of the microsolvation model, which cannot adjust to changing solvation preferences. An important advantage of adaptive QM/MM models is therefore that they can provide water molecules for coordination in any desired location. The explicit solvation provided by these models thus affects the quality of the solute description along the reaction path in an indirect manner by providing a reservoir of water molecules that can be incorporated in the first solvation shell. We can extrapolate our findings to intermediate states in nucleophilic substitution reactions featuring very polarized transition states. Water molecules coordinating the lone pair of the nucleophile need to be removed, which is energetically unfavorable. The reaction can be made more favorable if the nucleophilic moiety is located in a solvent-free environment. Such a situation can conceivably be achieved in the active site of a protein or in a well-designed molecular catalyst.

Summary and Conclusion

We compare the performance of five different multiscale models to accurately describe the effect of explicit solvation on a sensitive probe molecule that mimics the transition state of a nucleophilic substitution reaction. The explicit solvation models DAS, abrupt, buffered-force, and FIRES employ two spherical QM regions, encompassing the two reactive moieties in the Me2N–(CH2)3–CH=O (NCO) molecule, while the microsolvation model confines its five QM water molecules near the C=O moiety at all times. Among the adaptive models, the abrupt model does not describe the structure of the closed NCO molecule as accurately as the other models, most likely due to the effectively small QM region. Buffered-force yields a discrepancy in the solvent structure caused by a violation of Newton’s third law. In addition, abrupt and buffered-force require strong thermostats, which affect the structure of the first solvation shell. DAS yields excellent agreement with our reference simulation for the structure of the closed N|···C state. FIRES (a restrictive QM/MM model) also provides a good description of the geometry of NCO in the closed state, and the solvent structure is in excellent agreement with the reference. The microsolvation model appears to overestimate the interaction in the N|···C bond. Metadynamics simulations with the DAS and microsolvation models reveal that the free energy barriers for the N|···C bond opening, as well as the free energy differences between the closed and the open state, are affected differently by the two models. The dual-sphere explicit solvation of DAS allows a superior description of solvent rearrangement along the entire reaction path. This yields a binding free energy of 6 kcal/mol, which is about 40% lower than the binding energy obtained with a microsolvation model. This value is still roughly 20% stronger than an average hydrogen bond.

36 in total

1. Generalized Gradient Approximation Made Simple.

Authors:
Journal: Phys Rev Lett Date: 1996-10-28 Impact factor: 9.161

2. Escaping free-energy minima.

Authors: Alessandro Laio; Michele Parrinello
Journal: Proc Natl Acad Sci U S A Date: 2002-09-23 Impact factor: 11.205

3. Conservative Algorithm for an Adaptive Change of Resolution in Mixed Atomistic/Coarse-Grained Multiscale Simulations.

Authors: Andreas Heyden; Donald G Truhlar
Journal: J Chem Theory Comput Date: 2008-02 Impact factor: 6.006

4. Assessing the accuracy of metadynamics.

Authors: Alessandro Laio; Antonio Rodriguez-Fortea; Francesco Luigi Gervasio; Matteo Ceccarelli; Michele Parrinello
Journal: J Phys Chem B Date: 2005-04-14 Impact factor: 2.991

5. S(N)2 Mechanism for Alcoholysis, Aminolysis, and Hydrolysis of Acetyl Chloride.

Authors: T. William Bentley; Gareth Llewellyn; J. Anthony McAlister
Journal: J Org Chem Date: 1996-11-01 Impact factor: 4.354

6. Hamiltonian adaptive resolution simulation for molecular liquids.

Authors: Raffaello Potestio; Sebastian Fritsch; Pep Español; Rafael Delgado-Buscalioni; Kurt Kremer; Ralf Everaers; Davide Donadio
Journal: Phys Rev Lett Date: 2013-03-05 Impact factor: 9.161

7. Toward Hamiltonian Adaptive QM/MM: Accurate Solvent Structures Using Many-Body Potentials.

Authors: Jelle M Boereboom; Raffaello Potestio; Davide Donadio; Rosa E Bulo
Journal: J Chem Theory Comput Date: 2016-07-11 Impact factor: 6.006

8. Development of a ReaxFF reactive force field for glycine and application to solvent effect and tautomerization.

Authors: Obaidur Rahaman; Adri C T van Duin; William A Goddard; Douglas J Doren
Journal: J Phys Chem B Date: 2010-12-17 Impact factor: 2.991

9. Multiscale Modeling of Chemistry in Water: Are We There Yet?

Authors: Rosa E Bulo; Carine Michel; Paul Fleurat-Lessard; Philippe Sautet
Journal: J Chem Theory Comput Date: 2013-11-06 Impact factor: 6.006

10. An unusual functional group interaction and its potential to reproduce steric and electrostatic features of the transition states of peptidolysis.

Authors: Arnaud Gautier; Delphine Pitrat; Jens Hasserodt
Journal: Bioorg Med Chem Date: 2006-02-07 Impact factor: 3.641

7 in total

1. Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 4. Adaptive QM/MM.

Authors: Jia-Ning Wang; Wei Liu; Pengfei Li; Yan Mo; Wenxin Hu; Jun Zheng; Xiaoliang Pan; Yihan Shao; Ye Mei
Journal: J Chem Theory Comput Date: 2021-02-16 Impact factor: 6.006

Explicit Solvation Matters: Performance of QM/MM Solvation Models in Nucleophilic Addition.

Introduction

Methodology

Multiscale Solvation

Computational Details

Simulation Timings

Results and Discussion

Structure of Closed State

Test of QM Description

Benchmark Solvation Models

Electronic Structure

Geometry of the NCO Molecule

Local Solvent Structure

Global Solvent Structure

Summary

The N|···C+–O– Bond Breaking Process

Free Energy Barrier

Free Energy Difference

Summary

Summary and Conclusion

1. Generalized Gradient Approximation Made Simple.

2. Escaping free-energy minima.

3. Conservative Algorithm for an Adaptive Change of Resolution in Mixed Atomistic/Coarse-Grained Multiscale Simulations.

4. Assessing the accuracy of metadynamics.

5. S(N)2 Mechanism for Alcoholysis, Aminolysis, and Hydrolysis of Acetyl Chloride.

6. Hamiltonian adaptive resolution simulation for molecular liquids.

7. Toward Hamiltonian Adaptive QM/MM: Accurate Solvent Structures Using Many-Body Potentials.

8. Development of a ReaxFF reactive force field for glycine and application to solvent effect and tautomerization.

9. Multiscale Modeling of Chemistry in Water: Are We There Yet?

10. An unusual functional group interaction and its potential to reproduce steric and electrostatic features of the transition states of peptidolysis.

1. Accelerated Computation of Free Energy Profile at Ab Initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semiempirical Reference Potential. 4. Adaptive QM/MM.

2. How accurate are approximate quantum chemical methods at modelling solute-solvent interactions in solvated clusters?

3. A Many-Body, Fully Polarizable Approach to QM/MM Simulations.

4. Adaptive-Partitioning Multilayer Dynamics Simulations: 1. On-the-Fly Switch between Two Quantum Levels of Theory.

5. Adaptive QM/MM for Molecular Dynamics Simulations: 5. On the Energy-Conserved Permuted Adaptive-Partitioning Schemes.

6. How solvent determines the molecular reactive conformation and the selectivity: Solvation spheres and energy.

7. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems.