Literature DB >> 25321186

Multiple environment single system quantum mechanical/molecular mechanical (MESS-QM/MM) calculations. 1. Estimation of polarization energies.

Alexander J Sodt¹, Ye Mei, Gerhard König, Peng Tao, Ryan P Steele, Bernard R Brooks, Yihan Shao.

Abstract

In combined quantum mechanical/molecular mechanical (QM/MM) free energy calculations, it is often advantageous to have a frozen geometry for the quantum mechanical (QM) region. For such multiple-environment single-system (MESS) cases, two schemes are proposed here for estimating the polarization energy: the first scheme, termed MESS-E, involves a Roothaan step extrapolation of the self-consistent field (SCF) energy; whereas the other scheme, termed MESS-H, employs a Newton-Raphson correction using an approximate inverse electronic Hessian of the QM region (which is constructed only once). Both schemes are extremely efficient, because the expensive Fock updates and SCF iterations in standard QM/MM calculations are completely avoided at each configuration. They produce reasonably accurate QM/MM polarization energies: MESS-E can predict the polarization energy within 0.25 kcal/mol in terms of the mean signed error for two of our test cases, solvated methanol and solvated β-alanine, using the M06-2X or ωB97X-D functionals; MESS-H can reproduce the polarization energy within 0.2 kcal/mol for these two cases and for the oxyluciferin-luciferase complex, if the approximate inverse electronic Hessians are constructed with sufficient accuracy.

Entities: Chemical Disease

Mesh：

Substances：
beta-Alanine
Methanol

Year: 2014 PMID： 25321186 PMCID： PMC4353191 DOI： 10.1021/jp5072296

Source DB: PubMed Journal: J Phys Chem A ISSN： 1089-5639 Impact factor: 2.781

Introduction

In the last two decades, combined quantum mechanical molecular mechanical (QM/MM) calculations have become increasingly popular in the computational study of molecular solvation, catalytic or enzymatic reactions,[1−7] and ligand binding.[8,9] In these calculations, typically the solute, the reactive region, or the active site is studied by quantum mechanics (QM) methods, whereas the environment (solvent molecules or the non-QM portion of the macromolecule) is described with molecular mechanics (MM) force fields. In pure QM calculations, it is often sufficient to locate a few stationary points (local minima and saddle points) on the potential energy surface.[10,11] When it comes to QM/MM potential energy surfaces, such stationary points can still provide some useful insights into the reaction mechanisms or into the chemical nature of ligand–receptor binding. But only a small number of degrees of freedom (that are predominantly associated with QM atoms) connect these stationary points to each other. At a finite temperature, many other degrees of freedom are also accessible and can thus have non-negligible contributions to the free energy of interest, such as the solvation free energy, reaction free energy, or binding free energy. So it is essential to adequately sample all the relevant degrees of freedom. The sampling of these QM and MM degrees of freedom can be achieved via QM/MM molecular dynamics simulations, which in practice fall into at least two general categories. In the first category, one performs conventional simulations using QM/MM forces throughout the trajectory, meaning that the QM/MM energy and gradient have to be computed at every geometry step. Though such QM/MM MD simulations have proven to yield higher accuracy than classical MM simulations, they are, unfortunately, also orders of magnitude more expensive. For the three test cases in this work, for example, a single QM/MM MD step (using the B3LYP/6-31+G* level of theory for the QM region) would take 50 times longer than a MM step for solvated methanol, 400 times longer for solvated β-alanine, and 4500 times longer for oxyluciferin in its luciferase complex. For many applications, where an even larger QM region and/or a more sophisticated QM method are desired, a conventional QM/MM MD simulation remains quite expensive, even after much recent and continuing work on algorithmic developments to speed up QM calculations.[12−15] In the second category, one performs “indirect” QM/MM simulations.[16−22] In these simulations, the conformational space is sampled using a MM force field (and thus with a higher computational speed). Then a subset of configurations along the MM trajectory are subjected to QM/MM energy/gradient evaluations, to account for the free energy differences between QM/MM and MM surfaces. This strategy works best if the QM/MM configurations are well represented by the MM ensemble, which requires considerable “overlap” between the QM/MM potential energy surface and the MM potential energy surface. In cases where the overlap is insufficient, the free energy can either fail to converge or converge to incorrect values. To alleviate this problem, several groups have proposed “freezing” some or all internal degrees of freedom within the QM region[16−18,23,24] during the MM simulations. Such freezing of QM degrees of freedom has shown to improve the convergence of free energy calculations, but it might also lose accuracy due to the omission of any entropic contributions in the QM region.[25,26] This deficiency can potentially be mitigated with the introduction of three separate energy corrections: (a) an enthalpic correction which corresponds to calculating the shift of the energy minimum due to the constraints on the QM degrees of freedom; (b) a vibrational entropic correction based on a subsystem Hessian[27] (the same Hessian as in the enthalpic term); (c) a Jacobian term. These energy corrections are discussed in detail in separate publications.[22,28] In addition, systems with multiple rotational states can be addressed using the technique developed by Straatsma and McCammon.[29] In this work, we shall focus on “indirect” QM/MM simulations where all internal degrees of freedom in the QM region are frozen in the MM simulations. In other words, the QM region simply becomes a rigid body. The subset of configurations coming out of this MM trajectory share a single QM geometry whereas the MM environment can be quite different. The main task then is to compute the QM/MM energy for such “multiple environment single system” (MESS) configurations. We shall show in this work that, when the QM method in use is Kohn–Sham density functional theory (DFT),[30−35] it is possible to estimate such MESS-QM/MM energies with a small computational cost and a reasonably high accuracy. This article is structured as follows. In section 2, the underlying theory is formulated. The MESS-QM/MM energy is divided into three terms: (a) a gas-phase energy; (b) a first-order term, which describes the interaction of the gas-phase electron density and the MM electrostatic potential; (c) the QM/MM polarization energy, which corresponds to the second-order and higher terms. In the same section, two different schemes will be proposed for estimating the QM/MM polarization energy: MESS-E, which is based on a Roothaan step extrapolation, and MESS-H, which is based on a Newton–Raphson energy correction using an inverse electronic Hessian in a subspace representation. In section III, technical details will be provided about the implementation of these two schemes for MESS-QM/MM polarization energy estimation and about our test systems. In section IV, the results will be presented and discussed. Concluding remarking are made in section V.

THEORY

QM/MM Polarization Energy

Gas-Phase Electronic Structure

Let us consider a QM system that consists of NQM atoms and is subjected to an external potential v(r). Within the Kohn–Sham density functional theory using pure, hybrid or range-separated functionals, the total energy of such a system is given by a functional of the one-particle density matrixFor pure functionals, this energy functional involves only the diagonal part of the one particle density matrix, namely the electron density ρ(r). Hybrid functionals and range-separated functionals involve the off-diagonal elements as well through the Hartree–Fock exchange energy contributions. From a set of orthonormal Kohn–Sham molecular orbitals (MO), ψ, which are often approximated as linear combinations of atom-centered basis functions, ϕμ,one can write the one-particle density matrix asHere Pμν isand is often also referred to as the density matrix. In a DFT calculation, one optimizes the MOs in eq 2 via a self-consistent field (SCF) iterative procedure to obtain the lowest energy for the system in eq 1. Within each iteration, from the current set of orthonormal MOs, one rotates the orbitalswhere the unoccupied MOs, ψ, are mixed into the occupied ones, ψ. The actual free variables in the energy functional in eq 1 are thus the orbital rotations,The iterative procedure continues until it reaches a set of optimized MOs that make the energy stationary with respect to the orbital rotations In the gas phase, the external potential v(r) only includes the nuclear attraction,which is summed over all nuclei, A, with charges Z. This potential can be represented in the atomic basisThe MO coefficients optimized under this gas phase external electrostatic potential will be denoted by C(0), and the corresponding gas-phase density matrix byand the Fock matrix, Fμν = ∂E/∂Pμν, by F(0). The gas-phase Kohn–Sham energy isand the gas-phase electronic density is

MM Perturbation

In QM/MM calculations, the external potential includes both the nuclear attraction, v(0)(r) in eq 9, and an additional electrostatic potential due to a set of NMM MM atoms,In the simplest cases (as in the examples in section IV), where the MM atoms are represented by point charges, {q, B = 1, ..., NMM}, the additional potential isIts matrix representation is This external electrostatic potential from the MM atoms perturbs the energy of the system from E(0), its gas-phase value:where the first-order change is a simple sum of electronic and nuclear terms:where the minus signs reflect the negative charge from electrons. To compute this first-order energy change, the dominant computational cost comes from the evaluation of δhin eq 16. The second-order change, δE(2), plus all higher order terms is the QM/MM polarization energy. It is directly caused by the polarization of the Kohn–Sham orbitals (and therefore the one-particle density matrix) by the MM electrostatic potential. Essentially, the Kohn–Sham orbitals will relax within the additional MM electrostatic potential in eq 15, stabilize the system, and lower the energy. In the next two subsections, we will describe two approximate schemes, MESS-E and MESS-H, for estimating δE(2), the second-order term of the QM/MM polarization energy.

Scheme MESS-E: Roothaan Step Extrapolation

Similar to the dual-basis projection technique for DFT and MP2 calculations,[36−38] the three extrapolations in density functional triple jumping,[39] and the Roothaan step correction in the absolutely localized molecule orbital (ALMO) based energy decomposition analysis,[40,41] one Roothaan step[42] is taken in the MESS-E scheme. Here we write the perturbed Fock matrix aswhere F(0) is the gas-phase Fock matrix, and δh is the atomic basis representation of the electrostatic potential due to MM atoms (see eq 16). In a Roothaan step, the perturbed Fock matrix is diagonalizedwhere S is the overlap matrixFrom the eigenvectors, which are the new set of MO coefficients, C(1), one constructs a new density matrix Then one can approximate the QM/MM polarization energy as a Roothaan step extrapolationwhere δP(1) is the projected response in the density matrixThe leading contributions to the MESS-E energy, eq 23, are clearly second-order: in the F(0)·δP(1) term, the gas-phase Fock matrix, F(0), can only couple to the occupied–occupied (oo) and virtual–virtual (vv) blocks of δP(1), both of which are quadratic in orbital response and thus in δh. In the δh·δP(1) term, δh couples to the occupied-virtual (ov) and virtual-occupied (vo) blocks of δP(1), both of which are first order in orbital response (and thus in δh). With the F(0)·δP(1) term, one removes a fractional number of electrons from occupied Kohn–Sham orbitals and puts them into (higher-energy) virtual orbitals. As a result, this term is always positive. To achieve a net stabilization of the QM system within the MM potential, the δh·δP(1) term must be negative and have a larger absolute value. In our test cases, δh·δP(1) was found to be roughly twice as large as F(0)·δP(1) (with an opposite sign). It should be noted that, in the computation of this second-order energy change for each MM environment, one avoids the expensive Fock builds and only performs a diagonalization of the perturbed Fock matrix (eq 20) and a few simple matrix operations (eqs 19, 22 and 23).

Scheme MESS-H: Newton–Raphson Correction with an Approximate Hessian

As shown in eq 8, the gradient of the gas-phase Kohn–Sham energy with respect to orbital rotations vanishes at the set of optimized gas-phase Kohn–Sham orbitals. When the MM electrostatic potential in eq 15 is applied, the gradient is no longer zero:where δh is the MO representation of the MM potential If the electronic Hessian of the gas-phase energy, i.e., the set of second derivatives with respect to orbital rotations, H(0), is readily available, one can use its inverse, H(0),–1, to approximate the QM/MM polarization energy with a Newton–Raphson energy correctionwhich corresponds to a shift of the energy minimum with the following orbital response due to the external MM potential,Here one only needs to compute the gas-phase electronic Hessian once and then store its inverse in memory or disk. In subsequent MESS-H calculations, for each MM environment, one simply reads in the inverse Hessian back from the storage and performs two simple matrix operations (eqs26and27). We note that the inverse electronic Hessian is closely related to the linear response kernel in DFT, (δ2E)/(δν(r) δν(r′)), which has been studied analytically and numerically.[43−54] One can and should avoid computing the exact gas-phase electronic Hessian, H(0), for three practical considerations: (a) it might be too expensive to compute the exact Hessian for systems with a large number of QM atoms and/or a large basis set, (b) it might be unnecessary to compute the exact value of the second-order term whereas higher-order terms are completely neglected, and most importantly, (c) the commonly used stability analysis,[55]which usually involves using Davidson’s iterative subspace method[56,57]to solve for the lowest eigenvalues of the electronic Hessian, provides a practical way to approximate the gas-phase electronic Hessian and therefore its inverse within a subspace representation. Namely, if the Davidson procedure yields the lowest m eigenvectors, HU = Uλ, we can approximate the inverse Hessian as ∑(λ)−1UUT. Of course, this is based on the positive definiteness of a gas-phase electronic Hessian (assuming that the gas-phase Kohn–Sham energy is indeed optimized to its minimum), implying smaller and smaller contributions to the inverse Hessian from the eigenvectors as their eigenvalues increase.

COMPUTATIONAL DETAILS

The two schemes for estimating MESS QM/MM polarization energies, MESS-E and MESS-H, are implemented within the Q-Chem/CHARMM interface,[58] using version c38b2 of CHARMM[59] and a development version of Q-Chem 4.2.[60,61] Three test cases employed in this work are shown in Figure 1:For the first two test cases, the solute molecules (methanol or β-alanine) are described with DFT methods (B3LYP,[65−67] M06-2X,[68] and ωB97X-D[69] functionals with 6-31+G* and 6-311++G** basis sets[70,71]) and water molecules with TIP3P model.[72] Constant-volume rigid body molecular dynamics simulations are performed for both systems at 298 K for 100 ps for an initial equilibration, and then for 1 ns during which configurations are collected at 1 ps intervals, leading to 1000 configurations for both systems.

Figure 1

Three test systems in this work: (a) methanol and (b) β-alanine, both solvated in a cubic box of classical water molecules, and (c) oxyluciferin in its complex with luciferase. In the luciferase binding pocket, there is also an AMP molecule.

methanol solvated in a cubic box with 1000 water molecules β-alanine solvated in a cubic box with 1000 water molecules the oxyluciferin–luciferase complex[62−64] Three test systems in this work: (a) methanol and (b) β-alanine, both solvated in a cubic box of classical water molecules, and (c) oxyluciferin in its complex with luciferase. In the luciferase binding pocket, there is also an AMP molecule. For the oxyluciferin–luciferase complex, we started with the 2D1R protein data bank structure,[62] which includes 540 protein residues, the oxyluciferin substrate, and an AMP molecule also in the binding pocket. 626 water molecules are included to solvate the protein surface. The oxyluciferin substrate is also described with DFT methods, whereas the protein residues and AMP (10 307 atoms in total) are described with CHARMM c22 protein force field and CGenFF force field.[73] Constant-volume rigid body molecular dynamics simulations are performed for both systems at 298 K for 10 ps for an initial equilibration, and then for 100 ps. With one configuration collected every 1 ps, this leads to 100 configurations in total. In all these calculations, the core region (methanol, β-alanine, or oxyluciferin) is constrained as a rigid body through the “CONS FIX” command in CHARMM, or can alternatively be enforced through the “SHAPE” command.[74] As a result, the core region retains the same geometry for all 100 or 1000 configurations used in subsequent MESS-E or MESS-H polarization energy estimations. A “MESS” option is added to the Q-Chem interface in CHARMM, where “NROOTS n” is used to specify how many lowest eigenvalues/eigenvectors of the stability matrix (i.e., electronic Hessian) to be computed and used for constructing its inverse. The inverse electronic Hessian is constructed only once for the gas-phase density and saved to disk at the starting of the MESS-QM/MM calculation. For each configuration, we evaluate the MM potential for the QM atoms (eq 16), and then compute MESS-E and MESS-H estimations (the latter uses the inverse Hessian read in from disk) for the QM/MM polarization energy. To assess the accuracy of MESS-E and MESS-H energies, we also perform SCF calculations to fully converge DFT energies for each configuration. This is done only for the purpose of comparison, and it will be avoided in actual MESS-QM/MM calculations.

Results and Discussion

Solvated Methanol

The results for the solvated methanol systems are shown in Figure 2 and Table 1. The QM/MM polarization energy are confirmed to be all negative (i.e., stabilizing the system) and, with a range of −5 to 0 kcal/mol, its contribution to the total energy are thus non-negligible.

Figure 2

Table 1

Errors in the Estimated QM/MM Polarization Energy for a Methanol Molecule Solvated in a Cubic Box of 1000 Water Moleculesa

		6-31+G* basis				6-311++G** basis
functional	scheme	MSE	RMS	MAX	REL	MSE	RMS	MAX	REL
B3LYP	MESS-E	–0.386	0.432	1.339	20.6%	–0.420	0.470	1.459	21.2%
	MESS-H(15)	0.066	0.071	0.173	4.0%	0.101	0.107	0.234	5.7%
	MESS-H(30)	–0.016	0.033	0.176	1.0%	0.019	0.032	0.112	1.5%
	MESS-H(60)	–0.029	0.044	0.216	1.4%	–0.030	0.048	0.241	1.5%
M06-2X	MESS-E	–0.047	0.084	0.358	3.3%	–0.078	0.113	0.462	4.3%
	MESS-H(15)	0.090	0.097	0.230	5.2%	0.140	0.150	0.356	7.7%
	MESS-H(30)	–0.014	0.032	0.170	1.0%	0.028	0.039	0.118	1.9%
	MESS-H(60)	–0.025	0.041	0.205	1.3%	–0.026	0.043	0.218	1.3%
ωB97X-D	MESS-E	0.048	0.071	0.209	3.8%	0.048	0.075	0.223	3.7%
	MESS-H(15)	0.067	0.072	0.173	4.0%	0.095	0.103	0.247	5.4%
	MESS-H(30)	–0.010	0.030	0.151	1.0%	0.014	0.031	0.122	1.4%
	MESS-H(60)	–0.027	0.042	0.208	1.3%	–0.028	0.046	0.229	1.4%

Mean signed errors (MSE), root mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in the QM/MM polarization energies estimated with MESS-E and MESS-H schemes are listed, together with the average of the relative unsigned error (REL). For the MESS-H calculations, the number within the parentheses refers to the number of eigenvectors used in the construction of the inverse of the electronic Hessian. Averaged over 1000 configurations from a 1 ns trajectory.

Estimated MESS-QM/MM polarization energies (y-axis) versus the exact values (x-axis) for 1000 configurations of a single methanol molecule solvated in a cubic box of 1000 water molecules. All energies are in kcal/mol. The MESS-H(15), MESS-H(30), and MESS-H(60) estimations started by solving iteratively for 15, 30, or 60 eigenvalues/eigenvectors of the electronic Hessian. B3LYP results are shown in the left four panels, M06-2X results in the middle, and ωB97X-D results in the right four panels. The gray lines correspond to a hypothetical perfect estimation. The red lines are linear fits of the estimated energy values against the exact values, with R2 values and slopes listed at the corner of each panel. Mean signed errors (MSE), root mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in the QM/MM polarization energies estimated with MESS-E and MESS-H schemes are listed, together with the average of the relative unsigned error (REL). For the MESS-H calculations, the number within the parentheses refers to the number of eigenvectors used in the construction of the inverse of the electronic Hessian. Averaged over 1000 configurations from a 1 ns trajectory. With all three functionals (B3LYP, M06-2X, and ωB97X-D), the QM/MM polarization energy estimated with the MESS-E scheme, as shown in Figure 2, correlates very well with the exact values, with R2 values greater than 0.996. The fitted slope with the B3LYP calculations in Figure 2 is 1.216, suggesting that MESS-E tends to overestimate the polarization energy by roughly 20%, which corresponds to a 20.6% relative error (REL) and a −0.386 kcal/mol mean signed error (MSE) in Table 1. The maximum error (MAX) is 1.339 kcal/mol, which is slightly above the desired chemical accuracy of 1 kcal/mol. In contrast, the MESS-E scheme only overestimates the M06-2X values by 3% (slope 1.030, REL 3.3%, MSE −0.047 kcal/mol) and actually slightly overestimates the ωB97X-D values by 3–4% (slope 0.977, REL 3.8%, MSE 0.048 kcal/mol). It is very encouraging that, without performing even a single Fock build at each configuration, MESS-E can reproduce the M06-2X and ωB97X-D polarization energies with an average error less than 0.1 kcal/mol and a maximum error of 0.2–0.4 kcal/mol (0.358 kcal/mol with M06-2X and 0.209 kcal/mol with ωB97X-D). When applying the MESS-H scheme to the solvated methanol, we first solved for 15, 30, or 60 eigenvectors of the electronic Hessian of the gas-phase molecule. We will refer to these calculations as MESS-H(15), MESS-H(30), and MESS-H(60) in subsequent discussions. As shown in Figure 2, the estimated MESS-H energies all correlate well with the exact values, with R2 values greater than 0.998. MESS-H(15) slightly underestimates the polarization energy by 4–5%, with MSE values of 0.066 kcal/mol (B3LYP), 0.090 kcal/mol (M06-2X), and 0.067 kcal/mol (ωB97X-D). MESS-H(30) has a higher initial cost, because it requests twice as many eigenvectors for the gas-phase inverse Hessian construction, and it also leads to more accurate energies. MESS-H(30) energies are only 1% larger than the exact values, and the MSE values are −0.016 kcal/mol (B3LYP), −0.014 kcal/mol (M06-2X), and −0.010 kcal/mol (ωB97X-D). However, the results do not improve further upon requesting even more eigenvectors. As shown by the slopes in Figure 2 and the error values in Table 1, MESS-H(60) actually leads to slightly larger errors than MESS-H(30). Overall, all these MESS-H models led to highly accurate energies with all three functionals, with MSE errors smaller than 0.1 kcal/mol, and maximum errors of 0.15–0.23 kcal/mol. The MESS-E and MESS-H results with the larger 6-311++G** basis set are also listed in Table 1, which shows roughly the same performance as the smaller 6-31+G* basis set. MESS-E energies are quite accurate in M06-2X and ωB97X-D calculations, with MSE errors of −0.078 kcal/mol (M06-2X) and 0.048 kcal/mol (ωB97X-D) and maximum errors of 0.462 kcal/mol (M06-2X) and 0.223 kcal/mol (ωB97X-D). The MESS-E errors are also much larger in B3LYP calculations. On the other hand, MESS-H(15), MESS-H(30), and MESS-H(60) all produce energies with MSE values of no more than 0.14 kcal/mol and maximum errors of no more than 0.36 kcal/mol. With both basis sets, the best performance occurs with MESS-H(30), where the number of requested eigenvectors is around twice the number of electrons—18 for methanol.

Solvated β-Alanine

The results for solvated β-alanine are shown in Figure 3 and Table 2. The QM/MM polarization energy is more significant for this system: −8 to −1 kcal/mol with the three functionals.

Figure 3

Table 2

Errors in the Estimated QM/MM Polarization Energy for the β-Alanine Molecule Solvated in a Cubic Box of 1000 Water Moleculesa

		6-31+G* basis				6-311++G** basis
functional	scheme	MSE	RMS	MAX	REL	MSE	RMS	MAX	REL
B3LYP	MESS-E	–0.982	1.067	3.046	24.6%	–1.041	1.130	3.224	24.5%
	MESS-H(30)	0.509	0.534	1.159	13.3%	0.711	0.739	1.418	17.4%
	MESS-H(60)	0.199	0.210	0.425	5.3%	0.336	0.351	0.687	8.3%
	MESS-H(90)	0.062	0.075	0.206	1.8%	0.174	0.186	0.405	4.4%
M06-2X	MESS-E	–0.194	0.270	1.189	5.2%	–0.241	0.314	1.355	5.9%
	MESS-H(30)	0.528	0.551	1.048	14.0%	0.669	0.696	1.272	16.7%
	MESS-H(60)	0.206	0.218	0.462	5.5%	0.332	0.348	0.701	8.4%
	MESS-H(90)	0.081	0.094	0.244	2.3%	0.165	0.176	0.421	4.2%
ωB97X-D	MESS-E	0.060	0.161	0.601	3.6%	0.079	0.175	0.612	3.7%
	MESS-H(30)	0.484	0.508	1.088	12.8%	0.623	0.649	1.316	15.5%
	MESS-H(60)	0.162	0.174	0.412	4.4%	0.274	0.292	0.686	6.9%
	MESS-H(90)	0.049	0.070	0.306	1.6%	0.127	0.142	0.372	3.3%

Estimated MESS-QM/MM polarization energies (y-axis) versus the exact values (x-axis) for 1000 configurations of a single β-alanine molecule solvated in a cubic box of 1000 water molecules. All energies are in kcal/mol. The MESS-H(30), MESS-H(60), and MESS-H(90) estimations started by solving iteratively for 30, 60, or 90 eigenvalues/eigenvectors of the electronic Hessian. B3LYP results are shown in the left four panels, M06-2X results in the middle, and ωB97X-D results in the right four panels. The gray lines correspond to a hypothetical perfect estimation. The red lines are linear fits of the estimated energy values against the exact values, with R2 values and slopes listed at the corner of each panel. Mean signed errors (MSE), root mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in the QM/MM polarization energies estimated with MESS-E and MESS-H schemes are listed, together with the average of the relative unsigned error (REL). For the MESS-H calculations, the number within the parentheses refers to the number of eigenvectors used in the construction of the inverse of the electronic Hessian. Averaged over 1000 configurations from a 1 ns trajectory. MESS-E energies correlate well with the exact values, with R2 values of 0.989 (B3LYP) and 0.992 (M06-2X and ωB97X-D). In B3LYP calculations, MESS-E again significantly overestimates the polarization energy (MSE −0.982 kcal/mol with 6-31+G* basis and −1.041 kcal/mol with 6-311++G** basis; REL 24.5% or 24.6% with the two basis sets). Even more significant are the maximum errors, which are 3.046 kcal/mol with 6-31+G* and 3.224 kcal/mol with 6-311++G**, and thus these MESS-E values should not be used without rescaling. In M06-2X calculations, MESS-E energies are again relatively more accurate, with the MSE values with M06-2X functionals being −0.194 and −0.241 kcal/mol, respectively. Nevertheless, the maximum errors are still as large as 1.189 and 1.355 kcal/mol. The MESS-E energies are much more accurate in ωB97X-D calculations, with MSE values as small as 0.060 and 0.079 kcal/mol. But, with maximum errors of 0.601 and 0.612 kcal/mol, the MESS-E energy values should be used with caution, if not rescaled. The MESS-H calculations all tend to underestimate the polarization energy here, and they produce increasingly more accurate results as more eigenvectors are requested. The MSE values are 0.484–0.711 kcal/mol with MESS-H(30) and are reduced to 0.162–0.336 kcal/mol with MESS-H(60), and further down to 0.049–0.174 kcal/mol with MESS-H(90). At the same time, the maximum errors are reduced from 1.088 to 1.418 kcal/mol with MESS-H(30) to 0.412–0.701 with MESS-H(60) and finally to 0.206–0.421 kcal/mol with MESS-H(90). The relative errors goes down from 12.8−17.4% and 4.4–8.4% and finally to 1.6–4.4%. The best performance here is observed with MESS-H(90), for which the number of requested eigenvectors is again about twice the number of electrons—48 for β-alanine.

Oxyluciferin–Luciferase Complex

As shown in Figure 4, the QM/MM polarization energies range from −5 to −2 kcal/mol with the three functionals and the 6-31+G* basis set. With slopes of 1.858, 1.364, and 1.184 for three functionals, the MESS-E scheme now even more significantly overestimates the polarization energy, when compared to two previous cases (methanol and β-alanine).

Figure 4

Estimated MESS-QM/MM polarization energies (y-axis) versus the exact values (x-axis) for 100 configurations of an oxyluciferin molecule embedded in luciferase. All energies are in kcal/mol. The MESS-H(120), MESS-H(180), and MESS-H(240) estimations started by solving iteratively for 120, 180, or 240 eigenvalues/eigenvectors of the electronic Hessian. B3LYP results are shown in the left four panels, M06-2X results in the middle, and ωB97X-D results in the right four panels. The gray lines correspond to a hypothetical perfect estimation. The red lines are linear fits of the estimated energy values against the exact values, with R2 values and slopes listed at the corner of each panel. As show in Table 3, in B3LYP calculations, the MSE (over 100 configurations) is now as large as −2.731 kcal/mol (REL 82.7%), with a maximum error of 5.266 kcal/mol. This improves in M06-2X calculations, where the MSE is −1.114 kcal/mol, REL is 35.2%, and the maximum error is 2.129 kcal/mol. Even in ωB97X-D calcualtions, where MESS-E performs the best out of the three functionals, the MSE is still as large as −0.558 kcal/mol, and the maximum error is 1.219 kcal/mol, and the relative error is 17.6%. So for all three functionals, the MESS-E energy cannot be used without rescaling.

Table 3

Errors in the Estimated QM/MM Polarization Energy for the Oxyluciferin–Luciferase Complexa

		6-31+G* basis
functional	scheme	MSE	RMS	MAX	REL
B3LYP	MESS-E	–2.731	2.874	5.266	82.7%
	MESS-H(120)	0.251	0.254	0.341	7.9%
	MESS-H(180)	0.174	0.176	0.237	5.5%
	MESS-H(240)	0.136	0.137	0.190	4.3%
M06-2X	MESS-E	–1.114	1.169	2.129	35.2%
	MESS-H(120)	0.222	0.224	0.296	7.3%
	MESS-H(180)	0.149	0.150	0.208	4.9%
	MESS-H(240)	0.104	0.105	0.161	3.4%
ωB97X-D	MESS-E	–0.558	0.605	1.219	17.6%
	MESS-H(120)	0.185	0.187	0.258	6.1%
	MESS-H(180)	0.113	0.115	0.176	3.7%
	MESS-H(240)	0.077	0.079	0.129	2.5%

The oxyluciferin substrate is the QM region, and the luciferase is described with the CHARMM force field. Mean signed errors (MSE), root mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in the QM/MM polarization energies estimated with MESS-E and MESS-H schemes are listed, together with the average of the relative unsigned error (REL). Averaged over 100 configurations from a 100 ps trajectory. The MESS-H scheme has reliably led to more accurate energies with an increasing number of requested eigenvalues for the electronic Hessian. MESS-H(120) led to MSE values around 0.2 kcal/mol (0.251 kcal/mol (B3LYP), 0.222 kcal/mol (M06-2X), and 0.185 kcal/mol (ωB97X-D)) and maximum errors around 0.3 kcal/mol (0.341 kcal/mol (B3LYP), 0.296 kcal/mol (M06-2X), and 0.258 kcal/mol (ωB97X-D)). These errors are reduced roughly by one-third with MESS-H(180), and by one-half with MESS-H(240). Indeed, MESS-H(240) led to MSE values around 0.1 kcal/mol (0.136 kcal/mol (B3LYP), 0.104 kcal/mol (M06-2X), 0.077 kcal/mol (ωB97X-D)). Meanwhile, the MESS-H(240) maximum errors are all below 0.2 kcal/mol (0.190 kcal/mol (B3LYP), 0.161 kcal/mol (M06-2X), 0.129 kcal/mol (ωB97X-D)). Here, the number of requested eigenvectors, 240, is again roughly twice that of the number of electrons—128 for oxyluciferin.

Further Analysis of the MESS-E Scheme

So far, we have observed the following general trend for the errors in the MESS-E energies: (i) ωB97X-D < M06-2X ≪ B3LYP and (ii) methanol < β-alanine < oxyluciferin. On one end, MESS-E performs very well for solvated methanol, with MSE values less than 0.1 kcal/mol with M06-2X and ωB97X-D functionals. On the other end, it significantly overestimates the QM/MM polarization energy for oxyluciferin in luciferase: with B3LYP, the MSE is as large as −2.731 kcal/mol. Here we shall attempt to unravel the underlying reason for this rather uneven performance. Because MESS-E almost always overestimates the polarization energy, the exception being ωB97X-D calculations on solvated methanol and β-alanine, let us introduce a tuning parameter, λ, into eq 19, which then becomesBy diagonalizing this Fock matrix,we obtain a set of MO coefficients, C(λ), which change continuously with parameter λ. From these MOs, we can compute the corresponding density matrix, P(λ), electron density, ρ(λ), and Kohn–Sham DFT energy,Note that E(λ) provides an upperbound to E(full), the actual DFT energy within a given MM potential, which corresponds to full converged MOs, C(full), and its corresponding electron density, ρ(full). To assess the degree with which the occupied Kohn–Sham orbitals and thus the electron density respond to the applied MM potential, we computed the difference between ρ(λ) and the full-converged SCF density ρ(full):and plotted them and E(λ) values against λ (ranging from 0 to 1.5) in Figure 5 for the first configuration of each test system with all three functionals. There, in the panels with energy curves, the energy values are relative to the full-converged DFT energy, so that the values at λ = 0 are (the absolute value of) the exact polarization energies.

Figure 5

Deviation in the electron density and in DFT energies from the fully converged solutions versus the λ parameter. The deviation in the electron density is measured with Δn, which is defined in eq 32. The energy values are relative to the fully converged DFT energy, so that the values at λ = 0 are (the absolute value of) the exact polarization energies. The plotted energies contain extra 2-electron and exchange–correlation contributions that are not included in the MESS-E energies. The left panels correspond to results for solvated methanol, the middle panels for solvated β-alanine, and the right panels for the oxyluciferin–luciferase complex. It is clear in Figure 5 that, given a test system and a functional, the Δn curve and the E(λ) curve reach minimum at the same λ values. And in all cases, the minimum occurs where λ is smaller than 1. This indicates that δP(1) always overestimates the response in the density matrix. In ωB97X-D calculations, the energy reaches a minimum of 0.325 kcal/mol at λ = 0.9 for solvated methanol, 0.919 kcal/mol at λ = 0.8 for solvated β-alanine, and 0.497 kcal/mol at λ = 0.625 for the oxyluciferin–luciferase complex. In M06-2X calculations, the minimums are 0.296 kcal/mol at λ = 0.875 (methanol), 0.873 kcal/mol at λ = 0.775 (β-alanine), and 0.482 kcal/mol at λ = 0.55 (oxyluciferin). In B3LYP calculations, the minimums are 0.31 kcal/mol at λ = 0.725 (methanol), 1.004 kcal/mol at λ = 0.625 (β-alanine), and 0.613 kcal/mol at λ = 0.4 (oxyluciferin). The total energy within the MESS-E scheme,is a sum of gas-phase energy, E(0), the first-order energy change, E(1) (eq 18), and the MESS-E approximation to the polarization energy, EMESS-E(2) (eq 23). Compared to E(λ) at λ = 1 in Figure 5, the MESS-E total energy does not include (i) 2-electron integral contributions, δP(1)·II·δP(1), and (ii) their DFT exchange–correlation counterparts. As a result, the MESS-E energies are no longer upper bounds to the full-converged DFT energy. Due to its dependence on δP(1), which overestimates the response in the density matrix, the MESS-E formula in eq 23 understandably tends to overestimate the QM/MM polarization energy. In Figure 6, the MSE in the MESS-E energies from Tables 1–3 are plotted against the λ values above. There is a clear trend: the smaller the λ value is at the minimum, the larger errors we find in the MESS-E energies.

Figure 6

MSE in the MESS-E energies from Tables 1–3 versus the λ values at the minimums in Figure 5.

Conclusions

In this work, we proposed two schemes, MESS-E and MESS-H, for a fast estimation of the MESS-QM/MM polarization energies and applied them to three test systems: solvated methanol, solvated β-alanine, and the luciferin–luciferase complex. The main findings are as follows: The MESS-E scheme in general overestimates the polarization energy, and the errors follow the trends ωB97X-D < M06-2X ≪ B3LYP and methanol < β-alanine < oxyluciferin. In the best cases, solvated methanol with the M06-2X or ωB97X-D functionals, for example, the MSE is smaller than 0.1 kcal/mol. In the worst cases, such as the luciferin–luciferase complex with the B3LYP functional, the MSE is as large as −2.731 kcal/mol. This significant difference in its performance is suggested to arise from the overestimation of the density matrix response in the MESS-E scheme, and a λ parameter was introduced to assess the degree of this overestimation. The MESS-H scheme can reliably estimate the polarization energy. The average errors can be around or even below 0.2 kcal/mol, if the number of eigenvectors used to construct the inverse electronic Hessian is at least about twice the number of electrons in the QM region. MESS-E and MESS-H schemes only require the evaluation of the MM electrostatic potential and completely avoid the SCF iterations at each configuration, especially the expensive updating of Coulomb, Hartree–Fock exchange, and DFT exchange–correlation matrices, and thus they can be performed with only a tiny fraction of the cost associated with standard QM/MM calculations. This work is limited in a number of ways: Although we have demonstrated the accuracy of the two MESS schemes, especially MESS-H, for individual configurations of our test systems, it remains to be seen how much error these approximations cause to the overall free energies; Though a λ parameter has been introduced to assess the overestimation of the density response in the MESS-E scheme, the dependence of the optimal λ values on the molecule, on the QM method and on the external MM environment needs to be explored in more detail before developing a MESS-E scheme with proper rescaling that is universally applicable and the least arbitrary. Specifically, there has been some clear indication that the optimal λ values as shown in Figure 5 are correlated with the HOMO–LUMO gaps. For example, B3LYP, which has the smallest fraction for the Hartree–Fock exact exchange (20%) out of the three functionals, leads to the smallest HOMO–LUMO gaps and then the smallest optimal λ values. In many cases, the gas-phase electron structure is not necessarily the best reference for the MESS-E and MESS-H calculations. Other references, a single configuration with external MM potential or an average over multiple configurations, should be tested. Although the QM/MM polarization energy can be estimated within a reasonable accuracy, one might want to estimate the QM/MM energy gradient as well, which is required to compute the enthalpic correction to the free energy due to the constraint force. It remains to be seen whether the QM/MM gradient is more sensitive to the error in Kohn–Sham orbitals than the energy. The MM electrostatic potential used in the solvated methanol or β-alanine calculations comes only from MM atoms in the central unit cell. Although their images in PBC simulations are not expected to significantly add to the polarization of the QM region, especially if a significantly large unit cell is used (as the case with this work), an extension of MESS-E and MESS-H schemes to QM/MM calculations with a periodic boundary condition[75−77] might be desirable. For a macromolecular system with tens of thousands of (if not more) MM atoms, as in our test case of the oxyluciferin–luciferase complex, the cost of evaluating the MM electrostatic potential in the QM region can be high. For example, for each MM configuration in the luciferase calculation with B3LYP/6-31+G* level of theory, it still takes about 8 s to evaluate the MM electrostatic potential on a single Intel Xeon E5420 2.5 GHz processor (whereas it would have required about 150 s to fully converge SCF within 7 cycles starting from gas-phase molecular orbitals). Some techniques in the J-engine[78,79] and quantum Ewald mesh[80] methods can potentially be adapted to help speed up the evaluation of MM potential. At this moment, the MESS-E and MESS-H schemes are only applicable to Hartree–Fock, pure DFT, hybrid DFT, and range-separated DFT. One has yet to extend it to double-hybrid DFT or post-Hartree–Fock methods. We have limited our study to fixed point-charge force fields on the MM atoms, and therefore we have completely neglected the “back” polarization effects, where the QM nuclei/electrons also polarize the MM atoms. Currently, we are actively pursuing work to address some of these limitations.

47 in total

1. Perspective on density functional theory.

Authors: Kieron Burke
Journal: J Chem Phys Date: 2012-04-21 Impact factor: 3.488

2. Frontiers in electronic structure theory.

Authors: C David Sherrill
Journal: J Chem Phys Date: 2010-03-21 Impact factor: 3.488

3. Density functional triple jumping.

Authors: Jia Deng; Andrew T B Gilbert; Peter M W Gill
Journal: Phys Chem Chem Phys Date: 2010-07-07 Impact factor: 3.676

4. On the Convergence of QM/MM Energies.

Authors: LiHong Hu; Pär Söderhjelm; Ulf Ryde
Journal: J Chem Theory Comput Date: 2011-02-07 Impact factor: 6.006

Review 5. Advances in methods and algorithms in a modern quantum chemistry program package.

Authors: Yihan Shao; Laszlo Fusti Molnar; Yousung Jung; Jörg Kussmann; Christian Ochsenfeld; Shawn T Brown; Andrew T B Gilbert; Lyudmila V Slipchenko; Sergey V Levchenko; Darragh P O'Neill; Robert A DiStasio; Rohini C Lochan; Tao Wang; Gregory J O Beran; Nicholas A Besley; John M Herbert; Ching Yeh Lin; Troy Van Voorhis; Siu Hung Chien; Alex Sodt; Ryan P Steele; Vitaly A Rassolov; Paul E Maslen; Prakashan P Korambath; Ross D Adamson; Brian Austin; Jon Baker; Edward F C Byrd; Holger Dachsel; Robert J Doerksen; Andreas Dreuw; Barry D Dunietz; Anthony D Dutoi; Thomas R Furlani; Steven R Gwaltney; Andreas Heyden; So Hirata; Chao-Ping Hsu; Gary Kedziora; Rustam Z Khalliulin; Phil Klunzinger; Aaron M Lee; Michael S Lee; Wanzhen Liang; Itay Lotan; Nikhil Nair; Baron Peters; Emil I Proynov; Piotr A Pieniazek; Young Min Rhee; Jim Ritchie; Edina Rosta; C David Sherrill; Andrew C Simmonett; Joseph E Subotnik; H Lee Woodcock; Weimin Zhang; Alexis T Bell; Arup K Chakraborty; Daniel M Chipman; Frerich J Keil; Arieh Warshel; Warren J Hehre; Henry F Schaefer; Jing Kong; Anna I Krylov; Peter M W Gill; Martin Head-Gordon
Journal: Phys Chem Chem Phys Date: 2006-06-12 Impact factor: 3.676

Review 6. QM/MM studies of enzymes.

Authors: Hans Martin Senn; Walter Thiel
Journal: Curr Opin Chem Biol Date: 2007-02-16 Impact factor: 8.822

Review 7. CHARMM: the biomolecular simulation program.

Authors: B R Brooks; C L Brooks; A D Mackerell; L Nilsson; R J Petrella; B Roux; Y Won; G Archontis; C Bartels; S Boresch; A Caflisch; L Caves; Q Cui; A R Dinner; M Feig; S Fischer; J Gao; M Hodoscek; W Im; K Kuczera; T Lazaridis; J Ma; V Ovchinnikov; E Paci; R W Pastor; C B Post; J Z Pu; M Schaefer; B Tidor; R M Venable; H L Woodcock; X Wu; W Yang; D M York; M Karplus
Journal: J Comput Chem Date: 2009-07-30 Impact factor: 3.376

8. Variational and perturbative formulations of quantum mechanical/molecular mechanical free energy with mean-field embedding and its analytical gradients.

Authors: Takeshi Yamamoto
Journal: J Chem Phys Date: 2008-12-28 Impact factor: 3.488

9. The implementation of a fast and accurate QM/MM potential method in Amber.

Authors: Ross C Walker; Michael F Crowley; David A Case
Journal: J Comput Chem Date: 2008-05 Impact factor: 3.376

10. Dynamics on the electronically excited state surface of the bioluminescent firefly luciferase-oxyluciferin system.

Authors: Chang-ik Song; Young Min Rhee
Journal: J Am Chem Soc Date: 2011-07-19 Impact factor: 15.419

7 in total

1. Efficient treatment of induced dipoles.

Authors: Andrew C Simmonett; Frank C Pickard; Yihan Shao; Thomas E Cheatham; Bernard R Brooks
Journal: J Chem Phys Date: 2015-08-21 Impact factor: 3.488

2. LICHEM: A QM/MM program for simulations with multipolar and polarizable force fields.

Authors: Eric G Kratz; Alice R Walker; Louis Lagardère; Filippo Lipparini; Jean-Philip Piquemal; G Andrés Cisneros
Journal: J Comput Chem Date: 2016-01-18 Impact factor: 3.376

3. An Estimation of Hybrid Quantum Mechanical Molecular Mechanical Polarization Energies for Small Molecules Using Polarizable Force-Field Approaches.

Authors: Jing Huang; Ye Mei; Gerhard König; Andrew C Simmonett; Frank C Pickard; Qin Wu; Lee-Ping Wang; Alexander D MacKerell; Bernard R Brooks; Yihan Shao
Journal: J Chem Theory Comput Date: 2017-01-24 Impact factor: 6.006

4. Numerical study on the partitioning of the molecular polarizability into fluctuating charge and induced atomic dipole contributions.

Authors: Ye Mei; Andrew C Simmonett; Frank C Pickard; Robert A DiStasio; Bernard R Brooks; Yihan Shao
Journal: J Phys Chem A Date: 2015-05-18 Impact factor: 2.781

5. Fragmentation Method for Computing Quantum Mechanics and Molecular Mechanics Gradients for Force Matching: Validation with Hydration Free Energy Predictions Using Adaptive Force Matching.

Authors: Dong Zheng; Ying Yuan; Feng Wang
Journal: J Phys Chem A Date: 2022-04-14 Impact factor: 2.944

6. Computation of Hydration Free Energies Using the Multiple Environment Single System Quantum Mechanical/Molecular Mechanical Method.

Authors: Gerhard König; Ye Mei; Frank C Pickard; Andrew C Simmonett; Benjamin T Miller; John M Herbert; H Lee Woodcock; Bernard R Brooks; Yihan Shao
Journal: J Chem Theory Comput Date: 2015-12-11 Impact factor: 6.006

7. Performing Molecular Dynamics Simulations and Computing Hydration Free Energies on the B3LYP-D3(BJ) Potential Energy Surface with Adaptive Force Matching: A Benchmark Study with Seven Alcohols and One Amine.

Authors: Dong Zheng; Feng Wang
Journal: ACS Phys Chem Au Date: 2021-07-21

7 in total