In combined quantum mechanical/molecular mechanical (QM/MM) free energy calculations, it is often advantageous to have a frozen geometry for the quantum mechanical (QM) region. For such multiple-environment single-system (MESS) cases, two schemes are proposed here for estimating the polarization energy: the first scheme, termed MESS-E, involves a Roothaan step extrapolation of the self-consistent field (SCF) energy; whereas the other scheme, termed MESS-H, employs a Newton-Raphson correction using an approximate inverse electronic Hessian of the QM region (which is constructed only once). Both schemes are extremely efficient, because the expensive Fock updates and SCF iterations in standard QM/MM calculations are completely avoided at each configuration. They produce reasonably accurate QM/MM polarization energies: MESS-E can predict the polarization energy within 0.25 kcal/mol in terms of the mean signed error for two of our test cases, solvated methanol and solvated β-alanine, using the M06-2X or ωB97X-D functionals; MESS-H can reproduce the polarization energy within 0.2 kcal/mol for these two cases and for the oxyluciferin-luciferase complex, if the approximate inverse electronic Hessians are constructed with sufficient accuracy.
In combined quantum mechanical/molecular mechanical (QM/MM) free energy calculations, it is often advantageous to have a frozen geometry for the quantum mechanical (QM) region. For such multiple-environment single-system (MESS) cases, two schemes are proposed here for estimating the polarization energy: the first scheme, termed MESS-E, involves a Roothaan step extrapolation of the self-consistent field (SCF) energy; whereas the other scheme, termed MESS-H, employs a Newton-Raphson correction using an approximate inverse electronic Hessian of the QM region (which is constructed only once). Both schemes are extremely efficient, because the expensive Fock updates and SCF iterations in standard QM/MM calculations are completely avoided at each configuration. They produce reasonably accurate QM/MM polarization energies: MESS-E can predict the polarization energy within 0.25 kcal/mol in terms of the mean signed error for two of our test cases, solvated methanol and solvated β-alanine, using the M06-2X or ωB97X-D functionals; MESS-H can reproduce the polarization energy within 0.2 kcal/mol for these two cases and for the oxyluciferin-luciferase complex, if the approximate inverse electronic Hessians are constructed with sufficient accuracy.
In
the last two decades, combined quantum mechanical molecular
mechanical (QM/MM) calculations have become increasingly popular in
the computational study of molecular solvation, catalytic or enzymatic
reactions,[1−7] and ligand binding.[8,9] In these calculations, typically
the solute, the reactive region, or the active site is studied by
quantum mechanics (QM) methods, whereas the environment (solvent molecules
or the non-QM portion of the macromolecule) is described with molecular
mechanics (MM) force fields.In pure QM calculations, it is
often sufficient to locate a few
stationary points (local minima and saddle points) on the potential
energy surface.[10,11] When it comes to QM/MM potential
energy surfaces, such stationary points can still provide some useful
insights into the reaction mechanisms or into the chemical nature
of ligand–receptor binding. But only a small number of degrees
of freedom (that are predominantly associated with QM atoms) connect
these stationary points to each other. At a finite temperature, many
other degrees of freedom are also accessible and can thus have non-negligible
contributions to the free energy of interest, such as the solvation
free energy, reaction free energy, or binding free energy. So it is
essential to adequately sample all the relevant degrees of freedom.The sampling of these QM and MM degrees of freedom can be achieved
via QM/MM molecular dynamics simulations, which in practice fall into
at least two general categories. In the first category, one performs
conventional simulations using QM/MM forces throughout the trajectory,
meaning that the QM/MM energy and gradient have to be computed at
every geometry step. Though such QM/MM MD simulations have proven
to yield higher accuracy than classical MM simulations, they are,
unfortunately, also orders of magnitude more expensive. For the three
test cases in this work, for example, a single QM/MM MD step (using
the B3LYP/6-31+G* level of theory for the QM region) would take 50
times longer than a MM step for solvated methanol, 400 times longer
for solvated β-alanine, and 4500 times longer for oxyluciferin
in its luciferase complex. For many applications, where an even larger
QM region and/or a more sophisticated QM method are desired, a conventional
QM/MM MD simulation remains quite expensive, even after much recent
and continuing work on algorithmic developments to speed up QM calculations.[12−15]In the second category, one performs “indirect”
QM/MM
simulations.[16−22] In these simulations, the conformational space is sampled using
a MM force field (and thus with a higher computational speed). Then
a subset of configurations along the MM trajectory are subjected to
QM/MM energy/gradient evaluations, to account for the free energy
differences between QM/MM and MM surfaces. This strategy works best
if the QM/MM configurations are well represented by the MM ensemble,
which requires considerable “overlap” between the QM/MM
potential energy surface and the MM potential energy surface. In cases
where the overlap is insufficient, the free energy can either fail
to converge or converge to incorrect values. To alleviate this problem,
several groups have proposed “freezing” some or all
internal degrees of freedom within the QM region[16−18,23,24] during the MM simulations.Such freezing of QM degrees of freedom has shown to improve the
convergence of free energy calculations, but it might also lose accuracy
due to the omission of any entropic contributions in the QM region.[25,26] This deficiency can potentially be mitigated with the introduction
of three separate energy corrections: (a) an enthalpic correction
which corresponds to calculating the shift of the energy minimum due
to the constraints on the QM degrees of freedom; (b) a vibrational
entropic correction based on a subsystem Hessian[27] (the same Hessian as in the enthalpic term); (c) a Jacobian
term. These energy corrections are discussed in detail in separate
publications.[22,28] In addition, systems with multiple
rotational states can be addressed using the technique developed by
Straatsma and McCammon.[29]In this
work, we shall focus on “indirect” QM/MM
simulations where all internal degrees of freedom in the QM region
are frozen in the MM simulations. In other words, the QM region simply
becomes a rigid body. The subset of configurations coming out of this
MM trajectory share a single QM geometry whereas the MM environment
can be quite different. The main task then is to compute the QM/MM
energy for such “multiple environment single system”
(MESS) configurations. We shall show in this work that, when the QM
method in use is Kohn–Sham density functional theory (DFT),[30−35] it is possible to estimate such MESS-QM/MM energies with a small
computational cost and a reasonably high accuracy.This article
is structured as follows. In section 2,
the underlying theory is formulated. The MESS-QM/MM
energy is divided into three terms: (a) a gas-phase energy; (b) a
first-order term, which describes the interaction of the gas-phase
electron density and the MM electrostatic potential; (c) the QM/MM
polarization energy, which corresponds to the second-order and higher
terms. In the same section, two different schemes will be proposed
for estimating the QM/MM polarization energy: MESS-E, which is based
on a Roothaan step extrapolation, and MESS-H, which is based on a
Newton–Raphson energy correction using an inverse electronic
Hessian in a subspace representation. In section III, technical details will be provided about the implementation
of these two schemes for MESS-QM/MM polarization energy estimation
and about our test systems. In section IV, the
results will be presented and discussed. Concluding remarking are
made in section V.
THEORY
QM/MM Polarization Energy
Gas-Phase
Electronic Structure
Let
us consider a QM system that consists of NQM atoms and is subjected to an external potential v(r). Within the Kohn–Sham density functional
theory using pure, hybrid or range-separated functionals, the total
energy of such a system is given by a functional of the one-particle
density matrixFor pure functionals, this energy functional
involves only the diagonal part of the one particle density matrix,
namely the electron density ρ(r). Hybrid functionals
and range-separated functionals involve the off-diagonal elements
as well through the Hartree–Fock exchange energy contributions.From a set of orthonormal Kohn–Sham molecular orbitals (MO),
ψ, which are often approximated
as linear combinations of atom-centered basis functions, ϕμ,one can write the one-particle density
matrix
asHere Pμν isand is often
also referred to as the density
matrix.In a DFT calculation, one optimizes the MOs in eq 2 via a self-consistent field (SCF) iterative procedure
to
obtain the lowest energy for the system in eq 1. Within each iteration, from the current set of orthonormal MOs,
one rotates the orbitalswhere the unoccupied MOs, ψ, are mixed into the occupied ones, ψ. The actual free variables in the energy
functional
in eq 1 are thus the orbital rotations,The iterative procedure continues until it
reaches a set of optimized MOs that make the energy stationary with
respect to the orbital rotationsIn the gas phase, the external potential v(r) only includes the nuclear attraction,which is summed over all
nuclei, A, with charges Z. This
potential can be represented in the atomic basisThe MO
coefficients optimized under this gas
phase external electrostatic potential will be denoted by C(0), and the corresponding gas-phase density
matrix byand the Fock matrix, Fμν = ∂E/∂Pμν, by F(0). The gas-phase Kohn–Sham energy isand the gas-phase electronic density is
MM Perturbation
In QM/MM calculations,
the external potential includes both the nuclear attraction, v(0)(r) in eq 9, and an additional electrostatic potential due to a set of NMM MM atoms,In the simplest cases (as in the examples
in section IV), where the MM atoms are represented
by point charges, {q, B = 1, ..., NMM}, the additional potential isIts matrix representation isThis external electrostatic potential
from the MM atoms perturbs the energy of the system from E(0), its gas-phase value:where the first-order change is a simple sum
of electronic and nuclear terms:where the minus signs reflect the
negative
charge from electrons. To compute this first-order energy
change, the dominant computational cost comes from the evaluation
of δhin eq 16.The second-order
change, δE(2), plus all higher order
terms is the QM/MM polarization energy. It
is directly caused by the polarization of the Kohn–Sham orbitals
(and therefore the one-particle density matrix) by the MM electrostatic
potential. Essentially, the Kohn–Sham orbitals will relax within
the additional MM electrostatic potential in eq 15, stabilize the system, and lower the energy. In the next two subsections,
we will describe two approximate schemes, MESS-E and MESS-H, for estimating δE(2), the second-order term of the QM/MM
polarization energy.
Scheme MESS-E: Roothaan Step
Extrapolation
Similar to the dual-basis projection technique
for DFT and MP2
calculations,[36−38] the three extrapolations in density functional triple
jumping,[39] and the Roothaan step correction
in the absolutely localized molecule orbital (ALMO) based energy decomposition
analysis,[40,41] one Roothaan step[42] is taken in the MESS-E scheme.Here we write the perturbed
Fock matrix aswhere F(0) is
the gas-phase Fock matrix, and δh is the atomic
basis representation of the electrostatic potential due to MM atoms
(see eq 16). In a Roothaan step, the perturbed
Fock matrix is diagonalizedwhere S is the overlap matrixFrom the
eigenvectors, which are the new set
of MO coefficients, C(1), one constructs
a new density matrixThen one can approximate
the QM/MM polarization energy as a Roothaan
step extrapolationwhere δP(1) is the projected response in the density matrixThe leading contributions to the MESS-E energy,
eq 23, are clearly second-order: in the F(0)·δP(1) term, the gas-phase Fock matrix, F(0), can only couple to the occupied–occupied (oo) and virtual–virtual
(vv) blocks of δP(1), both of which
are quadratic in orbital response and thus in δh. In the δh·δP(1) term, δh couples to the occupied-virtual
(ov) and virtual-occupied (vo) blocks of δP(1), both of which are first order in orbital response
(and thus in δh).With the F(0)·δP(1) term,
one removes a fractional number of electrons
from occupied Kohn–Sham orbitals and puts them into (higher-energy)
virtual orbitals. As a result, this term is always positive. To achieve
a net stabilization of the QM system within the MM potential, the δh·δP(1) term
must be negative and have a larger absolute value. In our test cases, δh·δP(1) was found to be roughly
twice as large as F(0)·δP(1) (with an opposite sign).It should be noted
that, in the computation of this second-order
energy change for each MM environment, one avoids the expensive Fock
builds and only performs a diagonalization of the perturbed Fock matrix (eq 20) and a few simple matrix operations (eqs 19, 22 and 23).
Scheme MESS-H: Newton–Raphson
Correction
with an Approximate Hessian
As shown in eq 8, the gradient of the gas-phase Kohn–Sham energy with
respect to orbital rotations vanishes at the set of optimized gas-phase
Kohn–Sham orbitals. When the MM electrostatic potential in
eq 15 is applied, the gradient is no longer
zero:where δh is the MO representation of the MM potentialIf the
electronic Hessian of the gas-phase
energy, i.e., the set of second derivatives with respect to orbital
rotations, H(0), is readily available, one can
use its inverse, H(0),–1, to approximate the
QM/MM polarization energy with a Newton–Raphson energy correctionwhich corresponds to a shift of the energy
minimum with the following orbital response due to the external MM
potential,Here one only needs to compute the
gas-phase electronic Hessian once and then store its inverse in memory
or disk. In subsequent MESS-H calculations, for each MM environment,
one simply reads in the inverse Hessian back from the storage and
performs two simple matrix operations (eqs26and27). We note that the inverse electronic Hessian is closely related
to the linear response kernel in DFT, (δ2E)/(δν(r) δν(r′)), which has been
studied analytically and numerically.[43−54]One can and should avoid computing the exact gas-phase electronic
Hessian, H(0), for three practical considerations:
(a) it might be too expensive to compute the exact Hessian for systems
with a large number of QM atoms and/or a large basis set, (b) it might
be unnecessary to compute the exact value of the second-order term
whereas higher-order terms are completely neglected, and most importantly,
(c) the commonly used stability analysis,[55]which usually involves using Davidson’s iterative subspace method[56,57]to solve for the lowest eigenvalues of the electronic Hessian,
provides a practical way to approximate the gas-phase electronic Hessian
and therefore its inverse within a subspace representation. Namely, if the Davidson procedure yields the lowest m eigenvectors, HU = Uλ, we can approximate the inverse Hessian as ∑(λ)−1UUT. Of course, this is based on the positive definiteness of
a gas-phase electronic Hessian (assuming that the gas-phase Kohn–Sham
energy is indeed optimized to its minimum), implying smaller and smaller
contributions to the inverse Hessian from the eigenvectors as their
eigenvalues increase.
COMPUTATIONAL DETAILS
The two schemes for estimating MESS QM/MM polarization energies,
MESS-E and MESS-H, are implemented within the Q-Chem/CHARMM interface,[58] using version c38b2 of CHARMM[59] and a development version of Q-Chem 4.2.[60,61]Three test cases employed in this work are shown in Figure 1:For the first two test cases,
the solute molecules (methanol
or β-alanine) are described with DFT methods (B3LYP,[65−67] M06-2X,[68] and ωB97X-D[69] functionals with 6-31+G* and 6-311++G** basis
sets[70,71]) and water molecules with TIP3P model.[72] Constant-volume rigid body molecular dynamics
simulations are performed for both systems at 298 K for 100 ps for
an initial equilibration, and then for 1 ns during which configurations
are collected at 1 ps intervals, leading to 1000 configurations for
both systems.
Figure 1
Three test systems in this work: (a) methanol and (b)
β-alanine,
both solvated in a cubic box of classical water molecules, and (c)
oxyluciferin in its complex with luciferase. In the luciferase binding
pocket, there is also an AMP molecule.
methanol solvated in a cubic
box with 1000 water moleculesβ-alanine solvated in a
cubic box with 1000 water moleculesthe oxyluciferin–luciferase
complex[62−64]Three test systems in this work: (a) methanol and (b)
β-alanine,
both solvated in a cubic box of classical water molecules, and (c)
oxyluciferin in its complex with luciferase. In the luciferase binding
pocket, there is also an AMP molecule.For the oxyluciferin–luciferase complex, we started
with
the 2D1R protein
data bank structure,[62] which includes 540
protein residues, the oxyluciferin substrate, and an AMP molecule
also in the binding pocket. 626 water molecules are included to solvate
the protein surface. The oxyluciferin substrate is also described
with DFT methods, whereas the protein residues and AMP (10 307
atoms in total) are described with CHARMM c22 protein force field
and CGenFF force field.[73] Constant-volume
rigid body molecular dynamics simulations are performed for both systems
at 298 K for 10 ps for an initial equilibration, and then for 100
ps. With one configuration collected every 1 ps, this leads to 100
configurations in total.In all these calculations, the core
region (methanol, β-alanine,
or oxyluciferin) is constrained as a rigid body through the “CONS
FIX” command in CHARMM, or can alternatively be enforced through
the “SHAPE” command.[74] As
a result, the core region retains the same geometry for all 100 or
1000 configurations used in subsequent MESS-E or MESS-H polarization
energy estimations.A “MESS” option is added to
the Q-Chem interface
in CHARMM, where “NROOTS n” is used to specify how many
lowest eigenvalues/eigenvectors of the stability matrix (i.e., electronic
Hessian) to be computed and used for constructing its inverse. The
inverse electronic Hessian is constructed only once for the gas-phase
density and saved to disk at the starting of the MESS-QM/MM calculation.
For each configuration, we evaluate the MM potential for the QM atoms
(eq 16), and then compute MESS-E and MESS-H
estimations (the latter uses the inverse Hessian read in from disk)
for the QM/MM polarization energy. To assess the accuracy of MESS-E
and MESS-H energies, we also perform SCF calculations to fully converge
DFT energies for each configuration. This is done only for the purpose
of comparison, and it will be avoided in actual MESS-QM/MM calculations.
Results and Discussion
Solvated Methanol
The results for the
solvated methanol systems are shown in Figure 2 and Table 1. The QM/MM polarization energy
are confirmed to be all negative (i.e., stabilizing the system) and,
with a range of −5 to 0 kcal/mol, its contribution to the total
energy are thus non-negligible.
Figure 2
Estimated MESS-QM/MM polarization energies
(y-axis)
versus the exact values (x-axis) for 1000 configurations
of a single methanol molecule solvated in a cubic box of 1000 water
molecules. All energies are in kcal/mol. The MESS-H(15), MESS-H(30),
and MESS-H(60) estimations started by solving iteratively for 15,
30, or 60 eigenvalues/eigenvectors of the electronic Hessian. B3LYP
results are shown in the left four panels, M06-2X results in the middle,
and ωB97X-D results in the right four panels. The gray lines
correspond to a hypothetical perfect estimation. The red lines are
linear fits of the estimated energy values against the exact values,
with R2 values and slopes listed at the
corner of each panel.
Table 1
Errors in the Estimated QM/MM Polarization
Energy for a Methanol Molecule Solvated in a Cubic Box of 1000 Water
Moleculesa
6-31+G*
basis
6-311++G**
basis
functional
scheme
MSE
RMS
MAX
REL
MSE
RMS
MAX
REL
B3LYP
MESS-E
–0.386
0.432
1.339
20.6%
–0.420
0.470
1.459
21.2%
MESS-H(15)
0.066
0.071
0.173
4.0%
0.101
0.107
0.234
5.7%
MESS-H(30)
–0.016
0.033
0.176
1.0%
0.019
0.032
0.112
1.5%
MESS-H(60)
–0.029
0.044
0.216
1.4%
–0.030
0.048
0.241
1.5%
M06-2X
MESS-E
–0.047
0.084
0.358
3.3%
–0.078
0.113
0.462
4.3%
MESS-H(15)
0.090
0.097
0.230
5.2%
0.140
0.150
0.356
7.7%
MESS-H(30)
–0.014
0.032
0.170
1.0%
0.028
0.039
0.118
1.9%
MESS-H(60)
–0.025
0.041
0.205
1.3%
–0.026
0.043
0.218
1.3%
ωB97X-D
MESS-E
0.048
0.071
0.209
3.8%
0.048
0.075
0.223
3.7%
MESS-H(15)
0.067
0.072
0.173
4.0%
0.095
0.103
0.247
5.4%
MESS-H(30)
–0.010
0.030
0.151
1.0%
0.014
0.031
0.122
1.4%
MESS-H(60)
–0.027
0.042
0.208
1.3%
–0.028
0.046
0.229
1.4%
Mean signed
errors (MSE), root
mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in
the QM/MM polarization energies estimated with MESS-E and MESS-H schemes
are listed, together with the average of the relative unsigned error
(REL). For the MESS-H calculations, the number within the parentheses
refers to the number of eigenvectors used in the construction of the
inverse of the electronic Hessian. Averaged over 1000 configurations
from a 1 ns trajectory.
Estimated MESS-QM/MM polarization energies
(y-axis)
versus the exact values (x-axis) for 1000 configurations
of a single methanol molecule solvated in a cubic box of 1000 water
molecules. All energies are in kcal/mol. The MESS-H(15), MESS-H(30),
and MESS-H(60) estimations started by solving iteratively for 15,
30, or 60 eigenvalues/eigenvectors of the electronic Hessian. B3LYP
results are shown in the left four panels, M06-2X results in the middle,
and ωB97X-D results in the right four panels. The gray lines
correspond to a hypothetical perfect estimation. The red lines are
linear fits of the estimated energy values against the exact values,
with R2 values and slopes listed at the
corner of each panel.Mean signed
errors (MSE), root
mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in
the QM/MM polarization energies estimated with MESS-E and MESS-H schemes
are listed, together with the average of the relative unsigned error
(REL). For the MESS-H calculations, the number within the parentheses
refers to the number of eigenvectors used in the construction of the
inverse of the electronic Hessian. Averaged over 1000 configurations
from a 1 ns trajectory.With all three functionals (B3LYP, M06-2X, and ωB97X-D),
the QM/MM polarization energy estimated with the MESS-E scheme, as
shown in Figure 2, correlates very well with
the exact values, with R2 values greater
than 0.996. The fitted slope with the B3LYP calculations in Figure 2 is 1.216, suggesting that MESS-E tends to overestimate
the polarization energy by roughly 20%, which corresponds to a 20.6%
relative error (REL) and a −0.386 kcal/mol mean signed error
(MSE) in Table 1. The maximum error (MAX) is
1.339 kcal/mol, which is slightly above the desired chemical accuracy
of 1 kcal/mol. In contrast, the MESS-E scheme only overestimates the
M06-2X values by 3% (slope 1.030, REL 3.3%, MSE −0.047 kcal/mol)
and actually slightly overestimates the ωB97X-D values by 3–4%
(slope 0.977, REL 3.8%, MSE 0.048 kcal/mol). It is very encouraging
that, without performing even a single Fock build at each configuration,
MESS-E can reproduce the M06-2X and ωB97X-D polarization energies
with an average error less than 0.1 kcal/mol and a maximum error of
0.2–0.4 kcal/mol (0.358 kcal/mol with M06-2X and 0.209 kcal/mol
with ωB97X-D).When applying the MESS-H scheme to the
solvated methanol, we first
solved for 15, 30, or 60 eigenvectors of the electronic Hessian of
the gas-phase molecule. We will refer to these calculations as MESS-H(15),
MESS-H(30), and MESS-H(60) in subsequent discussions. As shown in
Figure 2, the estimated MESS-H energies all
correlate well with the exact values, with R2 values greater than 0.998. MESS-H(15) slightly underestimates
the polarization energy by 4–5%, with MSE values of 0.066 kcal/mol
(B3LYP), 0.090 kcal/mol (M06-2X), and 0.067 kcal/mol (ωB97X-D).
MESS-H(30) has a higher initial cost, because it requests twice as
many eigenvectors for the gas-phase inverse Hessian construction,
and it also leads to more accurate energies. MESS-H(30) energies are
only 1% larger than the exact values, and the MSE values are −0.016
kcal/mol (B3LYP), −0.014 kcal/mol (M06-2X), and −0.010
kcal/mol (ωB97X-D). However, the results do not improve further
upon requesting even more eigenvectors. As shown by the slopes in
Figure 2 and the error values in Table 1, MESS-H(60) actually leads to slightly larger errors
than MESS-H(30). Overall, all these MESS-H models led to highly accurate
energies with all three functionals, with MSE errors smaller than
0.1 kcal/mol, and maximum errors of 0.15–0.23 kcal/mol.The MESS-E and MESS-H results with the larger 6-311++G** basis
set are also listed in Table 1, which shows
roughly the same performance as the smaller 6-31+G* basis set. MESS-E
energies are quite accurate in M06-2X and ωB97X-D calculations,
with MSE errors of −0.078 kcal/mol (M06-2X) and 0.048 kcal/mol
(ωB97X-D) and maximum errors of 0.462 kcal/mol (M06-2X) and
0.223 kcal/mol (ωB97X-D). The MESS-E errors are also much larger
in B3LYP calculations. On the other hand, MESS-H(15), MESS-H(30),
and MESS-H(60) all produce energies with MSE values of no more than
0.14 kcal/mol and maximum errors of no more than 0.36 kcal/mol. With
both basis sets, the best performance occurs with MESS-H(30), where
the number of requested eigenvectors is around twice the number of
electrons—18 for methanol.
Solvated
β-Alanine
The results
for solvated β-alanine are shown in Figure 3 and Table 2. The QM/MM polarization
energy is more significant for this system: −8 to −1
kcal/mol with the three functionals.
Figure 3
Estimated MESS-QM/MM polarization energies
(y-axis)
versus the exact values (x-axis) for 1000 configurations
of a single β-alanine molecule solvated in a cubic box of 1000
water molecules. All energies are in kcal/mol. The MESS-H(30), MESS-H(60),
and MESS-H(90) estimations started by solving iteratively for 30,
60, or 90 eigenvalues/eigenvectors of the electronic Hessian. B3LYP
results are shown in the left four panels, M06-2X results in the middle,
and ωB97X-D results in the right four panels. The gray lines
correspond to a hypothetical perfect estimation. The red lines are
linear fits of the estimated energy values against the exact values,
with R2 values and slopes listed at the
corner of each panel.
Table 2
Errors in the Estimated QM/MM Polarization
Energy for the β-Alanine Molecule Solvated in a Cubic Box of
1000 Water Moleculesa
6-31+G*
basis
6-311++G**
basis
functional
scheme
MSE
RMS
MAX
REL
MSE
RMS
MAX
REL
B3LYP
MESS-E
–0.982
1.067
3.046
24.6%
–1.041
1.130
3.224
24.5%
MESS-H(30)
0.509
0.534
1.159
13.3%
0.711
0.739
1.418
17.4%
MESS-H(60)
0.199
0.210
0.425
5.3%
0.336
0.351
0.687
8.3%
MESS-H(90)
0.062
0.075
0.206
1.8%
0.174
0.186
0.405
4.4%
M06-2X
MESS-E
–0.194
0.270
1.189
5.2%
–0.241
0.314
1.355
5.9%
MESS-H(30)
0.528
0.551
1.048
14.0%
0.669
0.696
1.272
16.7%
MESS-H(60)
0.206
0.218
0.462
5.5%
0.332
0.348
0.701
8.4%
MESS-H(90)
0.081
0.094
0.244
2.3%
0.165
0.176
0.421
4.2%
ωB97X-D
MESS-E
0.060
0.161
0.601
3.6%
0.079
0.175
0.612
3.7%
MESS-H(30)
0.484
0.508
1.088
12.8%
0.623
0.649
1.316
15.5%
MESS-H(60)
0.162
0.174
0.412
4.4%
0.274
0.292
0.686
6.9%
MESS-H(90)
0.049
0.070
0.306
1.6%
0.127
0.142
0.372
3.3%
Mean signed errors (MSE), root
mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in
the QM/MM polarization energies estimated with MESS-E and MESS-H schemes
are listed, together with the average of the relative unsigned error
(REL). For the MESS-H calculations, the number within the parentheses
refers to the number of eigenvectors used in the construction of the
inverse of the electronic Hessian. Averaged over 1000 configurations
from a 1 ns trajectory.
Estimated MESS-QM/MM polarization energies
(y-axis)
versus the exact values (x-axis) for 1000 configurations
of a single β-alanine molecule solvated in a cubic box of 1000
water molecules. All energies are in kcal/mol. The MESS-H(30), MESS-H(60),
and MESS-H(90) estimations started by solving iteratively for 30,
60, or 90 eigenvalues/eigenvectors of the electronic Hessian. B3LYP
results are shown in the left four panels, M06-2X results in the middle,
and ωB97X-D results in the right four panels. The gray lines
correspond to a hypothetical perfect estimation. The red lines are
linear fits of the estimated energy values against the exact values,
with R2 values and slopes listed at the
corner of each panel.Mean signed errors (MSE), root
mean square errors (RMS), maximum errors (MAX), all in kcal/mol, in
the QM/MM polarization energies estimated with MESS-E and MESS-H schemes
are listed, together with the average of the relative unsigned error
(REL). For the MESS-H calculations, the number within the parentheses
refers to the number of eigenvectors used in the construction of the
inverse of the electronic Hessian. Averaged over 1000 configurations
from a 1 ns trajectory.MESS-E energies correlate well with the exact values, with R2 values of 0.989 (B3LYP) and 0.992 (M06-2X
and ωB97X-D). In B3LYP calculations, MESS-E again significantly
overestimates the polarization energy (MSE −0.982 kcal/mol
with 6-31+G* basis and −1.041 kcal/mol with 6-311++G** basis;
REL 24.5% or 24.6% with the two basis sets). Even more significant
are the maximum errors, which are 3.046 kcal/mol with 6-31+G* and
3.224 kcal/mol with 6-311++G**, and thus these MESS-E values should
not be used without rescaling. In M06-2X calculations, MESS-E energies
are again relatively more accurate, with the MSE values with M06-2X
functionals being −0.194 and −0.241 kcal/mol, respectively.
Nevertheless, the maximum errors are still as large as 1.189 and 1.355
kcal/mol. The MESS-E energies are much more accurate in ωB97X-D
calculations, with MSE values as small as 0.060 and 0.079 kcal/mol.
But, with maximum errors of 0.601 and 0.612 kcal/mol, the MESS-E energy
values should be used with caution, if not rescaled.The MESS-H
calculations all tend to underestimate the polarization
energy here, and they produce increasingly more accurate results as
more eigenvectors are requested. The MSE values are 0.484–0.711
kcal/mol with MESS-H(30) and are reduced to 0.162–0.336 kcal/mol
with MESS-H(60), and further down to 0.049–0.174 kcal/mol with
MESS-H(90). At the same time, the maximum errors are reduced from
1.088 to 1.418 kcal/mol with MESS-H(30) to 0.412–0.701 with
MESS-H(60) and finally to 0.206–0.421 kcal/mol with MESS-H(90).
The relative errors goes down from 12.8−17.4% and 4.4–8.4%
and finally to 1.6–4.4%. The best performance here is observed
with MESS-H(90), for which the number of requested eigenvectors is
again about twice the number of electrons—48 for β-alanine.
Oxyluciferin–Luciferase Complex
As
shown in Figure 4, the QM/MM polarization
energies range from −5 to −2 kcal/mol with the three
functionals and the 6-31+G* basis set. With slopes of 1.858, 1.364,
and 1.184 for three functionals, the MESS-E scheme now even more significantly
overestimates the polarization energy, when compared to two previous
cases (methanol and β-alanine).
Figure 4
Estimated MESS-QM/MM polarization energies
(y-axis)
versus the exact values (x-axis) for 100 configurations
of an oxyluciferin molecule embedded in luciferase. All energies are
in kcal/mol. The MESS-H(120), MESS-H(180), and MESS-H(240) estimations
started by solving iteratively for 120, 180, or 240 eigenvalues/eigenvectors
of the electronic Hessian. B3LYP results are shown in the left four
panels, M06-2X results in the middle, and ωB97X-D results in
the right four panels. The gray lines correspond to a hypothetical
perfect estimation. The red lines are linear fits of the estimated
energy values against the exact values, with R2 values and slopes listed at the corner of each panel.
Estimated MESS-QM/MM polarization energies
(y-axis)
versus the exact values (x-axis) for 100 configurations
of an oxyluciferin molecule embedded in luciferase. All energies are
in kcal/mol. The MESS-H(120), MESS-H(180), and MESS-H(240) estimations
started by solving iteratively for 120, 180, or 240 eigenvalues/eigenvectors
of the electronic Hessian. B3LYP results are shown in the left four
panels, M06-2X results in the middle, and ωB97X-D results in
the right four panels. The gray lines correspond to a hypothetical
perfect estimation. The red lines are linear fits of the estimated
energy values against the exact values, with R2 values and slopes listed at the corner of each panel.As show in Table 3, in B3LYP calculations,
the MSE (over 100 configurations) is now as large as −2.731
kcal/mol (REL 82.7%), with a maximum error of 5.266 kcal/mol. This
improves in M06-2X calculations, where the MSE is −1.114 kcal/mol,
REL is 35.2%, and the maximum error is 2.129 kcal/mol. Even in ωB97X-D
calcualtions, where MESS-E performs the best out of the three functionals,
the MSE is still as large as −0.558 kcal/mol, and the maximum
error is 1.219 kcal/mol, and the relative error is 17.6%. So for all
three functionals, the MESS-E energy cannot be used without rescaling.
Table 3
Errors in the Estimated QM/MM Polarization
Energy for the Oxyluciferin–Luciferase Complexa
6-31+G*
basis
functional
scheme
MSE
RMS
MAX
REL
B3LYP
MESS-E
–2.731
2.874
5.266
82.7%
MESS-H(120)
0.251
0.254
0.341
7.9%
MESS-H(180)
0.174
0.176
0.237
5.5%
MESS-H(240)
0.136
0.137
0.190
4.3%
M06-2X
MESS-E
–1.114
1.169
2.129
35.2%
MESS-H(120)
0.222
0.224
0.296
7.3%
MESS-H(180)
0.149
0.150
0.208
4.9%
MESS-H(240)
0.104
0.105
0.161
3.4%
ωB97X-D
MESS-E
–0.558
0.605
1.219
17.6%
MESS-H(120)
0.185
0.187
0.258
6.1%
MESS-H(180)
0.113
0.115
0.176
3.7%
MESS-H(240)
0.077
0.079
0.129
2.5%
The oxyluciferin
substrate
is the QM region, and the luciferase is described with the CHARMM
force field. Mean signed errors (MSE), root mean square errors (RMS),
maximum errors (MAX), all in kcal/mol, in the QM/MM polarization energies
estimated with MESS-E and MESS-H schemes are listed, together with
the average of the relative unsigned error (REL). Averaged over 100
configurations from a 100 ps trajectory.
The oxyluciferin
substrate
is the QM region, and the luciferase is described with the CHARMM
force field. Mean signed errors (MSE), root mean square errors (RMS),
maximum errors (MAX), all in kcal/mol, in the QM/MM polarization energies
estimated with MESS-E and MESS-H schemes are listed, together with
the average of the relative unsigned error (REL). Averaged over 100
configurations from a 100 ps trajectory.The MESS-H scheme has reliably led to more accurate
energies with
an increasing number of requested eigenvalues for the electronic Hessian.
MESS-H(120) led to MSE values around 0.2 kcal/mol (0.251 kcal/mol
(B3LYP), 0.222 kcal/mol (M06-2X), and 0.185 kcal/mol (ωB97X-D))
and maximum errors around 0.3 kcal/mol (0.341 kcal/mol (B3LYP), 0.296
kcal/mol (M06-2X), and 0.258 kcal/mol (ωB97X-D)). These errors
are reduced roughly by one-third with MESS-H(180), and by one-half
with MESS-H(240). Indeed, MESS-H(240) led to MSE values around 0.1
kcal/mol (0.136 kcal/mol (B3LYP), 0.104 kcal/mol (M06-2X), 0.077 kcal/mol
(ωB97X-D)). Meanwhile, the MESS-H(240) maximum errors are all
below 0.2 kcal/mol (0.190 kcal/mol (B3LYP), 0.161 kcal/mol (M06-2X),
0.129 kcal/mol (ωB97X-D)). Here, the number of requested eigenvectors,
240, is again roughly twice that of the number of electrons—128
for oxyluciferin.
Further Analysis of the MESS-E
Scheme
So far, we have observed the following general trend
for the errors
in the MESS-E energies: (i) ωB97X-D < M06-2X ≪ B3LYP
and (ii) methanol < β-alanine < oxyluciferin. On one end,
MESS-E performs very well for solvated methanol, with MSE values less
than 0.1 kcal/mol with M06-2X and ωB97X-D functionals. On the
other end, it significantly overestimates the QM/MM polarization energy
for oxyluciferin in luciferase: with B3LYP, the MSE is as large as
−2.731 kcal/mol. Here we shall attempt to unravel the underlying
reason for this rather uneven performance.Because MESS-E almost
always overestimates the polarization energy, the exception being
ωB97X-D calculations on solvated methanol and β-alanine,
let us introduce a tuning parameter, λ, into eq 19, which then becomesBy diagonalizing this Fock matrix,we obtain a set of MO coefficients, C(λ), which change continuously with parameter
λ. From these MOs, we can compute the corresponding density
matrix, P(λ), electron density,
ρ(λ), and Kohn–Sham DFT energy,Note that E(λ) provides
an upperbound to E(full), the actual DFT energy within
a given MM potential, which corresponds to full converged MOs, C(full), and its corresponding electron density,
ρ(full).To assess the degree with which the
occupied Kohn–Sham orbitals
and thus the electron density respond to the applied MM potential,
we computed the difference between ρ(λ) and
the full-converged SCF density ρ(full):and plotted them and E(λ)
values against λ (ranging from 0 to 1.5) in Figure 5 for the first configuration of each test system
with all three functionals. There, in the panels with energy curves,
the energy values are relative to the full-converged DFT energy, so
that the values at λ = 0 are (the absolute value of) the exact
polarization energies.
Figure 5
Deviation in the electron density and in DFT energies
from the
fully converged solutions versus the λ parameter. The deviation
in the electron density is measured with Δn, which is defined in eq 32. The energy values
are relative to the fully converged DFT energy, so that the values
at λ = 0 are (the absolute value of) the exact polarization
energies. The plotted energies contain extra 2-electron and exchange–correlation
contributions that are not included in the MESS-E energies. The left
panels correspond to results for solvated methanol, the middle panels
for solvated β-alanine, and the right panels for the oxyluciferin–luciferase
complex.
Deviation in the electron density and in DFT energies
from the
fully converged solutions versus the λ parameter. The deviation
in the electron density is measured with Δn, which is defined in eq 32. The energy values
are relative to the fully converged DFT energy, so that the values
at λ = 0 are (the absolute value of) the exact polarization
energies. The plotted energies contain extra 2-electron and exchange–correlation
contributions that are not included in the MESS-E energies. The left
panels correspond to results for solvated methanol, the middle panels
for solvated β-alanine, and the right panels for the oxyluciferin–luciferase
complex.It is clear in Figure 5 that, given a test
system and a functional, the Δn curve and the E(λ) curve reach minimum at the same λ values.
And in all cases, the minimum occurs where λ is smaller than
1. This indicates that δP(1) always
overestimates the response in the density matrix.In ωB97X-D
calculations, the energy reaches a minimum of
0.325 kcal/mol at λ = 0.9 for solvated methanol, 0.919 kcal/mol
at λ = 0.8 for solvated β-alanine, and 0.497 kcal/mol
at λ = 0.625 for the oxyluciferin–luciferase complex.
In M06-2X calculations, the minimums are 0.296 kcal/mol at λ
= 0.875 (methanol), 0.873 kcal/mol at λ = 0.775 (β-alanine),
and 0.482 kcal/mol at λ = 0.55 (oxyluciferin). In B3LYP calculations,
the minimums are 0.31 kcal/mol at λ = 0.725 (methanol), 1.004
kcal/mol at λ = 0.625 (β-alanine), and 0.613 kcal/mol
at λ = 0.4 (oxyluciferin).The total energy within the
MESS-E scheme,is a sum of gas-phase energy, E(0), the
first-order energy change, E(1) (eq 18), and the MESS-E approximation
to the polarization energy, EMESS-E(2) (eq 23). Compared to E(λ) at λ = 1
in Figure 5, the MESS-E total energy does not
include (i) 2-electron integral contributions, δP(1)·II·δP(1), and (ii) their DFT exchange–correlation counterparts.
As a result, the MESS-E energies are no longer upper bounds to the
full-converged DFT energy. Due to its dependence on δP(1), which overestimates the response in the density matrix,
the MESS-E formula in eq 23 understandably tends
to overestimate the QM/MM polarization energy. In Figure 6, the MSE in the MESS-E energies from Tables 1–3 are plotted against
the λ values above. There is a clear trend: the smaller the
λ value is at the minimum, the larger errors we find in the
MESS-E energies.
Figure 6
MSE in the MESS-E energies from Tables 1–3 versus the λ values
at the
minimums in Figure 5.
MSE in the MESS-E energies from Tables 1–3 versus the λ values
at the
minimums in Figure 5.
Conclusions
In this work, we proposed
two schemes, MESS-E and MESS-H, for a
fast estimation of the MESS-QM/MM polarization energies and applied
them to three test systems: solvated methanol, solvated β-alanine,
and the luciferin–luciferase complex. The main findings are
as follows:The MESS-E scheme in general
overestimates the polarization energy, and the errors follow the trends
ωB97X-D < M06-2X ≪ B3LYP and methanol < β-alanine
< oxyluciferin. In the best cases, solvated methanol with the M06-2X
or ωB97X-D functionals, for example, the MSE is smaller than
0.1 kcal/mol. In the worst cases, such as the luciferin–luciferase
complex with the B3LYP functional, the MSE is as large as −2.731
kcal/mol. This significant difference in its performance is suggested
to arise from the overestimation of the density matrix response in
the MESS-E scheme, and a λ parameter was introduced to assess
the degree of this overestimation.The MESS-H scheme can reliably
estimate the polarization energy. The average errors can be around
or even below 0.2 kcal/mol, if the number of eigenvectors used to
construct the inverse electronic Hessian is at least about twice the
number of electrons in the QM region.MESS-E and MESS-H schemes only
require the evaluation of the MM electrostatic potential and completely
avoid the SCF iterations at each configuration, especially the expensive
updating of Coulomb, Hartree–Fock exchange, and DFT exchange–correlation
matrices, and thus they can be performed with only a tiny fraction
of the cost associated with standard QM/MM calculations.This work is limited in a number of ways:Although we
have demonstrated
the accuracy of the two MESS schemes, especially MESS-H, for individual
configurations of our test systems, it remains to be seen how much
error these approximations cause to the overall free energies;Though a λ
parameter has
been introduced to assess the overestimation of the density response
in the MESS-E scheme, the dependence of the optimal λ values
on the molecule, on the QM method and on the external MM environment
needs to be explored in more detail before developing a MESS-E scheme
with proper rescaling that is universally applicable and the least
arbitrary. Specifically, there has been some clear indication that
the optimal λ values as shown in Figure 5 are correlated with the HOMO–LUMO gaps. For example, B3LYP,
which has the smallest fraction for the Hartree–Fock exact
exchange (20%) out of the three functionals, leads to the smallest
HOMO–LUMO gaps and then the smallest optimal λ values.In many cases, the
gas-phase
electron structure is not necessarily the best reference for the MESS-E
and MESS-H calculations. Other references, a single configuration
with external MM potential or an average over multiple configurations,
should be tested.Although the QM/MM polarization
energy can be estimated within a reasonable accuracy, one might want
to estimate the QM/MM energy gradient as well, which is required to
compute the enthalpic correction to the free energy due to the constraint
force. It remains to be seen whether the QM/MM gradient is more sensitive
to the error in Kohn–Sham orbitals than the energy.The MM electrostatic
potential
used in the solvated methanol or β-alanine calculations comes
only from MM atoms in the central unit cell. Although their images
in PBC simulations are not expected to significantly add to the polarization
of the QM region, especially if a significantly large unit cell is
used (as the case with this work), an extension of MESS-E and MESS-H
schemes to QM/MM calculations with a periodic boundary condition[75−77] might be desirable.For a macromolecular system with
tens of thousands of (if not more) MM atoms, as in our test case of
the oxyluciferin–luciferase complex, the cost of evaluating
the MM electrostatic potential in the QM region can be high. For example,
for each MM configuration in the luciferase calculation with B3LYP/6-31+G*
level of theory, it still takes about 8 s to evaluate the MM electrostatic
potential on a single Intel Xeon E5420 2.5 GHz processor (whereas
it would have required about 150 s to fully converge SCF within 7
cycles starting from gas-phase molecular orbitals). Some techniques
in the J-engine[78,79] and quantum Ewald mesh[80] methods can potentially be adapted to help speed
up the evaluation of MM potential.At this moment, the MESS-E and
MESS-H schemes are only applicable to Hartree–Fock, pure DFT,
hybrid DFT, and range-separated DFT. One has yet to extend it to double-hybrid
DFT or post-Hartree–Fock methods.We have limited our study to
fixed point-charge force fields on the MM atoms, and therefore we
have completely neglected the “back” polarization effects,
where the QM nuclei/electrons also polarize the MM atoms.Currently, we are actively pursuing work
to address some of these
limitations.
Authors: Yihan Shao; Laszlo Fusti Molnar; Yousung Jung; Jörg Kussmann; Christian Ochsenfeld; Shawn T Brown; Andrew T B Gilbert; Lyudmila V Slipchenko; Sergey V Levchenko; Darragh P O'Neill; Robert A DiStasio; Rohini C Lochan; Tao Wang; Gregory J O Beran; Nicholas A Besley; John M Herbert; Ching Yeh Lin; Troy Van Voorhis; Siu Hung Chien; Alex Sodt; Ryan P Steele; Vitaly A Rassolov; Paul E Maslen; Prakashan P Korambath; Ross D Adamson; Brian Austin; Jon Baker; Edward F C Byrd; Holger Dachsel; Robert J Doerksen; Andreas Dreuw; Barry D Dunietz; Anthony D Dutoi; Thomas R Furlani; Steven R Gwaltney; Andreas Heyden; So Hirata; Chao-Ping Hsu; Gary Kedziora; Rustam Z Khalliulin; Phil Klunzinger; Aaron M Lee; Michael S Lee; Wanzhen Liang; Itay Lotan; Nikhil Nair; Baron Peters; Emil I Proynov; Piotr A Pieniazek; Young Min Rhee; Jim Ritchie; Edina Rosta; C David Sherrill; Andrew C Simmonett; Joseph E Subotnik; H Lee Woodcock; Weimin Zhang; Alexis T Bell; Arup K Chakraborty; Daniel M Chipman; Frerich J Keil; Arieh Warshel; Warren J Hehre; Henry F Schaefer; Jing Kong; Anna I Krylov; Peter M W Gill; Martin Head-Gordon Journal: Phys Chem Chem Phys Date: 2006-06-12 Impact factor: 3.676
Authors: B R Brooks; C L Brooks; A D Mackerell; L Nilsson; R J Petrella; B Roux; Y Won; G Archontis; C Bartels; S Boresch; A Caflisch; L Caves; Q Cui; A R Dinner; M Feig; S Fischer; J Gao; M Hodoscek; W Im; K Kuczera; T Lazaridis; J Ma; V Ovchinnikov; E Paci; R W Pastor; C B Post; J Z Pu; M Schaefer; B Tidor; R M Venable; H L Woodcock; X Wu; W Yang; D M York; M Karplus Journal: J Comput Chem Date: 2009-07-30 Impact factor: 3.376
Authors: Andrew C Simmonett; Frank C Pickard; Yihan Shao; Thomas E Cheatham; Bernard R Brooks Journal: J Chem Phys Date: 2015-08-21 Impact factor: 3.488
Authors: Eric G Kratz; Alice R Walker; Louis Lagardère; Filippo Lipparini; Jean-Philip Piquemal; G Andrés Cisneros Journal: J Comput Chem Date: 2016-01-18 Impact factor: 3.376
Authors: Jing Huang; Ye Mei; Gerhard König; Andrew C Simmonett; Frank C Pickard; Qin Wu; Lee-Ping Wang; Alexander D MacKerell; Bernard R Brooks; Yihan Shao Journal: J Chem Theory Comput Date: 2017-01-24 Impact factor: 6.006
Authors: Ye Mei; Andrew C Simmonett; Frank C Pickard; Robert A DiStasio; Bernard R Brooks; Yihan Shao Journal: J Phys Chem A Date: 2015-05-18 Impact factor: 2.781
Authors: Gerhard König; Ye Mei; Frank C Pickard; Andrew C Simmonett; Benjamin T Miller; John M Herbert; H Lee Woodcock; Bernard R Brooks; Yihan Shao Journal: J Chem Theory Comput Date: 2015-12-11 Impact factor: 6.006