Timothy J Giese1, Haoyuan Chen1, Ming Huang2, Darrin M York1. 1. BioMaPS Institute and Department of Chemistry and Chemical Biology, Rutgers University , Piscataway, New Jersey 08854-8087, United States. 2. BioMaPS Institute and Department of Chemistry and Chemical Biology, Rutgers University , Piscataway, New Jersey 08854-8087, United States ; Scientific Computation, University of Minnesota , 207 Pleasant Street SE, Minneapolis, Minnesota 55455-0431, United States.
Abstract
We parametrize a linear-scaling quantum mechanical force field called mDC for the accurate reproduction of nonbonded interactions. We provide a new benchmark database of accurate ab initio interactions between sulfur-containing molecules. A variety of nonbond databases are used to compare the new mDC method with other semiempirical, molecular mechanical, ab initio, and combined semiempirical quantum mechanical/molecular mechanical methods. It is shown that the molecular mechanical force field significantly and consistently reproduces the benchmark results with greater accuracy than the semiempirical models and our mDC model produces errors twice as small as the molecular mechanical force field. The comparisons between the methods are extended to the docking of drug candidates to the Cyclin-Dependent Kinase 2 protein receptor. We correlate the protein-ligand binding energies to their experimental inhibition constants and find that the mDC produces the best correlation. Condensed phase simulation of mDC water is performed and shown to produce O-O radial distribution functions similar to TIP4P-EW.
We parametrize a linear-scaling quantum mechanical force field called mDC for the accurate reproduction of nonbonded interactions. We provide a new benchmark database of accurate ab initio interactions between sulfur-containing molecules. A variety of nonbond databases are used to compare the new mDC method with other semiempirical, molecular mechanical, ab initio, and combined semiempirical quantum mechanical/molecular mechanical methods. It is shown that the molecular mechanical force field significantly and consistently reproduces the benchmark results with greater accuracy than the semiempirical models and our mDC model produces errors twice as small as the molecular mechanical force field. The comparisons between the methods are extended to the docking of drug candidates to the Cyclin-Dependent Kinase 2 protein receptor. We correlate the protein-ligand binding energies to their experimental inhibition constants and find that the mDC produces the best correlation. Condensed phase simulation of mDCwater is performed and shown to produce O-O radial distribution functions similar to TIP4P-EW.
Linear-scaling
quantum-mechanical (LS-QM) methods[1] overcome
the computational bottlenecks of traditional QM
methods without resorting to the use of additional parametrized functions.
As a result, they offer a means for modeling system sizes much larger
than would otherwise be practical while largely retaining the accuracy
of a standard implementation. There are numerous recent examples that
further develop LS-QM methods in an effort to make them more affordable;[2−7] however, the widespread use of LS-QM methods continue to be hampered
by their high cost relative to combined quantum mechanical/molecular
mechanics (QM/MM) or traditional force fields. Furthermore, the LS-QM
literature has generally placed greater focus on the benchmark timings
of ad hoc systems or on the application of LS-QM methods without having
convincingly demonstrated that their added benefit justifies their
higher cost. For example, if one must consider relatively large systems
to reap the benefits of a LS-QM method, then one will pragmatically
choose to use small basis sets and inexpensive Hamiltonians, such
as B3LYP/6-31G* or a semiempirical/tight-binding model. But even if
a LS-QM method at this level was demonstrated to be computationally
feasible for a real application, one may find the accuracy unacceptable
or inferior to cheaper parametric alternatives. The GAFF MM force
field[8,9] after all, is many times faster than even
the fastest semiempirical method and can, for many problems, model
intermolecular interactions with acceptable accuracy even though it
does not explicitly treat electronic polarization. The utility of
LS-QM methods have thus been highlighted by applications that require
a detailed description of the electron density or molecular orbitals
(MOs), including, the description of biomolecular electrostatic potentials,[10] electron transfer[11−21] and excitation[22] in large systems, and
inferring enzyme specificity from frontier orbitals.[23] Some of the most promising examples that demonstrate the
utility of LS-QM methods have been provided by Merz, whom has used
them to identify misfolded proteins,[24] compute
NMR chemical shifts,[25,26] decompose protein–ligand
interactions,[27] and perform ligand-binding
and drug design.[28−31] Others have used LS-QM methods to examine interactions within clusters,[32−36] compute chemical reaction energies,[37−40] and perform drug screening.[41−43]Linear-scaling quantum mechanical force fields (QMFFs) are
an attempt
to circumvent the above criticism by supplementing the LS-QM energy
with parametrized interactions. The parametrized functional form provides
the means to make a concerted improvement to the method’s accuracy
and performance while significantly reducing, if not entirely eliminating,
their “break-even point”; i.e., the system size required
to yield a performance benefit.There are two categories of
QMFFs that have found chemical application
to molecular systems: (1) MO-based models, such as X-Pol[44−48] and the water-specific XP3P model,[49] and
(2) electron density-based models, e.g. SIBFA, GEM, and QMPFF.[50−61] The MO-based QMFFs, such as those proposed by Gao,[44] achieve their performance by approximating the wave function
with a Hartree product of antisymmetrized fragment determinants.[62] This approximation results in a block diagonal
Fock matrix that can be diagonalized fragment-by-fragment. The number
of blocks increases with the size of the system, but the size of each
block does not, so the scaling of the Fock matrix diagonalization
reduces from O(N3) to O(N). Lennard-Jones (LJ) or Buckingham
potentials are then used to recuperate the energetic effect of interfragment
MO coupling.[19,45,46,62−66] The density-based QMFFs avoid Fock matrix diagonalization
altogether; the interfragment interactions are computed from the overlap
and Coulomb interactions of an accurate density prefitted to ab initio
and is allowed to respond to the environment according to the principle
of chemical potential equalization. We recently introduced[67] a modified divide-and-conquer (mDC) QMFF model
which made an attempt at unifying the orbital- and density-based QMFFs.
In that model, the density resulting from fragment MOs were used to
interact fragments with a density-overlap van der Waals (vdW) model.[67,68]The present work is a parametrization of a MO-based mDC QMFF
which,
like X-Pol, models the interfragment vdW interactions with simple
LJ functions; however, there are two distinguishing features that
differentiate mDC from X-Pol. The interfragment X-Pol interactions
are treated with a QM/MM interaction potential; whereas, the mDC interactions
are not. A QM/MM potential does not produce symmetric interactions
in the sense that the interaction depends upon which residue is considered
the “QM region.” X-Pol therefore recomputes the interfragment
interactions for each QM fragment and averages the energy and forces.
The interfragment mDC interactions are symmetric and are therefore
computed once. Second, the mDC fragment densities are not limited
to being those used in a QM/MM potential. A QM/MM treatment of DFTB2
and DFTB3 densities would yield atomic charges only; however, this
severely degrades the accuracy of these methods. The ability of DFTB2
and DFTB3 to reproduce hydrogen bond angles is a result of their tight-binding
coupling matrix elements, not their charges. The Hartree–Product
approximation removes these coupling elements and their effect must
be recovered by performing electrostatics with higher-order atomic
multipoles.[67]In this work, we parametrize
the mDC method for nonbonded interactions;
we supply a new database, Sulfur Set 7 (SS7), of accurate nonbond
interactions involving sulfur; we apply the parametrized mDC method
to the screening of Cyclin-Dependent Kinase 2 (CDK2) protein receptor
drug candidates;[41] and we explore the ability
of mDC to simultaneously reproduce small-to-medium sized water clusters
and the condensed phase water radial distribution function (RDF) computed
from molecular dynamics simulation. A series of standard nonbond interaction
databases are used to compare the parametrized mDC model to MM force
fields, semiempirical models, inexpensive ab initio Hamiltonians,
and a semiempirical QM/MM method. The comparisons quantify the poor
accuracy of standard semiempirical models relative to a MM force field
and reveal the high accuracy of mDC. The CDK2 drug screening exercise
is repeated with MM and semiempirical QM/MM to further demonstrate
the benefits of our QMFF. The variety of molecular interactions examined
in this work and the number of other methods we compare mDC against
extend significantly beyond other recent investigations.[48]
Methods
Linear-Scaling
mDC Method
The mDC
total energy iswhere the
first term is a summation of fragment
ab initio energies, Cσ are the σ-spin MO
coefficients for the A’th fragment, and R are the nuclear positions of
the atoms in fragment A. The remaining terms describe
the interactions between the fragments. The second term is the interfragment
electrostatic energy, whereare atomic multipole moments on atom a, C(r) is a real
regular solid harmonic, ρ(r) is an atom-partitioned density, Z is a nuclear plus core electron charge,
andis a “multipolar potential”
arising from point-multipole Coulomb interactions. The primed summation
indicates that intrafragment electrostatics are excluded because those
Coulomb interactions are already considered in the ab initio calculation
of E.The last
two terms in eq 1 are an interfragment LJ energy
and a standard molecular mechanical (MM) bonded energy, respectively.
Unlike a standard MM Hamiltonian, Ebonded includes corrections only for those bonds, angles, and dihedrals
that cross the boundary between two covalently bonded fragments. The
systems examined in the present work involve intermolecular interactions
between nonbonded molecules and the intermolecular interaction between
drug ligands and a rigid protein; therefore, there is no need to further
elaborate on Ebonded here.The computational
advantage afforded by eq 1 arises from having
treated the MOs of each fragment as if they did
not overlap with those of any other fragment. This approximation allows
one to solve for the MO coefficients from a series of small generalized
eigenvalue problems (proportional to the size of a fragment)rather than solving a single eigenvalue problem
for the entire system. Although the lack of interfragment coupling
between the MOs is a significant approximation, the remaining interactions
in eq 1 are parametrized to reproduce high-level
ab initio data. The lack of explicit interfragment MO coupling matrix
elements does not imply that the fragments are uncoupled. The interfragment
coupling occurs through the interaction of their atomic multipoles
which are determined from the fragment electron densities within the
self-consistent-field (SCF) procedure. The σ-spin Fock matrix
for region A with inclusion of this coupling iswhereis the spin-resolved AO density matrix of
fragment A, and nσ is the occupation number of σ-spin orbital k in fragment A.Our linear-scaling framework
minimizes the mDC energy by performing
a double-SCF procedure.[44,45,69,70] In brief, each fragment is provided
a set of atom-centered external multipolar potentials p and is instructed to compute a converged fragment energy E and return a new set of atomic
multipole moments q. The linear-scaling framework evaluates
the interfragment electrostatics to transform the moments q into potentials p, and the procedure is repeated until
the total energy and q stop changing. Once convergence
is met, the atomic gradients are computed fromThe interface between the linear-scaling
framework
and the underlying Hamiltonian must extract or calculate the atomic
multipole moments q from the ab initio density, and it
must correct the model’s Fock matrix [eq 5] by evaluating the derivative ∂q/∂Pσ |.In the present work, we use the DFTB3 Hamiltonian (parametrization
“3ob”) described in ref (71). DFTB3 happens to use atomic charges in the
calculation of the energy, but our interface to DFTB3 is free to choose
a different set of q’s that are used to interact
a fragment with the outside world. This is a particularly useful feature
when using DFTB3 because its success in achieving good hydrogen bond
(H-bond) angles is largely a result of the intermolecular tight-binding
matrix elements[67] (e.g., Figure 1). The linear-scaling framework eliminates these
elements, so our approach is to recapture the angular dependence via
the electrostatic interactions by increasing the order of the atomic
multipoles. A mDC fragment can thus be thought as having two representations
of the atomic multipoles: a set of atomic charges that DFTB3 happens
to use internally and a set of q’s that our interface
constructs from the fragment’s total density matrix P = Pα + Pβ.
Figure 1
Comparison between optimized TIP3P/GAFF,
DFTB3, mDC, and B3LYP/6-31++G**
water–aspargine H-bond angle as a function of N···O
separation (3 and 4 Å). The TIP3P water is coplanar with the
ASN amine group. The DFTB3 water is angled in a manner similar to
B3LYP near the energy minimum but reverts to a TIP3P-like planar structure
when the inter-residue overlap becomes small. The mDC angle is in
close agreement with B3LYP even at larger separations than those shown.
Comparison between optimized TIP3P/GAFF,
DFTB3, mDC, and B3LYP/6-31++G**
water–aspargine H-bond angle as a function of N···O
separation (3 and 4 Å). The TIP3P water is coplanar with the
ASN amine group. The DFTB3water is angled in a manner similar to
B3LYP near the energy minimum but reverts to a TIP3P-like planar structure
when the inter-residue overlap becomes small. The mDC angle is in
close agreement with B3LYP even at larger separations than those shown.We decompose the DFTB3 density
into atomic components as follows:where χ(r) = χ(r)Y(Ω) is an AO basis function, andis a fraction between 0 and 1 and holds the
property f = 1 – f. If f = 1/2, then one would say that eq 8 is a Mulliken partitioning; however, we allow for
a bias as a function of atom-pair and Mulliken bond-orderfs and fd are parameters corresponding to
the fractions that we found
most agreeable when atoms a and b are connected by a single bond and double bond, respectively; and bs and bd are the Mulliken
single- and double-bond orders, i.e., fs ≡ f (bs) and fd ≡ f (bd). Had we expressed the partition using Wigner bond indices, bs and bd would take
values 1 and 2, respectively; however, Mulliken bond orders yield
convenient expressions for the Fock matrix corrections.is a smooth polynomial used to switch f from fs to fd as the bond order increases.The atomic multipoles
are obtained by inserting eq 8 into eq 2 while restricting the contributions
of the two-center densities to charge onlywhere theare treated as a parameters. For an sp-basis, there
are two parameters: M(1) and M(2), which control the magnitude of the
dipole and quadrupole contributions, respectively. When the multipole
expansion is limited to quadrupoles, an spd-basis
formally requires three additional parameters: M(2), M(1), and M(2); however, we constrain these additional parameters to the values
of M(2), M(1), and M(2), respectively.If the fragment is
not covalently bonded to any other fragment,
which is the predominant scenario in the present work, then eq 12 is the expression for the multipoles, i.e., q = q′. The CDK2 protein docking results described in this paper, however,
use a fixed protein structure that has been fragmented by residue.
Connection atoms are used to cap the resulting “dangling bonds”,
and we have found it necessary to treat the atomic multipoles at the
fragmentation boundaries specially, as described in Figure 2. For the purposes of the present work, we found
it acceptable to use hydrogens for the connection atoms. The charges
of the frontier atoms, i.e., the connection and boundary atoms, need
to be constrained to yield reliable SCF convergence. Although it is
possible to modify the ab initio method by introducing additional
constraints into the density matrix, it is much simpler and less intrusive
to compute the unconstrained atomic multipole moments from eq 12, manually change the charges of the frontier atoms,
and renormalize the remaining charges in the fragment, i.e.,where qmm, is a fixed molecular mechanical
charge andis the charge excess to be evenly distributed
among the nonfrontier atoms in the region after altering the charges
of the frontier atoms. Na,, Nb,, and Nc, are the total number of
atoms, boundary atoms, and connection atoms in fragment A, respectively. The summations over b and c in eq 15 are meant to encompass
the boundary and connection atoms, respectively. With this scheme,
the derivative required to correct the Fock matrix requires one additional
chain-rule:
Figure 2
Illustration of the relationship between residues
and fragments
and the relationship between boundary atoms and connection atoms in
the mDC method. Atoms C and D are connected by a covalent bond; therefore,
fragments 1 and 2 include pseudoatom D* and C*, respectively. When
fragment 1 is queried for the new set of multipole moments of atoms
A, B, and C; the charges of atoms C and D* are summed; the multipole
moments of atom D* are set to zero; the charge of atom C is assigned
a predefined fixed MM charge; and the excess charge from the sum is
evenly distributed to atoms A and B. An analogous procedure is performed
for fragment 2 to fix the charges of atoms C* and D. Thus, the charges
of atoms C and D are fixed and independent of the ab initio density.
Illustration of the relationship between residues
and fragments
and the relationship between boundary atoms and connection atoms in
the mDC method. Atoms C and D are connected by a covalent bond; therefore,
fragments 1 and 2 include pseudoatom D* and C*, respectively. When
fragment 1 is queried for the new set of multipole moments of atoms
A, B, and C; the charges of atoms C and D* are summed; the multipole
moments of atom D* are set to zero; the charge of atom C is assigned
a predefined fixed MM charge; and the excess charge from the sum is
evenly distributed to atoms A and B. An analogous procedure is performed
for fragment 2 to fix the charges of atoms C* and D. Thus, the charges
of atoms C and D are fixed and independent of the ab initio density.
Model
Hamiltonians
DFTB3
This method is described in
ref (71) and performed
using a
standard SCF procedure with the recently developed “3ob”
parametrization. We do not use the “H–H-mod”,
“H–N-mod”, nor spin-polarization corrections
described in ref (71) because these are intended to improve atomization energies, proton
affinities, and high-spin open-shell electronic configurations that
are not relevant here.
DFTB2
This method (sometimes called
SCC-DFTB) is described
in ref (72) and performed
using the “mio” parametrization. We do not use the γh correction that appears in more recent works[73−75] which has been shown in improve H-bond strengths. Nor do we supplement
DFTB2 with a London dispersion model for select nonbonded interactions,
which has been shown to improve the description of weakly bound complexes.[76]
PM6 and AM1
These semiempirical
methods are described
in refs (77) and (78), respectively; as implemented
in Gaussian 09.[79]
B3LYP
This method
refers to the B3LYP/6-31G* density
functional model implemented in Gaussian 09.[79]
GAFF
This comprises the general Amber force field[8,9] and TIP3P water. The CDK2 protein structure made use of the ff99
Amber force field[80] supplemented with parameters
from ref (81) for the
phosphotyrosine with a single protonated phosphate group. Figure 7 makes use of the TIP4P-EW water model.[82]
Figure 7
Comparison of water O–O radial distribution functions
(a),
water cluster binding energies (b), and the relative ordering of clusters
(c–e). The mue’s in b are mean unsigned errors of ΔE (kcal/mol). The R’s in c–e
are Pearson correlation coefficients.
MNDO/MM
Semiempirical QM/MM calculations
performed
with Amber 12.[9] Clusters of N monomers were geometry optimized N times to produce N relative energies and coordinate root-mean-square deviations
relative to the reference structure. The average relative energy and
coordinate root-mean-square is then used to avoid the ambiguity assigning
the QM region. This procedure is not used by X-Pol;
it is merely our attempt to compare to a traditional semiempirical
QM/MM method. The CDK2 calculations treat the ligand and protein with
MNDO and Amber ff99, respectively.
Reference
Data
S22
A database of 22 dimers is taken from ref (83). The set contains small
to relatively large complexes chosen to represent hydrogen bonding,
dispersion, and mixed electrostatic-dispersion interactions. The S22
reference is not a collection from a consistent level of theory but
is instead a collection of the best available calculations that could
be afforded at the time of its construction, and the reader should
refer to ref (83) for
full details. Generally speaking, most of the energies are an approximate
CCSD(T) complete basis extrapolation (CBS) from counterpoise (CP)
corrected calculations. Both the monomers are dimers are geometry
optimized.
JSCH-2005
A collection of 124 nucleobase
and amino
acid complexes compiled from several publications.[83] In this work, we refer only to a small subset of the JSCH-2005
database where optimized geometries of hydrogen bonded or stacked
nucleobases are available. In particular, we use the A···T
S1, A···T WC, G···C S, G···C
wc, mA···mT S, and mG···mC S complexes
from ref (84) [CCSD(T)/CBS//MP2/TZVPP]
and the G···A1, G···A1 pl, G···A2,
G···A2 pl, G···A3, and G···A4
complexes from ref (85) [MP2/CBS//MP2/cc-pVTZ]. Both the monomers and dimers are geometry
optimized.
SCAI
A set of 24 pairs of amino
acid side chain interactions
whose structures as taken from X-ray crystal structures.[86] The reference data was obtained[86] at CCSD(T)/CBS//TPSS/TZVP. The dimer geometries in this
database are constructed by optimizing the hydrogen positions while
fixing the heavy atom positions to the experimental structure. The
monomer geometries are taken from the partially optimized dimer structure.
This procedure is performed for each of the methods compared, as opposed
to having used the reference geometries without further optimization.
S66
A set of 66 complexes[87] considered
to be a more representative sampling of the distribution
of interaction motifs than the S22 database. Unlike S22, S66 uses
a consistent reference methodology for geometry optimizations (MP2/cc-pVTZ)
and single point energy refinement using a procedure described in
detail below. Furthermore, the S66 monomer structures are evaluated
using the dimer optimized structures instead of having included the
monomer deformation energy. Comparison to S66 will be made in two
different ways: “S66 (Fixed)” and “S66 (Procedure)”,
where S66 (Fixed) computes the interactions using the reference geometries
and S66 (Procedure) optimizes the dimers for each method and uses
the monomer structures from the resulting optimizations.
SS7
A new set of 7 sulfur-containing complexes (sulfur
set 7) presented as a part of this work (Figure 3). These complexes were chosen by extracting sulfur-containing residues
from a variety of crystal structures deposited in the PDB databank,
modeling the residues with small molecules, and geometry optimizing
for a minimum. In other words, the complexes are not guaranteed to
represent a global minimum configuration nor do they remain in the
orientation observed in the crystal structures. The protocol we follow
is analogous to S66: the dimers are optimized at MP2/cc-pVTZ and the
interaction energy is computed using the monomer structures from the
optimized dimer. The single point refinement energy iswhere all energies
are CP-corrected; aXZ denotes the aug-cc-pVXZ basis; ΔΔCCSD(T)
is the correlation energy difference between CCSD(T) and MP2; and
ΔMP2/CBS is the MP2 correlation energy in the CBS limit, as
estimated from Halkier’s two-point extrapolation formula[88]Comparisons to SS7 will be denoted “SS7
(Fixed)” and “SS7 (Procedure)” using the same
protocol described for S66.
Figure 3
Dimers used in the SS7 database. The interaction
energies (kcal/mol)
using the multilevel method described in the text are (a) −3.652,
(b) −3.737, (c) −6.036, (d) −6.473, (e) −3.683,
(f) −2.643, and (g) −5.191. Coordinates are provided
in the Supporting Information.
Dimers used in the SS7 database. The interaction
energies (kcal/mol)
using the multilevel method described in the text are (a) −3.652,
(b) −3.737, (c) −6.036, (d) −6.473, (e) −3.683,
(f) −2.643, and (g) −5.191. Coordinates are provided
in the Supporting Information.
Wat-2009
The reference structures
and binding energies
(CP-corrected MP2/CBS with a CCSD(T)/aDZ correction) of (H2O)n = 2–6 and
8 were taken from the database compiled[89−91] in ref (92).
Wat-2011
The reference
structures and binding energies
(RI-MP2/CBS with a CCSD(T)/aDZ correction) of (H2O)n = 2–10 where taken
from ref (93). Unlike
Wat-2009, these energies do not explicitly include CP-corrections,
but the CBS extrapolation was parametrized to well-reproduce CP-corrected
ΔE energies for a select set of clusters. As
a result, the water cluster ΔE binding energies
in this database are similar to those in the Wat-2009 database. The
purpose for using Wat-2011 in the present work is to examine the relative
ordering of clusters of a given cluster size (ΔΔE’s); a purpose for which the Wat-2009 database does
not adequately fullfill. It is not immediately clear to what extent
explicit CP-corrections might effect the ΔΔE’s, but we suspect that the general trends in ordering will
be similar. We therefore limit our analysis to Pearson correlations
of ΔΔE’s which tend to be either
“good” or “very bad.”
Parametrization
The mDC model makes
use of A and B LJ parameters (two parameters
per atom-type using Lorentz–Berthelot combining rules); Msp(1) and Mpp(2) multipole parameters (two parameters per
atomic number); fs, fd, bs, and bd charge mapping parameters (four parameters per atomic number pair);
and the boundary atom qmm, MM point charges, which are obtained along with the atom-type
assignments from the AmberTools suite of packages.[9] Although we refer to bs and bd as parameters, their values are the Mulliken
single- and double-bond orders for the pair of atoms and are of use
only if the fractions fs and fd are different. A full list of H, C, N, O, and S parameters
are provided as Supporting Information.The multipole and charge mapping parameters were adjusted by comparing
the mDC and B3LYP/6-311++G** electrostatic potentials evaluated on
the solvent accessible surface for a large set of molecules. Some
examples are shown in Figure 4. Fitting the
parameters “by eye” to match the electrostatic potentials
on the solvent accessible surface proved more reliable than having
adjusted them to minimize the molecular dipole and quadrupole moments.
The molecular multipole moments do improve with this procedure, however,
as is shown in Table 1.
Figure 4
Electrostatic potential
difference maps between B3LYP/6-311++G**
and mDC (top row) and DFTB3 (bottom row) evaluated on the solvent
accessible surface of acetic acid, cytosine, and tryptophan. Blue
and red indicate that the model requires more negative charge and
more positive charge, respectively, relative to B3LYP/6-311++G**.
Green indicates agreement between the electrostatic potential of the
model and B3LYP/6-311++G**. The colors are bounded by the range of
±0.003 au.
Table 1
Multipole
Moment Comparison between
B3LYP/6-311++G**//B3LYP/6-31++G** and Various Models for Amino Acids,
Their Side Chains, and Nucleobasesa
⟨|μ(1) – μref(1)|⟩
⟨| μ(2) – μref(2)|⟩
category
N
⟨μref(1)⟩
mDC
DFTB3
DFTB2
GAFF
⟨μref(2)⟩
mDC
DFTB3
DFTB2
GAFF
side-chains
20
1.36
0.11
0.13
0.17
0.16
8.68
1.19
1.90
2.08
1.22
amino acids
20
2.12
0.15
0.17
0.19
0.48
29.90
1.97
2.45
2.78
5.51
nucleobases
10
1.98
0.07
0.21
0.36
0.25
14.58
1.22
3.26
3.77
2.29
total
50
1.79
0.12
0.16
0.22
0.31
18.35
1.51
2.39
2.70
3.15
The model multipole moments are
computed at the B3LYP/6-31++G** optimized geometries. (1) and (2) are the molecular dipole and quadrupole multipole
moment vectors of length 3 and 5, respectively. ⟨μref(⟩ is the magnitude of the multipole moment vector averaged
over molecules and ⟨|( – ref(|⟩ is the magnitude of the multipole moment error vector
averaged over molecules. Boldface is used to highlight the best value
among the models.
Electrostatic potential
difference maps between B3LYP/6-311++G**
and mDC (top row) and DFTB3 (bottom row) evaluated on the solvent
accessible surface of acetic acid, cytosine, and tryptophan. Blue
and red indicate that the model requires more negative charge and
more positive charge, respectively, relative to B3LYP/6-311++G**.
Green indicates agreement between the electrostatic potential of the
model and B3LYP/6-311++G**. The colors are bounded by the range of
±0.003 au.The model multipole moments are
computed at the B3LYP/6-31++G** optimized geometries. (1) and (2) are the molecular dipole and quadrupole multipole
moment vectors of length 3 and 5, respectively. ⟨μref(⟩ is the magnitude of the multipole moment vector averaged
over molecules and ⟨|( – ref(|⟩ is the magnitude of the multipole moment error vector
averaged over molecules. Boldface is used to highlight the best value
among the models.The LJ
parameters were optimized to reduce the errors in the databases
shown in Figure 5 and Table 2.
Figure 5
Comparison of mDC with other Hamiltonians using several databases
of intermolecular interactions.
Table 2
Statistical Summary of mDC and Other
Hamiltonians for Several Databases of Intermolecular Interactionsa
model
DB
N
property
mDC
DFTB3
DFTB2
GAFF
AM1
PM6
B3LYP
MNDO/MM
S22
22
mue
0.699
3.012
3.338
1.588
4.401
3.226
1.336
2.504
mse
0.184
3.012
3.338
0.546
4.361
3.226
0.016
2.393
R
0.989
0.964
0.973
0.953
0.910
0.973
0.972
0.939
crms
0.180
0.579
0.277
0.255
0.924
0.407
0.843
0.467
S66 (Fixed)
66
mue
0.545
2.737
2.985
0.913
5.141
2.666
2.252
2.195
mse
0.275
2.737
2.985
0.708
5.141
2.666
1.332
2.183
R
0.982
0.957
0.973
0.958
0.521
0.965
0.930
0.899
S66
(Procedure)
66
mue
0.532
2.363
2.399
0.825
3.061
2.006
1.836
1.543
mse
–0.130
2.313
2.353
0.192
3.041
1.934
–0.142
1.505
R
0.988
0.961
0.968
0.960
0.890
0.949
0.940
0.952
crms
0.237
1.774
1.484
0.330
0.976
0.693
0.723
0.532
SS7 (Fixed)
7
mue
0.657
3.184
2.885
1.547
3.573
3.015
2.121
2.034
mse
0.657
3.184
2.885
1.547
3.573
3.015
2.121
2.034
R
0.964
0.452
0.346
0.836
0.799
0.585
0.869
0.815
SS7
(Procedure)
7
mue
0.393
2.817
2.248
1.039
2.549
2.070
1.418
1.321
mse
0.361
2.817
2.089
0.904
2.549
1.848
1.418
1.321
R
0.984
0.433
–0.041
0.874
0.903
0.472
0.901
0.908
crms
0.251
0.361
0.363
0.620
0.707
0.523
0.516
0.701
SCAI
23
mue
0.819
2.843
2.745
2.011
4.249
2.518
2.361
2.990
mse
0.668
2.843
2.745
1.940
4.249
2.518
0.646
2.987
R
1.000
0.998
0.998
0.997
0.995
0.999
0.999
0.995
JSCH-2005
12
mue
0.697
6.582
6.293
3.417
10.488
7.346
3.052
5.991
mse
0.096
5.825
5.637
1.788
10.488
7.346
–1.493
5.991
R
0.989
0.754
0.784
0.749
0.756
0.786
0.750
0.697
crms
0.183
1.005
0.644
0.507
1.294
0.713
1.023
0.509
Wat-2009
9
mue
0.352
3.424
13.547
1.890
4.354
6.223
20.499
4.338
mse
–0.100
3.424
13.547
–1.456
4.253
6.223
–20.499
4.338
R
1.000
0.999
1.000
0.998
0.995
0.998
0.998
0.998
crms
0.073
0.121
0.106
0.309
1.029
0.554
0.130
0.449
The terms mue
and mse are the
mean unsigned and mean signed error in ΔE (kcal/mol). R is the Pearson correlation coefficient between the reference
and model ΔE (unitless). crms is the average
coordinate root mean square deviation of the optimized dimer structures
to the reference structures (Å). Boldface is used to highlight
the best value amongst the models.
Comparison of mDC with other Hamiltonians using several databases
of intermolecular interactions.The terms mue
and mse are the
mean unsigned and mean signed error in ΔE (kcal/mol). R is the Pearson correlation coefficient between the reference
and model ΔE (unitless). crms is the average
coordinate root mean square deviation of the optimized dimer structures
to the reference structures (Å). Boldface is used to highlight
the best value amongst the models.
Drug Docking
Figure 6 displays the correlation between calculated protein–drug
binding affinity scores and the experimentally determined[94−96] protein inhibition constant (IC50) for a set of drug ligands to
the Cyclin-Dependent Kinase 2 (CDK2) protein receptor. The CDK2 structure[97] was taken from chain A of PDBID: 1H1S. Hydrogens were
added and geometry optimized with Amber ff99. The autodock program[98] was used to dock 73 ligands to the protein structure
using autodock’s default options. The ligands were chosen as
a subset of those listed in ref (41) subject to the availability of DFTB3 parameters.
One of these ligands, ligand 29 in ref (41) is chosen to be a “lead”, i.e.,
a ligand that has been identified as well-inhibiting the function
of the protein. The goal then, is to identify additional candidate
ligands that are structurally similar to the lead ligand but are predicted
to inhibit the protein function with greater efficacy. If one assumes
that ligands with a higher binding affinity to the receptor are generally
more potent, then one would expect to observe a strong correlation
between the ligand–receptor binding and the experimentally
determined IC50. We compute the protein–ligand gas-phase interaction
energy and compare the result to the experimental IC50s. These calculations
ignore the desolvation of the drug upon binding to the receptor and
the change in entropy associated with sterically inhibiting rotatable
bonds. We also ignore the configurational averaging that would occur
in dynamics and we freeze the CDK2 atoms to the 1H1S X-ray crystal
structure during geometry optimization.
Figure 6
(left) Base structure
common to all ligands. The atom marked “N*”
is used to compare the position of a bound ligand to the lead structure,
as explained in the text. (center) Lead ligand bound to the CDK2 protein.
(right) Correlation between the experimental protein inhibition constant
measured for each ligand and the assigned score relative to the lead
structure. Rao indicate the gas-phase DC-DFT results taken from ref (41) computed using a small
model system of CDK2.
(left) Base structure
common to all ligands. The atom marked “N*”
is used to compare the position of a bound ligand to the lead structure,
as explained in the text. (center) Lead ligand bound to the CDK2 protein.
(right) Correlation between the experimental protein inhibition constant
measured for each ligand and the assigned score relative to the lead
structure. Rao indicate the gas-phase DC-DFT results taken from ref (41) computed using a small
model system of CDK2.What remains to be described is the procedure that we performed
to generate the bound structures used to compute the binding affinity
scores. This procedure begins with the use of autodock, which binds
a ligand to the receptor many times, clusters the results, and returns
the coordinates of a docked ligand that is representative of each
cluster. Each ligand is allowed to produce up to 20 clusters and thus
we obtain up to 20 structures for each ligand and 632 structures in
total for the set of 73 ligands. The ligand geometry for each of the
632 structures is gas-phase optimized with Amber ff99 for 500 steps
while freezing the protein coordinates. The resulting Amber ff99 protein–ligand
interaction energy is then stored. 100 steps of gas phase minimization
are then performed for each of the 632 structures with mDC, and the
final mDC gas-phase protein–ligand interaction energy is stored.
The MNDO/MM results were computed using this same procedure. Although
we have, for a given method, 632 geometry optimized structures, not
all of them are geometrically relevant. For example, a structure is
not relevant if it is not bound in the receptor pocket nor if the
orientation of the ligand has been flipped so that the ligand substituents
have occupied the receptor pocket. In other words, we only consider
a structure to be relevant if it binds to the protein in the same
way as the lead structure. To quantify this, we choose an atom common
to all ligands (labeled N* in Figure 6) and
measure the displacement of this atom relative to the minimum energy
lead structure. We assign one structure to each of the 73 ligands
for each method by choosing the structure that binds most similarly
to the lead structure as measured by the N* atom unless the ligand
has two or more structures that are within 3 Å of the lead structure,
in which case we choose the lower energy structure.Each of
the 73 ligands is now represented by a single bound structure
which is used to compute a “score”The score relative to the lead ligand, ligand
29, iswhich are shown in Figure 6.The data
labeled “Rao” in Figure 6 are
the gas phase results reported in ref (41), who represented CDK2
with a 1000–1100 atom model system (in comparison to the approximately 4900 atom system used in this work)
treated with an extended ONIOM method.[99,100] In their
work, 300–380 atoms were computed with ωB97X-D/6-311++G**
or XYG3/6-311++G** and the remaining atoms computed with PM6.
Simulation Protocol
Figure 7 displays the O–O
RDF of water computed from molecular dynamics simulation using TIP3P,
TIP4P-EW, and mDC. Each model was simulated with the same protocol.
The system is composed of 512 waters in a cubic box. The simulation
was performed using the canonical ensemble with a water density of
0.996 g/cm3 and a temperature of 298 K for 3.2 ns with
a time step of 1 fs. The first 200 ps of simulation was discarded
as equilibration. An 8 Å nonbond cutoff was employed along with
particle mesh Ewald (PME) to account for the long-range electrostatic
interactions. The PME method used a 1 point/Å fast Fourier transform
grid and interpolation was performed with sixth-order Cardinal B-splines.
Shake was used to constrain the TIP3P and TIP4P models to the experimental
gas-phase water geometry and the mDC model to the gas-phase DFTB3
geometry. Langevin dynamics was used to control the temperature with
a collision frequency of 5 ps–1. All simulations
were performed using the SANDER program within Amber.[9] The implementation of mDC within SANDER required us to
implement a new PME method generalized for use with multipolar charge
densities. The generalization is based on the spherical tensor gradient
operator, but we must defer a complete description of the new PME
method to future work.Comparison of water O–O radial distribution functions
(a),
water cluster binding energies (b), and the relative ordering of clusters
(c–e). The mue’s in b are mean unsigned errors of ΔE (kcal/mol). The R’s in c–e
are Pearson correlation coefficients.
Results and Discussion
Atomic multipoles
are included into the mDC model to reproduce
H-bond angles (Figure 1 and Wat-2009) and improve
electrostatic potentials (Figure 4) and molecular
multipole moments (Table 1). In ref (67), we noted that traditional
DFTB methods produced good H-bond angles even though their electrostatics
involved atomic charges only. We were thus lead to hypothesize that
this was achieved through the multipolar character of the tight-binding
coupling matrix elements. The DFTB3hydrogen bond geometries shown
in Figure 1 support this. When the hydrogen
bond distance is small, the DFTB3 H-bond angle agrees with B3LYP/6-31++G**;
however, at larger separations where the tight-binding matrix elements
become small, it adopts a geometry similar to TIP3P.Figure 4 illustrates a few examples where
the use of multipoles improves the description of electrostatic potentials.
The largest discrepancies between DFTB3 and B3LYP/6-311++G** occur
in the description of π-bond electrons, sp3 oxygen
and sulfur lone pairs, and sp2 nitrogen lone pairs. These
differences are largely eliminated with mDC and result in an overall
improved description of molecular multipole moments, as shown in Table 1.The mDC LJ interactions were parametrized
by perturbing the values
used in GAFF to reduce the errors in the databases shown in Figure 5 and Table 2. mDC thus exhibits
the smallest mue and largest correlation to the high-level reference
data among the methods shown. One observes a significant amount of
variation between the other models in Figure 5 for interactions stronger than 10 kcal/mol, which involve either
multiple H-bonds or nucleobase stacking.The S66 (Procedure)
crms value in Table 2 for DFTB3 and DFTB2 are
skewed because the lack of a dispersion
model causes the benzene–ethene and benzene–benzene
π···π complexes to dissociate until the
intermolecular forces become sufficiently small to terminate the geometry
optimizer. If we exclude those two molecules from the DFTB3 and DFTB2
statistics, their crms drop to 0.516 and 0.506 Å, respectively.
The semiempirical models cause several of the stacked nucleobases
in the JSCH-2005 database to devolve into hydrogen bonded complexes
upon geometry optimization. The inclusion of a dispersion model could
aid those cases as well.[101] Empirical dispersion
corrections for DFTB2,[76,102] NDDO-based semiempirical methods,[103,104] and even ab initio methods[105,106] have been proposed;
however, we limit the scope of our comparisons to the commonly used
variant of these models to make the comparisons more useful to the
wider set of users instead of the niche set of model developers. Furthermore,
the series of databases that we consider are not biased toward dispersion-dominated
complexes; we feel our comparisons are made “fair” by
using the models consistently, as opposed to affording special treatment
to the outliers in an attempt to make the models appear artificially
tidy.Among the semiempirical models, AM1 produces the worst
nonbonded
interaction energies, whereas DFTB3, DFTB2, and PM6 are of comparable
quality. The reader may also note that the GAFF mue’s are roughly
twice as small as the semiempirical models. This is not unexpected;
it has been previously noted that nonbond interactions are problematic
for these methods.[107] The MNDO/MM relative
energies are slightly better than the semiempirical methods, but their
geometrical errors are similar.The parametrized mDC method
is applied to the prototype study of
CDK2 drug screening, shown in Figure 6. CDK2
has been extensively studied due to the role it plays in tumor formation.
The attention that it thus has received provides the computational
community with an array of experimental results which make this system
attractive to those developing new linear-scaling quantum models.
The experimental results are used to validate a computational procedure
by presuming that a drug has been identified as a good candidate for
protein inhibition and that one is tasked with identifying similar
compounds that one predicts to inhibit protein function with greater
efficacy.The work of Rao[41] is the
most recent
study to use CDK2 to validate their computational protocols. With
their method, they found an amazing R2 correlation of 0.86 upon considering solvation and entropic effects.
When only gas-phase interaction energies are used for the subset of
drugs that we consider in this work, their R2 drops to 0.83; however, this is still exceptionally good.
For comparison, the fragment molecular orbital method (FMO) has recently
been applied with MP2/6-31G* in a related study that, among other
things, correlates the CDK2 experimental binding affinities to ab
initio single-point calculations for a set 14 ligands with known X-ray
crystal structures.[43] In that work,[43] the authors found a good R2 correlation of 0.68 between their calculations and the experimental
free energy of binding. A similar study of another system[42] employed the ONIOM3 method B3LYP/6-31G*:HF/3-21G:PM3
and produced a R2 correlation of 0.64.
AM1-DC has also been used[30] in the screening
of zinc metalloenzymes to yield a correlation of 0.69.We observe
a R2 correlation of 0.71
using mDC. MM and MNDO/MM calculations were performed to compare mDC
with other methods that use the same CDK2 model and computational
protocol. MM and MNDO/MM produce R2 correlations
of 0.56 and 0.63, respectively. There does not appear to be a clear
relationship between the quality of nonbond interactions (Figure 5 and Table 2) and the model’s
correlation to the experimental CDK2 IC50s; the MNDO/MM energies correlate
better than MM to the experimental IC50s even though MM more accurately
reproduces the databases of high-level ab initio nonbond interactions.
One may attribute this observation to the explicit treatment of polarization
within mDC and MNDO/MM. Such attribution from this sole observation
is speculative, in part, because semiempirical model polarizabilities
are underestimated[108] by approximately
30%. Previous studies have also concluded, however, that the polarization
energy plays a key role on the relative ordering of inhibitor interactions
with focal adhesion kinase[109] and upon
the binding of ligands to the trypsin protein.[110] Our results add support to their conclusions. QMFF’s
strive to achieve high accuracy in environments of varying degrees
of heterogeneity, so we are encouraged to find that mDC yields more
accurate small molecule cluster interactions and produces
greater correlation with experiment when applied to large biomolecular
systems.A mDC calculation of CDK2 takes less than 10 s on a
desktop computer.
Rao’s method requires about a day of computation;[41] however, we emphasize in this comparison that
we use a system with more than four times as many atoms than used
by Rao. A more detailed analysis of mDC timings is provided in ref (67). The computational efficiency
of mDC offers the possibility of performing configurational sampling,
which may improve the correlation between the computed CDK2 interactions
with the experimental IC50s. In order to perform those simulations,
one would need to parametrize the MM bonded energy terms that cross
the fragment boundaries. We must defer the parametrization and analysis
of the bonded interactions to future work.Given that semiempirical
models underestimate molecular polarizabilities,
one may question if the quality of the mDC interactions degrade upon
examining larger clusters or when used in condensed phase environments.
To address this, Figure 7b displays the ΔE binding energy of water clusters up to ten waters. We
limit our comparisons in Figure 7 to mDC and
the MM water models because the errors observed in the Wat-2009 database
are only amplified as more and larger clusters are examined. By this,
we mean that each method in Figure 5 is observed
to have a characteristic slope and that slope is largely unchanged
when extended to Wat-2011. We therefore observe similar mDC and TIP3P
mue’s with Wat-2011 and Wat-2009. The TIP4P-EW clusters are
bound too strongly because of its use of an enhanced dipole moment.The TIP3P ΔΔE’s in Figure 7c–e do not correlate well with ab initio
because the higher-energy structures have a propensity to devolve
into lower-energy configurations. This symptom may be directly related
to the lack of structure observed in TIP3P RDFs. The TIP4P-EW model
ΔΔE’s correlate well for the larger
clusters, but not for (H2O)5. One possible reason
for this could be because the dipole moment of water increases as
one goes from small to large clusters, thus making the enhanced dipole
moment of TIP4P-EW inadequate for describing the higher-energy geometrical
configurations of smaller-sized clusters. Devising an unambiguous
computational experiment to test this hypothesis would require some
clever partitioning of the electron density; however, we note that
the average B3LYP/6-31G* Mulliken charge of oxygen computed for a
series of water clusters increases with the size of the cluster and
then begins to stabilize around 5 or 6 waters. This explanation therefore
seems plausible. mDC correlates well with the reference data largely
due to its tendency to maintain the desired H-bond geometry upon optimization.Figure 7a compares the mDC, TIP3P, and TIP4P-EW
O–O RDF to the recent experimental result taken from ref (111). Both TIP4P-EW and mDC
display good agreement to experiment; whereas TIP3P suffers from a
well-known lack of structure beyond the first solvation peak. The
mDC first solvation peak is too high, but is similar to TIP4P-EW.
Although the location of mDC’s first solvation peak is in good
agreement with experiment, it appears that its repulsive wall is slightly
too steep at short separations. The use of an exponential-like repulsive
wall, as was used in ref (67), may improve this. The mDC simulation results provided
here are very preliminary. mDC extensively relies
on the use of atomic multipoles and the implementation of mDC into
the SANDER[9] program required us to derive
and implement a new generalization of the PME method based on the
spherical tensor gradient operator for multipolar charge densities.
The details of the new PME method and further analysis of condensed
phase properties is currently in preparation. The 3.2 ns mDC simulation
of 512 waters required approximately 37 h per ns of dynamics using
a 1 fs time step on a desktop workstation.
Conclusions
In this work, we parametrized a linear-scaling quantum force field
for nonbonded interactions. Much effort was spent on the comparison
of the method with standard MM force fields, semiempirical models,
and relatively inexpensive ab initio Hamiltonians for a wide variety
of nonbond interactions using a series of benchmark databases. As
a part of this comparison, we provided a new benchmark database (SS7)
of accurate nonbond interactions between sulfur-containing dimers.
The comparisons reveal that the GAFF MM force field significantly
and consistently reproduced nonbond energies with far greater accuracy
than the DFTB2-mio, DFTB3-3ob, AM1, and PM6 semiempirical models and
the MNDO/MM method. The parametrized mDC model is shown to produce
errors approximately half those of GAFF. The benefit of mDC relative
to MM and MNDO/MM was further demonstrated by correlating CDK2 ligand
binding energies to their experimental inhibition constants. It was
shown that mDC correlated the data with a R2 of 0.71; whereas MM and MNDO/MM produced correlations of 0.56 and
0.63, respectively.In the process of parametrizing the mDC
model, we made several
observations using GAFF, DFTB3, and ab initio that may influence the
future development tight-binding models. We observe that DFTB3′s
good hydrogen bond angles are only produced when there is significant
overlap between fragments, but quickly decay into MM-like configurations
when they are separated. We interpret this as the modeling of multipole
interactions from within the tight-binding matrix instead of through
second-order electrostatics. Furthermore, we observe systematic discrepancies
between ab initio and DFTB3 electrostatic potentials in molecules
containing π-bonds, sp3 oxygen and sulfur lone pairs,
and sp2 nitrogen lone pairs. We are able to include the
higher-order atomic multipoles within mDC without making intrusive
modifications to DFTB3; whereas the extension of DFTB3 to atomic multipoles
would be a significant undertaking.
Authors: Ian R Hardcastle; Christine E Arris; Johanne Bentley; F Thomas Boyle; Yuzhu Chen; Nicola J Curtin; Jane A Endicott; Ashleigh E Gibson; Bernard T Golding; Roger J Griffin; Philip Jewsbury; Jerome Menyerol; Veronique Mesguiche; David R Newell; Martin E M Noble; David J Pratt; Lan-Zhen Wang; Hayley J Whitfield Journal: J Med Chem Date: 2004-07-15 Impact factor: 7.446
Authors: Lawrie B Skinner; Congcong Huang; Daniel Schlesinger; Lars G M Pettersson; Anders Nilsson; Chris J Benmore Journal: J Chem Phys Date: 2013-02-21 Impact factor: 3.488
Authors: Ashleigh E Gibson; Christine E Arris; Johanne Bentley; F Thomas Boyle; Nicola J Curtin; Thomas G Davies; Jane A Endicott; Bernard T Golding; Sharon Grant; Roger J Griffin; Philip Jewsbury; Louise N Johnson; Veronique Mesguiche; David R Newell; Martin E M Noble; Julie A Tucker; Hayley J Whitfield Journal: J Med Chem Date: 2002-08-01 Impact factor: 7.446
Authors: Maria T Panteva; Thakshila Dissanayake; Haoyuan Chen; Brian K Radak; Erich R Kuechler; George M Giambaşu; Tai-Sung Lee; Darrin M York Journal: Methods Enzymol Date: 2015-01-22 Impact factor: 1.600
Authors: Tai-Sung Lee; Bryce K Allen; Timothy J Giese; Zhenyu Guo; Pengfei Li; Charles Lin; T Dwight McGee; David A Pearlman; Brian K Radak; Yujun Tao; Hsu-Chun Tsai; Huafeng Xu; Woody Sherman; Darrin M York Journal: J Chem Inf Model Date: 2020-09-16 Impact factor: 4.956