Literature DB >> 24803856

Parametrization of an Orbital-Based Linear-Scaling Quantum Force Field for Noncovalent Interactions.

Timothy J Giese¹, Haoyuan Chen¹, Ming Huang², Darrin M York¹.

Abstract

We parametrize a linear-scaling quantum mechanical force field called mDC for the accurate reproduction of nonbonded interactions. We provide a new benchmark database of accurate ab initio interactions between sulfur-containing molecules. A variety of nonbond databases are used to compare the new mDC method with other semiempirical, molecular mechanical, ab initio, and combined semiempirical quantum mechanical/molecular mechanical methods. It is shown that the molecular mechanical force field significantly and consistently reproduces the benchmark results with greater accuracy than the semiempirical models and our mDC model produces errors twice as small as the molecular mechanical force field. The comparisons between the methods are extended to the docking of drug candidates to the Cyclin-Dependent Kinase 2 protein receptor. We correlate the protein-ligand binding energies to their experimental inhibition constants and find that the mDC produces the best correlation. Condensed phase simulation of mDC water is performed and shown to produce O-O radial distribution functions similar to TIP4P-EW.

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 24803856 PMCID： PMC3985928 DOI： 10.1021/ct401035t

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Linear-scaling quantum-mechanical (LS-QM) methods[1] overcome the computational bottlenecks of traditional QM methods without resorting to the use of additional parametrized functions. As a result, they offer a means for modeling system sizes much larger than would otherwise be practical while largely retaining the accuracy of a standard implementation. There are numerous recent examples that further develop LS-QM methods in an effort to make them more affordable;[2−7] however, the widespread use of LS-QM methods continue to be hampered by their high cost relative to combined quantum mechanical/molecular mechanics (QM/MM) or traditional force fields. Furthermore, the LS-QM literature has generally placed greater focus on the benchmark timings of ad hoc systems or on the application of LS-QM methods without having convincingly demonstrated that their added benefit justifies their higher cost. For example, if one must consider relatively large systems to reap the benefits of a LS-QM method, then one will pragmatically choose to use small basis sets and inexpensive Hamiltonians, such as B3LYP/6-31G* or a semiempirical/tight-binding model. But even if a LS-QM method at this level was demonstrated to be computationally feasible for a real application, one may find the accuracy unacceptable or inferior to cheaper parametric alternatives. The GAFF MM force field[8,9] after all, is many times faster than even the fastest semiempirical method and can, for many problems, model intermolecular interactions with acceptable accuracy even though it does not explicitly treat electronic polarization. The utility of LS-QM methods have thus been highlighted by applications that require a detailed description of the electron density or molecular orbitals (MOs), including, the description of biomolecular electrostatic potentials,[10] electron transfer[11−21] and excitation[22] in large systems, and inferring enzyme specificity from frontier orbitals.[23] Some of the most promising examples that demonstrate the utility of LS-QM methods have been provided by Merz, whom has used them to identify misfolded proteins,[24] compute NMR chemical shifts,[25,26] decompose protein–ligand interactions,[27] and perform ligand-binding and drug design.[28−31] Others have used LS-QM methods to examine interactions within clusters,[32−36] compute chemical reaction energies,[37−40] and perform drug screening.[41−43] Linear-scaling quantum mechanical force fields (QMFFs) are an attempt to circumvent the above criticism by supplementing the LS-QM energy with parametrized interactions. The parametrized functional form provides the means to make a concerted improvement to the method’s accuracy and performance while significantly reducing, if not entirely eliminating, their “break-even point”; i.e., the system size required to yield a performance benefit. There are two categories of QMFFs that have found chemical application to molecular systems: (1) MO-based models, such as X-Pol[44−48] and the water-specific XP3P model,[49] and (2) electron density-based models, e.g. SIBFA, GEM, and QMPFF.[50−61] The MO-based QMFFs, such as those proposed by Gao,[44] achieve their performance by approximating the wave function with a Hartree product of antisymmetrized fragment determinants.[62] This approximation results in a block diagonal Fock matrix that can be diagonalized fragment-by-fragment. The number of blocks increases with the size of the system, but the size of each block does not, so the scaling of the Fock matrix diagonalization reduces from O(N3) to O(N). Lennard-Jones (LJ) or Buckingham potentials are then used to recuperate the energetic effect of interfragment MO coupling.[19,45,46,62−66] The density-based QMFFs avoid Fock matrix diagonalization altogether; the interfragment interactions are computed from the overlap and Coulomb interactions of an accurate density prefitted to ab initio and is allowed to respond to the environment according to the principle of chemical potential equalization. We recently introduced[67] a modified divide-and-conquer (mDC) QMFF model which made an attempt at unifying the orbital- and density-based QMFFs. In that model, the density resulting from fragment MOs were used to interact fragments with a density-overlap van der Waals (vdW) model.[67,68] The present work is a parametrization of a MO-based mDC QMFF which, like X-Pol, models the interfragment vdW interactions with simple LJ functions; however, there are two distinguishing features that differentiate mDC from X-Pol. The interfragment X-Pol interactions are treated with a QM/MM interaction potential; whereas, the mDC interactions are not. A QM/MM potential does not produce symmetric interactions in the sense that the interaction depends upon which residue is considered the “QM region.” X-Pol therefore recomputes the interfragment interactions for each QM fragment and averages the energy and forces. The interfragment mDC interactions are symmetric and are therefore computed once. Second, the mDC fragment densities are not limited to being those used in a QM/MM potential. A QM/MM treatment of DFTB2 and DFTB3 densities would yield atomic charges only; however, this severely degrades the accuracy of these methods. The ability of DFTB2 and DFTB3 to reproduce hydrogen bond angles is a result of their tight-binding coupling matrix elements, not their charges. The Hartree–Product approximation removes these coupling elements and their effect must be recovered by performing electrostatics with higher-order atomic multipoles.[67] In this work, we parametrize the mDC method for nonbonded interactions; we supply a new database, Sulfur Set 7 (SS7), of accurate nonbond interactions involving sulfur; we apply the parametrized mDC method to the screening of Cyclin-Dependent Kinase 2 (CDK2) protein receptor drug candidates;[41] and we explore the ability of mDC to simultaneously reproduce small-to-medium sized water clusters and the condensed phase water radial distribution function (RDF) computed from molecular dynamics simulation. A series of standard nonbond interaction databases are used to compare the parametrized mDC model to MM force fields, semiempirical models, inexpensive ab initio Hamiltonians, and a semiempirical QM/MM method. The comparisons quantify the poor accuracy of standard semiempirical models relative to a MM force field and reveal the high accuracy of mDC. The CDK2 drug screening exercise is repeated with MM and semiempirical QM/MM to further demonstrate the benefits of our QMFF. The variety of molecular interactions examined in this work and the number of other methods we compare mDC against extend significantly beyond other recent investigations.[48]

Methods

Linear-Scaling mDC Method

The mDC total energy iswhere the first term is a summation of fragment ab initio energies, Cσ are the σ-spin MO coefficients for the A’th fragment, and R are the nuclear positions of the atoms in fragment A. The remaining terms describe the interactions between the fragments. The second term is the interfragment electrostatic energy, whereare atomic multipole moments on atom a, C(r) is a real regular solid harmonic, ρ(r) is an atom-partitioned density, Z is a nuclear plus core electron charge, andis a “multipolar potential” arising from point-multipole Coulomb interactions. The primed summation indicates that intrafragment electrostatics are excluded because those Coulomb interactions are already considered in the ab initio calculation of E. The last two terms in eq 1 are an interfragment LJ energy and a standard molecular mechanical (MM) bonded energy, respectively. Unlike a standard MM Hamiltonian, Ebonded includes corrections only for those bonds, angles, and dihedrals that cross the boundary between two covalently bonded fragments. The systems examined in the present work involve intermolecular interactions between nonbonded molecules and the intermolecular interaction between drug ligands and a rigid protein; therefore, there is no need to further elaborate on Ebonded here. The computational advantage afforded by eq 1 arises from having treated the MOs of each fragment as if they did not overlap with those of any other fragment. This approximation allows one to solve for the MO coefficients from a series of small generalized eigenvalue problems (proportional to the size of a fragment)rather than solving a single eigenvalue problem for the entire system. Although the lack of interfragment coupling between the MOs is a significant approximation, the remaining interactions in eq 1 are parametrized to reproduce high-level ab initio data. The lack of explicit interfragment MO coupling matrix elements does not imply that the fragments are uncoupled. The interfragment coupling occurs through the interaction of their atomic multipoles which are determined from the fragment electron densities within the self-consistent-field (SCF) procedure. The σ-spin Fock matrix for region A with inclusion of this coupling iswhereis the spin-resolved AO density matrix of fragment A, and nσ is the occupation number of σ-spin orbital k in fragment A. Our linear-scaling framework minimizes the mDC energy by performing a double-SCF procedure.[44,45,69,70] In brief, each fragment is provided a set of atom-centered external multipolar potentials p and is instructed to compute a converged fragment energy E and return a new set of atomic multipole moments q. The linear-scaling framework evaluates the interfragment electrostatics to transform the moments q into potentials p, and the procedure is repeated until the total energy and q stop changing. Once convergence is met, the atomic gradients are computed fromThe interface between the linear-scaling framework and the underlying Hamiltonian must extract or calculate the atomic multipole moments q from the ab initio density, and it must correct the model’s Fock matrix [eq 5] by evaluating the derivative ∂q/∂Pσ |. In the present work, we use the DFTB3 Hamiltonian (parametrization “3ob”) described in ref (71). DFTB3 happens to use atomic charges in the calculation of the energy, but our interface to DFTB3 is free to choose a different set of q’s that are used to interact a fragment with the outside world. This is a particularly useful feature when using DFTB3 because its success in achieving good hydrogen bond (H-bond) angles is largely a result of the intermolecular tight-binding matrix elements[67] (e.g., Figure 1). The linear-scaling framework eliminates these elements, so our approach is to recapture the angular dependence via the electrostatic interactions by increasing the order of the atomic multipoles. A mDC fragment can thus be thought as having two representations of the atomic multipoles: a set of atomic charges that DFTB3 happens to use internally and a set of q’s that our interface constructs from the fragment’s total density matrix P = Pα + Pβ.

Figure 1

Comparison between optimized TIP3P/GAFF, DFTB3, mDC, and B3LYP/6-31++G** water–aspargine H-bond angle as a function of N···O separation (3 and 4 Å). The TIP3P water is coplanar with the ASN amine group. The DFTB3 water is angled in a manner similar to B3LYP near the energy minimum but reverts to a TIP3P-like planar structure when the inter-residue overlap becomes small. The mDC angle is in close agreement with B3LYP even at larger separations than those shown. We decompose the DFTB3 density into atomic components as follows:where χ(r) = χ(r)Y(Ω) is an AO basis function, andis a fraction between 0 and 1 and holds the property f = 1 – f. If f = 1/2, then one would say that eq 8 is a Mulliken partitioning; however, we allow for a bias as a function of atom-pair and Mulliken bond-orderfs and fd are parameters corresponding to the fractions that we found most agreeable when atoms a and b are connected by a single bond and double bond, respectively; and bs and bd are the Mulliken single- and double-bond orders, i.e., fs ≡ f (bs) and fd ≡ f (bd). Had we expressed the partition using Wigner bond indices, bs and bd would take values 1 and 2, respectively; however, Mulliken bond orders yield convenient expressions for the Fock matrix corrections.is a smooth polynomial used to switch f from fs to fd as the bond order increases. The atomic multipoles are obtained by inserting eq 8 into eq 2 while restricting the contributions of the two-center densities to charge onlywhere theare treated as a parameters. For an sp-basis, there are two parameters: M(1) and M(2), which control the magnitude of the dipole and quadrupole contributions, respectively. When the multipole expansion is limited to quadrupoles, an spd-basis formally requires three additional parameters: M(2), M(1), and M(2); however, we constrain these additional parameters to the values of M(2), M(1), and M(2), respectively. If the fragment is not covalently bonded to any other fragment, which is the predominant scenario in the present work, then eq 12 is the expression for the multipoles, i.e., q = q′. The CDK2 protein docking results described in this paper, however, use a fixed protein structure that has been fragmented by residue. Connection atoms are used to cap the resulting “dangling bonds”, and we have found it necessary to treat the atomic multipoles at the fragmentation boundaries specially, as described in Figure 2. For the purposes of the present work, we found it acceptable to use hydrogens for the connection atoms. The charges of the frontier atoms, i.e., the connection and boundary atoms, need to be constrained to yield reliable SCF convergence. Although it is possible to modify the ab initio method by introducing additional constraints into the density matrix, it is much simpler and less intrusive to compute the unconstrained atomic multipole moments from eq 12, manually change the charges of the frontier atoms, and renormalize the remaining charges in the fragment, i.e.,where qmm, is a fixed molecular mechanical charge andis the charge excess to be evenly distributed among the nonfrontier atoms in the region after altering the charges of the frontier atoms. Na,, Nb,, and Nc, are the total number of atoms, boundary atoms, and connection atoms in fragment A, respectively. The summations over b and c in eq 15 are meant to encompass the boundary and connection atoms, respectively. With this scheme, the derivative required to correct the Fock matrix requires one additional chain-rule:

Figure 2

Illustration of the relationship between residues and fragments and the relationship between boundary atoms and connection atoms in the mDC method. Atoms C and D are connected by a covalent bond; therefore, fragments 1 and 2 include pseudoatom D* and C*, respectively. When fragment 1 is queried for the new set of multipole moments of atoms A, B, and C; the charges of atoms C and D* are summed; the multipole moments of atom D* are set to zero; the charge of atom C is assigned a predefined fixed MM charge; and the excess charge from the sum is evenly distributed to atoms A and B. An analogous procedure is performed for fragment 2 to fix the charges of atoms C* and D. Thus, the charges of atoms C and D are fixed and independent of the ab initio density.

Model Hamiltonians

DFTB3

This method is described in ref (71) and performed using a standard SCF procedure with the recently developed “3ob” parametrization. We do not use the “H–H-mod”, “H–N-mod”, nor spin-polarization corrections described in ref (71) because these are intended to improve atomization energies, proton affinities, and high-spin open-shell electronic configurations that are not relevant here.

DFTB2

This method (sometimes called SCC-DFTB) is described in ref (72) and performed using the “mio” parametrization. We do not use the γh correction that appears in more recent works[73−75] which has been shown in improve H-bond strengths. Nor do we supplement DFTB2 with a London dispersion model for select nonbonded interactions, which has been shown to improve the description of weakly bound complexes.[76]

PM6 and AM1

These semiempirical methods are described in refs (77) and (78), respectively; as implemented in Gaussian 09.[79]

B3LYP

This method refers to the B3LYP/6-31G* density functional model implemented in Gaussian 09.[79]

GAFF

This comprises the general Amber force field[8,9] and TIP3P water. The CDK2 protein structure made use of the ff99 Amber force field[80] supplemented with parameters from ref (81) for the phosphotyrosine with a single protonated phosphate group. Figure 7 makes use of the TIP4P-EW water model.[82]

Figure 7

Comparison of water O–O radial distribution functions (a), water cluster binding energies (b), and the relative ordering of clusters (c–e). The mue’s in b are mean unsigned errors of ΔE (kcal/mol). The R’s in c–e are Pearson correlation coefficients.

MNDO/MM

Semiempirical QM/MM calculations performed with Amber 12.[9] Clusters of N monomers were geometry optimized N times to produce N relative energies and coordinate root-mean-square deviations relative to the reference structure. The average relative energy and coordinate root-mean-square is then used to avoid the ambiguity assigning the QM region. This procedure is not used by X-Pol; it is merely our attempt to compare to a traditional semiempirical QM/MM method. The CDK2 calculations treat the ligand and protein with MNDO and Amber ff99, respectively.

Reference Data

S22

A database of 22 dimers is taken from ref (83). The set contains small to relatively large complexes chosen to represent hydrogen bonding, dispersion, and mixed electrostatic-dispersion interactions. The S22 reference is not a collection from a consistent level of theory but is instead a collection of the best available calculations that could be afforded at the time of its construction, and the reader should refer to ref (83) for full details. Generally speaking, most of the energies are an approximate CCSD(T) complete basis extrapolation (CBS) from counterpoise (CP) corrected calculations. Both the monomers are dimers are geometry optimized.

JSCH-2005

A collection of 124 nucleobase and amino acid complexes compiled from several publications.[83] In this work, we refer only to a small subset of the JSCH-2005 database where optimized geometries of hydrogen bonded or stacked nucleobases are available. In particular, we use the A···T S1, A···T WC, G···C S, G···C wc, mA···mT S, and mG···mC S complexes from ref (84) [CCSD(T)/CBS//MP2/TZVPP] and the G···A1, G···A1 pl, G···A2, G···A2 pl, G···A3, and G···A4 complexes from ref (85) [MP2/CBS//MP2/cc-pVTZ]. Both the monomers and dimers are geometry optimized.

SCAI

A set of 24 pairs of amino acid side chain interactions whose structures as taken from X-ray crystal structures.[86] The reference data was obtained[86] at CCSD(T)/CBS//TPSS/TZVP. The dimer geometries in this database are constructed by optimizing the hydrogen positions while fixing the heavy atom positions to the experimental structure. The monomer geometries are taken from the partially optimized dimer structure. This procedure is performed for each of the methods compared, as opposed to having used the reference geometries without further optimization.

S66

A set of 66 complexes[87] considered to be a more representative sampling of the distribution of interaction motifs than the S22 database. Unlike S22, S66 uses a consistent reference methodology for geometry optimizations (MP2/cc-pVTZ) and single point energy refinement using a procedure described in detail below. Furthermore, the S66 monomer structures are evaluated using the dimer optimized structures instead of having included the monomer deformation energy. Comparison to S66 will be made in two different ways: “S66 (Fixed)” and “S66 (Procedure)”, where S66 (Fixed) computes the interactions using the reference geometries and S66 (Procedure) optimizes the dimers for each method and uses the monomer structures from the resulting optimizations.

SS7

A new set of 7 sulfur-containing complexes (sulfur set 7) presented as a part of this work (Figure 3). These complexes were chosen by extracting sulfur-containing residues from a variety of crystal structures deposited in the PDB databank, modeling the residues with small molecules, and geometry optimizing for a minimum. In other words, the complexes are not guaranteed to represent a global minimum configuration nor do they remain in the orientation observed in the crystal structures. The protocol we follow is analogous to S66: the dimers are optimized at MP2/cc-pVTZ and the interaction energy is computed using the monomer structures from the optimized dimer. The single point refinement energy iswhere all energies are CP-corrected; aXZ denotes the aug-cc-pVXZ basis; ΔΔCCSD(T) is the correlation energy difference between CCSD(T) and MP2; and ΔMP2/CBS is the MP2 correlation energy in the CBS limit, as estimated from Halkier’s two-point extrapolation formula[88]Comparisons to SS7 will be denoted “SS7 (Fixed)” and “SS7 (Procedure)” using the same protocol described for S66.

Figure 3

Dimers used in the SS7 database. The interaction energies (kcal/mol) using the multilevel method described in the text are (a) −3.652, (b) −3.737, (c) −6.036, (d) −6.473, (e) −3.683, (f) −2.643, and (g) −5.191. Coordinates are provided in the Supporting Information.

Wat-2009

The reference structures and binding energies (CP-corrected MP2/CBS with a CCSD(T)/aDZ correction) of (H2O)n = 2–6 and 8 were taken from the database compiled[89−91] in ref (92).

Wat-2011

The reference structures and binding energies (RI-MP2/CBS with a CCSD(T)/aDZ correction) of (H2O)n = 2–10 where taken from ref (93). Unlike Wat-2009, these energies do not explicitly include CP-corrections, but the CBS extrapolation was parametrized to well-reproduce CP-corrected ΔE energies for a select set of clusters. As a result, the water cluster ΔE binding energies in this database are similar to those in the Wat-2009 database. The purpose for using Wat-2011 in the present work is to examine the relative ordering of clusters of a given cluster size (ΔΔE’s); a purpose for which the Wat-2009 database does not adequately fullfill. It is not immediately clear to what extent explicit CP-corrections might effect the ΔΔE’s, but we suspect that the general trends in ordering will be similar. We therefore limit our analysis to Pearson correlations of ΔΔE’s which tend to be either “good” or “very bad.”

Parametrization

The mDC model makes use of A and B LJ parameters (two parameters per atom-type using Lorentz–Berthelot combining rules); Msp(1) and Mpp(2) multipole parameters (two parameters per atomic number); fs, fd, bs, and bd charge mapping parameters (four parameters per atomic number pair); and the boundary atom qmm, MM point charges, which are obtained along with the atom-type assignments from the AmberTools suite of packages.[9] Although we refer to bs and bd as parameters, their values are the Mulliken single- and double-bond orders for the pair of atoms and are of use only if the fractions fs and fd are different. A full list of H, C, N, O, and S parameters are provided as Supporting Information. The multipole and charge mapping parameters were adjusted by comparing the mDC and B3LYP/6-311++G** electrostatic potentials evaluated on the solvent accessible surface for a large set of molecules. Some examples are shown in Figure 4. Fitting the parameters “by eye” to match the electrostatic potentials on the solvent accessible surface proved more reliable than having adjusted them to minimize the molecular dipole and quadrupole moments. The molecular multipole moments do improve with this procedure, however, as is shown in Table 1.

Figure 4

Table 1

Multipole Moment Comparison between B3LYP/6-311++G**//B3LYP/6-31++G** and Various Models for Amino Acids, Their Side Chains, and Nucleobasesa

			⟨\|μ⁽¹⁾ – μ_ref⁽¹⁾\|⟩					⟨\| μ⁽²⁾ – μ_ref⁽²⁾\|⟩
category	N	⟨μ_ref⁽¹⁾⟩	mDC	DFTB3	DFTB2	GAFF	⟨μ_ref⁽²⁾⟩	mDC	DFTB3	DFTB2	GAFF
side-chains	20	1.36	0.11	0.13	0.17	0.16	8.68	1.19	1.90	2.08	1.22
amino acids	20	2.12	0.15	0.17	0.19	0.48	29.90	1.97	2.45	2.78	5.51
nucleobases	10	1.98	0.07	0.21	0.36	0.25	14.58	1.22	3.26	3.77	2.29
total	50	1.79	0.12	0.16	0.22	0.31	18.35	1.51	2.39	2.70	3.15

The model multipole moments are computed at the B3LYP/6-31++G** optimized geometries. (1) and (2) are the molecular dipole and quadrupole multipole moment vectors of length 3 and 5, respectively. ⟨μref(⟩ is the magnitude of the multipole moment vector averaged over molecules and ⟨|( – ref(|⟩ is the magnitude of the multipole moment error vector averaged over molecules. Boldface is used to highlight the best value among the models.

Electrostatic potential difference maps between B3LYP/6-311++G** and mDC (top row) and DFTB3 (bottom row) evaluated on the solvent accessible surface of acetic acid, cytosine, and tryptophan. Blue and red indicate that the model requires more negative charge and more positive charge, respectively, relative to B3LYP/6-311++G**. Green indicates agreement between the electrostatic potential of the model and B3LYP/6-311++G**. The colors are bounded by the range of ±0.003 au. The model multipole moments are computed at the B3LYP/6-31++G** optimized geometries. (1) and (2) are the molecular dipole and quadrupole multipole moment vectors of length 3 and 5, respectively. ⟨μref(⟩ is the magnitude of the multipole moment vector averaged over molecules and ⟨|( – ref(|⟩ is the magnitude of the multipole moment error vector averaged over molecules. Boldface is used to highlight the best value among the models. The LJ parameters were optimized to reduce the errors in the databases shown in Figure 5 and Table 2.

Figure 5

Comparison of mDC with other Hamiltonians using several databases of intermolecular interactions.

Table 2

Statistical Summary of mDC and Other Hamiltonians for Several Databases of Intermolecular Interactionsa

			model
DB	N	property	mDC	DFTB3	DFTB2	GAFF	AM1	PM6	B3LYP	MNDO/MM
S22	22	mue	0.699	3.012	3.338	1.588	4.401	3.226	1.336	2.504
		mse	0.184	3.012	3.338	0.546	4.361	3.226	0.016	2.393
		R	0.989	0.964	0.973	0.953	0.910	0.973	0.972	0.939
		crms	0.180	0.579	0.277	0.255	0.924	0.407	0.843	0.467
S66 (Fixed)	66	mue	0.545	2.737	2.985	0.913	5.141	2.666	2.252	2.195
		mse	0.275	2.737	2.985	0.708	5.141	2.666	1.332	2.183
		R	0.982	0.957	0.973	0.958	0.521	0.965	0.930	0.899
S66 (Procedure)	66	mue	0.532	2.363	2.399	0.825	3.061	2.006	1.836	1.543
		mse	–0.130	2.313	2.353	0.192	3.041	1.934	–0.142	1.505
		R	0.988	0.961	0.968	0.960	0.890	0.949	0.940	0.952
		crms	0.237	1.774	1.484	0.330	0.976	0.693	0.723	0.532
SS7 (Fixed)	7	mue	0.657	3.184	2.885	1.547	3.573	3.015	2.121	2.034
		mse	0.657	3.184	2.885	1.547	3.573	3.015	2.121	2.034
		R	0.964	0.452	0.346	0.836	0.799	0.585	0.869	0.815
SS7 (Procedure)	7	mue	0.393	2.817	2.248	1.039	2.549	2.070	1.418	1.321
		mse	0.361	2.817	2.089	0.904	2.549	1.848	1.418	1.321
		R	0.984	0.433	–0.041	0.874	0.903	0.472	0.901	0.908
		crms	0.251	0.361	0.363	0.620	0.707	0.523	0.516	0.701
SCAI	23	mue	0.819	2.843	2.745	2.011	4.249	2.518	2.361	2.990
		mse	0.668	2.843	2.745	1.940	4.249	2.518	0.646	2.987
		R	1.000	0.998	0.998	0.997	0.995	0.999	0.999	0.995
JSCH-2005	12	mue	0.697	6.582	6.293	3.417	10.488	7.346	3.052	5.991
		mse	0.096	5.825	5.637	1.788	10.488	7.346	–1.493	5.991
		R	0.989	0.754	0.784	0.749	0.756	0.786	0.750	0.697
		crms	0.183	1.005	0.644	0.507	1.294	0.713	1.023	0.509
Wat-2009	9	mue	0.352	3.424	13.547	1.890	4.354	6.223	20.499	4.338
		mse	–0.100	3.424	13.547	–1.456	4.253	6.223	–20.499	4.338
		R	1.000	0.999	1.000	0.998	0.995	0.998	0.998	0.998
		crms	0.073	0.121	0.106	0.309	1.029	0.554	0.130	0.449

The terms mue and mse are the mean unsigned and mean signed error in ΔE (kcal/mol). R is the Pearson correlation coefficient between the reference and model ΔE (unitless). crms is the average coordinate root mean square deviation of the optimized dimer structures to the reference structures (Å). Boldface is used to highlight the best value amongst the models.

Comparison of mDC with other Hamiltonians using several databases of intermolecular interactions. The terms mue and mse are the mean unsigned and mean signed error in ΔE (kcal/mol). R is the Pearson correlation coefficient between the reference and model ΔE (unitless). crms is the average coordinate root mean square deviation of the optimized dimer structures to the reference structures (Å). Boldface is used to highlight the best value amongst the models.

Drug Docking

Figure 6 displays the correlation between calculated protein–drug binding affinity scores and the experimentally determined[94−96] protein inhibition constant (IC50) for a set of drug ligands to the Cyclin-Dependent Kinase 2 (CDK2) protein receptor. The CDK2 structure[97] was taken from chain A of PDBID: 1H1S. Hydrogens were added and geometry optimized with Amber ff99. The autodock program[98] was used to dock 73 ligands to the protein structure using autodock’s default options. The ligands were chosen as a subset of those listed in ref (41) subject to the availability of DFTB3 parameters. One of these ligands, ligand 29 in ref (41) is chosen to be a “lead”, i.e., a ligand that has been identified as well-inhibiting the function of the protein. The goal then, is to identify additional candidate ligands that are structurally similar to the lead ligand but are predicted to inhibit the protein function with greater efficacy. If one assumes that ligands with a higher binding affinity to the receptor are generally more potent, then one would expect to observe a strong correlation between the ligand–receptor binding and the experimentally determined IC50. We compute the protein–ligand gas-phase interaction energy and compare the result to the experimental IC50s. These calculations ignore the desolvation of the drug upon binding to the receptor and the change in entropy associated with sterically inhibiting rotatable bonds. We also ignore the configurational averaging that would occur in dynamics and we freeze the CDK2 atoms to the 1H1S X-ray crystal structure during geometry optimization.

Figure 6

(left) Base structure common to all ligands. The atom marked “N*” is used to compare the position of a bound ligand to the lead structure, as explained in the text. (center) Lead ligand bound to the CDK2 protein. (right) Correlation between the experimental protein inhibition constant measured for each ligand and the assigned score relative to the lead structure. Rao indicate the gas-phase DC-DFT results taken from ref (41) computed using a small model system of CDK2. What remains to be described is the procedure that we performed to generate the bound structures used to compute the binding affinity scores. This procedure begins with the use of autodock, which binds a ligand to the receptor many times, clusters the results, and returns the coordinates of a docked ligand that is representative of each cluster. Each ligand is allowed to produce up to 20 clusters and thus we obtain up to 20 structures for each ligand and 632 structures in total for the set of 73 ligands. The ligand geometry for each of the 632 structures is gas-phase optimized with Amber ff99 for 500 steps while freezing the protein coordinates. The resulting Amber ff99 protein–ligand interaction energy is then stored. 100 steps of gas phase minimization are then performed for each of the 632 structures with mDC, and the final mDC gas-phase protein–ligand interaction energy is stored. The MNDO/MM results were computed using this same procedure. Although we have, for a given method, 632 geometry optimized structures, not all of them are geometrically relevant. For example, a structure is not relevant if it is not bound in the receptor pocket nor if the orientation of the ligand has been flipped so that the ligand substituents have occupied the receptor pocket. In other words, we only consider a structure to be relevant if it binds to the protein in the same way as the lead structure. To quantify this, we choose an atom common to all ligands (labeled N* in Figure 6) and measure the displacement of this atom relative to the minimum energy lead structure. We assign one structure to each of the 73 ligands for each method by choosing the structure that binds most similarly to the lead structure as measured by the N* atom unless the ligand has two or more structures that are within 3 Å of the lead structure, in which case we choose the lower energy structure. Each of the 73 ligands is now represented by a single bound structure which is used to compute a “score”The score relative to the lead ligand, ligand 29, iswhich are shown in Figure 6. The data labeled “Rao” in Figure 6 are the gas phase results reported in ref (41), who represented CDK2 with a 1000–1100 atom model system (in comparison to the approximately 4900 atom system used in this work) treated with an extended ONIOM method.[99,100] In their work, 300–380 atoms were computed with ωB97X-D/6-311++G** or XYG3/6-311++G** and the remaining atoms computed with PM6.

Simulation Protocol

Figure 7 displays the O–O RDF of water computed from molecular dynamics simulation using TIP3P, TIP4P-EW, and mDC. Each model was simulated with the same protocol. The system is composed of 512 waters in a cubic box. The simulation was performed using the canonical ensemble with a water density of 0.996 g/cm3 and a temperature of 298 K for 3.2 ns with a time step of 1 fs. The first 200 ps of simulation was discarded as equilibration. An 8 Å nonbond cutoff was employed along with particle mesh Ewald (PME) to account for the long-range electrostatic interactions. The PME method used a 1 point/Å fast Fourier transform grid and interpolation was performed with sixth-order Cardinal B-splines. Shake was used to constrain the TIP3P and TIP4P models to the experimental gas-phase water geometry and the mDC model to the gas-phase DFTB3 geometry. Langevin dynamics was used to control the temperature with a collision frequency of 5 ps–1. All simulations were performed using the SANDER program within Amber.[9] The implementation of mDC within SANDER required us to implement a new PME method generalized for use with multipolar charge densities. The generalization is based on the spherical tensor gradient operator, but we must defer a complete description of the new PME method to future work. Comparison of water O–O radial distribution functions (a), water cluster binding energies (b), and the relative ordering of clusters (c–e). The mue’s in b are mean unsigned errors of ΔE (kcal/mol). The R’s in c–e are Pearson correlation coefficients.

Results and Discussion

Atomic multipoles are included into the mDC model to reproduce H-bond angles (Figure 1 and Wat-2009) and improve electrostatic potentials (Figure 4) and molecular multipole moments (Table 1). In ref (67), we noted that traditional DFTB methods produced good H-bond angles even though their electrostatics involved atomic charges only. We were thus lead to hypothesize that this was achieved through the multipolar character of the tight-binding coupling matrix elements. The DFTB3 hydrogen bond geometries shown in Figure 1 support this. When the hydrogen bond distance is small, the DFTB3 H-bond angle agrees with B3LYP/6-31++G**; however, at larger separations where the tight-binding matrix elements become small, it adopts a geometry similar to TIP3P. Figure 4 illustrates a few examples where the use of multipoles improves the description of electrostatic potentials. The largest discrepancies between DFTB3 and B3LYP/6-311++G** occur in the description of π-bond electrons, sp3 oxygen and sulfur lone pairs, and sp2 nitrogen lone pairs. These differences are largely eliminated with mDC and result in an overall improved description of molecular multipole moments, as shown in Table 1. The mDC LJ interactions were parametrized by perturbing the values used in GAFF to reduce the errors in the databases shown in Figure 5 and Table 2. mDC thus exhibits the smallest mue and largest correlation to the high-level reference data among the methods shown. One observes a significant amount of variation between the other models in Figure 5 for interactions stronger than 10 kcal/mol, which involve either multiple H-bonds or nucleobase stacking. The S66 (Procedure) crms value in Table 2 for DFTB3 and DFTB2 are skewed because the lack of a dispersion model causes the benzene–ethene and benzene–benzene π···π complexes to dissociate until the intermolecular forces become sufficiently small to terminate the geometry optimizer. If we exclude those two molecules from the DFTB3 and DFTB2 statistics, their crms drop to 0.516 and 0.506 Å, respectively. The semiempirical models cause several of the stacked nucleobases in the JSCH-2005 database to devolve into hydrogen bonded complexes upon geometry optimization. The inclusion of a dispersion model could aid those cases as well.[101] Empirical dispersion corrections for DFTB2,[76,102] NDDO-based semiempirical methods,[103,104] and even ab initio methods[105,106] have been proposed; however, we limit the scope of our comparisons to the commonly used variant of these models to make the comparisons more useful to the wider set of users instead of the niche set of model developers. Furthermore, the series of databases that we consider are not biased toward dispersion-dominated complexes; we feel our comparisons are made “fair” by using the models consistently, as opposed to affording special treatment to the outliers in an attempt to make the models appear artificially tidy. Among the semiempirical models, AM1 produces the worst nonbonded interaction energies, whereas DFTB3, DFTB2, and PM6 are of comparable quality. The reader may also note that the GAFF mue’s are roughly twice as small as the semiempirical models. This is not unexpected; it has been previously noted that nonbond interactions are problematic for these methods.[107] The MNDO/MM relative energies are slightly better than the semiempirical methods, but their geometrical errors are similar. The parametrized mDC method is applied to the prototype study of CDK2 drug screening, shown in Figure 6. CDK2 has been extensively studied due to the role it plays in tumor formation. The attention that it thus has received provides the computational community with an array of experimental results which make this system attractive to those developing new linear-scaling quantum models. The experimental results are used to validate a computational procedure by presuming that a drug has been identified as a good candidate for protein inhibition and that one is tasked with identifying similar compounds that one predicts to inhibit protein function with greater efficacy. The work of Rao[41] is the most recent study to use CDK2 to validate their computational protocols. With their method, they found an amazing R2 correlation of 0.86 upon considering solvation and entropic effects. When only gas-phase interaction energies are used for the subset of drugs that we consider in this work, their R2 drops to 0.83; however, this is still exceptionally good. For comparison, the fragment molecular orbital method (FMO) has recently been applied with MP2/6-31G* in a related study that, among other things, correlates the CDK2 experimental binding affinities to ab initio single-point calculations for a set 14 ligands with known X-ray crystal structures.[43] In that work,[43] the authors found a good R2 correlation of 0.68 between their calculations and the experimental free energy of binding. A similar study of another system[42] employed the ONIOM3 method B3LYP/6-31G*:HF/3-21G:PM3 and produced a R2 correlation of 0.64. AM1-DC has also been used[30] in the screening of zinc metalloenzymes to yield a correlation of 0.69. We observe a R2 correlation of 0.71 using mDC. MM and MNDO/MM calculations were performed to compare mDC with other methods that use the same CDK2 model and computational protocol. MM and MNDO/MM produce R2 correlations of 0.56 and 0.63, respectively. There does not appear to be a clear relationship between the quality of nonbond interactions (Figure 5 and Table 2) and the model’s correlation to the experimental CDK2 IC50s; the MNDO/MM energies correlate better than MM to the experimental IC50s even though MM more accurately reproduces the databases of high-level ab initio nonbond interactions. One may attribute this observation to the explicit treatment of polarization within mDC and MNDO/MM. Such attribution from this sole observation is speculative, in part, because semiempirical model polarizabilities are underestimated[108] by approximately 30%. Previous studies have also concluded, however, that the polarization energy plays a key role on the relative ordering of inhibitor interactions with focal adhesion kinase[109] and upon the binding of ligands to the trypsin protein.[110] Our results add support to their conclusions. QMFF’s strive to achieve high accuracy in environments of varying degrees of heterogeneity, so we are encouraged to find that mDC yields more accurate small molecule cluster interactions and produces greater correlation with experiment when applied to large biomolecular systems. A mDC calculation of CDK2 takes less than 10 s on a desktop computer. Rao’s method requires about a day of computation;[41] however, we emphasize in this comparison that we use a system with more than four times as many atoms than used by Rao. A more detailed analysis of mDC timings is provided in ref (67). The computational efficiency of mDC offers the possibility of performing configurational sampling, which may improve the correlation between the computed CDK2 interactions with the experimental IC50s. In order to perform those simulations, one would need to parametrize the MM bonded energy terms that cross the fragment boundaries. We must defer the parametrization and analysis of the bonded interactions to future work. Given that semiempirical models underestimate molecular polarizabilities, one may question if the quality of the mDC interactions degrade upon examining larger clusters or when used in condensed phase environments. To address this, Figure 7b displays the ΔE binding energy of water clusters up to ten waters. We limit our comparisons in Figure 7 to mDC and the MM water models because the errors observed in the Wat-2009 database are only amplified as more and larger clusters are examined. By this, we mean that each method in Figure 5 is observed to have a characteristic slope and that slope is largely unchanged when extended to Wat-2011. We therefore observe similar mDC and TIP3P mue’s with Wat-2011 and Wat-2009. The TIP4P-EW clusters are bound too strongly because of its use of an enhanced dipole moment. The TIP3P ΔΔE’s in Figure 7c–e do not correlate well with ab initio because the higher-energy structures have a propensity to devolve into lower-energy configurations. This symptom may be directly related to the lack of structure observed in TIP3P RDFs. The TIP4P-EW model ΔΔE’s correlate well for the larger clusters, but not for (H2O)5. One possible reason for this could be because the dipole moment of water increases as one goes from small to large clusters, thus making the enhanced dipole moment of TIP4P-EW inadequate for describing the higher-energy geometrical configurations of smaller-sized clusters. Devising an unambiguous computational experiment to test this hypothesis would require some clever partitioning of the electron density; however, we note that the average B3LYP/6-31G* Mulliken charge of oxygen computed for a series of water clusters increases with the size of the cluster and then begins to stabilize around 5 or 6 waters. This explanation therefore seems plausible. mDC correlates well with the reference data largely due to its tendency to maintain the desired H-bond geometry upon optimization. Figure 7a compares the mDC, TIP3P, and TIP4P-EW O–O RDF to the recent experimental result taken from ref (111). Both TIP4P-EW and mDC display good agreement to experiment; whereas TIP3P suffers from a well-known lack of structure beyond the first solvation peak. The mDC first solvation peak is too high, but is similar to TIP4P-EW. Although the location of mDC’s first solvation peak is in good agreement with experiment, it appears that its repulsive wall is slightly too steep at short separations. The use of an exponential-like repulsive wall, as was used in ref (67), may improve this. The mDC simulation results provided here are very preliminary. mDC extensively relies on the use of atomic multipoles and the implementation of mDC into the SANDER[9] program required us to derive and implement a new generalization of the PME method based on the spherical tensor gradient operator for multipolar charge densities. The details of the new PME method and further analysis of condensed phase properties is currently in preparation. The 3.2 ns mDC simulation of 512 waters required approximately 37 h per ns of dynamics using a 1 fs time step on a desktop workstation.

Conclusions

In this work, we parametrized a linear-scaling quantum force field for nonbonded interactions. Much effort was spent on the comparison of the method with standard MM force fields, semiempirical models, and relatively inexpensive ab initio Hamiltonians for a wide variety of nonbond interactions using a series of benchmark databases. As a part of this comparison, we provided a new benchmark database (SS7) of accurate nonbond interactions between sulfur-containing dimers. The comparisons reveal that the GAFF MM force field significantly and consistently reproduced nonbond energies with far greater accuracy than the DFTB2-mio, DFTB3-3ob, AM1, and PM6 semiempirical models and the MNDO/MM method. The parametrized mDC model is shown to produce errors approximately half those of GAFF. The benefit of mDC relative to MM and MNDO/MM was further demonstrated by correlating CDK2 ligand binding energies to their experimental inhibition constants. It was shown that mDC correlated the data with a R2 of 0.71; whereas MM and MNDO/MM produced correlations of 0.56 and 0.63, respectively. In the process of parametrizing the mDC model, we made several observations using GAFF, DFTB3, and ab initio that may influence the future development tight-binding models. We observe that DFTB3′s good hydrogen bond angles are only produced when there is significant overlap between fragments, but quickly decay into MM-like configurations when they are separated. We interpret this as the modeling of multipole interactions from within the tight-binding matrix instead of through second-order electrostatics. Furthermore, we observe systematic discrepancies between ab initio and DFTB3 electrostatic potentials in molecules containing π-bonds, sp3 oxygen and sulfur lone pairs, and sp2 nitrogen lone pairs. We are able to include the higher-order atomic multipoles within mDC without making intrusive modifications to DFTB3; whereas the extension of DFTB3 to atomic multipoles would be a significant undertaking.

92 in total

1. N2-substituted O6-cyclohexylmethylguanine derivatives: potent inhibitors of cyclin-dependent kinases 1 and 2.

Authors: Ian R Hardcastle; Christine E Arris; Johanne Bentley; F Thomas Boyle; Yuzhu Chen; Nicola J Curtin; Jane A Endicott; Ashleigh E Gibson; Bernard T Golding; Roger J Griffin; Philip Jewsbury; Jerome Menyerol; Veronique Mesguiche; David R Newell; Martin E M Noble; David J Pratt; Lan-Zhen Wang; Hayley J Whitfield
Journal: J Med Chem Date: 2004-07-15 Impact factor: 7.446

2. Electrostatic polarization is critical for the strong binding in streptavidin-biotin system.

Authors: Ye Mei; Yong L Li; Juan Zeng; John Z H Zhang
Journal: J Comput Chem Date: 2012-03-30 Impact factor: 3.376

3. Toward a Separate Reproduction of the Contributions to the Hartree-Fock and DFT Intermolecular Interaction Energies by Polarizable Molecular Mechanics with the SIBFA Potential.

Authors: Jean-Philip Piquemal; Hilaire Chevreau; Nohad Gresh
Journal: J Chem Theory Comput Date: 2007-05 Impact factor: 6.006

4. Linear-Scaling Time-Dependent Density Functional Theory Based on the Idea of "From Fragments to Molecule".

Authors: Fangqin Wu; Wenjian Liu; Yong Zhang; Zhendong Li
Journal: J Chem Theory Comput Date: 2011-09-29 Impact factor: 6.006

5. A new method for direct calculation of total energy of protein.

Authors: Xiao He; John Z H Zhang
Journal: J Chem Phys Date: 2005-01-15 Impact factor: 3.488

6. Improvement of semiempirical response properties with charge-dependent response density.

Authors: Timothy J Giese; Darrin M York
Journal: J Chem Phys Date: 2005-10-22 Impact factor: 3.488

7. A generalized many-body expansion and a unified view of fragment-based methods in electronic structure theory.

Authors: Ryan M Richard; John M Herbert
Journal: J Chem Phys Date: 2012-08-14 Impact factor: 3.488

8. Benchmark oxygen-oxygen pair-distribution function of ambient water from x-ray diffraction measurements with a wide Q-range.

Authors: Lawrie B Skinner; Congcong Huang; Daniel Schlesinger; Lars G M Pettersson; Anders Nilsson; Chris J Benmore
Journal: J Chem Phys Date: 2013-02-21 Impact factor: 3.488

9. Probing the ATP ribose-binding domain of cyclin-dependent kinases 1 and 2 with O(6)-substituted guanine derivatives.

Authors: Ashleigh E Gibson; Christine E Arris; Johanne Bentley; F Thomas Boyle; Nicola J Curtin; Thomas G Davies; Jane A Endicott; Bernard T Golding; Sharon Grant; Roger J Griffin; Philip Jewsbury; Louise N Johnson; Veronique Mesguiche; David R Newell; Martin E M Noble; Julie A Tucker; Hayley J Whitfield
Journal: J Med Chem Date: 2002-08-01 Impact factor: 7.446

10. An Efficient a Posteriori Treatment for Dispersion Interaction in Density-Functional-Based Tight Binding.

Authors: Lyuben Zhechkov; Thomas Heine; Serguei Patchkovskii; Gotthard Seifert; Helio A Duarte
Journal: J Chem Theory Comput Date: 2005-09 Impact factor: 6.006

16 in total

1. Charge-dependent many-body exchange and dispersion interactions in combined QM/MM simulations.

Authors: Erich R Kuechler; Timothy J Giese; Darrin M York
Journal: J Chem Phys Date: 2015-12-21 Impact factor: 3.488

2. Multiscale methods for computational RNA enzymology.

Authors: Maria T Panteva; Thakshila Dissanayake; Haoyuan Chen; Brian K Radak; Erich R Kuechler; George M Giambaşu; Tai-Sung Lee; Darrin M York
Journal: Methods Enzymol Date: 2015-01-22 Impact factor: 1.600

3. Nucleic acid reactivity: challenges for next-generation semiempirical quantum models.

Authors: Ming Huang; Timothy J Giese; Darrin M York
Journal: J Comput Chem Date: 2015-05-06 Impact factor: 3.376

4. Improved parameterization of interatomic potentials for rare gas dimers with density-based energy decomposition analysis.

Authors: Nengjie Zhou; Zhenyu Lu; Qin Wu; Yingkai Zhang
Journal: J Chem Phys Date: 2014-06-07 Impact factor: 3.488

Review 5. Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications.

Authors: Anders S Christensen; Tomáš Kubař; Qiang Cui; Marcus Elstner
Journal: Chem Rev Date: 2016-04-13 Impact factor: 60.622

6. VR-SCOSMO: A smooth conductor-like screening model with charge-dependent radii for modeling chemical reactions.

Authors: Erich R Kuechler; Timothy J Giese; Darrin M York
Journal: J Chem Phys Date: 2016-04-28 Impact factor: 3.488

7. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery.

Authors: Tai-Sung Lee; Bryce K Allen; Timothy J Giese; Zhenyu Guo; Pengfei Li; Charles Lin; T Dwight McGee; David A Pearlman; Brian K Radak; Yujun Tao; Hsu-Chun Tsai; Huafeng Xu; Woody Sherman; Darrin M York
Journal: J Chem Inf Model Date: 2020-09-16 Impact factor: 4.956

8. Quantum mechanical force fields for condensed phase molecular simulations.

Authors: Timothy J Giese; Darrin M York
Journal: J Phys Condens Matter Date: 2017-08-17 Impact factor: 2.333

9. Ambient-Potential Composite Ewald Method for ab Initio Quantum Mechanical/Molecular Mechanical Molecular Dynamics Simulation.

Authors: Timothy J Giese; Darrin M York
Journal: J Chem Theory Comput Date: 2016-05-26 Impact factor: 6.006

10. Intermolecular interactions in the condensed phase: Evaluation of semi-empirical quantum mechanical methods.

Authors: Anders S Christensen; Jimmy C Kromann; Jan H Jensen; Qiang Cui
Journal: J Chem Phys Date: 2017-10-28 Impact factor: 3.488