Literature DB >> 24803867

Quantum Mechanical Calculation of Noncovalent Interactions: A Large-Scale Evaluation of PMx, DFT, and SAPT Approaches.

Amanda Li¹, Hari S Muddana², Michael K Gilson².

Abstract

Quantum mechanical (QM) calculations of noncovalent interactions are uniquely useful as tools to test and improve molecular mechanics force fields and to model the forces involved in biomolecular binding and folding. Because the more computationally tractable QM methods necessarily include approximations, which risk degrading accuracy, it is essential to evaluate such methods by comparison with high-level reference calculations. Here, we use the extensive Benchmark Energy and Geometry Database (BEGDB) of CCSD(T)/CBS reference results to evaluate the accuracy and speed of widely used QM methods for over 1200 chemically varied gas-phase dimers. In particular, we study the semiempirical PM6 and PM7 methods; density functional theory (DFT) approaches B3LYP, B97-D, M062X, and ωB97X-D; and symmetry-adapted perturbation theory (SAPT) approach. For the PM6 and DFT methods, we also examine the effects of post hoc corrections for hydrogen bonding (PM6-DH+, PM6-DH2), halogen atoms (PM6-DH2X), and dispersion (DFT-D3 with zero and Becke-Johnson damping). Several orders of the SAPT expansion are also compared, ranging from SAPT0 up to SAPT2+3, where computationally feasible. We find that all DFT methods with dispersion corrections, as well as SAPT at orders above SAPT2, consistently provide dimer interaction energies within 1.0 kcal/mol RMSE across all systems. We also show that a linear scaling of the perturbative energy terms provided by the fast SAPT0 method yields similar high accuracy, at particularly low computational cost. The energies of all the dimer systems from the various QM approaches are included in the Supporting Information, as are the full SAPT2+(3) energy decomposition for a subset of over 1000 systems. The latter can be used to guide the parametrization of molecular mechanics force fields on a term-by-term basis.

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2014 PMID： 24803867 PMCID： PMC3985464 DOI： 10.1021/ct401111c

Source DB: PubMed Journal: J Chem Theory Comput ISSN： 1549-9618 Impact factor: 6.006

Introduction

Noncovalent interactions are of fundamental importance to biomolecular systems, as they help determine the structures and functions of proteins and nucleic acids and play a central role in molecular recognition. A reliable representation of noncovalent interactions therefore is critically important to computational modeling of biomolecules, with applications that include rational drug design and protein engineering.[1,2] In molecular simulations, noncovalent interactions are typically modeled by the nonbonded terms in an empirical force field.[3−9] These account for electrostatic and van der Waals interactions and may also include terms to account for time-varying changes in electronic polarization during the simulation.[10] Although the parameters in an empirical force field are typically adjusted to optimize agreement with experimental data, growing computer power and a shortage of suitable experimental data are also driving increased use of quantum mechanical (QM) calculations to parametrize and test force fields.[11−15] In addition, concerns regarding the accuracy of empirical force fields[16,17] are motivating the direct application of QM methods to the study of noncovalent binding in host–guest[18,19] and protein–ligand[20−22] systems. It would be ideal if such applications could take advantage of the highly accurate QM approach often viewed as the gold standard for computing noncovalent interactions, that is, counterpoise-corrected couple-cluster theory with single, double, and perturbative triple excitations extrapolated to the completed basis set limit.[23] However, the computational demands of such CCSD(T)/CBS CP calculations make them too time consuming for routine use in force field parametrization and prohibit direct application to biomolecular systems. As a consequence, a range of other QM methods have been developed. Because all of these methods make approximations for the sake of computational efficiency, it becomes essential to evaluate their accuracy. While there are many studies that rely on high accuracy reference results for relevant molecular systems,[24−28] there is still a need for broader comparative validation studies that will provide users and developers with a perspective of the strengths, weaknesses, and trade-offs among the various QM approaches and their applicability to specific classes of noncovalent interactions. Here, therefore, we contribute a systematic assessment of accuracy and speed for a range of QM methods using a reference data set of over 1200 gas-phase dimers, for which CCSD(T)/CBS CP reference energies are publicly available in the Benchmark Energy and Geometry Database (BEGDB).[29] The categories of the QM method examined are semiempirical, density functional theory (DFT) with and without dispersion corrections, and symmetry-adapted perturbation theory (SAPT). The semiempirical PM6[30] and PM7[31] methods, the most computationally efficient methods tested here, rely on empirically adjustable parameters and are often combined with additional interaction terms. We examine the PM6 method, with post hoc corrections for dispersion and hydrogen bonding interactions (PM6-DH2,[32,33] PM6-DH+[34]), and halogen interactions (PM6-DH2X[35]). The PM7 method, which is based on PM6, is also included without additional corrections, as its parametrization strategy accounts for such interactions. We test the widely used DFT functionals B3LYP,[36,37] B97-D,[38] and M062X,[39] with and without added dispersion corrections,[40] as well as the ωB97X-D[41] functional, which includes its own correction for dispersion. Finally, we test SAPT,[42] which is distinct from the PMx and DFT approaches in that it is applicable only to the calculation of noncovalent interactions (e.g., it cannot be applied to geometry optimizations) and that it provides an informative decomposition of the overall interaction energy into electrostatic, induction, exchange, and dispersion components. In SAPT, the interaction energy is computed as an expansion of perturbative terms, and we examine the SAPT0, SAPT2, SAPT2+, SAPT2+(3), and SAPT2+3 truncations.[26] We also explore the potential for the fast SAPT0 (fSAPT0) method to afford accurate results through empirical scaling of its energy terms, much as done previously in a smaller study,[43] and make available the detailed energy decompositions afforded by SAPT across all of the test systems, as these can be useful to guide force field parametrization.[44] The present study provides a unique perspective of the reliability and efficiency of a broad range of QM methods and should be a useful guide to their selection and further improvement.

Methods

Benchmark Data Sets

A growing collection of benchmark data sets provides high quality geometries and interaction energies for noncovalent complexes.[26] Here, we use several data sets (Table 1) from the BEGDB to explore the accuracy of various QM methods for noncovalent interactions spanning a range of system types and sizes. We study a total of 1266 dimers, ranging in size from 20 electrons in 4 atoms to 478 electrons in 101 atoms (Figure 1). These various BEGDB data sets probe different classes of noncovalent interactions. In particular, the S22×5[45] and S66×8[46] data sets both contain noncovalent complexes categorized as hydrogen bonded (electrostatics dominated), dispersion dominated, or mixed electrostatic and dispersive. X40×10[47] focuses on complexes with halogen interactions. Ionic[48] contains systems with charged hydrogen bonds. SCAI[49] and JSCH[23] contain amino acid and nucleic acid complexes. The L7[50] data set contains even larger extended complexes; all of them containing greater than 200 electrons. Several of these data sets, S22×5, S66×8,[46] X40×10,[47] and Ionic, contain geometries generated along a dissociation path relative to the equilibrium geometry and thus include many nonequilibrium conformations. Because the aug-cc-pVTZ basis set[51−54] used in the present study is not applicable to iodine, we omit the nine iodine-containing complexes in X40×10 and term the reduced data set X31×10. Lastly, we also include the A24[55] data set, which contains small complexes sized to enable comparisons of more accurate approaches that would otherwise be unfeasible for larger complexes. BEGDB provides counterpoise-corrected CCSD(T)/CBS interaction energies for all of these data sets, except for JSCH, which contains energies evaluated using CCSD(T)/CBS and MP2/CBS without counterpoise (CP) correction, and L7, which uses QCISD(T)/CBS CP. It is also worth noting that there are variations within and across these BEGDB data sets in both the basis sets and extrapolation schemes employed to obtain the CBS results. Such details are not trivial and can produce discrepancies as large as 0.7 kcal/mol, as elaborated in the Results section.

Table 1

BEGDB Data Sets Used in the Present Studya[27]

equilibrium data sets
data set	description	number of structures	geometry optimization level	reference energy level
A24[55]	small complexes of 7–11 atoms	24	CCSD(T)/CBS CP or noCP	CCSD(T)/CBS CP
S22[23]	small complexes of 8–26 atoms	22	MP2/cc-pVTZ CP noCP or CCSD(T)/cc-pV(T/Q)Z noCP	CCSD(T)/CBS CP
S66[46]	small complexes of 6–18 atoms	66	MP2/cc-pVTZ CP	CCSD(T)/CBS CP
X40[47]	complexes with halogenated molecules	40	MP2/cc-pVTZ CP	CCSD(T)/CBS CP
SCAI[49]	amino acid side-chain complexes of 22–32 atoms	24	DFT TPSS/TZVP noCP	CCSD(T)/CBS CP (D→T)
JSCH[23]	124 nucleobase complexes and 19 amino acid complexes of 29–41 atoms	143	artificial geometries, NMR structures, crystal structures, X-ray structures, MP2/cc-pVTZ noCP, or MP2/TZVPP noCP	CCSD(T)/CBS noCP or MP2/CBS noCP
L7[50]	large complexes of 48–112 atoms	7	DFT-D TPSS-D/TZVP or other	QCISD(T)/CBS CP or CCSD(T)/CBS CP

Note that the names match those on the BEGDB Web site, which are not necessarily consistent with the corresponding publications. For example, the X40, X40×10, and Ionic data sets have also been referred to as “Halogens”, “Halogensx10”, and “Charged HB”.

Figure 1

Sizes of dimers studied. For nonequilibrium data sets, only one point is shown per dimer system. The larger L7 data set is included in the inset.

Sizes of dimers studied. For nonequilibrium data sets, only one point is shown per dimer system. The larger L7 data set is included in the inset. Note that the names match those on the BEGDB Web site, which are not necessarily consistent with the corresponding publications. For example, the X40, X40×10, and Ionic data sets have also been referred to as “Halogens”, “Halogensx10”, and “Charged HB”.

Computational Methods

Semiempirical PMx energy calculations were carried out with MOPAC2012.[56] The PM6[30] methods were examined with corrections for dispersion, hydrogen bonding, and halogen interactions. PM6-DH2[32] and PM6-DH+[34] differ in the hydrogen-bonding correction used, with the latter having improved long-range behavior. PM6-DH2X[35] adds an empirical repulsive correction for halogen interactions to the same dispersion and hydrogen-bonding corrections implemented in PM6-DH2. The more recent PM7[31] method is parametrized against a larger reference data set than that used for PM6 and includes its own terms to account for dispersion and hydrogen bonding. The DFT calculations with CP correction were carried out with the aug-cc-pVTZ basis set in revision C.01 of Gaussian 09.[57] Where SCF calculations failed to converge using default run parameters, the keyword Integral = (Acc2E = 12) was used to increase the two-electron integral accuracy. B3LYP,[36,37] B97-D,[38] M062X,[39] and ωB97X-D[41] functionals were selected based on previously assessed performance for noncovalent interactions.[58] The D3 dispersion correction was applied using DFT-D3, version 3.0,[40] to B3LYP, B97-D, and M062X using the default parameters. These were optimized using a different basis set, (aug-)def2-QZVP. However, we have found that for the S22 data set, aug-cc-pVTZ and (aug-)def2-QZVP, without any dispersion correction, give results that are within 0.18 kcal/mol RMSE of each other, across all of the methods examined in the present study. We also observed that using the DFT-D3 parameters optimized for (aug-)def2-QZVP reduced the RMSE across all data sets by up to 0.12 kcal/mol, compared with using those optimized for (aug-)def2-TZVPP, the only other basis set option currently available. Becke–Johnson damping for the D3 correction[59] (D3BJ) was also tested for B3LYP and B97-D using parameters optimized for (aug-)def2-QZVP; there is no such correction available for M062X. The original “zero-damping” D3 corrections are so-called because they employ a damping function for which the dispersion energy approaches zero with small internuclear separations. We note that, with the exception of B3LYP, all the functionals already contain some treatment of dispersion. B97-D is the B97 functional with the D2 dispersion correction. M062X is already parametrized to account for dispersion. ωB97X-D utilizes its own specialized empirical dispersion correction. The SAPT[42] energy calculations at varying orders (SAPT0, SAPT2, SAPT2+, SAPT2+(3), SAPT2+3) were carried out with PSI4.[60] In SAPT, the total interaction is computed as a sum of energy terms that are each classified as resulting from electrostatic, exchange, induction, or dispersion effects. The specific truncations of the SAPT expansion are detailed in Table S1 of the Supporting Information. Due to memory limitations, only lower order SAPT calculations were completed for larger systems. Thus, L7 was evaluated only with SAPT0. SCAI was evaluated at orders through SAPT2+. JSCH was evaluated through SAPT2 with the exception of nine amino acid pair geometries (F30–K46, F30–L33, F30–Y13, F30–F49, F30–Y4, F49–K46, F49–V5, F49–W37, and F49–Y4) taken from a rubredoxin crystal structure[61] (PDB 1RB9), for which only SAPT0 calculations were completed. On the opposite end of the system size spectrum, SAPT orders up to SAPT2+3 were calculated for A24. All other data sets (S22×5, S66×8, Ionic, and X31×10) were evaluated through SAPT2+(3).

Evaluation of Accuracy and Computational Speed

We use the root mean squared error (RMSE) as the primary metric of error in comparing the various computational methods. However, the mean signed error (MSE) is also provided to further characterize the performance of each method; a negative MSE indicates that a method overestimates the attractive interactions of a noncovalent dimer. Relative error is often reported in the literature, presumably because errors are thought to increase with interaction energy. Here, however, we saw no correlation between error and interaction (R2 < 0.2 for all methods), so relative errors are not reported. Timing studies were carried out on eight CPUs of a 16-CPU node, which was dedicated entirely to the calculation being timed. The timings of DFT and SAPT methods were examined by applying them in triplicate to each system in the A24 data set and noting the shortest of the three wall-clock times reported as elapsed “real” time by the Unix time command. This timing approach accounts for the efficiency with which each method uses the eight available CPUs. The timings for DFT with D3 dispersion correction are recorded without the add-on correction, as it requires negligible resources compared to the main calculation.

Linear Scaling of SAPT0 Energy Terms

The SAPT0 interaction energy is the sum of seven energy terms, as detailed in Table S1 of the Supporting Information. In the fSAPT0 schemes a separate scaling factor is applied to some or all of these terms. The linear scaling factors for the SAPT0 energy terms were determined by randomly splitting the systems in all combined data sets, except L7, into two equal subsets. One subset was used for training and the other for testing. Thus, we applied multiple linear regression of the SAPT0 energy terms to the corresponding CCSD(T)/CBS CP reference energies for the training set to obtain a set of fitted coefficients. We then used these coefficients to compute interaction energies for the test set and computed correlation coefficients and RMSE for the test set results. This procedure was repeated 1000 times, with different random selections of the training and testing subsets. Three different fitting schemes were tested: fSAPT0(1) scales all SAPT0 energy terms; fSAPT0(2) scales only the two dispersion terms, Edisp(20) and Eexch–disp(20), treated independently; and fSAPT0(3) scales only the sum of the two dispersion terms, Edisp(20) and Eexch–disp(20). We also tried applying scaling factors to SAPT2 thru SAPT2+(3), but this did not lead to significant improvements in accuracy.

Results

We tested a spectrum of quantum mechanical methods, spanning semiempirical (PMx), DFT, and SAPT, by comparing their results with reference interaction energies for a collection of noncovalent complexes in the gas phase. The reference collection, which comprised the A24, Ionic, JSCH, L7, SCAI, S22×5, S66×8, and X31×10 data sets from BEGDB, totals 1266 entries and includes a variety of molecules—nonpolar, polar, ionized, and halogenated—in equilibrium and nonequilibrium geometries. The Supporting Information provides the interaction energies, calculated using the various QM methods for all the systems studied, along with the corresponding BEGDB reference energies. The present quantum mechanical results were compared with the highest-accuracy reference energies available in the BEGDB for the data sets used. These were generated with CCSD(T)/CBS calculations including counterpoise corrections, except as noted in the Methods section. It is worth noting that the reference energies in the S22[23] and S66[46] data sets, which are more limited versions of the S22×5 and S66×8 data sets used here, were recently revised based on larger basis sets, additional points for the CBS extrapolation, or both. The more rigorous results differ from those originally published by 0.2 and 0.1 kcal/mol RMSE, respectively, with maximum unsigned errors of up to 0.7 kcal/mol. Given that the reference energies were not computed at such a high level, they also presumably have errors similar in magnitude. This uncertainty in the reference energies implies that the present study cannot meaningfully resolve errors less than about 0.2 kcal/mol. The following subsections provide an overview of the results, followed by more detailed discussions of the PMx, DFT, and SAPT approaches. Timing comparisons are then given for the DFT and SAPT methods. Finally, we show that a simple scaling of SAPT0 energy components offers substantially improved accuracy at minimal computational cost.

Overview

As shown in Figure 2 and tabulated in Table 2, the RMSE values of the various quantum methods averaged across all data sets range from 0.52 to 3.76 kcal/mol. The methods that yield the lowest overall errors are SAPT2+(3) and M062X, both with and without its dispersion correction, but a number of methods also yield overall RMSE values within 1 kcal/mol. The methods that yield the highest overall errors are the semiempirical (PMx) methods, B3LYP without dispersion correction, and SAPT0. The MSE values are more informative; they show that both the PMx and DFT methods without dispersion corrections tend to provide interaction energies more positive (less favorable) than the reference results, while the SAPT methods tend to provide interaction energies more negative (more favorable) than the reference results. As expected, supplementing the DFT methods with negative dispersion energy terms reduces the tendency to overestimate the interaction energy; the resulting improvement is particularly striking in going from B3LYP to B3LYP-D3.

Figure 2

Table 2

Ranking of QM Methods by RMSE (kcal/mol) for Combined and Individual Benchmark Data Setsa

Dashed lines mark RMSE levels of 0.50 and 1.00 kcal/mol. Higher orders of SAPT calculations were not completed for some data sets due to memory limitations. A24 was evaluated at orders through SAPT2+3, SCAI through SAPT2+, JSCH through SAPT2, and all other data sets through SAPT2+(3).

SAPT2 calculations were only evaluated for 134 out of 143 of the JCSH systems.

Evaluation of QM methods for combined and individual benchmark data sets. Errors evaluated relative to CCSD(T)/CBS CP energies. SAPT2 calculations were only evaluated for 134 out of 143 of the JCSH systems. Dashed lines mark RMSE levels of 0.50 and 1.00 kcal/mol. Higher orders of SAPT calculations were not completed for some data sets due to memory limitations. A24 was evaluated at orders through SAPT2+3, SCAI through SAPT2+, JSCH through SAPT2, and all other data sets through SAPT2+(3). SAPT2 calculations were only evaluated for 134 out of 143 of the JCSH systems. The performance of the quantum approaches varies significantly across the data sets, as shown in Figure 2. Perhaps most striking is that the PMx methods provide substantially better relative results for the S22×5, S66×8, SCAI, and JCSH data sets and worse results for A24, Ionic, and X31×10. The problems for A24 and X31×10 appear to arise largely from errors associated specifically with halogenated molecules. The Ionic data set includes no halogens, however, and we speculate that the errors here may trace to the lack of ionized hydrogen-bonded complexes in the data sets used to parametrize the PMx methods. In addition, the minimal basis sets used in the PMx methods may have difficulty accounting for the strong polarization effects present in such ionized complexes. Other than B3LYP, lower-order SAPT, and PMx, all methods are within 1 kcal/mol RMSE of the reference energies for all sets except JSCH. For JSCH, all approaches yield larger errors (note the scale of the vertical axis in Figure 2c). This is perhaps not surprising because the JSCH data set contains the largest dimer systems, and one may expect larger systems to effectively include more interactions, each potentially associated with some level of error (Figure 1). Ranking the methods according to their overall accuracy and their accuracy on each data set (Table 2) shows that although certain methods remain near the top of the rankings across the board, the detailed ordering of the methods varies across data sets. The scatter plots in Figure 3 provide further insight into the performance of the various approaches. All the methods tested provide excellent correlation with the reference energies (R2 > 0.86), and not surprisingly, those with the largest RMSE values (Figure 2) also yield the lowest R2 values (Figure 3). This analysis also allows further characterization of the errors associated with some of the methods. First, the PMx scatter plots include outliers arranged in smooth arcs. Further analysis indicates that each arc corresponds to the dissociation curve of one dimer system, and the dimer systems that generate these arcs are ones for which the PMx method yields idiosyncratically high errors, as discussed below. Second, most of the errors of the B3LYP method are associated with dimer systems whose interactions are primarily dispersive, as indicated by the red cluster of off-diagonal points. These errors are largely corrected by addition of the D3 dispersion correction. In contrast, the tendency of the SAPT methods to overestimate dimer affinities is largely independent of interaction type, as points of all colors are found below the lines of identity in the SAPT scatter plots. Third, adding a dispersion correction to the DFT methods uniformly improves the correlation, and the D3 correction performs somewhat better than D2, where the comparison is made (B97-D versus B97-D3). Finally, the DFT methods have a weak tendency to overestimate the unfavorable energy of the most repulsive interactions, while the PMx and SAPT methods tend to assign overly favorable energies in these cases. These repulsive interactions tend to have intermediate electrostatic dispersive character, as indicated by the cyan color of these points.

Figure 3

Correlation of QM methods with CCSD(T)/CBS CP. We present only those data sets to which all methods could be applied, i.e., A24, Ionic, S22×5, S66×8, and X31×10. Each entry is colored by interaction type character (see spectrum below), which is defined as |Edisp/Eelst|, where Edisp and Eelst are the total electrostatic and dispersion energy components taken from the SAPT2+(3) calculations. It is of interest to examine how the performance of the various methods depends on whether they are applied to equilibrium versus nonequilibrium geometries. Data sets S22×5, S66×8, X31×10, and Ionic make such comparisons possible, as they contain dissociation curves for a total of 134 dimer systems (see Methods). Figure 4 compares the MSE and RMSE for each method at close, equilibrium, and far separations, defined here as 90%, 100%, and 200% of the equilibrium separations, respectively. The rankings of the methods for equilibrium geometries correlate well with the rankings for the close geometries but poorly with rankings at far separations (Table 3).

Figure 4

Evaluation of QM methods for equilibrium and nonequilibrium geometries.

Table 3

Ranking of QM Methods by RMSE (kcal/mol) for Equilibrium and Nonequilibrium Geometriesa

Dashed lines mark RMSE levels of 0.5 and 1.0 kcal/mol.

Evaluation of QM methods for equilibrium and nonequilibrium geometries. Dashed lines mark RMSE levels of 0.5 and 1.0 kcal/mol. Given that the long-range interactions are smaller in absolute terms, this observation suggests that a study of equilibrium geometries suffices to determine which methods work best overall. On the other hand, the errors rise at a short distance for all methods, so that none provide excellent accuracies for the close geometries, and only M062X has an RMSE below 0.5 kcal/mol. At far separations, all methods are within 1.0 kcal/mol RMSE, and several fall under 0.2 kcal/mol RMSE, which is comparable to the size of errors associated with basis set choice in computing the CCSD(T) correction, as discussed above.

Semiempirical PMx Methods

The semiempirical PMx methods are roughly comparable in accuracy to the DFT-D3 and higher-order SAPT methods for the S22×5, S66×8, and JSCH data sets but is considerably less accurate for A24, Ionic, and X31×10, much as previously noted.[27] We conjecture that this difference stems in part from the fact that the PMx methods considered here, as well as their corrections (e.g., DH+), were parametrized using systems similar in character to those in the S22×5, S66×8, and JSCH data sets. In addition, as suggested by the scatter plots in Figure 3, some of the larger errors of the PMx methods are associated with specific systems for which they give idiosyncratically poor results. In particular, the PMx methods supply problematic results for bromobenzene···trimethylamine and systems containing HF, i.e., HF dimer, HF···methane, HF···methanol, and HF···methylamine. Accordingly, omitting these problematic cases significantly improves the RMSE of the PMx methods by 0.8–1.8 kcal/mol for the A24 and X31×10 data sets and by 0.2–0.7 kcal/mol across the full reference collection of data sets, as shown in Table 4. The bromine–nitrogen problem, as found here in the bromobenzene···trimethylamine system, is a known issue for the PM6 method and is improved by the halogen (“X”) correction for PM6 or by going to the PM7 method. However, we have not found previous comments on the issue for HF, and we are not aware of a correction for it. The fact that HF is problematic for all of the PMx methods is evident from the fact that the corresponding RMSE values for the A24 data set, which lacks the bromobenzene···trimethylamine system, improve by 0.5–0.9 kcal/mol when only the HF systems are omitted. When the bromobenzene···trimethylamine and HF-containing systems are omitted from the PMx results, their RMSE values across all systems fall to 1.4–1.6 kcal/mol (Table 4, bottom row, right). However, this improvement does not significantly change their position in the rankings in Table 2.

Table 4

Evaluation of PMx Methods with and without Problematic Dimer Casesa

	original				no bromobenzene trimethylamine or HF
	PM6-DH+	PM6-DH2	PM6-DH2X	PM7	PM6-DH+	PM6-DH2	PM6-DH2X	PM7
A24	1.86	1.83	1.83	2.31	0.88	0.82	0.82	0.93
X31×10	3.68	3.69	3.03	2.69	1.88	1.92	2.26	1.72
all	2.16	2.20	1.89	1.76	1.44	1.48	1.59	1.52

Errors are presented as RMSE values, in kcal/mol.

Errors are presented as RMSE values, in kcal/mol. No single PMx method emerges as the most reliable from these data. For example, although PM7 has a slightly better RMSE across the entire data collection than the corrected PM6 methods, it is not clear how significant this difference is, as its relative performance is quite inconsistent across the separate data sets (Figure 2). The PM6-DH2 and PM6-DH2X methods are equivalent for all systems except those containing halogens and thus produce identical results for S22×5, S66×8, Ionic, SCAI, and JSCH data sets. As expected, using PM7 or applying the “X” correction provides significant improvement in RMSE for the halogen-containing X31×10 data set. However, removing the specific problem systems mentioned in the prior paragraph essentially eliminates the advantage of these more advanced methods. The utility of the “X” correction, which is specifically designed to improve the treatment of halogens, may be examined more closely by comparing the various PMx methods for the full X40×10 data set, for which all systems contain at least one halogen atom, and which includes both equilibrium and nonequilibrium distances. PM6-DH2X is more accurate than PM6-DH2 at all distances, except that the “X” correction generates a particularly large error (22.6 kcal/mol RMSE) for dimers at 80% of their equilibrium distances, as shown Table S2 of the Supporting Information. Interestingly, when the iodine-containing systems are omitted, to create the X31×10 subset of X40×10, the “X” correction yields improved or equal results at all distances, and the anomalously high error at short range is absent. Thus, although the “X” correction yields an overall improvement, it seems problematic for the particular case of short-ranged interactions involving iodine. The PM7 method lacks this short-range anomaly but is somewhat less accurate for the iodinated compounds at longer ranges.

DFT with and without Dispersion Corrections

The DFT methods that incorporate some treatment of dispersion show good overall performance, with RMSE values ranging from 0.52 to 0.83 kcal/mol. In contrast, uncorrected B3LYP yields a rather large RMSE of 3.76 kcal/mol, and its largest errors are associated chiefly with dispersive systems (red in Figure 2), for which the method underestimates the attractive forces. Supplementing B3LYP with attractive D3 dispersion corrections markedly improves the overall RMSE across all test systems to 0.68 kcal/mol with D3 and to 0.55 kcal/mol with D3BJ. For the B3LYP functional, the BJ-damped version of the D3 correction typically produces results closer to reference compared with zero-damped D3. Interestingly, there is no marked improvement going from the two-body D2 correction to the three-body D3 correction in the context of the B97-D method, even for the larger systems in the JSCH and SCAI data sets, where three-body contributions are expected to be more important. Furthermore, while B97-D3BJ has a lower overall RMSE across all systems compared with B97-D3 (0.65 kcal/mol compared to 0.79 kcal/mol), the former produces higher RMSE values for the Ionic and SCAI data sets. Thus, it is difficult to gauge the benefit of applying BJ-damping over zero-damping for B97-D. The uncorrected M062X method performs equally well for electrostatic and dispersive systems (Figure 3), but its accuracy appears to be slightly improved by supplementing it with the D3 dispersion term (Figure 2). The ωB97X-D method includes its own dispersion correction distinct from D2 or D3, and this method ranks well across all the data sets. It is perhaps worth noting that the counterpoise corrections for all for DFT methods (B3LYP, B97-D, M062X, and ωB97X-D) are small, averaging 0.15 kcal/mol across all methods and systems. The mean correction rises only slightly to 0.21 kcal/mol for nonequilibrium systems at close range (90% of equilibrium separation). These corrections are small, in the sense that they are similar in magnitude to the uncertainty in the reference energies used here, as discussed above. Finally, it is worth noting that the low errors observed for D3-corrected DFT functionals in S22×5 are, perhaps, unsurprising, because the same molecules in similar geometries were included in the data set used to parametrize DFT-D3.

SAPT

The accuracy of the SAPT approach tends to increase with order, as expected, and the higher orders are comparable in accuracy to the best DFT methods (Figure 2). The trend of increased accuracy with increased order holds for all individual data sets, except Ionic, for which SAPT2 yields a lower RMSE than SAPT2+. Because SAPT2+ differs from SAPT2 by only two dispersion terms, it is interesting that the inclusion of these terms seemingly degrades accuracy here. Perhaps the excellent performance of SAPT2 for this particular data set results from a fortuitous cancellation of errors. It is worth noting that all orders of SAPT tend to overestimate attractive forces, regardless of system character, as evident from the negative MSE values in Figure 2 and by inspection of the scatter plots in Figure 3. This overestimation is particularly marked for SAPT0, suggesting the presence of a systematic error that might be mitigated by a post-calculation correction. Finally, because the SAPT energy components can be useful for tuning individual force field terms,[44] we provide in the Supporting Information the detailed SAPT2+(3) decompositions for all the dimer systems studied here.

Timing Analysis

We used the A24 data set to compare the computational speeds of the various methods. The PMx methods all finished in less than 0.02s real (wall clock) time on a single CPU, making them over 1000 times faster than the DFT or SAPT methods. The latter were timed for all A24 systems on eight dedicated CPUs. Figure 5 plots real time against system size, as measured by the number of atoms, while Figure 6 plots the trade-off between accuracy and computer time. Overall, SAPT0 is clearly the fastest approach, SAPT2+(3) is the slowest, and the DFT timings are rather similar to each other and to SAPT2+. The level of accuracy broadly correlates with computer time, except in the case of uncorrected B3LYP.

Figure 5

Scaling of calculation time with system size. Results are presented for the A24 data set, where system size is measured by the number of atoms in each dimer.

Figure 6

Trade-off between accuracy and calculation time. Accuracies are presented as RMSE across the A24, Ionic, S22×5, S66×8, and X31×10 data sets, while calculation times are averaged for the A24 data set alone. Note that the post hoc D3 dispersion corrections and the SAPT0 fitting require negligible calculation time.

Scaling of calculation time with system size. Results are presented for the A24 data set, where system size is measured by the number of atoms in each dimer. Trade-off between accuracy and calculation time. Accuracies are presented as RMSE across the A24, Ionic, S22×5, S66×8, and X31×10 data sets, while calculation times are averaged for the A24 data set alone. Note that the post hoc D3 dispersion corrections and the SAPT0 fitting require negligible calculation time. The scaling of computer time with system size was examined by fitting the timings for each method to a power model of the form t = anb, where t is real time and n is the number of atoms or electrons. The curve fits are detailed in Table S3 of the Supporting Information. As shown in Figure 5, all of the DFT methods except ωB97X-D have exponents of about 2.5 and prefactors of about 0.4. The ωB97X-D DFT method appears to scale rather differently, as its exponent and prefactor are 1.65 and 3.13, respectively. On the other hand, the R2 value of its fit to the power model is only 0.67, so its scaling behavior is not clearly defined by these data. The exponents for the SAPT methods increase with order. SAPT0 scales as the number of atoms to the power 1.49, while the SAPT2+(3) time varies as the 2.44 power. Analogous trends across the methods are observed when one fits the timings to the number of electrons in the system, rather than the number of atoms. Perhaps surprisingly, however, the R2 values of the fits are much lower, as evident in Table S3 of the Supporting Information.

Linear Scaling of SAPT0 Energy Terms

Of the methods tested here, SAPT0 is faster than all but the semiempirical PMx methods, as detailed above. The fact that it decomposes the total dimer interaction energy into seven contributions, which capture aspects of electrostatics, exchange, induction, and dispersion, provides an opportunity to try generating a fast method with improved accuracy by scaling these contributions, as detailed in Methods. Table 5 lists the means and standard deviations of the resulting scaling coefficients for the SAPT0 terms across the 1000 different training sets and the mean and standard deviations of the RMSE and R2 values when the trained coefficients are applied to the respective test sets. Most of the scaling coefficients are near unity; the term that requires the most scaling is the Eexch–disp(20) term. Scaling all terms in fSAPT0(1) produces the lowest test set RMSE, followed by scaling the dispersion terms individually in fSAPT0(2), and then by scaling the summed dispersion terms in fSAPT0(3). The fact that these results are obtained on test sets not used to set the coefficients means that the improvement in performance for the more highly parametrized models do not reflect overfitting. The accuracy of the three scaling schemes is also compared with the various QM methods in Figures 2 and 3 and Tables 2 and 3. Across all data sets, except L7, applying scaling factors to the SAPT0 terms reduces the RMSE from 1.58 kcal/mol to as low as 0.47 kcal/mol and corrects the tendency of SAPT0 to overestimate the attractive nature of the dimer interactions. Indeed, the fitted SAPT0 results approach the accuracy of the DFT methods, with the differences within the estimates of CCSD(T)/CBS basis set choice errors (above). Note that this improvement in SAPT0, through the application of simple scaling factors, incurs negligible additional computational cost, so that the fSAPT0 scaling methods provide a particularly favorable combination of efficiency and accuracy, as shown in Figure 6. Figure S1 of the Supporting Information furthermore examines the accuracy of fSAPT0, as well as the other QM methods, for the large noncovalent complexes of the L7 data set; the results are generally consistent with those obtained for the other data sets. The energy components of the fitted SAPT0 method still correlate well with the corresponding energy components calculated at the SAPT2+(3) level, as detailed in Table S4 of the Supporting Information. The good agreement suggests that the energy decomposition derived using the scaled terms is still physically meaningful.

Table 5

Linear Scaling Factors for SAPT0/aug-cc-pVTZ Energy Termsa

	fSAPT0(1)	fSAPT0(2)	fSAPT0(3)
E_elst,r⁽¹⁰⁾	1.01 ± 0.02	1.00b	1.00b
E_exch⁽¹⁰⁾	1.02 ± 0.02	1.00b	1.00b
E_ind,r⁽²⁰⁾	0.76 ± 0.08	1.00b	1.00b
E_exch–ind,r⁽²⁰⁾	0.70 ± 0.08	1.00b	1.00b
δE_HF,r⁽²⁾	1.06 ± 0.08	1.00b	1.00b
E_disp⁽²⁰⁾	0.93 ± 0.01	0.96 ± 0.02	0.76c ± 0.01
E_exch–disp⁽²⁰⁾	1.7 ± 0.2	2.1 ± 0.2	0.76c ± 0.01
test RMSE	0.66 ± 0.06	0.82 ± 0.05	0.93 ± 0.04
test R²	0.995 ± 0.009	0.993 ± 0.001	0.992 ± 0.001

Three different fitting schemes were tested: fSAPT0(1) scales all terms; fSAPT0(2) scales only the two dispersion terms, Edisp(20) and Eexch–disp(20), treated independently; and fSAPT0(3) scales only the sum of the two dispersion terms, Edisp(20) and Eexch–disp(20). The scaling factors were determined over 1000 iterations of multiple linear regression on randomly selected training subsets of the dimer systems, while RMSE and R2 were evaluated over the same iterations using test subsets comprising all dimer systems not included in the training subset. Training and test subsets were equal in size.

Not fitted.

Both dispersion terms share a single fitted coefficient.

Discussion

The present study systematically evaluates the accuracy and speed of a broad range of electronic structure methods for estimating noncovalent interaction energies. Methods spanning PMx, DFT, and SAPT were applied to over 1200 geometries of gas-phase dimers drawn from the BEGDB resource, which is tailored to probe a variety of interaction motifs relevant to biomolecules and drug-like compounds. These results offer useful guidance regarding which methods are most suitable for various types of applications where “gold-standard” CCSD(T)/CBS CP calculations are too time consuming or impractical, as now discussed. The PMx methods studied here are dramatically faster than both the DFT and SAPT approaches, and they are more readily applied to larger molecules. However, they are in general less accurate, particularly for halogenated and ionic molecules, as well as for a few types of systems with idiosyncratic results.[62] Perhaps surprisingly, none of the various PMx methods tested here are clearly superior to the others in terms of overall accuracy. The DFT methods are slower and more difficult to apply to large systems, but they can achieve high accuracy, so long as dispersion is accounted for, either implicitly, as in M062X, or via an add-on term, as in B97-D3. The performance of the SAPT approach depends strongly on the order of the SAPT expansion. The SAPT2+ and SAPT2+(3) orders span the range of accuracy seen for the dispersion-corrected DFT methods. However, while the speed of the SAPT2+ method is comparable to that of the DFT methods, the more accurate SAPT2+(3) is considerably slower. It is also worth noting that, at least in the current PSI4 software, the memory requirements of SAPT at orders higher than SAPT0 can become problematic for the larger systems examined here. The lowest order of SAPT, SAPT0, is similar in accuracy to the PMx methods but significantly slower. However, we find that a simple empirical scaling of one or more SAPT0 energy terms leads to accuracy approaching that of the best DFT methods, at far less computational cost. With further development, an empirically adjusted SAPT0 approach might provide a powerful alternative to DFT methods for the study of noncovalent interactions in larger systems. The results of this study have implications for improving the treatment of noncovalent interactions in molecular modeling, as QM calculations are used to guide the development of force fields for simulation and may even replace force fields in some applications. The more accurate DFT and DFT-D3 methods maybe most suitable for force field parametrization given their reliability and consistency across many types of molecular systems and the fact that their moderate computational cost is not a major liability for this application. Despite the high speed of the PMx methods, their lower accuracy, especially for ionic systems and halogens, along with occasional idiosyncratic performance, makes them less suitable for parametrization of force fields. However, continued development of such semiempirical methods, including training on broader data sets, remains promising. In addition, these methods may already be more accurate than typical simulation force fields, so their high speed makes them a reasonable choice for direct modeling of biomolecular systems. The higher order SAPT methods are about as accurate as DFT but are relatively slow, while SAPT0 is fast but inaccurate. Interestingly, the scaled SAPT0 method offers a promising blend of accuracy and computational speed, especially for larger molecular systems. In addition, the present scaling approach is relatively simple, and more sophisticated schemes that account for geometry and chemistry might be even more accurate at minimal computational cost.

43 in total

Review 1. Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions.

Authors: W Wang; O Donini; C M Reyes; P A Kollman
Journal: Annu Rev Biophys Biomol Struct Date: 2001

Review 2. Force fields for protein simulations.

Authors: Jay W Ponder; David A Case
Journal: Adv Protein Chem Date: 2003

Review 3. Empirical force fields for biological macromolecules: overview and issues.

Authors: Alexander D Mackerell
Journal: J Comput Chem Date: 2004-10 Impact factor: 3.376

4. Unexpectedly strong energy stabilization inside the hydrophobic core of small protein rubredoxin mediated by aromatic residues: correlated ab initio quantum chemical calculations.

Authors: Jirí Vondrásek; Lada Bendová; Vojtech Klusák; Pavel Hobza
Journal: J Am Chem Soc Date: 2005-03-02 Impact factor: 15.419

5. Large-scale validation of a quantum mechanics based scoring function: predicting the binding affinity and the binding mode of a diverse set of protein-ligand complexes.

Authors: Kaushik Raha; Kenneth M Merz
Journal: J Med Chem Date: 2005-07-14 Impact factor: 7.446

6. Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs.

Authors: Petr Jurecka; Jirí Sponer; Jirí Cerný; Pavel Hobza
Journal: Phys Chem Chem Phys Date: 2006-03-07 Impact factor: 3.676

7. Semiempirical GGA-type density functional constructed with a long-range dispersion correction.

Authors: Stefan Grimme
Journal: J Comput Chem Date: 2006-11-30 Impact factor: 3.376

8. Assessment of standard force field models against high-quality ab initio potential curves for prototypes of pi-pi, CH/pi, and SH/pi interactions.

Authors: C David Sherrill; Bobby G Sumpter; Mutasem O Sinnokrot; Michael S Marshall; Edward G Hohenstein; Ross C Walker; Ian R Gould
Journal: J Comput Chem Date: 2009-11-15 Impact factor: 3.376

9. Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections.

Authors: Jeng-Da Chai; Martin Head-Gordon
Journal: Phys Chem Chem Phys Date: 2008-09-29 Impact factor: 3.676

10. Optimization of parameters for semiempirical methods V: modification of NDDO approximations and application to 70 elements.

Authors: James J P Stewart
Journal: J Mol Model Date: 2007-09-09 Impact factor: 1.810

12 in total

1. AMOEBA+ Classical Potential for Modeling Molecular Interactions.

Authors: Chengwen Liu; Jean-Philip Piquemal; Pengyu Ren
Journal: J Chem Theory Comput Date: 2019-06-11 Impact factor: 6.006

2. Nucleic acid reactivity: challenges for next-generation semiempirical quantum models.

Authors: Ming Huang; Timothy J Giese; Darrin M York
Journal: J Comput Chem Date: 2015-05-06 Impact factor: 3.376

Review 3. Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications.

Authors: Anders S Christensen; Tomáš Kubař; Qiang Cui; Marcus Elstner
Journal: Chem Rev Date: 2016-04-13 Impact factor: 60.622

10. Cation-π interactions in protein-ligand binding: theory and data-mining reveal different roles for lysine and arginine.

Authors: Kiran Kumar; Shin M Woo; Thomas Siu; Wilian A Cortopassi; Fernanda Duarte; Robert S Paton
Journal: Chem Sci Date: 2018-01-31 Impact factor: 9.825

Quantum Mechanical Calculation of Noncovalent Interactions: A Large-Scale Evaluation of PMx, DFT, and SAPT Approaches.

Introduction

Methods

Benchmark Data Sets

Computational Methods

Evaluation of Accuracy and Computational Speed

Linear Scaling of SAPT0 Energy Terms

Results

Overview

Semiempirical PMx Methods

DFT with and without Dispersion Corrections

SAPT

Timing Analysis

Linear Scaling of SAPT0 Energy Terms

Discussion

Review 1. Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions.

Review 2. Force fields for protein simulations.

Review 3. Empirical force fields for biological macromolecules: overview and issues.

4. Unexpectedly strong energy stabilization inside the hydrophobic core of small protein rubredoxin mediated by aromatic residues: correlated ab initio quantum chemical calculations.

5. Large-scale validation of a quantum mechanics based scoring function: predicting the binding affinity and the binding mode of a diverse set of protein-ligand complexes.

6. Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs.

7. Semiempirical GGA-type density functional constructed with a long-range dispersion correction.

8. Assessment of standard force field models against high-quality ab initio potential curves for prototypes of pi-pi, CH/pi, and SH/pi interactions.

9. Long-range corrected hybrid density functionals with damped atom-atom dispersion corrections.

10. Optimization of parameters for semiempirical methods V: modification of NDDO approximations and application to 70 elements.

1. AMOEBA+ Classical Potential for Modeling Molecular Interactions.

2. Nucleic acid reactivity: challenges for next-generation semiempirical quantum models.

Review 3. Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications.

4. Attractive Interactions between Heteroallenes and the Cucurbituril Portal.

Review 5. Enhanced semiempirical QM methods for biomolecular interactions.

6. Discovering the stacking landscape of a pyridine-pyridine system.

Review 7. Quantum Chemical Modeling of Hydrogen Bonding in Ionic Liquids.

8. Conformational design concepts for anions in ionic liquids.

Review 9. Recent Progress in Treating Protein-Ligand Interactions with Quantum-Mechanical Methods.

10. Cation-π interactions in protein-ligand binding: theory and data-mining reveal different roles for lysine and arginine.