Literature DB >> 31136709

Minimally Empirical Double-Hybrid Functionals Trained against the GMTKN55 Database: revDSD-PBEP86-D4, revDOD-PBE-D4, and DOD-SCAN-D4.

Golokesh Santra¹, Nitai Sylvetsky¹, Jan M L Martin¹.

Abstract

We present a family of minimally empirical double-hybrid DFT functionals parametrized against the very large and diverse GMTKN55 benchmark. The very recently proposed ωB97M(2) empirical double hybrid (with 16 adjustable parameters) has the lowest WTMAD2 (weighted mean absolute deviation over GMTKN55) ever reported at 2.19 kcal/mol. However, refits of the DSD-BLYP and DSD-PBEP86 spin-component-scaled, dispersion-corrected double hybrids can achieve WTMAD2 values as low as 2.33 with the very recent D4 dispersion correction (2.42 kcal/mol with the D3(BJ) dispersion term) using just a handful of adjustable parameters. If we use full DFT correlation in the initial orbital evaluation, the xrevDSD-PBEP86-D4 functional reaches WTMAD2 = 2.23 kcal/mol, statistically indistinguishable from ωB97M(2) but using just four nonarbitrary adjustable parameters (and three semiarbitrary ones). The changes from the original DSD parametrizations are primarily due to noncovalent interaction energies for large systems, which were undersampled in the original parametrization set. With the new parametrization, same-spin correlation can be eliminated at minimal cost in performance, which permits revDOD-PBEP86-D4 and revDOD-PBE-D4 functionals that scale as N4 or even N3 with the size of the system. Dependence of WTMAD2 for DSD functionals on the percentage of HF exchange is roughly quadratic; it is sufficiently weak that any reasonable value in the 64% to 72% range can be chosen semiarbitrarily. DSD-SCAN and DOD-SCAN double hybrids involving the SCAN nonempirical meta-GGA as the semilocal component have also been considered and offer a good alternative if one wishes to eliminate either the empirical dispersion correction or the same-spin correlation component. noDispSD-SCAN66 achieves WTMAD2 = 3.0 kcal/mol, compared to 2.7 kcal/mol for DOD-SCAN66-D4. However, the best performance without dispersion corrections (WTMAD2 = 2.8 kcal/mol) is reached by revωB97X-2, a slight reparametrization of the Chai-Head-Gordon range-separated double hybrid. Finally, in the context of double-hybrid functionals, the very recent D4 dispersion correction is clearly superior over D3(BJ).

Entities: Chemical

Year: 2019 PMID： 31136709 PMCID： PMC9479158 DOI： 10.1021/acs.jpca.9b03157

Source DB: PubMed Journal: J Phys Chem A ISSN： 1089-5639 Impact factor: 2.944

Introduction

Large and chemically diverse standardized reference data sets play a crucial role in the validation of new approximate computational chemistry methods (not just density functional methods but also semiempirical molecular orbital methods (e.g., ref (1)), composite wave function ab initio schemes,[2−6] and machine learning-assisted approaches[7]). If these approaches are devoid of empirical parameters (such as the “non-empirical” DFT functionals PBE,[8] TPSS,[9] and SCAN,[10] as well as the ccCA,[11] W1,[5,12] and W1–F12[6] approaches), then the purpose of these data sets is only validation. If the methods include empirical parameters, however (such as in “empirical” DFT functionals, e.g., B97-1,[13] HCTH,[14] BMK,[15] M06,[16] MN15,[17] and many others), then such data sets take on the additional role of “training sets” or parametrization sets. In the earliest days, small sets of experimental data were used for this purpose, e.g., for B3LYP[18] and EDF-1;[19] as the practical limitations of this approach became apparent, Handy[14,20] pioneered the use of high-level wave function ab initio data for the same purpose. This approach has perhaps been taken furthest by the Head-Gordon group in their combinatorially optimized ωB97X-V,[21] B97M-V,[22] ωB97M-V,[23] and ωB97M(2)[24] functionals. In Perdew’s “Jacob’s Ladder” metaphor,[25] Hartree theory represents the “Earthly vale of tears” and the introduction of each new type of information one more rung on the “Jacob’s Ladder” ascending to the Heaven of chemical accuracy. The first rung corresponds to the local density approximation, where the XC (exchange–correlation) functional only depends on the density ρ. The reduced density gradient is introduced on the second rung, leading to the various GGA (generalized gradient approximation) functionals. The introduction of the Laplacian (or the kinetic energy density, which contains similar information) creates the third rung, the meta-GGAs. The fourth rung introduces dependence on the occupied orbitals;[26] the most important special cases are the different types of hybrid functionals. Finally, the fifth rung corresponds to dependence on virtual orbitals, such as double-hybrid functionals. The term “doubly hybrid” was first coined to denote the linear combination of GGA correlation and MP2 correlation from HF orbitals,[27] but since the landmark paper of Grimme,[28] the term “double hybrid” has come to refer exclusively to the admixture of (meta)GGA DFT correlation with GLPT2 (second-order Görling–Levy[29] perturbation theory) correlation from hybrid (meta)GGA DFT orbitals. In the first step of a double-hybrid calculation, a Kohn–Sham calculation is carried out for a given semilocal exchange–correlation (XC) functional with a fraction c′ of Hartree–Fock exchange and (1 – c′) of XC exchange, plus XC correlation damped by a factor c′C,DFA. With the converged Kohn–Sham orbitals at the end, the total energy is then evaluated in the second step aswhere EN1e stands for the nuclear repulsion and one-electron energy term; Ex,HF is the Hartree–Fock exchange energy, and cx the fraction of Hartree–Fock-like exchange energy; Ex,XC and Ec,XC are the exchange and correlation energies, respectively, for the given semilocal XC functional with the converged density from the first step, and cc,XC is the fraction of DFT correlation energy; E2ab and E2ss are the opposite-spin and same-spin MP2-like energies obtained in the basis of Kohn–Sham-like orbitals from the first step, and c2ab and c2ss are the linear coefficients for the same; and, finally, Edisp is the dispersion energy obtained from a given empirical dispersion model (e.g., D2,[30] D3(0),[31] D3(BJ),[31,32] or very recently,[33,34] D4) or nonlocal dispersion functional (such as VV10[35]), which can itself be dependent on scaling or shaping parameters. In much of the present work, Edisp is proportional to a prefactor s6. In the original Grimme approach,[28]c′x = cx, c′c,XC = cc,XC, and c2ab= c2ss; the last constraint in practice restricts[36,37] the choice of correlation functionals to LYP. In the DSD (Dispersion-corrected, Spin-component-scaled Double hybrid[38]) functionals of Kozuch and Martin, the constraint c2ab = c2ss is relaxed; this was found[36,37] to enable a broader variety of exchange–correlation functionals, with DSD-PBEP86 the best performer at that point. In the GMTKN55 benchmark paper, the best two performers were DSD-PBEP86 and DSD-BLYP, followed by B2GP-PLYP.[39] In the XYG3 approach of Xu and co-workers,[40]c′c,XC = 1 ≠ cc,XC and c′x may differ from cx; the implication of this choice has been discussed at length elsewhere.[41−43] With two exceptions (namely, ωB97M(2)[24] and xrevDSD-PBEP86-D4, see below), functionals of this form are not discussed in this paper. The special case c′x = cx in a DSD context has been denoted xDSD.[42] Double-hybrid DFT has been reviewed by Goerigk and Grimme,[44] by Sancho-Garcia and Adamo,[45] and by Xu and co-workers.[40,46] An extensive comparative study between nonempirical[47] and semiempirical (e.g., refs (28), (36), (37), (39), and (48−50)) double hybrids was recently made by Goerigk and co-workers,[51] who found semiempirical functionals (at present) to be more accurate and more robust. For some recent perspectives on “the functional zoo” (Perdew’s term), see, e.g., refs (52−54). Perhaps the two most extensive and chemically diverse validation data sets around are GMTKN55 (General Main-group Thermochemistry, Kinetics, and Noncovalent interactions, 55 problem subsets) of Goerigk, Grimme, and co-workers,[55] which has about 1500 nonredundant reaction energies and barrier heights, and the even larger MGCDB84 (Main-Group Chemistry DataBase, 84 problem subsets) of Mardirossian and Head-Gordon,[53] which has close to 5000 such nonredundant energy differences. These databases themselves incorporate and extend upon earlier work by these authors themselves (e.g., refs (41) and (56)), by the Minnesota group (refs (54) and (57) and references therein), by the Hobza group (particularly for noncovalent interactions[58−61]), and by the present research team (e.g., refs (39) and (62−73)) Such large and unwieldy reference data sets have themselves inspired the statistical search for representative subsamples that would recover most of the variation in the underlying data set yet be much easier to handle, particularly as training sets where re-evaluation of the whole data set at every parameter combination would quickly become unwieldy. To the authors’ knowledge, the first such study was ref (74); the two most recent ones are MG8 by Chan,[75] a 60-reaction subset of MGCDB84 obtained through lasso regularization, and “Diet-GMTKN55” by Gould,[76] the latter of which proposes 30-, 100-, and 150-reaction “Diet” versions of GMTKN55. Aside from “rapid prototyping”, these could in principle serve as training sets for empirical functionals, with the full data set then used for validation purposes. Our explorations on the suitability of such reduced training sets for functional development will be discussed elsewhere. In the present paper, we focus instead on the full GMTKN55 benchmark as being sufficiently large and chemically varied that parametrization and validation against it is largely immune to sample bias. To the best of our knowledge, the present work is the first paper in which the full GMTKN55 data set is used as a training set for DFT functionals, although we are also validating some new functionals not covered in the original GMTKN55 paper (for technical reasons). We will show below that: the most accurate functional that does not entail the fifth rung of Perdew’s “Jacob’s Ladder”[25] is the combinatorially optimized, range-separated, hybrid meta-GGA ωB97M-V, again by the Berkeley group;[23] if the search is widened to fifth-rung options, the combinatorially optimized, range-separated, double-hybrid ωB97M(2) by Mardirossian and Head-Gordon[24] is at present the most accurate functional available for general main-group chemistry; this having been said, reparametrized versions of DSD-BLYP-D3(BJ)[38] and DOD-PBEP86-D3(BJ)[36,37] fitted to GMTKN55 come quite close in performance with just one-third the number of empirical parameters; replacing the D3(BJ) dispersion correction by the more modern, partial-charge dependent D4 model significantly enhances performance; the xrevDSD-PBEP86-D4 model affords a statistically equivalent WTMAD2 to ωB97M(2), as does its xrevDOD-PBEP86-D4 variant, which is amenable to reduced-scaling MP2 implementations; if one eschews empirical dispersion corrections, then the noDispSD-SCAN63 functional proposed in the present work offers the best performance; while performance over GMTKN55 is markedly improved from the original versus the refitted DSD functionals, performance for small-molecule atomization energies and barrier heights is barely affected (the improvements are seen in large-molecule isomerization and reaction energies where there is an important dispersion component); and this presents a cautionary tale about “overfitting” to small and insufficiently diverse reference samples.

Computational Methods

Reference Data

As our primary parametrization and validation set, we used the comprehensive GMTKN55 benchmark[55] of Goerigk, Grimme, and co-workers. This set, itself a further expansion and update of earlier GMTKN[56] and GMTKN30[41] data sets, is a composite of 55 chemical problem types, ranging from small-molecule thermochemistry and barrier heights to large-molecule isomerization energies, noncovalent interactions, conformational equilibria, self-interaction errors, heavy p-block chemistry, ion chemistry, ... intending to cover all aspects of main-group chemistry. Their reference data had been compiled from high-level ab initio benchmark studies in the literature, supplemented by some new benchmark calculations of their own. A detailed breakdown of the 55 subsets (and full source information for the original reference data) can be found in Table S1 in the Supporting Information; suffice to say a full evaluation entails 2459 electronic structure calculations for 1499 chemical energy differences. The reference geometries, charge and multiplicity information, and reference data were extracted from the ACCDB database of Morgante and Peverati.[77] While initial runs were made with the help of the Snakemake[78] workflows defined as part of ACCDB, once we had a full set of input files, we elected to use our own scripting. Data analysis was carried out using a Fortran program developed in-house and available on request from the authors. The primary metric and “objective function” employed is the WTMAD2 (weighted mean absolute deviation, type 2) as defined by Goerigk et al.[55] It seeks to compensate both for the different energy scales various properties are on and for the different sizes of the various subsets:in which |ΔE| is the mean absolute value of all the reference energies for subset i, N is the number of systems in the subset, and MAD represents the mean absolute difference between calculated and reference reaction energies for subset i. We note that MAD is a more “robust statistic”[79] than the root-mean-square deviation, in the statistical sense that MAD is less prone to hypersensitivity to one or a few “outlier” points than the RMSD (root-mean-square deviation), even as the latter is more useful for spotting “troublemakers” for the exact same reason.

Electronic Structure Details

Reference geometries were used “as is” and not optimized further. The Weigend–Ahlrichs def2-QZVPP basis set[80] was used for most systems, except for the subsets WATER27, RG18, IL16, G21EA, and AHB21 where we used the diffuse-function augmented def2-QZVPPD instead,[81] and the large-molecule isomerization subsets C60ISO and UPU23, where we compromised on the def2-TZVPP basis set for reasons of computational cost.[80] All calculations were carried out using Q-CHEM 5.1.1[82] running on the ChemFarm HPC cluster of the Weizmann Institute Faculty of Chemistry. For GGAs and double hybrids derived from them, initial calculations employed the SG-2 integration grid,[83] which is a pruned (75 302) grid roughly comparable to the (Grid = Fine) in Gaussian; the notation stands for the direct product of a 75-point Euler-Maclaurin radial quadrature[84,85] and a 302-point Lebedev angular grid (see ref (86) and references therein). For meta-GGAs and double hybrids derived from them, we employed the larger SG-3 grid, which is a pruned (99 590) grid roughly comparable to (Grid = UltraFine) in Gaussian; ultimately, we also recalculated the GGAs and double-hybrid GGAs from them. As can be seen in the Supporting Information (Table S12), the switch to an SG-3 grid makes a major difference for the RG18 rare gas complexes subset, and a minor but nontrivial one for the anionic subsets. For the SCAN (strongly constrained and appropriately normed[10]) meta-GGA, which exhibits a well-known[87] integration grid hypersensitivity, after some experimentation we decided on an unpruned (150 590) grid, which for a subset we checked for convergence against an even larger (200 974) grid. The combinatorially optimized range-separated hybrid (RSH) GGA ωB97X-V,[21] its RSH meta-GGA successor ωB97M-V[23] (and its meta-GGA ancestor B97M-V[22]), and finally, its very recent double-hybrid spinoff ωB97M(2) were evaluated using their respective implementations in Q-CHEM,[24] as was the older range-separated double-hybrid GGA ωB97X-2(TQ) of Chai and Head-Gordon.[88] In the GMTKN55 paper, all electrons were correlated in the MP2-like steps of the double hybrids. While the def2-QZVPP basis set used there and in the present work is not really suitable for core–valence correlation, we have calculated statistics both with and without frozen inner-shell orbitals. In both cases, however, we have elected to correlate the subvalence electrons of the metal and metalloid atoms in subsets MB16–43, HEAVY28, HEAVYSB11, ALK8, and ALKBDE10 sets, as the core–valence gaps with default frozen orbital settings are smaller than 1 hartree. Indeed, for alkali and alkali earth oxides and halides, subvalence (n – 1)p orbitals may otherwise intrude into the valence band and thus result in nonsensical dissociation energies with standard frozen-core settings, as discussed at length in, e.g., refs (89) and (90). Initially, we unfroze all subvalence electrons in HEAVY28 and HAL59. (The importance of subvalence correlation for halogen-bonded species has been shown previously[63] for the X40x10 data set.[60]) However, while the added binding energy from (1s) orbitals of CNOF elements, on the order of 0.3 kcal/mol, might be negligible compared to covalent bond dissociation energies, it represents a nontrivial percentage of the small reaction energies in these sets, which owing to the weighting of subsets by inverse mean absolute reaction energies in WTMAD2 causes an increase of up to 0.15 kcal/mol. (As shown in ref (38), double-hybrid parameters optimized with core correlation have slightly different coefficients to compensate for the extra correlation energy.) We have hence, in these systems, manually unfrozen the (n – 1)spd orbitals of the heavy elements but not any deeper core orbitals such as the (1s) of first-row elements. For the sake of comparison and completeness, we also evaluated WTMAD2 in the exact same manner for some additional functionals on rungs 2–4. They were the GGAs PBE,[8] revPBE,[91] and B97-D3(BJ);[31] the meta-GGAs TPSS,[9] revTPSS,[92] SCAN,[10] the global hybrid GGAs B3LYP,[18,93] PBE0,[94] and its analog revPBE0;[94] the global hybrid meta-GGAs PW6B95,[95] M06,[96] M06-2X,[96] SCAN0,[97] and MN15.[17] With the exception of M06 and M06-2X, all these functionals were used in conjunction with D3(BJ) corrections; parameters published in ref (55) were used throughout. For M06, D3(0) was used instead as recommended in ref (41), where it was found that D3(BJ) parameter optimizations for M06 and M06-2X diverge; for M06, we employed D3(0) instead as per their recommendation, while for M06-2X, we found that D3(0) in fact yields slightly worse WTMAD2 than the uncorrected functional. While the difference is statistically insignificant, we used the uncorrected M06-2X instead according to lex parsimoniae (a.k.a. Occam’s Razor).

Optimization Details

The BOBYQA (Bound Optimization BY Quadratic Approximation) derivative-free constrained optimizer[98] by Powell was used as the core of a computer program and collection of shell scripts developed in-house. A DSD double hybrid, if fully optimized, has six empirical parameters: the fraction of HF exchange cX,HF (the fraction of semilocal DFT exchange is always cX,DFT = 1 – cX,HF) the fraction of semilocal DFT correlation cC,DFT the fraction of opposite-spin second-order GLPT correlation energy c2ab the fraction of same-spin second-order GLPT correlation energy c2ss= c2aa+bb the prefactor s6 for the D3(BJ) empirical dispersion correction[31,32,99] the length scale parameter a2 for the D3(BJ) damping function (as in refs (36) and (37),we are setting a1 = 0; we are also setting s = 0 as in refs (36) and (37) and in the SCAN-D3(BJ)[87] paper) For a given pair of values for (a, b), it is possible to obtain the optimal group of (c–f) parameters without re-evaluating any electronic structure calculations, simply by extracting individual energy components from the electronic structure calculations, evaluating total energies and hence WTMAD2 for a given combination of {c2ab, c2ss, s6, a2}, and minimizing WTMAD2 with respect to these four parameters using BOBYQA. This could then constitute an inner “microiteration” loop, while the outer “macroiteration” loop consists of varying {cX,DFT, cC,DFT} and rerunning all 2459 calculations with the new parameters. We considered, however, placing one or both variables in the microiteration loop, with the optimum values from the microiterations to be used in the macroiterations, and so forth until “self-consistency” has been reached. While the coupling between (a) and (c, d) proved too strong for this to be viable for cX,DFT, we found that for a fixed value of cX,DFT, convergence of cC,DFT to two decimal places or better typically does not require more than two macroiterations. Hence, we have adopted the practice of microiterating {cC,DFT, c2ab, c2ss, s6, a2} at every macroiteration by means of BOBYQA. In fact, with full microiteration cycles, additional macroiterations beyond the first typically do not have significantly improve performance unless the starting guess is especially poor, and the output of the first cycle is reported. The D3(BJ) corrections were computed for a finely spaced grid in a2 using the standalone DFTD3 program by Grimme and co-workers (https://www.chemie.uni-bonn.de/pctc/mulliken-center/software/dft-d3/). Values for intermediate a2 were obtained by interpolation. We found, however (see below), that if s6 is part of the microiterations, then the WTMAD2 surface is sufficiently flat in a2 that fixing a2 at semiarbitrary values both stabilizes the optimization and has negligible effect on the final WTMAD2. For a DOD double hybrid, c2ss = 0, leaving just four parameters cC,DFT, c2ab, s6, a2} for the microiterations, while for a DSD-noDisp, s6 = 0 and a2 is irrelevant, leaving just three parameters {cC,DFT, c2ab, c2ss} in the inner loop.

Results and Discussion

WTMAD2 performance metrics over the GMTKN55 data set are given in Table . Results for a large number of GGA, meta-GGAs, and hybrid functionals have been given in the GMTKN55 paper[55] and will not be repeated here. We have repeated the calculations for B3LYP, PBE0, and ωB97X-V as a sanity check; in addition, we have evaluated the B97M-V and ωB97M-V functionals, which were not considered in the original GMTKN55 paper but were covered in a recent follow-up study by Najibi and Goerigk (NG),[100] and finally the ωB97M(2) double hybrid, for which no GMTKN55 statistics have yet been reported.

Table 1

WTMAD2 Values (kcal/mol) for Various Functionals Using the Full GMTKN55 Databasea

SCAN0-2 can also be written DSD79-SCAN, DSD69-SCAN as DSD-SCAN-QIDH, and DSD55-SCAN as DSD-SCAN-CIDH.

SCAN0-2.[97]

SCAN0-QIDH.[97]

SCAN0-DH.[97]

a2 = 7.9042. a1 = s8 = 0 (this work).

SCAN0-2 can also be written DSD79-SCAN, DSD69-SCAN as DSD-SCAN-QIDH, and DSD55-SCAN as DSD-SCAN-CIDH. SCAN0-2.[97] SCAN0-QIDH.[97] SCAN0-DH.[97] a2 = 7.9042. a1 = s8 = 0 (this work). As very recently found by NG, the switch from a combinatorially optimized range-separated hybrid GGA (coRSH GGA) in ωB97X-V to a coRSH meta-GGA in ωB97M-V represents a clear improvement over ωB97X-V; we find that WTMAD2 goes down from 3.96 to 3.29 kcal/mol. This latter figure is the lowest WTMAD2 reported thus far for a hybrid functional; ωB97X-V was the previous best contender in the original GMTKN55 paper. Breakdown by components (Table S11) reveals conspicuous accuracy gains for the pericyclic reaction barriers (BHPERI), for bond separation reactions of saturated hydrocarbons (BSR36), and large-system reaction energies more generally. By way of data reduction, we may in fact consider the sums of WTMAD2 contributions for each of the five major subcategories in GMTKN55: thermochemistry, intermolecular interactions, conformers, barrier heights, and reaction energies for large systems. Like NG earlier, we find B97M-V and ωB97M-V to be “best in class” on rungs 3 and 4, respectively, of the Jacob’s Ladder. It then becomes apparent (Table ) that the chief gain for ωB97M-V over ωB97X-V is in fact for thermochemistry and large-system reaction energies. There is a small improvement for barrier heights, but no change for intermolecular interactions and in fact a slight deterioration for conformers; upon further inspection, the latter can be attributed primarily to the triand tetrapeptide conformers (PCONF21). The already excellent statistic for ωB97M-V can be brought down even further to 2.19 kcal/mol with the coRSH mGGA double-hybrid ωB97M(2). This improvement is actually seen for all subcategories across the board. The WTMAD2 value for ωB97M(2) is, by some distance, the lowest reported for any functional thus far; the best three performers from the original GMTKN55 paper,[55] DSD-BLYP-D3(0), DSD-BLYP-D3(BJ), and DSD-PBEP86-D3(BJ), clocked in at 3.00, 3.08, and 3.14 kcal/mol, respectively, while in the followup paper, NG found DSD-PBEP86-NL to be even better at 2.84 kcal/mol. In that same paper, the runners-up were ωB97X-2(TQ)-D3(BJ) and DSD-BLYP-NL at 2.97 and 3.05 kcal/mol, respectively. In the present paper, we found 3.06 for ωB97X-2(TQ)-D3(BJ); detailed analysis revealed that essentially the whole difference can be attributed to the larger def2-QZVPPD basis set used in RG18, which has a disproportionately large weight factor owing to the small average reaction energy in the denominator. (Note that using grids smaller than SG-3 for RG18 causes errors in WTMAD as large as the first decimal place!) ωB97M(2) was parametrized for frozen subvalence orbitals; if correlation from such orbitals were to be included, c2ab = c2ss for this functional would have to be slightly reduced to compensate.[38]Indeed, when we evaluated WTMAD2 with all orbitals correlated and original parametrization, we saw an increase to 2.36 kcal/mol.

Table 2

WTMAD2 Contribution (kcal/mol) for Each of Five Major Subcategories in Cases of B97-D3(BJ), B3LYP-D3(BJ), B97M-rV, ωB97X-V, ωB97M-V, and ωB97M(2) Functionals

	ΔWTMAD2 (kcal/mol)
subcategories	B97-D3(BJ)	B3LYP-D3(BJ)	B97M-rV	ωB97X-V	ωB97M-V	ωΒ97Μ(2)
intermolecular interactions (intermol)	1.238	1.238	0.838	0.578	0.565	0.492
conformers/intramolecular (conformer)	1.542	1.147	1.810	0.729	0.897	0.578
barrier heights (barrier)	1.733	1.141	1.008	0.561	0.454	0.258
thermochemistry (thermo)	1.817	1.314	1.194	1.02	0.73	0.442
large-species reaction energies (REAClarge)	2.278	1.662	1.535	1.07	0.64	0.418
total WTMAD2	8.607	6.503	6.384	3.959	3.286	2.187

Sensitivity to the Percentage of Hartree–Fock Exchange: DSD-SCAN as a Case Study

The costliest parameter to vary in the refit of a DSD functional would be the percentage of Hartree–Fock exchange. As already shown in Figure 1 of ref (37), only minimal changes in performance statistics result from varying the fraction of HF-like exchange cX,HF of a DSD functional within a fairly broad range, but a relatively small training set was sampled there. Presently, we will consider the case of DSD-SCAN-D3(BJ) (where x stands for the percentage of HF exchange) in more detail; for the evaluation points, we have chosen cX,HF = n–1/3, where n = 2, 2.5, 3, 3.5, 4, 5, 6, and 8. (In the “nonempirical” double hybrids of Adamo and co-workers, these choices would correspond to c2ab = c2ss = 1/n, and cC,DFT = 1 – 1/n owing to the putative cubic dependence of the integrand.[101,102]) For the cases of n = 3, 3.5, and 4, this yields values close to percentage points 0.69, 0.66, and 0.63, respectively, and we elected to round off to these latter values. The WTMAD2 for these trial functionals, with orbitals where cC,DFT was fixed at 0.50 during the iterations (but not during the linear optimization), is depicted in Figure .

Figure 1

Dependence on the fraction of HF exchange x/100% of WTMAD2 over the GMTKN55 data set for the dispersion-corrected, spin-component-scaled, double-hybrid DSD-SCAN-D3(BJ), as well as the constrained versions DOD (i.e., c2ss = 0), noDispSD (i.e., s6 = 0), and noDispOD (i.e., s6 = c2ss = 0). As can be seen there, these can be fit excellently (R2 = 0.998 or better) by a simple parabola: for the DSD-SCAN-D3(BJ) curve (black), the minimum is at cX,HF = 0.683 and the quadratic coefficient is 11.26. The latter implies a remarkably weak sensitivity of WTMAD2 to the percentage of HF exchange: it will vary by just 0.01 kcal/mol from 0.65 to 0.71, by 0.02 kcal/mol over a range from 0.64 to 0.725, and by just 0.05 kcal/mol between 0.62 and 0.750. What this implies is that nonlinear optimization for a specific minimum cX,HF becomes a somewhat academic exercise; one can choose any sensible fixed value in those ranges, such as 0.69 or 0.66 (or, if one prefers, 3–1/3 or 3.5–1/3). The DOD-SCAN-D3(BJ) curve (blue), where c2ss = 0 throughout, is only somewhat less flat, with a minimum at x = 0.66. We note that where x < 0.63 (the crossing point between DSD and DOD WTMAD2 curves), an unconstrained DSD fit actually has a negativec2ss; if c2ss were constrained to be non-negative, then DSD would follow the blue line left of the crossing point. If the empirical dispersion correction is eliminated (which we denote here by noDispSD-SCAN), the curve does become a little steeper and the minimum shifts up to x = 0.75. If we in addition constrain c2ss = 0 (the yellow noDispOD-SCAN curve), we pay a relatively modest accuracy premium, but this is specific to the underlying SCAN semilocal functional. We will address this point further below. Of course, a single global performance metric such as WTMAD2 does not tell the whole story. A breakdown by components is given in Tables S2–S10. A number of these subsets are essentially indifferent to the fraction of HF exchange (particularly the noncovalent interaction sets), while others prefer small HF exchange (e.g., DC13 = 13 “difficult cases”), yet others (e.g., SIE = self-interaction error) will prefer large HF exchange by design, and a number of the thermochemistry subsets have clearly defined minima. Following the original GMTKN55 paper and the previous section, we can partition WTMAD2 into the same five primary components. A plot of these as a function of cX,HF is given in Figure .

Figure 2

Dependence on the fraction of HF exchange x/100% of WTMAD2 over the GMTKN55 data set for the dispersion-corrected, spin-component-scaled, double-hybrid DSD-SCAN-D3(BJ), as well as of the five major subdivisions thereof and of an objective function similar to that in ref (37) (“RMS4”). We see there that of these five primary subsets, the two noncovalent interaction sets (intramolecular and intermolecular) display some (opposite) variation, but their average NCI/2 is remarkably insensitive to the percentage of HF exchange. The greatest variation is seen for thermochemistry, where a clear “valley” exists, which is, however, nearly flat between 0.66 and 0.74. For barrier heights, error goes down slowly but monotonically as cX,HF goes up. Finally, for the large reaction energies, the profile stays fairly flat but error starts increasing for the largest values. The combination of these factors creates something of a “Goldilocks zone” between 0.65 and 0.70, or even between 0.62 and 0.74, as we noted earlier; for instance, while 0.69 yields slightly better thermochemistry and barrier heights than 0.66, this is offset by increased errors in the large-system isomerizations. The GMTKN55 minimum is even more shallow than what we previously found[37] in the DSD paper. We do note that, instead of weighted MADs, paper used unweighted RMSDs, which tends to amplify differences; moreover, it focused on a training set with just six subsets. Two of these were transition metal reaction prototypes; the remaining four are identical or similar to the W4–11, BH76, S22, and MB18 subsets from GMTKN55. RMSD4, an average of the latter four’s RMSDs, which should behave quite similarly to the training set used in ref (37), is displayed as the dot-dashed black line in Figure . One indeed sees a much more pronounced variation here, as differences are neither smoothed out by the more robust (in the statistical sense) MAD averaging nor diluted by many noncovalent interaction-driven subsets. We hence conclude that we may avoid the costly nonlinear optimization of cX,HF for the other DSD-XC functionals, and that we may instead either semiarbitrarily fix cX,HF at the same value as the original[37] or choose 0.69 ≈ 3–1/3 as a sensible compromise. The evolution of the parameters in DSD-SCAN as a function of x = cX,HF can be seen in Figure S1 in the Supporting Information. Suffice here to say that, for this particular semilocal functional (SCAN) and the DSD form, dependence of all four parameters cC,DFT, c2ab, c2ss, and s6 is remarkably linear over the 0.5 ≤ x ≤ 0.8 range. This empirical behavior is somewhat at odds with the cubic dependence argued on theoretical grounds;[101,103] we will explore this in more detail in a future paper.

DSD Double Hybrids and Refits

Performance statistics for DSD functionals with original parametrization were already given in the GMTKN55 paper; differences with the values reported there are principally due to slight differences in the basis set and choices of frozen cores. In the present paper, we will use the notation, e.g., DSD-PBEP86-D3(BJ) for the original functional and revDSD-PBEP86-D3(BJ) for the present refit. WTMAD2 statistics can be found in Table . With valence electrons correlated, revDSD-PBE86-D3(BJ) and revDSD-BLYP-D3(BJ) are essentially tied at WTMAD2 = 2.42 and 2.48 kcal/mol, respectively. While this is still higher than ωB97M(2), it should be kept in mind that DSD-BLYP-D3refit and DSD-PBEP86-D3refit only entail six adjustable parameters rather than 16, rendering them less “empirical”. Of these six, cX,HF can be fixed at a semiarbitrary value; furthermore, we found here that the same is true of the damping function turnover point a2 of the D3(BJ) dispersion term. In fact, the surface is so flat in a2 that its inclusion as an optimization parameter leads to poor convergence of WTMAD2; we hence fixed a2 at semiarbitrary value of 5.5 for most functionals, 5.2 for short-ranged correlation functionals like LYP and P86, and 5.75 for the longer-ranged SCAN (Table S16). This leaves us arguably with only four true empirical parameters. It was previously noted, in the original DSD papers, that “upgrading” the underlying semilocal functional from a GGA to a meta-GGA is not necessarily beneficial, with DSD-TPSS-D3refit for instance being among the poorer performers. At first sight, this observation holds true here as well; it is also notable that DSD-PBE-D3refit does noticeably more poorly than DSD-PBEP86-D3refit and DSD-BLYP-D3refit. What the two best performers have in common are fairly “short-ranged” semilocal correlation functionals, which at long-range at least “do no harm” and leave the treatment of dispersion up to the MP2-like correlation and the D3(BJ) correction. In contrast, as noted in ref (104), the PBEc correlation functional exhibits a spurious attraction at long-range, and so does TPSS. The very recent SCAN (strongly constrained and appropriately normed[10]) meta-GGA, in contrast, exhibits much better performance in a DSD context than TPSS, second only to DSD-BLYP and DSD-PBEP86. The great improvement from the original DSD-PBEP86-D3 (WTMAD2 = 3.10 kcal/mol) to revDSD-PBEP86-D3 serves as a cautionary tale against small and idiosyncratic training sets. All four of the main-group training sets for the DSD functional are part of GMTKN55. If we were to instead consider the sum of the four WTMAD2 contributions for W4–11, RG18, S22, and BH76, we would actually find essentially no improvement from DSD-PBEP86-D3 to revDSD-PBEP86-D3; the large-system subsets are what makes the difference. Is there any benefit to be gained from correlating the inner-shell orbitals? (We have noted previously[38] that DSD-BLYP parameters fitted with and without inner-shell correlation are slightly different.) We have only considered this for a subset of double hybrids. (See also Tables S13 and S14 for sample component breakdowns.) As seen in Table , the impact of including inner-shell correlation is quite minor for both DSD and DOD, but for noDisp (see next subsection), where the empirical dispersion correction has been eliminated, including inner-shell correlation systematically improves WTMAD2 by 0.1–0.2 kcal/mol. The popular simple hybrids B3LYP-D3(BJ) and PBE0-D3(BJ) are nearly tied at WTMAD2 = 6.5 kcal/mol. Replacing PBE exchange by Weitao Yang’s revision (revPBE[91]) leads to a significant improvement to WTMAD2 = 5.43 kcal/mol for revPBE0-D3(BJ). (A detailed comparison can be found in Table S15. This functional was not considered in the GMTKN55 paper.) Intriguingly, making a similar substitution in the double hybrid to create DSD-revPBEP86-D3 does not yield improved performance (WTMAD2 = 2.66 kcal/mol, compared to 2.46 for DSD-PBEP86-D3, both without frozen cores). At the end of this section, we can report that and revDSD-PBEP86-D3(BJ) is the best performer, closely followed by revDSD-BLYP-D3.

Eliminating the Semiempirical Dispersion Correction

The presence of the D3(BJ) dispersion correction[31,32] in DSD functionals exposes them to the criticism of “mixing DFT with molecular mechanics”. We have earlier considered the option of eliminating D3(BJ) entirely; in practice, this entails an increased percentage of same-spin correlation as compensation. As can be seen in Table , all the DSD-noD3 (or, if you like, noDispSD) functionals exhibit significantly degraded performance compared to their DSD-D3 siblings. In addition, however, the ordering is upended: noDispSD-SCAN now exhibits significantly better performance than the other options. With core–valence correlation included, noDispSD-SCAN-QIDH reaches WTMAD2 = 2.84 kcal/mol, still significantly better than the original DSD-PBEP86-D3. Why, in a noDispSD-XC functional, does XC = SCAN outperform all other options? At long-range, three scenarios are possible: (a) the functional tapers away quickly (like in BLYP and PBEP86), which at least does no harm but in the absence of a D3(BJ) correction leaves PT2 to handle all the long-range dispersion effects (for which it is inadequate); (b) the functional does not decay quickly but has the wrong behavior, leading to poor performance for noncovalent interactions; (c) the functional can at least partly recover the correct behavior, in which case PT2 may be sufficient to handle the remainder. It appears that (c) is the case for SCAN. As a proxy for behavior at intermediate distance, we may consider the s8 coefficient for the r—8 term in D3(BJ): for MP2, this was found[69] to be large and negative, while for DSD double hybrids and functionals like M06, s8 was found to be zero or statistically insignificantly different from zero. We note in particular that SCAN-D3(BJ), unlike other GGAs and meta-GGA, has a fitted s8 = 0 in the D3(BJ) correction;[87] in the present work, we also fitted a D3(BJ) correction to the SCAN0 hybrid, and obtained a1 = 0, a2 = 7.9042, s8 = 0. Somewhat at odds with this argument, however, is ωB97X-2. Upon reoptimizing the prefactor for DFT correlation as well as c2ab and c2ss, we found that WTMAD2 = 3.06 dropped to 2.80 kcal/mol whether or not the D3(BJ) was included. (The a1 and a2 damping function parameters taken from NG place the correction very far out in space.) This becomes our best performer in the absence of a dispersion correction. Another approach to eliminating the D3(BJ) correction would be to replace it with a nonlocal dispersion functional, such as the Vydrov–Van Voorhis VV10[35] (also known as NL[105]) employed in the B97M-V and other Berkeley functionals. Najibi and Goerigk found[100] that DSD-PBEP86-NL (using a damping parameter b = 14.2 as suggested in ref (106)) yields WTMAD2 = 2.84 kcal/mol. We obtained VV10 corrections for DSD-PBEP86-NL using b = 12.8 (as obtained in ref (71) for the revised[71] S66x8 noncovalent interactions benchmark,[107] near the basis set limit), and then used those to fit a revDSD-PBEP86-NL functional. Its parameters can be found in Table : at WTMAD2 = 2.60 kcal/mol, this is within statistical noise from revDSD-PBEP86-D3(BJ), illustrating the ability of the DSD refitting process to overcome certain deficiencies in the noncovalent interaction description.

Table 3

Final Parameters for Revised DSD-D3(BJ) Functionals and revDSD-PBEP86-NLa

functionals	c_X,HF	c_C,DFT	c_2ab	c_2ss	s₆	a₁	a₂	WTMAD (kcal/mol)
noDispSD-SCAN₆₉	0.69	0.4409	0.6223	0.26	[0]			2.98
DOD-SCAN₆₆-D3(BJ)	0.66	0.5048	0.6283	[0]	0.3152	[0]	5.75	2.67
revDSD-BLYP-D3(BJ)	0.71	0.5313	0.5477	0.1979	0.5451	[0]	5.2	2.48
revDSD-PBEP86-D3(BJ)	0.69	0.4296	0.5785	0.0799	0.4377	[0]	5.5	2.42
revDSD-PBEP86-NL	0.69	0.435	0.5762	0.0622	0.9921	b = 14.2		2.44
revDSD-PBEB95-D3(BJ)	0.66	0.4960	0.4935	0.1009	0.3686	[0]	5.5	2.85
revDSD-PBE-D3(BJ)	0.68	0.4528	0.5845	0.0711	0.5746	[0]	5.5	2.72
revDOD-BLYP-D3(BJ)	0.71	0.5911	0.6216	[0]	0.6145	[0]	5.2	2.67
revDOD-PBEP86-D3(BJ)	0.69	0.4449	0.6055	[0]	0.477	[0]	5.5	2.47
revDOD-PBEP86-NL	0.69	0.4445	0.5994	[0]	1.0629	b = 14.2		2.46
revDOD-PBEB95-D3(BJ)	0.66	0.5225	0.5278	[0]	0.4107	[0]	5.5	2.92
revDOD-PBE-D3(BJ)	0.68	0.4641	0.6134	[0]	0.6067	[0]	5.5	2.73
revωB97X-2(TQZ)	c	0.9518	0.5123	0.4294	[0]			2.80
orig[37]DSD-BLYP-D3(BJ)	0.71	0.54	0.47	0.4	0.57	[0]	5.4	3.34
orig[37]DSD-PBEP86-D3(BJ)	0.69	0.44	0.52	0.22	0.48	[0]	5.6	3.10
orig[37]DSD-PBEB95-D3(BJ)	0.66	0.55	0.46	0.09	0.61	[0]	6.2	3.32
orig[37]DSD-PBE-D3(BJ)	0.68	0.49	0.55	0.13	0.78	[0]	6.1	3.17
DSD-PBEP86-NLb	0.69	0.44	0.52	0.22	[1.0]	b = 14.2		2.60
B2GP-PLYP-D3(BJ)[39,55]	0.65	0.64	0.36	0.36	0.56	0.2597	6.333	3.19

Original parameters, if any, are given for comparison.

Calculated using ORCA 4.1,[115] this work.

Short-range 0.6362, long-range 1.0, range separation parameter ω = 0.3.

Original parameters, if any, are given for comparison. Calculated using ORCA 4.1,[115] this work. Short-range 0.6362, long-range 1.0, range separation parameter ω = 0.3.

DOD Double Hybrids and Refit

If only opposite-spin MP2 correlation is included, then the cost scaling of the post-KS step can be reduced to ∝N4 formally by means of a Laplace transform algorithm.[108−110] In fact, Song and Martínez[111] achieved further reduction to ∝N3 using tensor hypercontraction techniques. In our previous work,[37] we have denoted such functionals DOD, short for Dispersion-corrected, Opposite-spin, Double hybrids. (As was expected and can be seen in Table , the results of trying to eliminate both same-spin correlation and the dispersion correction were dismal for most functionals, with the exception of noDispOD-SCAN74 where WTMAD = 3.3 kcal/mol. Still, this is a performance comparable with ωB97M-V, which does not require evaluation of E2ab.) The tie between revDSD-BLYP and revDSD-PBEP86 is broken in favor of the latter: Inspection reveals that revDSD-PBEP86-D3 functionals have a c2ss coefficient close to 0, which is not the case for revDSD-BLYP. (The latter is plausible when one considers that BLYP does not treat opposite-spin and same-spin correlation on the same footing; in fact, it is easily seen from eq 2 in ref (112) that the BLYP correlation energy for a fully polarized uniform electron gas is 0, which is clearly an unphysical answer.) Hence, we see that revDOD-PBEP86-D3 “pulls ahead of” revDOD-BLYP-D3in the “WTMAD2 race”, at 2.45 vs 2.66 kcal/mol (Table ). revDOD-PBEP86-NL (Table ) is essentially indistinguishable in performance from revDOD-PBEP86-D3(BJ).

Final Recommended D3(BJ) Functionals

In light of the above, we only are retaining three exchange–correlation combinations for the semilocal part: BLYP, PBEP86, and SCAN. For PBEP86, both DSD and DOD combinations are given; while DSD represents only a very small improvement over DOD, it comes at zero additional computational expense when using a code that cannot exploit reduced-scaling algorithms for opposite-spin-only MP2. (The most commonly used such codes are Gaussian 09 and Gaussian 16.) Hence, we have elected to recommend both revDSD-PBEP86-D3 and revDOD-PBEP86-D3. In view of the weak dependence of performance on the percentage of HF exchange, we have elected to retain the original percentages of 71% for DSD-BLYP and 69% for DSD-PBEP86, in order to simplify nonstandard inputs for codes such as Gaussian,[113] ORCA,[114,115] and Q-CHEM. For DSD-SCAN and DOD-SCAN, we have no such incentive; following inspection of the minima of the parabolic fits to WTMAD2, we have chosen 66% for DOD-SCAN and 69% for noDispSD-SCAN. Finally, we have noted above that, as long as the dispersion prefactor s6 is self-consistently optimized with the other parameters, WTMAD2 is only weakly dependent on the chosen turnover parameter a2. With a given fixed a2, DSD-BLYP has the largest s6, followed by DSD-PBEP86; for DSD-SCAN and DOD-SCAN, s6 is much smaller. This reflects that SCAN is better able to handle long-range effects than BLYP or PBEP86. We have hence semiarbitrarily fixed a2 at a “short” value of 5.2 for DSD-BLYP, and at a “longer” values of 5.75 for DSD-SCAN, while for DSD-PBEP86 we chose an intermediate a2 = 5.5. Comparison with full optimizations including a2 revealed that WTMAD2 differences are on the order of a few “small” calories per mole; hence, fixing these parameters is considered justifiable in light of a more smoothly converging optimization for the remaining ones. Sample input files for most major codes are given in the Supporting Information. We wish to point out that, while noDispSD-SCAN is inferior to the revDSD offerings, its WTMAD2 is still superior to the original B2GP-PLYP and DSD double hybrids, and this without any empirical dispersion correction and with just three nonarbitrary parameters.

Considering the D4 Dispersion Correction: Final Recommended D4 Functionals

As the present manuscript was being prepared for publication, a preprint[34] by Grimme and co-workers was posted on ChemRXiv, in which they propose a next-generation D4 dispersion correction (see also ref (33)). The reader is referred to these references for details; for the purpose of our discussion, the most significant difference between D3(BJ) and D4 is that the latter introduces dependence on atomic partial charges, which (by default) are evaluated using the electronegativity equalization principle.[116] (For the general theory, see refs (117) and (118) and references therein.) As a first step, we substituted the D4 correction for D3(BJ) in the original DSD functionals from ref (37) as a “drop-in replacement” using parameters optimized for these functionals and published by Grimme et al.[34] The results can be found in the third numerical column of Table and, for individual GMTKN55 subsets, in Table S13. Across the board, the WTMAD2 values are significantly better than those with the original, in the case of DSD-PBE even superior to the refitted revDSD-PBE-D3(BJ)!

Table 4

Final Parameters for revDSD-D4 Functionals and Comparison of WTMAD2 (kcal/mol) with Original Double Hybrids (D3(BJ)) and the Same with Drop-in Replacement of D3(BJ) by D4 and revD3(BJ)

	WTMAD2 (kcal/mol)				parameters
functionals	D3(BJ)	revD3(BJ)	D4	revD4	c_X,HF	c_C,DFT	c_2ab	c_2ss	s₆	s₈	a₁	a₂
DSD-PBEP86	3.099	2.422	2.649	2.332	0.69	0.4210	0.5922	0.0636	0.5132	0	0.44	3.60
With core corr DSD-PBEP86				2.307	0.69	0.4038	0.5979	0.0571	0.4612	0	0.44	3.60
DSD-PBE	3.170	2.738	2.637	2.461	0.68	0.4403	0.6025	0.0417	0.6706	0	0.4	3.6
DSD-BLYP	3.336	2.484	2.829	2.592	0.71	0.5169	0.5586	0.1972	0.6141	0	0.38	3.52
DSD-SCAN		2.662		2.635	0.66	0.4855	0.6320	0.0131	0.3203	0	0.4	3.6
DSD-PBEB95	3.325	2.845	3.109	2.700	0.66	0.4549	0.5305	0.0547	0.4707	0	0.42	2.93
xDSD-PBEP86				2.318	0.69	0.4155a	0.6023	0.0514	0.4829	0	0.44	3.60
Ditto, s8≠0				2.302	0.69	0.4135	0.6044	0.0476	0.4158	0.1096	0.44	3.60
With core corr xDSD-PBEP86				2.229	0.69	0.3986a	0.6077	0.0502	0.4200	0	0.44	3.60
with core corr xDOD-PBEP86				2.247	0.69	0.4071a	0.6261	0	0.4561	0	0.44	3.60
DOD-PBEP86				2.363	0.69	0.4323	0.6122	0	0.5552	0	0.44	3.60
DOD-PBE				2.470	0.68	0.4470	0.6181	0	0.6992	0	0.4	3.6
DOD-SCAN				2.637	0.66	0.4914	0.6344	0	0.3270	0	0.4	3.6
DOD-PBEB95				2.714	0.66	0.4653	0.5532	0	0.4915	0	0.42	2.93
DOD-BLYP				2.792	0.71	0.5619	0.6346	0	0.7105	0	0.38	3.52

During iterations, cC,DFT = 1.00 as for all xDSD functionals.

During iterations, cC,DFT = 1.00 as for all xDSD functionals. We then proceeded to reoptimize the DSD functionals in the presence of D4 and adjusting the latter’s parameters. It quickly became clear that setting s8 to 0 had negligible impact on the WTMAD2; furthermore, the other parameters settled around a1 = 0.4 and a2 = 3.6, and one could actually choose these “semiarbitrary values” across the board, leaving the same four adjustable parameters cC,DFT, c2ab, c2ss, and s6 as in the revDSD-XC-D3(BJ) cases. revDSD-PBEP86-D4 in particular shines, with WTMAD2 = 2.33 kcal/mol, quite close to the ωB97M(2) functional with its 16 adjustable parameters. But revDSD-SCAN66-D4, revDSD-PBE-D4, and revDSD-PBEB95-D4 all likewise outperform their revDSD-XC-D3(BJ) counterparts, and revDSD-BLYP-D4 marginally bests revDSD-BLYP-D3(BJ). Three of the refitted functionals have c2ss values close to 0; hence, we also performed revDOD-PBE-D4, DOD-SCAN66-D4, and revDOD-PBEB95-D4 fits in case one wants to exploit the reduced-scaling algorithms for opposite-spin-only MP2. For revDOD-BLYP-D4, there is substantial loss in performance, but revDOD-PBEP86-D4, at WTMAD2 = 2.36 kcal/mol, only sacrifices 0.03 kcal/mol compared to its DSD counterpart. Next in performance is revDOD-PBE-D4 at WTMAD2 = 2.47 kcal/mol, then followed by DOD-SCAN-D4 at 2.64 kcal/mol. Inspection of the contributions to WTMAD2 (Table S17) reveals that the improvement from revDSD-PBEP86-D3(BJ) to revDSD-PBEP86-D4 is mostly due to the intermolecular interaction components, and somewhat due to conformers. In DOD-SCAN66-D4, however, intermolecular interactions and conformers are improved about equally. While it is easy to rationalize the improvement in WTMAD2 from revDSD-D3(BJ) to revDSD-D4, it is not so obvious why the WTMAD2 of revDSD-PBEP86-D4 would be lower still than that of revDSD-PBEP86-NL. Detailed comparison reveals (Table S13) that four of the 55 subsets account for nearly all the difference: HAL59 (halogen bonding), HEAVY28 (NCI energies involving heavy p-block elements other than halogens), AMINO20X4 (Relative energies in amino acid conformers), and PNICO23 (Interaction energies in pnicogen-containing dimers). All of these have small average reaction energies and thus large weights in the WTMAD2 formula, eq , which means that small changes in the description of these four subsets have a disproportionately large impact on WTMAD2. (We note that DSD-PBEP86-D4 likewise has a small edge over DSD-PBEP86-NL.) For the remaining 51 subsets, an accuracy gain for NL in the MB16 “mindless benchmark[55,119]” is offset by a comparable loss in conformer energies for melatonin, leaving the two WTMAD2 totals within 0.10 kcal/mol of each other. While it is clear that both D4 and NL represent substantial improvements over D3(BJ), we have insufficient data to decide whether D4 is superior over NL, or conversely. Iron and Janes[120] have very recently examined the performance of hybrid and double-hybrid functionals for their newly developed MOBH35 transition metal barrier heights benchmark as well as for the older MOR41 organometallic reaction energy benchmark.[121] There, the SCAN-based functionals were found to be superior to the others for these applications, even though overall the ωB97M-V functional outperformed all double hybrids except PWPB95.[122] Detailed inspection of the double-hybrid results revealed a number of outliers for the system that exhibits some degree of static correlation; apparently, the PT2 correlation is insufficiently resilient to that. The use of dRPA (direct random phase approximation) as an alternative, as proposed by Mezei et al.,[123,124] will be explored in future work. (The use of perturbation theory higher than second order was considered by Chan and Radom[125] and found to yield essentially no performance benefit.) Results for the ωB97M(2) functional were not given in that paper. For the sake of completeness, we carried out these calculations ourselves using the def2-TZVPP basis set used in ref (120) for the other double hybrids; for MOR41,[121] we obtain MAD = 2.8 and RMSD = 3.5 kcal/mol, while for MOBH35,[120] we obtain MAD = 2.4 and RMSD = 4.0 kcal/mol. Coming back to GMTKN55, can we improve further over revDSD-PBEP86-D4, at WTMAD2 = 2.33 kcal/mol? As above, we found a minor improvement over revDSD-PBEP86-D3(BJ) when the frozen-core approximation was not made, we attempted the same here and found that revDSD-PBEP86-D4(noFC) [where noFC stands for “no frozen cores”] has a slightly lower WTMAD2 = 2.31 kcal/mol. (See Table S17 for details.) Then we attempted one more thing that also is present in ωB97M(2): we evaluated the KS orbitals with full DFT correlation, akin to the XYG3 family of functionals.[40] Previously, we have found[42] for much smaller training sets that (a) typically error metrics go through a minimum when the percentage of HF exchange in the final energy evaluation is at or near that used for determining the orbitals (leading to what we have termed[42] xDSD functionals); (b) the improvement seen from DSD to xDSD was small and its statistical significance uncertain. Hait and Head-Gordon have discussed some downsides of the xDSD and XYG3 type functionals for nonequilibrium geometries.[43] It was argued in ref (122) that the putative success of that approach results not from the effect of the unthrottled DFT correlation on the orbitals but from the smaller orbital gaps appearing in the denominators of the MP2-like terms. In the present work, we have obtained an xrevDSD-PBEP86-D4(noFC) functional fitted to the GMTKN55 data set. Parameters are given in Table : the WTMAD2 obtained, 2.23 kcal/mol, is the lowest of any functional optimized here and is statistically equivalent to the 2.18 kcal/mol of the highly empirical ωB97M(2) functional despite the much smaller number of parameters. Detailed inspection of Table S17 reveals that the improvement over the revDSD-PBEP86-D4 functional mostly comes from just one subset, namely, the radical stabilization energies in RSE43. Turning to the five major categories, ωB97M(2) outperforms xrevDSD-PBEP86-D4(noFC) for thermochemistry and is in turn outperformed for conformer energies, while there is little to choose between them for intermolecular interaction energies, barrier heights, and large-system reactions. Can we eliminate the same-spin correlation and thus enable the reduced-scaling MP2 algorithms (as well as eliminate one more empirical parameter)? As seen in Table , the WTMAD2 for xrevDOD-PBEP86-D4(noFC) is just 0.02 kcal/mol higher at 2.25 kcal/mol, We have, hence, a functional comparable to ωB97M(2) in quality that is amenable to reduced O(N4) or O(N3) MP2 scaling, unlike ωB97M(2) which has c2ab = c2ss. This may be relevant in application to larger systems than considered presently. (The largest species in GMTKN55 has “only” 83 atoms.) However, the difference in WTMAD2 between 2.23 for xrev-DSD-PBE86-D4 and 2.33 for rev-DSD-PBEP86-D4 can be argued to be comparable to the uncertainty in the GMTKN55 reference data and hence should perhaps not be given excess weight.

Harmonic Frequencies as a Sanity Check

We have previously shown[126] that for the HFREQ harmonic frequencies benchmark,[126] the DSD-PBEP86 functional performed exceptionally well (RMSD = 9.8 cm–1 with the def-QZVP basis set and a scale factor of 0.9971), unlike earlier double hybrids. As HFREQ samples a different type of information than GMTKN55, it can act as a sanity check on revDSD-PBEP86: if RMSD[HFREQ] were substantially higher than for the original DSD-PBEP86, then perhaps the improved performance for GMTKN55 would reflect improved physics not as much as the famous quip attributed to John von Neumann that “with four parameters, I can fit an elephant, and with five, I can make him wiggle his trunk”. As one can see in Figure , revDSD-PBEP86-D3(BJ) and especially revDSD-PBEP86-D4 in fact show somewhat improved performance for harmonic frequencies. If we fit scaling factors for harmonic frequencies in the same manner as in ref (126), we obtain values that are only semantically different from 1.0.

Figure 3

Contour plot of RMSD (cm–1) for the HFREQ database for DSD-PBEP86-like forms, as a function of the opposite-spin and same-spin MP2 coefficients c2ab and c2ss, respectively. The square marker indicates the original DSD-PBEP86-D3(BJ) solution, the large round marker revDSD-PBEP86-D3(BJ), and the triangular ones revDSD-PBEP86-D4 on the left and revDOD-PBEP86-D4 on the right. The slanted line only serves to guide the eye. An additional sanity check is afforded by the work of Iron and Janes[120] on transition metal barrier heights and of Efremenko and one of us[127] on a specific organometallic catalysis problem.[128] In both cases, the revDSD-PBEP86-D3(BJ) and revDSD-PBEP86-D4 functionals came considerably closer to the coupled cluster benchmark data than the original DSD-PBEP86-D3(BJ) parametrization, even though (as noted above) they do not offer the clear added value over ωB97M-V for transition metal systems that they do for main-group systems. Still, our confidence in the reparametrization against GMTKN55 is bolstered by its yielding better performance than the original both when sampling a different type of property (energy derivatives) and when sampling energetic properties for a different sector of chemical space. A reviewer inquired about best performers for individual subsets. A sorted, color-shaded table with performance metrics for the five major subsets of WTMAD2, as well as for the total, has been given in the Supporting Information as Table S18, with the top three performers for each subset marked. Said showings are largely a back-and-forth between ωB97M(2) and revDSD-PBEP86-D4. We should point out that revDSD-PBEP86-D3(BJ) entails a small enough loss in performance (much smaller than between ωB97M-V and ωB97M-D3(BJ), for instance) that it remains an attractive option, as it can be carried out with existing codes (including all of Gaussian 16, ORCA, and Q-CHEM) without source code modification and permits analytical first and second derivatives.

Conclusions

Having made an extensive survey of DSD double-hybrid (and some other) functionals with the aid of the GMTKN55 data set, we are in a position to state the following observations: The combinatorially optimized ωB97M-V is “best in class” for fourth-rung exchange–correlation functionals and approaches the performance of double hybrids like B2GP-PLYP-D3(BJ). The combinatorially optimized ωB97M(2) yields the lowest WTMAD2 metric of any functional in existence, making it best in class for double hybrids. In a DSD double-hybrid context, the very recent D4 dispersion correction is clearly superior over D3(BJ), presumably owing to the newly introduced partial charge dependence. The available data suggest that revDSD-PBEP86-D4 is slightly superior to revDSD-PBEP86-NL. While ωB97M(2) has 16 empirical parameters, refitted revDSD-PBEP86-D3(BJ) comes close in performance, while xrevDSD-PBEP86-D4 and xrevDOD-PBEP86-D4 are essentially equivalent in quality to ωB97M(2). Out of their reduced number of parameters, the percentage of HF exchange cX,HF and the damping function parameters a1 anda2 can be fixed at semiarbitrary values (as the WTMAD2 surface is fairly flat in them), leaving just four true optimization parameters for revDSD and three for revDOD. The revDOD option permits the use of reduced-scaling MP2 algorithms, which might prove useful for large systems. For the underlying semilocal functional in double hybrids, any good exchange functional appears to work well, while simple correlation functionals that rapidly “get out of the way” at long distances appear to work best (e.g., P86[129] and LYP[130]). If one wishes to avoid the D3 or D4 corrections, however, DSD-SCAN overall appears to work best. Here, the number of empirical parameters is down to four, one of which (the fraction of HF-like exchange) is semiarbitrary. The best performance is without dispersion corrections. Refitting of the DSD functionals to the GMTKN55 database very substantially improves their accuracy particularly for noncovalent interactions and large-system reactions. This serves as a cautionary tale about the use of small, idiosyncratic training sets for empirical functionals.

86 in total

1. "Mindless" DFT Benchmarking.

Authors: Martin Korth; Stefan Grimme
Journal: J Chem Theory Comput Date: 2009-03-04 Impact factor: 6.006

2. Performance of the van der Waals Density Functional VV10 and (hybrid)GGA Variants for Thermochemistry and Noncovalent Interactions.

Authors: Waldemar Hujo; Stefan Grimme
Journal: J Chem Theory Comput Date: 2011-10-25 Impact factor: 6.006

3. Semiempirical double-hybrid density functional with improved description of long-range correlation.

Authors: Tobias Benighaus; Robert A DiStasio; Rohini C Lochan; Jeng-Da Chai; Martin Head-Gordon
Journal: J Phys Chem A Date: 2008-03-05 Impact factor: 2.781

4. Density functionals with broad applicability in chemistry.

Authors: Yan Zhao; Donald G Truhlar
Journal: Acc Chem Res Date: 2008-01-11 Impact factor: 22.384

5. Optimization and basis-set dependence of a restricted-open-shell form of B2-PLYP double-hybrid density functional theory.

Authors: David C Graham; Ambili S Menon; Lars Goerigk; Stefan Grimme; Leo Radom
Journal: J Phys Chem A Date: 2009-09-10 Impact factor: 2.781

6. Double-hybrid density-functional theory made rigorous.

Authors: Kamal Sharkas; Julien Toulouse; Andreas Savin
Journal: J Chem Phys Date: 2011-02-14 Impact factor: 3.488

7. Frequency and zero-point vibrational energy scale factors for double-hybrid density functionals (and other selected methods): can anharmonic force fields be avoided?

Authors: Manoj K Kesharwani; Brina Brauer; Jan M L Martin
Journal: J Phys Chem A Date: 2014-10-21 Impact factor: 2.781

8. Communication: xDH double hybrid functionals can be qualitatively incorrect for non-equilibrium geometries: Dipole moment inversion and barriers to radical-radical association using XYG3 and XYGJ-OS.

Authors: Diptarka Hait; Martin Head-Gordon
Journal: J Chem Phys Date: 2018-05-07 Impact factor: 3.488

9. Structures and thermochemistry of calcium-containing molecules.

Authors: Naomi L Haworth; Michael B Sullivan; Angela K Wilson; Jan M L Martin; Leo Radom
Journal: J Phys Chem A Date: 2005-10-13 Impact factor: 2.781

10. The correlation consistent composite approach (ccCA): an alternative to the Gaussian-n methods.

Authors: Nathan J DeYonker; Thomas R Cundari; Angela K Wilson
Journal: J Chem Phys Date: 2006-03-21 Impact factor: 3.488

39 in total

1. Halogen Atom Participation in Guiding the Stereochemical Outcomes of Acetal Substitution Reactions.

Authors: Krystyna M Demkiw; Wouter A Remmerswaal; Thomas Hansen; Gijsbert A van der Marel; Jeroen D C Codée; K A Woerpel
Journal: Angew Chem Int Ed Engl Date: 2022-09-16 Impact factor: 16.823

2. Supramolecular cis-"Bis(Chelation)" of [M(CN)₆]^3- (M = Cr^III, Fe^III, Co^III) by Phloroglucinol (H₃PG).

Authors: Katarzyna Jędrzejowska; Jedrzej Kobylarczyk; Dorota Glosz; Emilia Kuzniak-Glanowska; Dominika Tabor; Monika Srebro-Hooper; Jakub J Zakrzewski; Katarzyna Dziedzic-Kocurek; Tadeusz M Muzioł; Robert Podgajny
Journal: Molecules Date: 2022-06-26 Impact factor: 4.927

3. Do Double-Hybrid Functionals Benefit from Regularization in the PT2 Term? Observations from an Extensive Benchmark.

Authors: Golokesh Santra; Jan M L Martin
Journal: J Phys Chem Lett Date: 2022-04-13 Impact factor: 6.475

4. The Atomic Partial Charges Arboretum: Trying to See the Forest for the Trees.

Authors: Minsik Cho; Nitai Sylvetsky; Sarah Eshafi; Golokesh Santra; Irena Efremenko; Jan M L Martin
Journal: Chemphyschem Date: 2020-03-23 Impact factor: 3.102

5. A Quadratic Pair Atomic Resolution of the Identity Based SOS-AO-MP2 Algorithm Using Slater Type Orbitals.

Authors: Arno Förster; Mirko Franchini; Erik van Lenthe; Lucas Visscher
Journal: J Chem Theory Comput Date: 2020-01-24 Impact factor: 6.006

6. Double-Hybrid DFT Functionals for the Condensed Phase: Gaussian and Plane Waves Implementation and Evaluation.

Authors: Frederick Stein; Jürg Hutter; Vladimir V Rybkin
Journal: Molecules Date: 2020-11-06 Impact factor: 4.411

7. The formyloxyl radical: electrophilicity, C-H bond activation and anti-Markovnikov selectivity in the oxidation of aliphatic alkenes.

Authors: Miriam Somekh; Mark A Iron; Alexander M Khenkin; Ronny Neumann
Journal: Chem Sci Date: 2020-10-02 Impact factor: 9.825

8. Performance of Electronic Structure Methods for the Description of Hückel-Möbius Interconversions in Extended π-Systems.

Authors: Tatiana Woller; Ambar Banerjee; Nitai Sylvetsky; Golokesh Santra; Xavier Deraet; Frank De Proft; Jan M L Martin; Mercedes Alonso
Journal: J Phys Chem A Date: 2020-03-13 Impact factor: 2.781

9. An Overview of Self-Consistent Field Calculations Within Finite Basis Sets.

Authors: Susi Lehtola; Frank Blockhuys; Christian Van Alsenoy
Journal: Molecules Date: 2020-03-08 Impact factor: 4.411

10. Rotational Spectroscopy Meets Quantum Chemistry for Analyzing Substituent Effects on Non-Covalent Interactions: The Case of the Trifluoroacetophenone-Water Complex.

Authors: Juncheng Lei; Silvia Alessandrini; Junhua Chen; Yang Zheng; Lorenzo Spada; Qian Gou; Cristina Puzzarini; Vincenzo Barone
Journal: Molecules Date: 2020-10-23 Impact factor: 4.411