| Literature DB >> 21836824 |
Jan Rezáč, Kevin E Riley, Pavel Hobza.
Abstract
With numerous new quantum chemistry methods being developed in recent years and the promise of even more new methods to be developed in the near future, it is clearly critical that highly accurate, well-balanced, reference data for many different atomic and molecular properties be available for the parametrization and validation of these methods. One area of research that is of particular importance in many areas of chemistry, biology, and material science is the study of noncovalent interactions. Because these interactions are often strongly influenced by correlation effects, it is necessary to use computationally expensive high-order wave function methods to describe them accurately. Here, we present a large new database of interaction energies calculated using an accurate CCSD(T)/CBS scheme. Data are presented for 66 molecular complexes, at their reference equilibrium geometries and at 8 points systematically exploring their dissociation curves; in total, the database contains 594 points: 66 at equilibrium geometries, and 528 in dissociation curves. The data set is designed to cover the most common types of noncovalent interactions in biomolecules, while keeping a balanced representation of dispersion and electrostatic contributions. The data set is therefore well suited for testing and development of methods applicable to bioorganic systems. In addition to the benchmark CCSD(T) results, we also provide decompositions of the interaction energies by means of DFT-SAPT calculations. The data set was used to test several correlated QM methods, including those parametrized specifically for noncovalent interactions. Among these, the SCS-MI-CCSD method outperforms all other tested methods, with a root-mean-square error of 0.08 kcal/mol for the S66 data set.Entities:
Year: 2011 PMID: 21836824 PMCID: PMC3152974 DOI: 10.1021/ct2002946
Source DB: PubMed Journal: J Chem Theory Comput ISSN: 1549-9618 Impact factor: 6.006
Monomers Used To Construct the Complexes in the S66 Data Set
| molecule | model for |
|---|---|
| acetic acid | cyclic hydrogen bonds with OH donor, electrostatic interactions |
| acetamide | cyclic hydrogen bonds with NH donor, electrostatic interactions |
| benzene | π–π and X−π interactions – aromatic |
| cyclopentane | aliphatic dispersion – cyclic hydrocarbons |
| ethene | π–π and X−π interactions – nonaromatic |
| ethyne | π–π and X−π interactions of triple bond |
| neopentane | aliphatic dispersion – branched hydrocarbons |
| aliphatic dispersion – linear hydrocarbons | |
| methylamine | hydrogen bonding – NH group |
| methanol | hydrogen bonding – OH group |
| peptide bond model, carbonyl hydrogen bonds | |
| pyridine | π–π and X−π interactions in heterocycles |
| uracil | π–π and X−π interactions, base pairing |
| water | hydrogen bonds and other interactions with water |
Figure 1Distribution of interaction energies in the 66 complexes of the S66 data set.
List of the Benchmark CCSD(T)/CBS Interaction Energies (in kcal/mol), the Dispersion/Electrostatics Ratio from the DFT-SAPT Decomposition, and the Interaction Type (E, Electrostatics-Dominated; D, Dispersion-Dominated; and M, Mixed) Based on It for the S66 Data Set
| hydrogen bonds | Δ | disp/elec | category | |
|---|---|---|---|---|
| 1 | water···water | –4.92 | 0.29 | E |
| 2 | water···MeOH | –5.59 | 0.35 | E |
| 3 | water···MeNH2 | –6.91 | 0.30 | E |
| 4 | water···peptide | –8.10 | 0.37 | E |
| 5 | MeOH···MeOH | –5.76 | 0.40 | E |
| 6 | MeOH···MeNH2 | –7.55 | 0.38 | E |
| 7 | MeOH···peptide | –8.23 | 0.42 | E |
| 8 | MeOH···water | –5.01 | 0.34 | E |
| 9 | MeNH2···MeOH | –3.06 | 0.71 | M |
| 10 | MeNH2···MeNH2 | –4.16 | 0.71 | M |
| 11 | MeNH2···peptide | –5.42 | 0.79 | M |
| 12 | MeNH2···water | –7.27 | 0.33 | E |
| 13 | peptide···MeOH | –6.19 | 0.56 | E |
| 14 | peptide···MeNH2 | –7.45 | 0.50 | E |
| 15 | peptide···peptide | –8.63 | 0.56 | E |
| 16 | peptide···water | –5.12 | 0.42 | E |
| 17 | uracil···uracil (BP) | –17.18 | 0.35 | E |
| 18 | water···pyridine | –6.86 | 0.34 | E |
| 19 | MeOH···pyridine | –7.41 | 0.40 | E |
| 20 | AcOH···AcOH | –19.09 | 0.30 | E |
| 21 | AcNH2···AcNH2 | –16.26 | 0.32 | E |
| 22 | AcOH···uracil | –19.49 | 0.31 | E |
| 23 | AcNH2···uracil | –19.19 | 0.31 | E |
Errors of the Studied Methods with Respect to the Benchmark CCSD(T)/CBS Calculations on the S66 Data Seta
| method | RMSE, kcal/mol | MUE, kcal/mol | AVG, kcal/mol | MAX % |
|---|---|---|---|---|
| MP2/TZ | 0.70 | 0.56 | 0.43 | 29 |
| MP2/aDZ | 0.79 | 0.58 | 0.31 | 32 |
| MP2/CBS | 0.69 | 0.45 | –0.44 | 40 |
| MP2C/CBS | 0.71 | 0.47 | –0.01 | 174 |
| SCS-MP2/CBS | 0.87 | 0.74 | 0.73 | 79 |
| SCS-MI-MP2/CBS | 0.38 | 0.28 | 0.21 | 54 |
| DW-MP2/CBS | 0.40 | 0.27 | 0.09 | 58 |
| MP3/CBS | 0.62 | 0.45 | 0.44 | 64 |
| MP2.5/CBS | 0.16 | 0.12 | 0.00 | 16 |
| CCSD/CBS | 0.70 | 0.62 | 0.62 | 73 |
| SCS-CCSD/CBS | 0.25 | 0.15 | 0.12 | 6 |
| SCS-MI-CCSD/CBS | 0.08 | 0.06 | –0.04 | 6 |
The errors are reported as RMSE, mean unsigned error (MUE), average signed error (AVG), and largest error in the set relative to the interaction energy (MAX).
Figure 2The RMSE (kcal/mol) with respect to the CCSD(T)/CBS benchmark. The symbol next to the bar is the sign of the average error. Plus indicates that the method underestimates the strength of the binding over the whole data set; minus indicates systematic overbinding.
Figure 3Relative errors (%) for the three groups of complexes: hydrogen bonds, dispersion-dominated, and others. The error is calculated as a RMSE relative to average interaction energy in the group so that the errors can be compared between the groups.
Figure 4Relative errors (%) for the three types of dispersion-dominated complexes: π–π, aliphatic–aliphatic, and π–aliphatic interactions. The error is calculated as a RMSE relative to average interaction energy in the group so that the errors can be compared between the groups.
Figure 5Errors of selected methods plotted against the ratio of dispersion to electrostatic term from the DFT-SAPT decomposition.
Figure 6The RMSE (kcal/mol) in the S66 × 8 (dissociation curves of the 66 complexes).