| Literature DB >> 27148408 |
Ting Gao1, Hongzhi Li1, Wenze Li1, Lin Li1, Chao Fang1, Hui Li1, LiHong Hu1, Yinghua Lu2, Zhong-Min Su3.
Abstract
BACKGROUND: Non-covalent interactions (NCIs) play critical roles in supramolecular chemistries; however, they are difficult to measure. Currently, reliable computational methods are being pursued to meet this challenge, but the accuracy of calculations based on low levels of theory is not satisfactory and calculations based on high levels of theory are often too costly. Accordingly, to reduce the cost and increase the accuracy of low-level theoretical calculations to describe NCIs, an efficient approach is proposed to correct NCI calculations based on the benchmark databases S22, S66 and X40 (Hobza in Acc Chem Rev 45: 663-672, 2012; Řezáč et al. in J Chem Theory Comput 8:4285, 2012). <br> RESULTS: A novel type of NCI correction is presented for density functional theory (DFT) methods. In this approach, the general regression neural network machine learning method is used to perform the correction for DFT methods on the basis of DFT calculations. Various DFT methods, including M06-2X, B3LYP, B3LYP-D3, PBE, PBE-D3 and ωB97XD, with two small basis sets (i.e., 6-31G* and 6-31+G*) were investigated. Moreover, the conductor-like polarizable continuum model with two types of solvents (i.e., water and pentylamine, which mimics a protein environment with ε = 4.2) were considered in the DFT calculations. With the correction, the root mean square errors of all DFT calculations were improved by at least 70 %. Relative to CCSD(T)/CBS benchmark values (used as experimental NCI values because of its high accuracy), the mean absolute error of the best result was 0.33 kcal/mol, which is comparable to high-level ab initio methods or DFT methods with fairly large basis sets. Notably, this level of accuracy is achieved within a fraction of the time required by other methods. For all of the correction models based on various DFT approaches, the validation parameters according to OECD principles (i.e., the correlation coefficient R (2), the predictive squared correlation coefficient q (2) and [Formula: see text] from cross-validation) were >0.92, which suggests that the correction model has good stability, robustness and predictive power. <br> CONCLUSIONS: The correction can be added following DFT calculations. With the obtained molecular descriptors, the NCIs produced by DFT methods can be improved to achieve high-level accuracy. Moreover, only one parameter is introduced into the correction model, which makes it easily applicable. Overall, this work demonstrates that the correction model may be an alternative to the traditional means of correcting for NCIs.Graphical abstractA machine learning correction model efficiently improved the accuracy of non-covalent interactions(NCIs) calculated by DFT methods. The application of the correction model is easy and flexible, so it may be an alternative correction means for NCIs by first-principle calculations.Entities:
Keywords: Computational accuracy; Density functional theory; Feature selection; Machine learning correction; Non-covalent interactions
Year: 2016 PMID: 27148408 PMCID: PMC4855356 DOI: 10.1186/s13321-016-0133-7
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1The structure of GRNN
The mean values (kcal/mol) of CCSD(T)/CBS benchmark interactions and the number of four NCI-dominated molecular complexes
| Types | Number | Mean |
|---|---|---|
| H-bonded complexes | 29 | −10.33 |
| Dispersion complexes | 30 | −3.94 |
| Mixed complexes | 26 | −3.70 |
| Halogen complexes | 36 | −3.43 |
The validation parameters of DFT and GRNN correction models (RMSE & MAE units: kcal/mol)
| RMSE | MAE | DFT | GRNN | ||||||
|---|---|---|---|---|---|---|---|---|---|
| DFT | GRNN | DFT | GRNN | q2 | R2 | q2 | qcv2 | R2 | |
| M062X/6-31G*(vac) (DFT1) | 1.66 | 0.50 | 1.34 | 0.35 | 0.87 | 0.98 | 0.98 | 0.95 | 0.99 |
| M062X/6-31G*a (DFT2) | 1.79 | 0.52 | 1.13 | 0.37 | 0.85 | 0.93 | 0.98 | 0.92 | 0.99 |
| M062X/6-31G*b (DFT3) |
|
|
| 0.34 | 0.90 | 0.96 | 0.98 | 0.93 | 0.99 |
| M062X/6-31+G*a (DFT4) | 2.59 | 0.54 | 1.45 |
| 0.68 | 0.89 | 0.98 | 0.96 | 0.99 |
| ωB97XD/6-31G*(vac) (DFT5) | 1.77 | 0.47 | 1.55 | 0.35 | 0.85 | 0.98 | 0.98 | 0.96 | 0.99 |
| ωB97XD/6-31G*a (DFT6) | 1.74 | 0.54 | 1.22 | 0.38 | 0.86 | 0.93 | 0.98 | 0.93 | 0.99 |
| ωB97XD/6-31G*b (DFT7) | 1.46 |
| 1.17 | 0.34 | 0.90 | 0.95 | 0.98 | 0.94 | 0.99 |
| B3LYP/6-31G*a (DFT8) | 3.98 | 0.62 | 3.01 | 0.46 | 0.25 | 0.80 | 0.97 | 0.92 | 0.98 |
| B3LYP-D3/6-31G*a (DFT9) | 1.89 | 0.56 | 1.18 | 0.40 | 0.83 | 0.93 | 0.98 | 0.92 | 0.99 |
| PBE/6-31G*a (DFT10) | 3.15 | 0.62 | 2.33 | 0.46 | 0.53 | 0.83 | 0.97 | 0.92 | 0.98 |
| PBE-D3/6-31G*a (DFT11) | 1.96 | 0.59 | 1.33 | 0.45 | 0.82 | 0.91 | 0.97 | 0.92 | 0.98 |
The best results are shown in italics
vacThe calculations are performed in vacuum
aThe solvent is set as water (ε = 78.35)
bThe solvent is set as pentylamine (ε = 4.20), which possess a similar dielectric constant as the protein environment (ε ~4.0)
The mean values of the NCIs in four types of complexes and the RMSEs of DFT calculations and GRNN corrections relative to the CCSD(T)/CBS benchmark NCIs (Unit: kcal/mol)
| Methods | Mean (RMSE/RMSE1c) | |||
|---|---|---|---|---|
| H-bonded | Dispersion | Mixed | Halogen | |
| CCSD(T)/CBS | −10.33 (−/−) | −3.94 (−/−) | −3.70 (−/−) | −3.43 (−/−) |
| M062X/6-31G*(vac) (DFT1) | −12.23 (2.12/0.59) | −4.84 (1.11/0.45) | −4.86 (1.31/0.43) | −4.39 (1.85/0.53) |
| M062X/6-31G*a (DFT2) | −9.23 (3.13/0.54) | −3.84 (1.01/0.47) | − | −3.81 (1.31/0.52) |
| M062X/6-31G*b (DFT3) | −9.96 (2.37/0.37) | − | −4.05 (0.68/0.49) | − |
| M062X/6-31+G*a (DFT4) | −7.24 (4.78/0.76) | −3.22 (1.25/0.45) | −2.80 (1.18/0.43) | −2.91 (1.33/ |
| ωB97XD/6-31G*(vac) (DFT5) | −12.46 (2.20/0.42) | −5.37 (1.52/ | −5.20 (1.58/0.51) | −3.98 (1.72/0.53) |
| ωB97XD/6-31G*a (DFT6) | −9.37 (2.88/0.50) | −4.30 (1.38/0.64) | −4.01 (0.74/0.41) | −3.43 (1.25/0.55) |
| ωB97XD/6-31G*b (DFT7) | − | −4.61 (1.28/0.46) | −4.34 (0.86/ | −3.53 (1.35/0.56) |
| B3LYP/6-31G*a (DFT8) | −6.76 (5.10/0.57) | 0.23 (5.06/0.64) | −0.85 (3.09/0.60) | −2.23 (2.04/0.65) |
| B3LYP-D3/6-31G*a (DFT9) | −8.94 (3.30/0.58) | −3.88 (1.32/ | −3.74 (0.67/0.66) | −4.16 (1.24/0.57) |
| PBE/6-31G*a (DFT10) | −8.27 (3.94/0.59) | −0.68 (4.08/0.55) | −1.72 (2.27/0.79) | −3.22 (1.80/0.56) |
| PBE-D3/6-31G*a (DFT11) | −9.75 (2.99/0.59) | −3.54 (1.76/0.57) | −3.74 (0.70/0.58) | −4.45 (1.66/0.60) |
The best results are shown in italics
vacThe calculations are performed in vacuum
aThe solvent is set as water (ε = 78.35)
bThe solvent is set as pentylamine (ε = 4.20), which possess a similar dielectric constant as the protein environment (ε ~4.0)
cThe RMSE after the GRNN correction
The RMSE (kcal/mol) of benchmark databases by DFT methods with respect to CCSD(T)/CBS benchmark interactions
| Methods | RMSE | |||||
|---|---|---|---|---|---|---|
| S22 (DFT) | S22 (GRNN) | S66 (DFT) | S66 (GRNN) | X40 (DFT) | X40 (GRNN) | |
| M062X/6-31G*(vac) (DFT1) | 1.41 | 0.57 | 1.84 | 0.48 | 1.85 | 0.52 |
| M062X/6-31G*a (DFT2) | 2.55 | 0.58 | 1.67 | 0.48 | 1.31 | 0.52 |
| M062X/6-31G*b (DFT3) | 1.82 | 0.42 |
| 0.45 |
| 0.49 |
| M062X/6-31+G*a (DFT4) | 3.99 | 0.83 | 2.45 | 0.43 | 1.33 |
|
| ωB97XD/6-31G*(vac) (DFT5) | 1.68 | 0.30 | 1.46 | 0.43 | 1.71 | 0.53 |
| ωB97XD/6-31G*a (DFT6) | 2.36 | 0.60 | 1.70 | 0.50 | 1.25 | 0.55 |
| ωB97XD/6-31G*b (DFT7) |
|
| 1.46 |
| 1.35 | 0.56 |
| B3LYP/6-31G*a (DFT8) | 5.97 | 0.47 | 3.90 | 0.63 | 2.04 | 0.65 |
| B3LYP-D3/6-31G*a (DFT9) | 2.78 | 0.41 | 1.79 | 0.59 | 1.24 | 0.57 |
| PBE/6-31G*a (DFT10) | 4.66 | 0.57 | 3.07 | 0.67 | 1.80 | 0.56 |
| PBE-D3/6-31G*a (DFT11) | 2.68 | 0.45 | 1.79 | 0.61 | 1.66 | 0.60 |
The best results are shown in italics
vacThe calculations are performed in vacuum
aThe solvent is set as water (ε = 78.35)
bThe solvent is set as pentylamine (ε = 4.20), which possess a similar dielectric constant as the protein environment (ε ~4.0)
Fig. 2PLS coefficients for all molecular descriptors. The red columns are the selected descriptors from the calculations a M06-2X/6-31G* (water), b M06-2X/6-31G* (pentylamine), c M06-2X/6-31+G* (water) for the GRNN correction model, respectively
Fig. 3The NCI plots and electron density of the frontier molecular orbitals (Carbon: grey, Nitrogen: blue, Oxygen: red, Hydrogen: white, Chlorine: yellow)
Fig. 4NCIs calculated by DFT M06-2X (left) and DFT-GRNN (right) versus benchmark values. The insets are the deviation (calculated-benchmark) distribution relative to the benchmark values in each calculation (training set: red; test set: blue)
Fig. 5NCIs calculated by DFT ωB97XD and DFT-GRNN versus benchmark values. The insets are the deviation distribution relative to the benchmark values in each calculation (training set: red; test set: blue)
Fig. 6NCIs calculated by DFT B3LYP, B3LYP-D3 and DFT-GRNN versus benchmark values. The insets are the deviation distribution relative to the benchmark values in each calculation (training set: red; test set: blue)
Fig. 7NCIs calculated by DFT PBE, PBE-D3 and DFT-GRNN versus benchmark values. The insets are the deviation distribution relative to the benchmark values in each calculation (training set: red; test set: blue)