| Literature DB >> 26146493 |
Pavlo O Dral1, O Anatole von Lilienfeld2, Walter Thiel1.
Abstract
We investigate possible improvements in the accuracy of semiempirical quantum chemistry (SQC) methods through the use of machine learning (ML) models for the parameters. For a given class of compounds, ML techniques require sufficiently large training sets to develop ML models that can be used for adapting SQC parameters to reflect changes in molecular composition and geometry. The ML-SQC approach allows the automatic tuning of SQC parameters for individual molecules, thereby improving the accuracy without deteriorating transferability to molecules with molecular descriptors very different from those in the training set. The performance of this approach is demonstrated for the semiempirical OM2 method using a set of 6095 constitutional isomers C7H10O2, for which accurate ab initio atomization enthalpies are available. The ML-OM2 results show improved average accuracy and a much reduced error range compared with those of standard OM2 results, with mean absolute errors in atomization enthalpies dropping from 6.3 to 1.7 kcal/mol. They are also found to be superior to the results from specific OM2 reparameterizations (rOM2) for the same set of isomers. The ML-SQC approach thus holds promise for fast and reasonably accurate high-throughput screening of materials and molecules.Entities:
Year: 2015 PMID: 26146493 PMCID: PMC4479612 DOI: 10.1021/acs.jctc.5b00141
Source DB: PubMed Journal: J Chem Theory Comput ISSN: 1549-9618 Impact factor: 6.006
Mean Absolute Deviations (MAD) of Parameter Values Optimized in APT Step 1 from the Standard OM2 Values and Mean Absolute Errors (MAEs) in Atomization Enthalpies from ML-OM2//OM2 Calculations at OM2 Geometries for 1000 C7H10O2 Molecules (Drawn at Random) In the Training Set and 5095 C7H10O2 Molecules in the Test Set (Remainder)a
| hydrogen | carbon | oxygen | |||||||
|---|---|---|---|---|---|---|---|---|---|
| MAE, kcal/mol | MAE, kcal/mol | MAE, kcal/mol | |||||||
| parameter | MAD, % | training | test | MAD, % | training | test | MAD, % | training | test |
| One-Center One-Electron Terms | |||||||||
| 1.20 | 0.00 | 2.89 | 0.10 | 0.00 | 2.83 | 4.10 | 0.51 | 3.50 | |
| 0.10 | 0.00 | 2.84 | 0.30 | 0.00 | 2.84 | ||||
| Orbital Exponent | |||||||||
| ζ | 1.10 | 0.00 | 2.85 | 0.40 | 0.00 | 2.82 | 1.20 | 0.00 | 2.88 |
| Resonance Integrals | |||||||||
| βs | 1.20 | 0.00 | 2.82 | 1.50 | 0.00 | 2.87 | 13.40 | 0.00 | 3.09 |
| βp | 0.90 | 0.00 | 2.84 | 2.50 | 0.00 | 3.04 | |||
| βπ | 3.90 | 0.00 | 3.77 | 9.80 | 0.00 | 3.78 | |||
| β | 2.30 | 0.00 | 2.86 | 117.80 | 0.44 | 6.27 | |||
| βp(X–H) | 1.40 | 0.00 | 2.84 | 35.60 | 0.08 | 6.69 | |||
| αs | 2.50 | 0.00 | 2.82 | 1.30 | 0.00 | 2.84 | 9.40 | 0.00 | 2.99 |
| αp | 0.90 | 0.00 | 2.84 | 2.90 | 0.00 | 3.27 | |||
| απ | 2.50 | 0.00 | 3.49 | 6.60 | 0.00 | 3.33 | |||
| αs(X–H) | 4.40 | 0.00 | 2.88 | 203.20 | 1.37 | 6.01 | |||
| αp(X–H) | 4.70 | 0.00 | 2.99 | 47.40 | 0.24 | 6.28 | |||
| Orthogonalization Factors | |||||||||
| 4.20 | 0.00 | 2.82 | 0.70 | 0.00 | 2.82 | 1.60 | 0.00 | 2.84 | |
| 5.40 | 0.00 | 2.86 | 8.60 | 0.00 | 2.84 | 4.70 | 0.00 | 2.86 | |
| 40.10 | 0.64 | 3.57 | 17.00 | 0.00 | 3.04 | 215.50 | 0.18 | 5.52 | |
| 26.30 | 0.00 | 2.80 | 11.90 | 0.00 | 2.84 | 223.30 | 0.11 | 4.22 | |
| Effective Core Potentials | |||||||||
| ζα | 0.40 | 0.00 | 2.88 | 4.80 | 0.00 | 3.12 | |||
| 1.60 | 0.00 | 2.88 | 13.90 | 0.00 | 2.86 | ||||
| βα | 6.50 | 0.00 | 2.86 | 116.00 | 0.00 | 3.08 | |||
| αα | 4.10 | 0.00 | 2.87 | 250.50 | 1.40 | 25.40 | |||
| One-Center Two-Electron Integrals | |||||||||
| 7.40 | 0.46 | 3.49 | 0.30 | 0.00 | 2.83 | 4.50 | 0.00 | 2.85 | |
| 0.70 | 0.00 | 2.83 | 1.60 | 0.00 | 2.84 | ||||
| 1.50 | 0.10 | 3.18 | 1.30 | 0.00 | 2.89 | ||||
| 0.20 | 0.00 | 2.83 | 0.60 | 0.00 | 2.84 | ||||
| 11.80 | 0.02 | 3.13 | 11.40 | 0.02 | 3.06 | ||||
MADs are given in percent; MAEs are given in kcal/mol. Standard OM2 yields a MAE of 6.30 kcal/mol for these molecules.
Mean Absolute Errors (MAEs) in the Predicted Atomization Enthalpies of the Constitutional Isomers of C7H10O2 from OM2 (Ntrain = 0) and ML-OM2//OM2 Calculations at OM2 Geometriesa
| training set | test set | |
|---|---|---|
| 0 | 6.30 | |
| 10 | 0.00 | 6.31 |
| 100 | 0.00 | 5.46 |
| 1000 | 0.00 | 2.88 |
| 2000 | 0.00 | 2.29 |
| 3000 | 0.00 | 1.96 |
| 4000 | 0.00 | 1.81 |
| 5000 | 0.00 | 1.72 |
See the text for details. MAEs are given in kcal/mol for Ntrain molecules in the training sets and for 6095 – Ntrain molecules in the test sets.
Figure 1Mean absolute errors (MAEs) in the predicted atomization enthalpies for the out-of-sample test set of molecules with C7H10O2 stoichiometry for ML-OM2//OM2 and rOM2 (see text). MAEs for the training set are shown only for rOM2 (vanishingly small for ML-OM2//OM2). The MAEs are plotted as a function of the training set size (Ntrain, logarithmic scale). The horizontal line at 1.0 kcal/mol indicates the onset of chemical accuracy.
Mean Absolute Errors (MAEs) in Atomization Enthalpies from OM2 (Ntrain = 0) and rOM2//OM2 Calculations at OM2 Geometries for Ntrain Molecules in the Training Sets and 6095 – Ntrain in the Test Setsa
| training set | test set | |
|---|---|---|
| 0 | 6.30 | |
| 2 | 0.00 | 19.57 |
| 4 | 0.00 | 8.47 |
| 8 | 0.00 | 8.62 |
| 16 | 0.00 | 5.75 |
| 32 | 0.49 | 4.32 |
| 64 | 1.44 | 2.94 |
| 128 | 2.06 | 2.52 |
MAEs are given in kcal/mol.
Figure 2Error histogram for OM2, 5k ML-OM2//OM2, and rOM2//OM2 for a test set of 1095 molecules (see text).
Mean Absolute Errors (MAEs) of Atomization Enthalpies of 100 Molecules Drawn at Random from GDB-17[23]a
| method | MAE, kcal/mol | range of errors, kcal/mol |
|---|---|---|
| OM2 | 10.94 | –39.57 to 9.6 |
| ML-OM2//OM2 | 21.58 | –52.02 to 0.87 |
| rOM2//OM2 | 145.39 | –414.15 to 484.33 |
Results are given for OM2 with default parameters, rOM2//OM2 was reparametrized using 128 C7H10O2 isomers, and the ML-OM2//OM2 model was trained on 5k C7H10O2 isomers. MAEs are given in kcal/mol.