| Literature DB >> 35769203 |
Mojtaba Haghighatlari1, Jie Li1, Xingyi Guan1,2, Oufan Zhang1, Akshaya Das1, Christopher J Stein1,2,3, Farnaz Heidar-Zadeh1,4,5, Meili Liu1,6, Martin Head-Gordon1,2, Luke Bertels1, Hongxia Hao1,2, Itai Leven1,2, Teresa Head-Gordon1,2,7.
Abstract
We report a new deep learning message passing network that takes inspiration from Newton's equations of motion to learn interatomic potentials and forces. With the advantage of directional information from trainable force vectors, and physics-infused operators that are inspired by Newtonian physics, the entire model remains rotationally equivariant, and many-body interactions are inferred by more interpretable physical features. We test NewtonNet on the prediction of several reactive and non-reactive high quality ab initio data sets including single small molecules, a large set of chemically diverse molecules, and methane and hydrogen combustion reactions, achieving state-of-the-art test performance on energies and forces with far greater data and computational efficiency than other deep learning models. This journal is © The Royal Society of Chemistry.Entities:
Year: 2022 PMID: 35769203 PMCID: PMC9189860 DOI: 10.1039/d2dd00008c
Source DB: PubMed Journal: Digit Discov ISSN: 2635-098X
Fig. 1(a) Newton's laws for the force and displacement calculations for atom i with respect to its neighbors. (b) Schematic view of the NewtonNet message passing layer. At each layer four separate components are updated: atomic feature arrays ai, latent force vectors and force and displacement feature vectors ( and ).
The performance of models in terms of mean absolute error (MAE) for the prediction of energies (kcal mol−1) and forces (kcal mol−1 Å−1) of molecules in the MD17 data sets. We report results by averaging over four random splits of the data to define standard deviations. Best results in the standard deviation range are marked in bold
| SchNet | PhysNet | DimeNet | FCHL19 | sGDML | NequIP ( | PaiNN | NewtonNet | ||
|---|---|---|---|---|---|---|---|---|---|
|
| Energy | 0.370 | 0.230 | 0.204 | 0.182 | 0.19 | — |
|
|
| Forces | 1.35 | 0.605 | 0.499 | 0.478 | 0.68 |
| 0.371 |
| |
| Ethanol | Energy | 0.08 | 0.059 | 0.064 |
| 0.07 | — | 0.063 | 0.078 ± 0.010 |
| Forces | 0.39 | 0.160 | 0.230 |
| 0.33 | 0.208 | 0.230 | 0.264 ± 0.032 | |
| Malonaldehyde | Energy | 0.13 | 0.094 | 0.104 |
| 0.10 | — | 0.091 | 0.096 ± 0.013 |
| Forces | 0.66 | 0.319 | 0.383 |
| 0.41 | 0.337 | 0.319 | 0.323 ± 0.019 | |
|
| Energy | 0.16 | 0.142 | 0.122 |
| 0.12 | — |
|
|
| Forces | 0.58 | 0.310 | 0.215 | 0.151 | 0.11 | 0.096 |
|
| |
|
| Energy | 0.20 | 0.126 | 0.134 | 0.114 | 0.12 | — |
|
|
| Forces | 0.85 | 0.337 | 0.374 | 0.221 | 0.28 | 0.238 | 0.209 |
| |
|
| Energy | 0.12 | 0.100 | 0.102 |
| 0.10 | — |
|
|
| Forces | 0.57 | 0.191 | 0.216 | 0.203 | 0.14 | 0.101 | 0.102 |
| |
| Uracil | Energy | 0.14 | 0.108 | 0.115 |
| 0.11 | — |
|
|
| Forces | 0.56 | 0.218 | 0.301 |
| 0.24 | 0.172 | 0.140 | 0.149 ± 0.003 | |
|
| Energy | — | 0.197 | — | — |
| — | — | 0.142 ± 0.003 |
| Forces | — | 0.462 | — | — | 0.409 | — | — |
| |
|
| Energy | — | 0.181 | — | — | 0.153 | — | — |
|
| Forces | — | 0.519 | — | — | 0.491 | — | — |
|
The performance of models in terms of mean absolute error (MAE) for the prediction of energies (kcal mol−1) and forces (kcal mol−1 Å−1) of molecules at CCSD or CCSD(T) accuracy. We randomly select 50 snapshots of the training data as the validation set and average the performance of NewtonNet over four random splits to find standard deviations. Best results in the standard deviation range are marked in bold
| sGDML | NequIP ( | NewtonNet | ||
|---|---|---|---|---|
| Aspirin | Energy | 0.158 | — |
|
| Forces | 0.761 |
|
| |
| Benzene | Energy |
| — |
|
| Forces | 0.039 | 0.018 |
| |
| Ethanol | Energy |
| — |
|
| Forces | 0.350 |
| 0.282 ± 0.032 | |
| Malonaldehyde | Energy | 0.248 | — |
|
| Forces | 0.369 | 0.369 |
| |
| Toluene | Energy | 0.030 | — |
|
| Forces | 0.210 | 0.101 |
|
The performance of NewtonNet model compared with DeepMD on 13 315 randomly sampled in-distribution (ID) hold-out test configurations and 13 315 out-of-distribution (OOD) test configurations provided by the authors on the methane combustion dataset. Errors are reported in terms of mean absolute error (MAE) for energies (kcal per mol per atom) and forces (kcal mol−1 Å−1). We systematically reduce the amount of training data by two orders of magnitude using NewtonNet and compare it to the 578 731 data points used in the original paper by Zeng and co-workers[28]
| Training set size | DeepMD | NewtonNet | NewtonNet | NewtonNet |
|---|---|---|---|---|
| 578 731 | 578 731 | 57 873 | 5787 | |
| Energies (ID) | 0.945 | 0.353 | 0.391 | 0.484 |
| Forces (ID) | — | 1.12 | 1.88 | 2.78 |
| Energies (OOD) | 3.227 | 3.170 | 3.135 | 3.273 |
| Forces (OOD) | 2.77 | 2.75 | 2.93 | 3.76 |
The MAE on the training set reported in ref. 14 was taken as the in-distribution prediction error here.
Fig. 2The learning curve of NewtonNet for the hydrogen combustion data, with MAEs of energy and forces averaged over the 16 independent reactions and with respect to the number of training samples used for each reaction. The dashed lines show the performance of SchNet when trained on all 5k data per sub-reaction.
Ablation study with a focus on the Newtonian components of our model. Numbers show the MAE of energy (kcal mol−1) and force (kcal mol−1 Å−1) predictions for aspirin molecule from MD17
| Energy | Forces | |
|---|---|---|
| No ablation | 0.168 ± 0.019 | 0.348 ± 0.014 |
| Remove sym. message passing | 4.430 ± 2.020 | 4.290 ± 0.360 |
| Remove latent force loss | 0.167 ± 0.014 | 0.359 ± 0.013 |
| Remove both | 0.187 ± 0.022 | 0.427 ± 0.009 |
Hyperparameters for all the reported experiments in the results section
|
|
|
| Learning rate (lr) | lr decay | Cutoff radius [Å] | |
|---|---|---|---|---|---|---|
| MD17 | 1 | 50 | 1 | 1 × 10−3 | 0.7 | 5 |
| MD17/CCSD(T) | 1 | 50 | 1 | 1 × 10−3 | 0.7 | 5 |
| ANI | 1 | 0 | 0 | 1 × 10−4 | 0.7 | 5 |
| Methane combustion | 1 | 5 | 1 | 1 × 10−3 | 0.7 | 5 |
| Methane combustion | 1 | 5 | 1 | 1 × 10−4 | 0.7 | 5 |
| Hydrogen combustion | 1 | 20 | 1 | 5 × 10−4 | 0.7 | 5 |
10% data & 1% data.
100% data.
| training set size | ANI | NewtonNet | NewtonNet |
|---|---|---|---|
| 20,000 000 | 2 000 000 | 1 000 000 | |
| Energies | 1.30 | 0.65 | 0.85 |
The test set performance for the alkane pyrolysis reaction was not reported in ref. 22, so we compared our test set performance with the training set performance in ref. 22.
| training set size | ANI-1X | NewtonNet |
|---|---|---|
| 4 956 005 | 495 600 | |
| Energies | 1.61 | 1.45 |
| Forces | 2.70 | 1.79 |
| training set size | Alkane pyrolysis | NewtonNet | NewtonNet |
|---|---|---|---|
| 35 496 | 28 396 | 10 000 | |
| Forces (train) | 9.68 | 5.69 | 7.58 |
| Forces (test) | — | 6.50 | 8.71 |