| Literature DB >> 31263102 |
Justin S Smith1,2,3, Benjamin T Nebgen2,4, Roman Zubatyuk2,5, Nicholas Lubbers2,3, Christian Devereux1, Kipton Barros2, Sergei Tretiak6,7, Olexandr Isayev8, Adrian E Roitberg9.
Abstract
Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31263102 PMCID: PMC6602931 DOI: 10.1038/s41467-019-10827-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Accuracy in predicting conformer energy differences on the GDB-10to13 benchmark
| ANI-1ccx | ANI-1ccx-R | ANI-1x | ωB97X | |
|---|---|---|---|---|
| MADa | 1.46 | 1.81 | 1.97 | 1.42 |
| RMSDa | 2.07 | 2.54 | 2.79 | 2.04 |
aUnits are in kcal mol−1
Fig. 1Accuracy in predicting atomization energies. Error of the ANI-1ccx predicted atomization energy E on the GDB-10to13 benchmark relative to CCSD(T)*/CBS and compared against ωB97X
Accuracy for calculating atomic forces on the GDB-10to13 benchmark
| ANI-1ccx | ANI-1x | ωB97X | MP2/DZ | |
|---|---|---|---|---|
| MP2/TZ | 3.4/5.3a | 4.7/7.1a | 3.7/5.9a | 4.6/5.9a |
aMAE/RMSE in kcal mol−1 Å−1
Fig. 2Accuracy in predicting reaction and isomerization energy. ANI-1ccx reaction and isomerization energy difference prediction on the a HC7/11 and b ISOL6 benchmarks, relative to CCSD(T)/CBS. Methods compared are the ANI-1ccx transfer learning potential, ANI-1x trained only on DFT data, the DFT reference (ωB97X), and our coupled-cluster extrapolation scheme CCSD(T)*/CBS. The top panel provides the HC7 reactions numbered 1, 2, 6, and 7 and bottom panel shows the ISOL6 reactions numbered 1–5
Fig. 3Accuracy in predicting torsional energies relevant to drug discovery. Methods compared are QM (red and green), molecular mechanics (blue), and ANI (orange) performance on 45 torsion profiles containing C, H, N, and O atomic elements. The gray dots represent the MAD of a given torsion scan vs. gold standard CCSD(T)/CBS. The box extends from the upper to lower quartile and the black horizontal line in the box is the median. The upper “whisker” extends to the last datum less than the third quartile plus 1.5 times the interquartile range while the lower “whisker” extends to the first datum greater than the first quartile minus 1.5 times the interquartile range
Computational cost and accuracy of our coupled-cluster approximation
| CPU-core hoursa | Mean absolute deviation from CCSD(T)-F12 (kcal mol−1) | |||
|---|---|---|---|---|
| Alanine (13 atoms) | Aspirin (21 atoms) | S66 | W4-11 | |
| CCSD(T)/CBS | 9.13 | 427.00 | 0.03 | 1.31 |
| CCSD(T)*/CBS (this work) | 1.44 | 7.44 | 0.09 | 1.46 |
aAll calculations are performed on an Intel Xeon E5-2630 v3 @ 2.40 GHz CPU
Fig. 4Diagram of the transfer learning technique evaluated in this work. Transfer learning starts from a pretrained ANI-1x DFT model, then retrains to higher accuracy CCSD(T)*/CBS data with some parameters fixed during training