Literature DB >> 28926232

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error.

Felix A Faber1, Luke Hutchison2, Bing Huang1, Justin Gilmer2, Samuel S Schoenholz2, George E Dahl2, Oriol Vinyals3, Steven Kearnes2, Patrick F Riley2, O Anatole von Lilienfeld1.   

Abstract

We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of 13 electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to ∼118k distinct molecules. Molecular structures and properties at the hybrid density functional theory (DFT) level of theory come from the QM9 database [ Ramakrishnan et al. Sci. Data 2014 , 1 , 140022 ] and include enthalpies and free energies of atomization, HOMO/LUMO energies and gap, dipole moment, polarizability, zero point vibrational energy, heat capacity, and the highest fundamental vibrational frequency. Various molecular representations have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR), and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). Out-of sample errors are strongly dependent on the choice of representation and regressor and molecular property. Electronic properties are typically best accounted for by MG and GC, while energetic properties are better described by HDAD and KRR. The specific combinations with the lowest out-of-sample errors in the ∼118k training set size limit are (free) energies and enthalpies of atomization (HDAD/KRR), HOMO/LUMO eigenvalue and gap (MG/GC), dipole moment (MG/GC), static polarizability (MG/GG), zero point vibrational energy (HDAD/KRR), heat capacity at room temperature (HDAD/KRR), and highest fundamental vibrational frequency (BAML/RF). We present numerical evidence that ML model predictions deviate from DFT (B3LYP) less than DFT (B3LYP) deviates from experiment for all properties. Furthermore, out-of-sample prediction errors with respect to hybrid DFT reference are on par with, or close to, chemical accuracy. The results suggest that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data were available.

Entities:  

Year:  2017        PMID: 28926232     DOI: 10.1021/acs.jctc.7b00577

Source DB:  PubMed          Journal:  J Chem Theory Comput        ISSN: 1549-9618            Impact factor:   6.006


  57 in total

Review 1.  Vibrational Spectroscopic Map, Vibrational Spectroscopy, and Intermolecular Interaction.

Authors:  Carlos R Baiz; Bartosz Błasiak; Jens Bredenbeck; Minhaeng Cho; Jun-Ho Choi; Steven A Corcelli; Arend G Dijkstra; Chi-Jui Feng; Sean Garrett-Roe; Nien-Hui Ge; Magnus W D Hanson-Heine; Jonathan D Hirst; Thomas L C Jansen; Kijeong Kwac; Kevin J Kubarych; Casey H Londergan; Hiroaki Maekawa; Mike Reppert; Shinji Saito; Santanu Roy; James L Skinner; Gerhard Stock; John E Straub; Megan C Thielges; Keisuke Tominaga; Andrei Tokmakoff; Hajime Torii; Lu Wang; Lauren J Webb; Martin T Zanni
Journal:  Chem Rev       Date:  2020-06-29       Impact factor: 60.622

Review 2.  QSAR without borders.

Authors:  Eugene N Muratov; Jürgen Bajorath; Robert P Sheridan; Igor V Tetko; Dmitry Filimonov; Vladimir Poroikov; Tudor I Oprea; Igor I Baskin; Alexandre Varnek; Adrian Roitberg; Olexandr Isayev; Stefano Curtarolo; Denis Fourches; Yoram Cohen; Alan Aspuru-Guzik; David A Winkler; Dimitris Agrafiotis; Artem Cherkasov; Alexander Tropsha
Journal:  Chem Soc Rev       Date:  2020-05-01       Impact factor: 54.564

3.  Materials Science in the AI age: high-throughput library generation, machine learning and a pathway from correlations to the underpinning physics.

Authors:  Rama K Vasudevan; Kamal Choudhary; Apurva Mehta; Ryan Smith; Gilad Kusne; Francesca Tavazza; Lukas Vlcek; Maxim Ziatdinov; Sergei V Kalinin; Jason Hattrick-Simpers
Journal:  MRS Commun       Date:  2019       Impact factor: 2.566

4.  Accurate molecular polarizabilities with coupled cluster theory and machine learning.

Authors:  David M Wilkins; Andrea Grisafi; Yang Yang; Ka Un Lao; Robert A DiStasio; Michele Ceriotti
Journal:  Proc Natl Acad Sci U S A       Date:  2019-02-07       Impact factor: 11.205

5.  Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network.

Authors:  Jianing Lu; Cheng Wang; Yingkai Zhang
Journal:  J Chem Theory Comput       Date:  2019-06-12       Impact factor: 6.006

6.  What Does the Machine Learn? Knowledge Representations of Chemical Reactivity.

Authors:  Joshua A Kammeraad; Jack Goetz; Eric A Walker; Ambuj Tewari; Paul M Zimmerman
Journal:  J Chem Inf Model       Date:  2020-03-03       Impact factor: 4.956

7.  Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods.

Authors:  Mojtaba Haghighatlari; Jie Li; Farnaz Heidar-Zadeh; Yuchen Liu; Xingyi Guan; Teresa Head-Gordon
Journal:  Chem       Date:  2020-06-16       Impact factor: 22.804

Review 8.  Ab Initio Machine Learning in Chemical Compound Space.

Authors:  Bing Huang; O Anatole von Lilienfeld
Journal:  Chem Rev       Date:  2021-08-13       Impact factor: 60.622

Review 9.  Towards operando computational modeling in heterogeneous catalysis.

Authors:  Lukáš Grajciar; Christopher J Heard; Anton A Bondarenko; Mikhail V Polynski; Jittima Meeprasert; Evgeny A Pidko; Petr Nachtigall
Journal:  Chem Soc Rev       Date:  2018-11-12       Impact factor: 54.564

Review 10.  Machine learning for molecular and materials science.

Authors:  Keith T Butler; Daniel W Davies; Hugh Cartwright; Olexandr Isayev; Aron Walsh
Journal:  Nature       Date:  2018-07-25       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.