Literature DB >> 27802646

Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity.

Bing Huang1, O Anatole von Lilienfeld1.   

Abstract

The predictive accuracy of Machine Learning (ML) models of molecular properties depends on the choice of the molecular representation. Inspired by the postulates of quantum mechanics, we introduce a hierarchy of representations which meet uniqueness and target similarity criteria. To systematically control target similarity, we simply rely on interatomic many body expansions, as implemented in universal force-fields, including Bonding, Angular (BA), and higher order terms. Addition of higher order contributions systematically increases similarity to the true potential energy and predictive accuracy of the resulting ML models. We report numerical evidence for the performance of BAML models trained on molecular properties pre-calculated at electron-correlated and density functional theory level of theory for thousands of small organic molecules. Properties studied include enthalpies and free energies of atomization, heat capacity, zero-point vibrational energies, dipole-moment, polarizability, HOMO/LUMO energies and gap, ionization potential, electron affinity, and electronic excitations. After training, BAML predicts energies or electronic properties of out-of-sample molecules with unprecedented accuracy and speed.

Entities:  

Year:  2016        PMID: 27802646     DOI: 10.1063/1.4964627

Source DB:  PubMed          Journal:  J Chem Phys        ISSN: 0021-9606            Impact factor:   3.488


  26 in total

Review 1.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors:  Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal:  Chem Rev       Date:  2020-06-10       Impact factor: 60.622

Review 2.  QSAR without borders.

Authors:  Eugene N Muratov; Jürgen Bajorath; Robert P Sheridan; Igor V Tetko; Dmitry Filimonov; Vladimir Poroikov; Tudor I Oprea; Igor I Baskin; Alexandre Varnek; Adrian Roitberg; Olexandr Isayev; Stefano Curtarolo; Denis Fourches; Yoram Cohen; Alan Aspuru-Guzik; David A Winkler; Dimitris Agrafiotis; Artem Cherkasov; Alexander Tropsha
Journal:  Chem Soc Rev       Date:  2020-05-01       Impact factor: 54.564

3.  Machine Learning for Electronically Excited States of Molecules.

Authors:  Julia Westermayr; Philipp Marquetand
Journal:  Chem Rev       Date:  2020-11-19       Impact factor: 60.622

4.  Gaussian Process Regression for Materials and Molecules.

Authors:  Volker L Deringer; Albert P Bartók; Noam Bernstein; David M Wilkins; Michele Ceriotti; Gábor Csányi
Journal:  Chem Rev       Date:  2021-08-16       Impact factor: 60.622

Review 5.  A review of mathematical representations of biomolecular data.

Authors:  Duc Duy Nguyen; Zixuan Cang; Guo-Wei Wei
Journal:  Phys Chem Chem Phys       Date:  2020-02-26       Impact factor: 3.676

6.  What Does the Machine Learn? Knowledge Representations of Chemical Reactivity.

Authors:  Joshua A Kammeraad; Jack Goetz; Eric A Walker; Ambuj Tewari; Paul M Zimmerman
Journal:  J Chem Inf Model       Date:  2020-03-03       Impact factor: 4.956

Review 7.  Ab Initio Machine Learning in Chemical Compound Space.

Authors:  Bing Huang; O Anatole von Lilienfeld
Journal:  Chem Rev       Date:  2021-08-13       Impact factor: 60.622

8.  Machine Learning Force Fields.

Authors:  Oliver T Unke; Stefan Chmiela; Huziel E Sauceda; Michael Gastegger; Igor Poltavsky; Kristof T Schütt; Alexandre Tkatchenko; Klaus-Robert Müller
Journal:  Chem Rev       Date:  2021-03-11       Impact factor: 60.622

9.  ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules.

Authors:  Justin S Smith; Olexandr Isayev; Adrian E Roitberg
Journal:  Sci Data       Date:  2017-12-19       Impact factor: 6.444

10.  Learning More, with Less.

Authors:  Benjamín Sánchez-Lengeling; Alán Aspuru-Guzik
Journal:  ACS Cent Sci       Date:  2017-04-18       Impact factor: 14.553

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.