Literature DB >> 29960361

Constant size descriptors for accurate machine learning models of molecular properties.

Christopher R Collins1, Geoffrey J Gordon2, O Anatole von Lilienfeld3, David J Yaron1.   

Abstract

Two different classes of molecular representations for use in machine learning of thermodynamic and electronic properties are studied. The representations are evaluated by monitoring the performance of linear and kernel ridge regression models on well-studied data sets of small organic molecules. One class of representations studied here counts the occurrence of bonding patterns in the molecule. These require only the connectivity of atoms in the molecule as may be obtained from a line diagram or a SMILES string. The second class utilizes the three-dimensional structure of the molecule. These include the Coulomb matrix and Bag of Bonds, which list the inter-atomic distances present in the molecule, and Encoded Bonds, which encode such lists into a feature vector whose length is independent of molecular size. Encoded Bonds' features introduced here have the advantage of leading to models that may be trained on smaller molecules and then used successfully on larger molecules. A wide range of feature sets are constructed by selecting, at each rank, either a graph or geometry-based feature. Here, rank refers to the number of atoms involved in the feature, e.g., atom counts are rank 1, while Encoded Bonds are rank 2. For atomization energies in the QM7 data set, the best graph-based feature set gives a mean absolute error of 3.4 kcal/mol. Inclusion of 3D geometry substantially enhances the performance, with Encoded Bonds giving 2.4 kcal/mol, when used alone, and 1.19 kcal/mol, when combined with graph features.

Year:  2018        PMID: 29960361     DOI: 10.1063/1.5020441

Source DB:  PubMed          Journal:  J Chem Phys        ISSN: 0021-9606            Impact factor:   3.488


  10 in total

1.  Materials Science in the AI age: high-throughput library generation, machine learning and a pathway from correlations to the underpinning physics.

Authors:  Rama K Vasudevan; Kamal Choudhary; Apurva Mehta; Ryan Smith; Gilad Kusne; Francesca Tavazza; Lukas Vlcek; Maxim Ziatdinov; Sergei V Kalinin; Jason Hattrick-Simpers
Journal:  MRS Commun       Date:  2019       Impact factor: 2.566

2.  The role of molecular modelling and simulation in the discovery and deployment of metal-organic frameworks for gas storage and separation.

Authors:  Arni Sturluson; Melanie T Huynh; Alec R Kaija; Caleb Laird; Sunghyun Yoon; Feier Hou; Zhenxing Feng; Christopher E Wilmer; Yamil J Colón; Yongchul G Chung; Daniel W Siderius; Cory M Simon
Journal:  Mol Simul       Date:  2019       Impact factor: 2.178

3.  Probabilistic metabolite annotation using retention time prediction and meta-learned projections.

Authors:  Constantino A García; Alberto Gil-de-la-Fuente; Coral Barbas; Abraham Otero
Journal:  J Cheminform       Date:  2022-06-07       Impact factor: 8.489

4.  Learning To Predict Reaction Conditions: Relationships between Solvent, Molecular Structure, and Catalyst.

Authors:  Eric Walker; Joshua Kammeraad; Jonathan Goetz; Michael T Robo; Ambuj Tewari; Paul M Zimmerman
Journal:  J Chem Inf Model       Date:  2019-08-19       Impact factor: 4.956

Review 5.  Ab Initio Machine Learning in Chemical Compound Space.

Authors:  Bing Huang; O Anatole von Lilienfeld
Journal:  Chem Rev       Date:  2021-08-13       Impact factor: 60.622

6.  Representation of molecular structures with persistent homology for machine learning applications in chemistry.

Authors:  Jacob Townsend; Cassie Putman Micucci; John H Hymel; Vasileios Maroulas; Konstantinos D Vogiatzis
Journal:  Nat Commun       Date:  2020-06-26       Impact factor: 14.919

7.  The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules.

Authors:  Justin S Smith; Roman Zubatyuk; Benjamin Nebgen; Nicholas Lubbers; Kipton Barros; Adrian E Roitberg; Olexandr Isayev; Sergei Tretiak
Journal:  Sci Data       Date:  2020-05-01       Impact factor: 6.444

8.  Dataset's chemical diversity limits the generalizability of machine learning predictions.

Authors:  Marta Glavatskikh; Jules Leguy; Gilles Hunault; Thomas Cauchy; Benoit Da Mota
Journal:  J Cheminform       Date:  2019-11-12       Impact factor: 5.514

9.  Low-cost prediction of molecular and transition state partition functions via machine learning.

Authors:  Evan Komp; Stéphanie Valleau
Journal:  Chem Sci       Date:  2022-06-14       Impact factor: 9.969

10.  A quantitative uncertainty metric controls error in neural network-driven chemical discovery.

Authors:  Jon Paul Janet; Chenru Duan; Tzuhsiung Yang; Aditya Nandy; Heather J Kulik
Journal:  Chem Sci       Date:  2019-07-11       Impact factor: 9.825

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.