Literature DB >> 35064170

Selecting molecules with diverse structures and properties by maximizing submodular functions of descriptors learned with graph neural networks.

Tomohiro Nakamura1,2, Shinsaku Sakaue3,4, Kaito Fujii5,6, Yu Harabuchi7,8,9, Satoshi Maeda10,11,12,2, Satoru Iwata1,11,2.   

Abstract

Selecting diverse molecules from unexplored areas of chemical space is one of the most important tasks for discovering novel molecules and reactions. This paper proposes a new approach for selecting a subset of diverse molecules from a given molecular list by using two existing techniques studied in machine learning and mathematical optimization: graph neural networks (GNNs) for learning vector representation of molecules and a diverse-selection framework called submodular function maximization. Our method, called SubMo-GNN, first trains a GNN with property prediction tasks, and then the trained GNN transforms molecular graphs into molecular vectors, which capture both properties and structures of molecules. Finally, to obtain a subset of diverse molecules, we define a submodular function, which quantifies the diversity of molecular vectors, and find a subset of molecular vectors with a large submodular function value. This can be done efficiently by using the greedy algorithm, and the diversity of selected molecules measured by the submodular function value is mathematically guaranteed to be at least 63% of that of an optimal selection. We also introduce a new evaluation criterion to measure the diversity of selected molecules based on molecular properties. Computational experiments confirm that our SubMo-GNN successfully selects diverse molecules from the QM9 dataset regarding the property-based criterion, while performing comparably to existing methods regarding standard structure-based criteria. We also demonstrate that SubMo-GNN with a GNN trained on the QM9 dataset can select diverse molecules even from other MoleculeNet datasets whose domains are different from the QM9 dataset. The proposed method enables researchers to obtain diverse sets of molecules for discovering new molecules and novel chemical reactions, and the proposed diversity criterion is useful for discussing the diversity of molecular libraries from a new property-based perspective.
© 2022. The Author(s).

Entities:  

Year:  2022        PMID: 35064170      PMCID: PMC8782878          DOI: 10.1038/s41598-022-04967-9

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


  31 in total

1.  Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups.

Authors:  Peter Ertl
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr

2.  ZINC--a free database of commercially available compounds for virtual screening.

Authors:  John J Irwin; Brian K Shoichet
Journal:  J Chem Inf Model       Date:  2005 Jan-Feb       Impact factor: 4.956

Review 3.  Exploring chemical space for drug discovery using the chemical universe database.

Authors:  Jean-Louis Reymond; Mahendra Awale
Journal:  ACS Chem Neurosci       Date:  2012-04-25       Impact factor: 4.418

Review 4.  The art and practice of structure-based drug design: a molecular modeling perspective.

Authors:  R S Bohacek; C McMartin; W C Guida
Journal:  Med Res Rev       Date:  1996-01       Impact factor: 12.944

5.  The chemical space project.

Authors:  Jean-Louis Reymond
Journal:  Acc Chem Res       Date:  2015-02-17       Impact factor: 22.384

Review 6.  Inverse molecular design using machine learning: Generative models for matter engineering.

Authors:  Benjamin Sanchez-Lengeling; Alán Aspuru-Guzik
Journal:  Science       Date:  2018-07-26       Impact factor: 47.728

7.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach.

Authors:  Rafael Gómez-Bombarelli; Jorge Aguilera-Iparraguirre; Timothy D Hirzel; David Duvenaud; Dougal Maclaurin; Martin A Blood-Forsythe; Hyun Sik Chae; Markus Einzinger; Dong-Gwang Ha; Tony Wu; Georgios Markopoulos; Soonok Jeon; Hosuk Kang; Hiroshi Miyazaki; Masaki Numata; Sunghan Kim; Wenliang Huang; Seong Ik Hong; Marc Baldo; Ryan P Adams; Alán Aspuru-Guzik
Journal:  Nat Mater       Date:  2016-08-08       Impact factor: 43.841

8.  Chemical-Space-Based de Novo Design Method To Generate Drug-Like Molecules.

Authors:  Shunichi Takeda; Hiromasa Kaneko; Kimito Funatsu
Journal:  J Chem Inf Model       Date:  2016-09-30       Impact factor: 4.956

9.  Chemical Abstracts Service Chemical Registry System. 10. Registration of substances from pre-1965 indexes of Chemical Abstracts.

Authors:  K A Hamill; R D Nelson; G G Vander Stouw; R E Stobaugh
Journal:  J Chem Inf Comput Sci       Date:  1988-11

10.  PubChem 2019 update: improved access to chemical data.

Authors:  Sunghwan Kim; Jie Chen; Tiejun Cheng; Asta Gindulyte; Jia He; Siqian He; Qingliang Li; Benjamin A Shoemaker; Paul A Thiessen; Bo Yu; Leonid Zaslavsky; Jian Zhang; Evan E Bolton
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  1 in total

1.  Scalable training of graph convolutional neural networks for fast and accurate predictions of HOMO-LUMO gap in molecules.

Authors:  Jong Youl Choi; Pei Zhang; Kshitij Mehta; Andrew Blanchard; Massimiliano Lupo Pasini
Journal:  J Cheminform       Date:  2022-10-17       Impact factor: 8.489

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.