| Literature DB >> 33430985 |
Youngchun Kwon1,2, Jiho Yoo1, Youn-Suk Choi1, Won-Joon Son1, Dongseon Lee1, Seokho Kang3.
Abstract
With the advancements in deep learning, deep generative models combined with graph neural networks have been successfully employed for data-driven molecular graph generation. Early methods based on the non-autoregressive approach have been effective in generating molecular graphs quickly and efficiently but have suffered from low performance. In this paper, we present an improved learning method involving a graph variational autoencoder for efficient molecular graph generation in a non-autoregressive manner. We introduce three additional learning objectives and incorporate them into the training of the model: approximate graph matching, reinforcement learning, and auxiliary property prediction. We demonstrate the effectiveness of the proposed method by evaluating it for molecular graph generation tasks using QM9 and ZINC datasets. The model generates molecular graphs with high chemical validity and diversity compared with existing non-autoregressive methods. It can also conditionally generate molecular graphs satisfying various target conditions.Entities:
Keywords: Deep learning; Graph neural network; Molecular graph; Variational autoencoder
Year: 2019 PMID: 33430985 PMCID: PMC6873411 DOI: 10.1186/s13321-019-0396-x
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Model architecture used in case study
Performance comparison for unconditional molecular graph generation
| Dataset | Metric | GraphVAE [ | MolGAN [ | Proposed | ||
|---|---|---|---|---|---|---|
| A1 | A2 | Full | ||||
| QM9 | 0.610 | 0.550 | 0.965 | 0.945 | ||
| Uniqueness | 0.104 | 0.293 | 0.275 | 0.343 | ||
| 0.850 | 0.612 | 0.737 | 0.806 | |||
| G-mean | 0.596 | 0.458 | 0.462 | 0.581 | ||
| ZINC | 0.140 | 0.017 | 0.008 | 0.919 | ||
| Uniqueness | 0.316 | 0.201 | 0.614 | 0.762 | ||
| G-mean | 0.354 | 0.151 | 0.197 | 0.828 | ||
Best score for each metric is given in italic
Fig. 2Examples of newly generated molecular graphs from QM9
Fig. 3Examples of newly generated molecular graphs from ZINC
Conditional molecular graph generation with proposed model
| Dataset | Target condition | G-mean | Unique count | MolWt | LogP |
|---|---|---|---|---|---|
| QM9 | Training set | – | 100,000 | 122.97 ± 7.61 | 0.14 ± 1.16 |
| Unconditional generation | 0.639 | 3243 | 123.01 ± 8.04 | − 0.06 ± 1.36 | |
| MolWt = 120 | 0.583 | 2316 | 121.85 ± 5.11 | 0.02 ± 1.36 | |
| MolWt = 125 | 0.543 | 1947 | 125.11 ± 4.56 | − 0.27 ± 1.22 | |
| MolWt = 130 | 0.482 | 1475 | 128.98 ± 4.27 | − 0.41 ± 1.33 | |
| LogP = − 0.4 | 0.576 | 2399 | 122.97 ± 8.26 | − 0.40 ± 0.73 | |
| LogP = 0.2 | 0.543 | 2099 | 122.53 ± 8.17 | 0.19 ± 0.75 | |
| LogP = 0.8 | 0.537 | 1989 | 122.17 ± 8.09 | 0.83 ± 0.72 | |
| ZINC | Training set | – | 100,000 | 357.94 ± 65.48 | 2.62 ± 1.36 |
| Unconditional generation | 0.888 | 7000 | 366.44 ± 51.63 | 2.49 ± 1.43 | |
| MolWt = 300 | 0.742 | 4090 | 313.12 ± 13.72 | 1.91 ± 1.50 | |
| MolWt = 350 | 0.796 | 5045 | 356.22 ± 12.66 | 2.24 ± 1.36 | |
| MolWt = 400 | 0.805 | 5212 | 400.95 ± 13.66 | 2.78 ± 1.30 | |
| LogP = 1.5 | 0.865 | 6470 | 352.33 ± 46.78 | 1.66 ± 0.94 | |
| LogP = 2.5 | 0.860 | 6356 | 366.64 ± 48.30 | 2.46 ± 0.92 | |
| LogP = 3.5 | 0.827 | 5658 | 381.96 ± 48.46 | 3.22 ± 0.85 |
Results of GuacaMol distribution-learning benchmarks
| Metric | SMILES-based | Graph-based | ||||
|---|---|---|---|---|---|---|
| LSTM | VAE | AAE | ORGAN | GraphMCTS | Proposed | |
| 0.959 | 0.870 | 0.822 | 0.379 | 1.000 | 0.830 | |
| 1.000 | 0.999 | 1.000 | 0.841 | 1.000 | 0.944 | |
| 0.912 | 0.974 | 0.998 | 0.687 | 0.994 | 1.000 | |
| 0.991 | 0.982 | 0.886 | 0.267 | 0.522 | 0.554 | |
| 0.913 | 0.863 | 0.529 | 0.000 | 0.015 | 0.016 | |