| Literature DB >> 35126153 |
Banghua Wu1, Linjie Li2, Yue Cui2, Kai Zheng2.
Abstract
Molecular generation is an important but challenging task in drug design, as it requires optimization of chemical compound structures as well as many complex properties. Most of the existing methods use deep learning models to generate molecular representations. However, these methods are faced with the problems of generation validity and semantic information of labels. Considering these challenges, we propose a cross-adversarial learning method for molecular generation, CRAG for short, which integrates both the facticity of VAE-based methods and the diversity of GAN-based methods to further exploit the complex properties of Molecules. To be specific, an adversarially regularized encoder-decoder is used to transform molecules from simplified molecular input linear entry specification (SMILES) into discrete variables. Then, the discrete variables are trained to predict property and generate adversarial samples through projected gradient descent with corresponding labels. Our CRAG is trained using an adversarial pattern. Extensive experiments on two widely used benchmarks have demonstrated the effectiveness of our proposed method on a wide spectrum of metrics. We also utilize a novel metric named Novel/Sample to measure the overall generation effectiveness of models. Therefore, CRAG is promising for AI-based molecular design in various chemical applications.Entities:
Keywords: adversarial learning; adversarially regularized autoencoder; generative adversarial network; molecular generation; projected gradient descent
Year: 2022 PMID: 35126153 PMCID: PMC8815768 DOI: 10.3389/fphar.2021.827606
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
FIGURE 1Current work on molecular generation for drug representation.
Mainstream methods of molecular generation.
| Method | RNN | VAE | GAN | Flow | One-shot | Sequential |
| DeepGMG |
| — | — | — | — |
|
| MolecularRNN |
| — | — | — | — |
|
| GraphVAE | — |
| — | — |
| — |
| JT-VAE | — |
| — | — | — |
|
| MolGAN | — | — |
| — |
| — |
| GCPN | — | — |
| — | — |
|
| ARAE | — |
|
|
| — | |
| GraphNVP | — | — | — |
|
| — |
| MoFlow | — | — | — |
|
| — |
|
| — |
|
| — |
| — |
FIGURE 2Illustration of the overall architecture. CRAG consists of an adversarially regularized encoder-decoder block, a property predictor block, and a projected gradient descent block. In particular, p (z|x) and q (x|z) denote the probabilistic encoder and the probabilistic decoder. The property prediction network f is used to predict the property of molecules. Projected gradient descent block generates adversarial samples and labels . The discriminator is only used in the training process to enforce regularization.
FIGURE 3Details of the projected gradient descent block. Projected gradient descent continues to add small perturbations δ to the real sample z until it successfully interferes with their label categories, thereby generating adversarial sample with property information.
FIGURE 4Convergence of the four evaluation metrics with the ZINC dataset.
FIGURE 5Visualization of the latent space for molecular generation by a given molecule. The red circled molecule is the given molecule.
Performance of benchmark models and our CRAG model on the QM9 and the ZINC datasets. Baseline results are taken from (Cao and Kipf, 2018), and Baseline results are based on the QM9 dataset.
| Method | Validity (A) | Uniqueness (B) | Novelty (C) | Novel/Sample (AxBxC) |
| ChemicalVAE | 0.103 | 0.675 | 0.900 | 0.063 |
| GrammarVAE | 0.602 | 0.093 | 0.809 | 0.045 |
| GraphVAE | 0.557 | 0.670 | 0.616 | 0.261 |
| GraphVAE/imp | 0.562 | 0.52 | 0.758 | 0.179 |
| GraphVAE NoGM | 0.810 | 0.241 | 0.610 | 0.129 |
| MolGAN |
| 0.104 | 0.942 | 0.096 |
| ARAE | 0.862 | 0.935 | 0.371 | 0.299 |
|
| 0.872 |
|
|
|
|
| 0.976 |
|
|
|
Bold values indicate the best performance w.r.t. the corresponding metric.
Performance of CRAG on Conditional Molecular Generation on the ZINC dataset, where the three conditions of logP, SAS, and TPSA are simultaneously controlled.
| Condition | Validity (A) | Uniqueness (B) | Novelty (C) |
| (1.5, 2.0, 30) | 0.913 | 0.937 | 0.999 |
| (1.5, 2.0, 100) | 0.834 | 0.992 | 0.999 |
| (1.5, 5.0, 30) | 0.892 | 0.995 | 1.000 |
| (1.5, 5.0, 100) | 0.603 | 1.000 | 1.000 |
| (4.5, 2.0, 30) | 0.894 | 0.998 | 1.000 |
| (4.5, 2.0, 100) | 0.826 | 1.000 | 1.000 |
| (4.5, 5.0, 30) | 0.579 | 1.000 | 1.000 |
| (4.5, 5.0, 100) | 0.273 | 1.000 | 1.000 |
FIGURE 6Chemical property optimization. Given the left-most molecule, we optimize the molecule in the direction of maximizing its QED property.