| Literature DB >> 36068257 |
Lvwei Wang1, Rong Bai1, Xiaoxuan Shi1, Wei Zhang2, Yinuo Cui1, Xiaoman Wang1, Cheng Wang1, Haoyu Chang1, Yingsheng Zhang1, Jielong Zhou1, Wei Peng2, Wenbiao Zhou3, Bo Huang4.
Abstract
We report for the first time the use of experimental electron density (ED) as training data for the generation of drug-like three-dimensional molecules based on the structure of a target protein pocket. Similar to a structural biologist building molecules based on their ED, our model functions with two main components: a generative adversarial network (GAN) to generate the ligand ED in the input pocket and an ED interpretation module for molecule generation. The model was tested on three targets: a kinase (hematopoietic progenitor kinase 1), protease (SARS-CoV-2 main protease), and nuclear receptor (vitamin D receptor), and evaluated with a reference dataset composed of over 8000 compounds that have their activities reported in the literature. The evaluation considered the chemical validity, chemical space distribution-based diversity, and similarity with reference active compounds concerning the molecular structure and pocket-binding mode. Our model can generate molecules with similar structures to classical active compounds and novel compounds sharing similar binding modes with active compounds, making it a promising tool for library generation supporting high-throughput virtual screening. The ligand ED generated can also be used to support fragment-based drug design. Our model is available as an online service to academic users via https://edmg.stonewise.cn/#/create .Entities:
Mesh:
Substances:
Year: 2022 PMID: 36068257 PMCID: PMC9448726 DOI: 10.1038/s41598-022-19363-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Model architecture. (a) The GAN for generating filler ED based on pocket ED. (b) ED interpretation module for molecule generation. VQ-VAE2 and PixelCNN used for latent space construction and autoregressive sampling, as well as the subsequent process of ED fragment substitution are illustrated. (c) The evaluation framework for generated molecules.
Figure 2ED-based 3D molecule generation for HPK1. (a) Binding pocket and reference ligand (PDB code 7KAC). Experimental ED (2Fo-Fc map at 1.2 σ contour level) for the pocket and ligand is shown as blue mesh. (b) Pocket ED with ligand removed. (c) Generated filler ED. For the rainbow color scheme, red indicates a strong ED intensity, and blue indicates a weak ED intensity. The extension of generated ED to the region originally occupied by water molecules and cavities originally unoccupied is indicated by red arrows. (d) Reconstructed ED generated based on filler ED. (e) Hinge binding fragments fitted in the reconstructed ED; map skeletons shown as white lines. (f) Examples of generated molecules and their map skeletons aligned with reconstructed ED. (g) List of examples of generated molecules.
Evaluation of molecular validity.
10,000 molecules each were generated by our model and the benchmark model for each target; aBM refers to a model[27] reported in NeurIPS 2021 and used as the benchmark here; bReference compounds (Supporting information Table S1); Regarding the color scheme: all the values of reference compounds have their cells colored yellow. For generated molecules, if their values are better than or equal to that of the reference, then their cells are colored green; otherwise, their cells are colored red. Specifically, for QED, higher values are better; for SAS, lower values are better.
Figure 3Chemical space distribution of generated molecules and references. A 120 × 120 SOM was created using SMU-RUL[28] compounds. The color indicates the number of molecules on a logarithmic scale. The distributions of molecules from PubChem are listed in panel a. The distributions of reference active compounds and molecules generated using different models for HPK1, 3CLpro, and VDR are listed in panel (b–d), respectively. The number of molecules generated by our model is adjusted to match the number that can be generated by the benchmark model within a reasonable time frame. The SMILES of the molecules and their positions in the chemical space are provided in Supplementary material 3.
Figure 4Examples of generated molecules that are similar to reference compounds for HPK1.
Test for the generation of molecules similar to classical active compounds.
| Metrics | HPK1 | 3CLpro | VDR | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Active | Medium | Not active | Active | Medium | Not active | Active | Medium | Not active | |
| # of reference compounds | 3847 | 2319 | 168 | 222 | 248 | 631 | 329 | 67 | 361 |
| # of molecules generated | 1 million | 1 million | 1 million | ||||||
| Max. of Tanimoto similarity to ref. cpd | 0.76 | 0.72 | 0.51 | 0.58 | 0.73 | 0.61 | 0.57 | 0.44 | 0.55 |
| # of reference molecule with similara counterparts generated | 58 | 53 | 1 | 3 | 15 | 11 | 1 | 0 | 7 |
| # of generated molecules similar to ref. cpd | 30 | 20 | 2 | 14 | 158 | 42 | 1 | 0 | 10 |
aSimilar: for a reference compound, if a generated molecule has over 0.5 Tanimoto similarity with it for ECFP4, then this reference compound is considered to have similar counterparts generated. The SMILES of the generated molecules and their similar reference compounds are provided in Supplementary material 2.
Figure 5Binding mode analysis of the generated molecules for HPK1. (a) Glide Score distribution for active reference compounds and generated molecules. (b) Results of t-SNE clustering using IGM calculated NCIs as features. Red circles are used to indicate our generated molecules with novel cyclic skeletons[33] (CSK) with respect to reference compounds. A generated molecule is defined as having novel CSK if the highest Tanimoto similarity between its CSK and that of all the reference compounds is less than 0.5. (c) Binding mode of selected molecules generated by our model. (d) Binding mode of selected molecules generated by benchmark model. For panel c and d, NCI regions are indicated with dots colored using the rainbow scheme, in which blue indicates weak interactions and red indicates strong interactions. More details are provided in Supplementary material 4, 5, and 6.