Literature DB >> 29319225

De Novo Design of Bioactive Small Molecules by Artificial Intelligence.

Daniel Merk¹, Lukas Friedrich¹, Francesca Grisoni^1,2, Gisbert Schneider¹.

Abstract

Generative artificial intelligence offers a fresh view on molecular design. We present the first-time prospective application of a deep learning model for designing new druglike compounds with desired activities. For this purpose, we trained a recurrent neural network to capture the constitution of a large set of known bioactive compounds represented as SMILES strings. By transfer learning, this general model was fine-tuned on recognizing retinoid X and peroxisome proliferator-activated receptor agonists. We synthesized five top-ranking compounds designed by the generative model. Four of the compounds revealed nanomolar to low-micromolar receptor modulatory activity in cell-based assays. Apparently, the computational model intrinsically captured relevant chemical and biological knowledge without the need for explicit rules. The results of this study advocate generative artificial intelligence for prospective de novo molecular design, and demonstrate the potential of these methods for future medicinal chemistry.

Entities: CellLine Chemical Disease Gene Species

Keywords: Automation; drug discovery; machine learning; medicinal chemistry; nuclear receptor

Mesh：

Substances：

Year: 2018 PMID： 29319225 PMCID： PMC5838524 DOI： 10.1002/minf.201700153

Source DB: PubMed Journal: Mol Inform ISSN： 1868-1743 Impact factor: 3.353

Computational de novo design aims to generate new chemical entities with desired properties.1 There are several such methodologies, largely differing in the process of chemical structure generation and the scoring methods employed.2,3 Recently, an innovative concept of de novo molecular design has been proposed that relies on generative artificial intelligence (AI). It bears promise as a way of learning from known bioactive compounds and autonomously designs novel compounds with inherited bioactivity and synthesizability (Figure 1).4,5 Importantly, these generative methods are expected to produce chemically correct structures without the need for explicitly including building block libraries or rules for their fusion and chemical transformation. However, until now, generative AI has only been applied to retrospective de novo design by reproducing known bioactive ligands or generating predicted actives. In this first prospective study, we apply generative AI to see if this approach lives up to its promise to deliver actually synthesizable bioactive de novo designs.

Figure 1

Concept of generative artificial intelligence (AI). A model of the training data (e. g., molecular structures) is obtained that can be used to emit new instances (new chemical entities) within the training domain by sampling. The computational approach consisted of two basic steps. First, we developed a generic model that learned the constitution of druglike molecules from a large unfocussed compound set. In a second step, we fine‐tuned this generic model on more specific molecular features from a small target‐focused library of actives. For the generic model, we utilized a recently published deep recurrent neural network (RNN) with long short‐term memory (LSTM) cells,6 which had been trained on SMILES representations of 541,555 bioactive compounds (K D, K i, IC/EC50 values <1 μM) extracted from the ChEMBL227 compound database.5 Then, we fine‐tuned the model by transfer learning to enable the de novo generation of target‐specific ligands. For this purpose, we used 25 fatty acid mimetics8 with known agonistic activity on retinoid X receptors (RXR)9 and/or peroxisome proliferator‐activated receptors (PPAR).10 From the resulting fine‐tuned AI model, we sampled 1000 SMILES strings, applying fragment growing from the minimalist start fragment “−COOH”. The generated set included 93 % valid and 90 % unique SMILES entries, all of which contained a carboxylic acid function by default. None of the computer‐generated chemical structures was identical to compounds from the training sets. Importantly, the newly generated molecules populate the chemical space of the training data, residing within the RXR/PPAR region of the fine‐tuning set (Figure 2). These observations corroborate the ability of the generative AI model to produce novel chemical entities within the training data domain.

Figure 2

Chemical space analysis by multi‐dimensional scaling. Compounds were represented by Morgan substructure fingerprints (radius=0–4 bonds, length=1024 bit), and similarity was defined by the Jaccard‐Tanimoto index. Colored dots represent the training data (light grey), fine‐tuning set (green), known RXR (orange) and PPAR (blue) agonists, sampled molecules (dark grey), and the selected de novo designs 1–5 (red). Compounds 1, 2, 3 and 5 populate the same area as the known RXR and PPAR agonists, while 4 is similar to PPAR agonist but remote from known RXR actives. Following this preliminary analysis, we computationally ranked the de novo designs according to their potential modulatory effects on RXRs and PPARs. For this purpose, we employed a target prediction method (SPiDER),11 and molecular shape and partial charge descriptors to determine the similarity of the designed compounds to known bioactive ligands. The individual screening lists were merged, obtaining a final set of 49 high‐scoring designs (Supplementary Information). For proof‐of‐concept, we selected five compounds (1–5, Scheme 1) from this list for synthesis, taking into account their individual in silico ranks and building block availability. These five chemical entities were not present in the ChEMBL,7 PubChem,12 SureChEMBL,13 Reaxys14 and SciFinder15 databases, indicating their novelty.

Scheme 1

Synthesis of designs 1–5. Reagents & conditions: (a) H2N−C6H4−COOH (7), EDC, 4‐DMAP, THF, reflux, 4 h; (b) C6H5−B(OH)2 (9), Pd(PPh3)4, Cs2CO3, dioxane, 100 °C, 16 h; (c) KOH, MeOH/THF/H2O, μw, 70 °C, 30 min; (d) HO‐C6H3F−B(OH)2 (12), Pd(PPh3)4, Cs2CO3, toluene/EtOH, 100 °C, 20 h; (e) F‐C6H4‐CH2‐Br (15), K2CO3, DMF, μw, 100 °C, 120 min; (f) MeOH, H2SO4cc, reflux, 4 h; (g) C5H9Br (18), K2CO3, DMF, μw, 100 °C, 6 h; (h) HO‐C6H4‐B(OH)2 (20), Pd(PPh3)4, Cs2CO3, toluene/EtOH, 100 °C, 16 h; (i) C6H4Cl‐C6H4‐COOH (24), EDC, 4‐DMAP, CHCl3, relux, 12 h; (j) C6H3Br(OH)2 (27), Pd(PPh3)4, Cs2CO3, dioxane/DMF, reflux, 4 h; (k) malonic acid, pyridine/piperidine, μw, 100 °C, 30 min. Compounds 1–5 were prepared over two to four steps (Scheme 1). Amide coupling of 5‐bromothiophene‐2‐carboxylic acid (6) and methyl 3‐aminobenzoate (7), using EDC/4‐DMAP to 8, followed by Suzuki reaction with benzeneboronic acid (9) to 10 and alkaline ester hydrolysis afforded compound 1. Compound 2 was available from 4‐bromo‐3‐trifluoromethylbenzoic acid (11) and 2‐hydroxy‐5‐fluorobenzeneboronic acid (12), forming 13 in a Suzuki reaction followed by Williamson ether synthesis to 14 with excess 4‐fluorobenzyl bromide (15) and subsequent hydrolysis of the resulting ester. For the preparation of compound 3, 4‐bromosalicylic acid (16) was esterified (17) with methanol and reacted with bromocyclopentane (18) to form ether 19. Suzuki reaction of 19 with 3‐hydroxybenzeneboronic acid (20) to 21 and alkaline ester hydrolysis yielded 3. Compound 4 was obtained from 3‐(4‐aminophenyl)propionic acid (22) by esterification (23), amide coupling with 2‐(4‐chlorophenyl)benzoic acid (24) to 25 using EDC/4‐DMAP, and alkaline ester hydrolysis. Suzuki reaction of 4‐formylphenylboronic acid (26) and 5‐bromoresorcinol (27) to 28 followed by Knoevenagel condensation in Doebner modification with malonic acid afforded compound 5. We then characterized designs 1–5 in hybrid reporter gene assays for their agonistic effects on nuclear receptors RXRα/β/γ and PPARα/γ/δ in HEK293T cells.16 These in vitro tests involved constitutively expressed hybrid receptors composed of the ligand binding domain of the respective human nuclear receptor and the DNA‐binding domain of the nuclear receptor Gal4 from yeast. Gal4 responsive firefly luciferase served as reporter gene, and constitutively expressed Renilla luciferase was used for normalization of transfection efficiency and toxicity control. The in vitro characterization of 1–5 revealed agonistic activity on PPAR and RXR subtypes (Table 1). Four of the compounds were active, and for each receptor studied, we identified at least one agonist. Designs 1 and 2 turned out as dual agonists of RXRs and PPARγ, whereas 3 and 4 each activated two PPAR subtypes but were inactive on RXRs. Only design 5 showed neither RXR nor PPAR transactivation activity. EC50 values of 1–4 ranged between double‐digit nanomolar for RXR agonist 1, despite moderate transactivation efficacy, and double‐digit micromolar for design 4 on PPARδ. Design 2 revealed micromolar potency on RXRs but markedly higher transactivation efficacy than 1. With regard to PPARγ, design 2 showed micromolar agonistic activity with equivalent efficacy as the reference agonist pioglitazone. Design 3 behaved as a micromolar superagonist on PPARγ, with about 2.5‐fold greater transactivation efficacy than pioglitazone. 4 turned out as the least potent design and showed partial agonistic activity on both PPARγ and PPARδ.

Table 1

Compound no.	RXRα	RXRβ	RXRγ	PPARα	PPARγ	PPARδ
1	0.13±0.01	1.1±0.3	0.06±0.02	inactive	2.3±0.2	inactive
2	13.0±0.1	9±2	8.0±0.7	inactive	2.8±0.3	inactive
3	inactive	inactive	inactive	4.0±1.0	10.1±0.3	inactive
4	inactive	inactive	inactive	inactive	9±3	14±2
5	inactive	inactive	inactive	inactive	inactive	inactive
reference agonists^a)	0.033±0.002	0.024±0.004	0.025±0.002	0.006±0.002	0.6±0.1	0.5±0.1

a) Reference agonists, literature data: bexarotene17 for RXRs, GW764718 for PPARα, pioglitazone19 for PPARγ, L165,04119 for PPARδ

In vitro activity of designs 1–5 on RXRs and PPARs (EC50 values ± SEM [μM]; n=2 (when inactive) or 4 (when active) independent experiments in duplicates; inactive, no statistically significant reporter transactivation at a compound concentration of 30 μM). a) Reference agonists, literature data: bexarotene17 for RXRs, GW764718 for PPARα, pioglitazone19 for PPARγ, L165,04119 for PPARδ To exclude unspecific effects, we repeated the in vitro assays in the absence of a hybrid receptor for every active molecule, using a concentration at or above its EC80 value. This time, only the reporter gene and control luciferase, but no hybrid receptor, were transfected. Designs 1–4 caused no observable reporter transactivation without a hybrid receptor, confirming that their activity was actually mediated via RXRs and PPARs, respectively (Supplementary Information). These results experimentally validate the applicability of generative AI to prospective de novo molecule design. The computational approach led to the discovery of new agonists of therapeutically relevant nuclear receptors. The bioactive designs 1–4 possess considerable potency, as well as diverse selectivity profiles on RXRs and PPARs, and may serve as starting points for hit‐to‐lead expansion. All of the selected compounds were easily prepared from commercially available building blocks, suggesting that their chemical synthesizability was intrinsically learned by the computer model. The results also suggest that a proper choice of compound libraries for model fine‐tuning by transfer learning enables application‐tailored AI support for de novo design. This particular concept might even be suitable for concerted multi‐target drug design. By providing rapid knowledge‐driven access to innovative small molecules, generative AI bears potential for medicinal chemistry and chemical biology.

Conflict of Interest

G. S. declares a potential financial conflict of interest in his role as life‐science industry consultant and cofounder of inSili.com GmbH, Zurich. As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors. Supplementary Click here for additional data file.

16 in total

Review 1. The PPARs: from orphan receptors to drug discovery.

Authors: T M Willson; P J Brown; D D Sternbach; B R Henke
Journal: J Med Chem Date: 2000-02-24 Impact factor: 7.446

Review 2. De novo drug design.

Authors: Markus Hartenfeller; Gisbert Schneider
Journal: Methods Mol Biol Date: 2011

Review 3. Computer-based de novo design of drug-like molecules.

Authors: Gisbert Schneider; Uli Fechner
Journal: Nat Rev Drug Discov Date: 2005-08 Impact factor: 84.694

4. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus.

Authors: Daniel Reker; Tiago Rodrigues; Petra Schneider; Gisbert Schneider
Journal: Proc Natl Acad Sci U S A Date: 2014-03-03 Impact factor: 11.205

Review 5. International Union of Pharmacology. LXI. Peroxisome proliferator-activated receptors.

Authors: Liliane Michalik; Johan Auwerx; Joel P Berger; V Krishna Chatterjee; Christopher K Glass; Frank J Gonzalez; Paul A Grimaldi; Takashi Kadowaki; Mitchell A Lazar; Stephen O'Rahilly; Colin N A Palmer; Jorge Plutzky; Janardan K Reddy; Bruce M Spiegelman; Bart Staels; Walter Wahli
Journal: Pharmacol Rev Date: 2006-12 Impact factor: 25.468

Review 6. Opportunities and Challenges for Fatty Acid Mimetics in Drug Discovery.

Authors: Ewgenij Proschak; Pascal Heitel; Lena Kalinowsky; Daniel Merk
Journal: J Med Chem Date: 2017-03-27 Impact factor: 7.446

7. Synthesis and structure-activity relationships of novel retinoid X receptor-selective retinoids.

Authors: M F Boehm; L Zhang; B A Badea; S K White; D E Mais; E Berger; C M Suto; M E Goldman; R A Heyman
Journal: J Med Chem Date: 1994-09-02 Impact factor: 7.446

Review 8. International Union of Pharmacology. LXIII. Retinoid X receptors.

Authors: Pierre Germain; Pierre Chambon; Gregor Eichele; Ronald M Evans; Mitchell A Lazar; Mark Leid; Angel R De Lera; Reuben Lotan; David J Mangelsdorf; Hinrich Gronemeyer
Journal: Pharmacol Rev Date: 2006-12 Impact factor: 25.468

9. Molecular de-novo design through deep reinforcement learning.

Authors: Marcus Olivecrona; Thomas Blaschke; Ola Engkvist; Hongming Chen
Journal: J Cheminform Date: 2017-09-04 Impact factor: 5.514

10. SureChEMBL: a large-scale, chemically annotated patent document database.

Authors: George Papadatos; Mark Davies; Nathan Dedman; Jon Chambers; Anna Gaulton; James Siddle; Richard Koks; Sean A Irvine; Joe Pettersson; Nicko Goncharoff; Anne Hersey; John P Overington
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971

29 in total

Review 1. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling.

Authors: Linlin Zhao; Heather L Ciallella; Lauren M Aleksunes; Hao Zhu
Journal: Drug Discov Today Date: 2020-07-11 Impact factor: 7.851

2. Artificial intelligence (AI) in medicine as a strategic valuable tool.

Authors: Andreas Larentzakis; Nik Lygeros
Journal: Pan Afr Med J Date: 2021-02-17

3. Understanding the Research Landscape of Deep Learning in Biomedical Science: Scientometric Analysis.

Authors: Seojin Nam; Donghun Kim; Woojin Jung; Yongjun Zhu
Journal: J Med Internet Res Date: 2022-04-22 Impact factor: 7.076

4. Scaffold-Hopping from Synthetic Drugs by Holistic Molecular Representation.

Authors: Francesca Grisoni; Daniel Merk; Ryan Byrne; Gisbert Schneider
Journal: Sci Rep Date: 2018-11-07 Impact factor: 4.379

Review 5. Deep Learning in Drug Discovery and Medicine; Scratching the Surface.

Authors: Dibyendu Dana; Satishkumar V Gadhiya; Luce G St Surin; David Li; Farha Naaz; Quaisar Ali; Latha Paka; Michael A Yamin; Mahesh Narayan; Itzhak D Goldberg; Prakash Narayan
Journal: Molecules Date: 2018-09-18 Impact factor: 4.411

Review 6. In silico Strategies to Support Fragment-to-Lead Optimization in Drug Discovery.

Authors: Lauro Ribeiro de Souza Neto; José Teófilo Moreira-Filho; Bruno Junior Neves; Rocío Lucía Beatriz Riveros Maidana; Ana Carolina Ramos Guimarães; Nicholas Furnham; Carolina Horta Andrade; Floriano Paes Silva
Journal: Front Chem Date: 2020-02-18 Impact factor: 5.221