| Literature DB >> 35693890 |
Takehiro Fujita1, Kei Terayama2,3, Masato Sumita3,4, Ryo Tamura3,4,5,6, Yasuyuki Nakamura1, Masanobu Naito1, Koji Tsuda3,5,6.
Abstract
Recently, artificial intelligence (AI)-enabled de novo molecular generators (DNMGs) have automated molecular design based on data-driven or simulation-based property estimates. In some domains like the game of Go where AI surpassed human intelligence, humans are trying to learn from AI about the best strategy of the game. To understand DNMG's strategy of molecule optimization, we propose an algorithm called characteristic functional group monitoring (CFGM). Given a time series of generated molecules, CFGM monitors statistically enriched functional groups in comparison to the training data. In the task of absorption wavelength maximization of pure organic molecules (consisting of H, C, N, and O), we successfully identified a strategic change from diketone and aniline derivatives to quinone derivatives. In addition, CFGM led us to a hypothesis that 1,2-quinone is an unconventional chromophore, which was verified with chemical synthesis. This study shows the possibility that human experts can learn from DNMGs to expand their ability to discover functional molecules.Entities:
Keywords: De novo molecule generation; characteristic functional group monitoring; chromophore; deep learning
Year: 2022 PMID: 35693890 PMCID: PMC9176351 DOI: 10.1080/14686996.2022.2075240
Source DB: PubMed Journal: Sci Technol Adv Mater ISSN: 1468-6996 Impact factor: 7.821
Figure 1.Evolution of molecular properties in the series of generated molecules. (a) absorption wavelength (nm) to S1 excited state, (b) HOMO/LUMO gap (eV) (c) absorption intensity (oscillator strength; OS), (d) molecular weight (g mol −1), (e) conjugate length, (f) number of aromatic rings. Average values of training and generated molecules at each step are depicted by green broken line and blue solid line, respectively. The shaded area depicts the distribution profiles of generated molecules for each property. A thin shade area represents 5%–95% of the total distribution, while a dense shade area represents 15%–75% of the total distribution in each number of generated molecules.
Functional group enrichment analysis for various functional groups and their percentage of generated molecules and training data. Odds ratio is given as PE.
| Functional group | PE | Generated mol. (%) | Training data (%) |
|---|---|---|---|
| 0.649 | 49.7 | 76.6 | |
| 0.375 | 0.847 | 2.26 | |
| 0.537 | 16.2 | 30.1 | |
| 2.37 | 0.878 | 0.371 | |
| 15.5 | 1.05 | 0.668 | |
| 31.5 | 0.682 | 0.0217 | |
| 78.5 | 0.0486 | 0.000619 | |
| 0.938 | 0.0110 | 0.0118 |
Figure 2.Odds ratio evolution of several functional groups shown in Table 1 as a function of the number of generated molecules.
Figure 3.Target molecule inspired by ChemTS. (a) the generated molecule by ChemTS is 1. 2–4 molecules are synthesis models of 1. The absorption wavelength of each molecule is estimated at the APFD/6-31 G* level. Surfaces of HOMO–LUMO orbitals of 4 are drawn at an isodensity value of 0.02. (b) Retro-synthesis of 4.
Figure 4.Synthetic route and UV-vis spectrum of molecule 4. (a) Synthesis process of 4. (b) the black and blue curves correspond to the UV-vis absorption spectra of the solution of 4 and that of 1,2-naphthoquinone in acetonitrile (1 × 10 −5 mol L −1), respectively. The red curve shows the computational absorption spectrum of 4 obtained by TD-DFT calculation at the APFD/6-31 G* level. Photograph of a solution of 4 in acetonitrile (1 × 10 −4 mol L −1) under ambient light is also shown.