Literature DB >> 30276245

Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies.

Masato Sumita1,2, Xiufeng Yang1,3, Shinsuke Ishihara2, Ryo Tamura2,3,4, Koji Tsuda1,3,4.   

Abstract

This work presents a proof-of-concept study in artificial-intelligence-assisted (AI-assisted) chemistry where a machine-learning-based molecule generator is coupled with density functional theory (DFT) calculations, synthesis, and measurement. Although deep-learning-based molecule generators have shown promise, it is unclear to what extent they can be useful in real-world materials development. To assess the reliability of AI-assisted chemistry, we prepared a platform using a molecule generator and a DFT simulator, and attempted to generate novel photofunctional molecules whose lowest excited states lie at desired energetic levels. A 10 day run on the 12-core server discovered 86 potential photofunctional molecules around target lowest excitation levels, designated as 200, 300, 400, 500, and 600 nm. Among the molecules discovered, six were synthesized, and five were confirmed to reproduce DFT predictions in ultraviolet visible absorption measurements. This result shows the potential of AI-assisted chemistry to discover ready-to-synthesize novel molecules with modest computational resources.

Entities:  

Year:  2018        PMID: 30276245      PMCID: PMC6161049          DOI: 10.1021/acscentsci.8b00213

Source DB:  PubMed          Journal:  ACS Cent Sci        ISSN: 2374-7943            Impact factor:   14.553


Introduction

The idea of using artificial intelligence (AI) for molecule design has existed for a long time but has never been fully realized. Although earlier attempts employing relatively simple methods such as heuristic enumeration[1] and genetic algorithms[2] yielded some success, these methods rely on arbitrary chemical rules. Stimulated by recent breakthroughs in deep learning,[3] a new generation of de novo molecule design algorithms emerged and showed remarkable ability to generate functional molecules without chemical rules.[4−8] Similar machine-learning algorithms have been used in designing inorganic materials as well.[9−12]Figure illustrates our AI-assisted chemistry platform to develop new molecules. It generates a large number of molecules using the loop of a machine-learning-based molecule generator and a quantum chemical package such as GAUSSIAN,[13] GAMESS,[14] or NWChem.[15] It has been shown repeatedly that these methods can generate simulator-qualified molecules, i.e., molecules that are predicted to have the desired properties by a simulator. To what extent this can be useful to real-world materials development remains, however, largely unknown.
Figure 1

Our AI-assisted chemistry platform for discovering new functional molecules. The Android robot is reproduced or modified from work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.

Our AI-assisted chemistry platform for discovering new functional molecules. The Android robot is reproduced or modified from work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License. In this work, we conducted a proof-of-concept study to evaluate whether or not an AI-assisted chemistry platform can discover synthesizable, functional molecules in a reasonable computational time. As a testbed, we chose photofunctional organic molecules, which have received particular attention in green chemistry and molecular sensing. In photofunctional molecules, light induces transition between electronic states. Controlling the level of excited states of the molecules from their ground states is a common issue for organic electronics (like organic light-emitting diodes,[16,17] organic photovoltaic cells[18,19]), photofunctional sensors,[20] and UV filters.[21] Our platform, consisting of ChemTS (a molecule generator)[4] and a calculator (B3LYP/3-21G*) based on density functional theory (DFT),[22] was configured to generate molecules whose first excited state is at five different wavelengths. A 10 day (240 h) run of our machine-learning algorithm on a 12-core (Intel Xeon E5-2689v3 CPU) server created a variety of molecules whose DFT-based wavelength was approximately at the desired value. Among them, six molecules were synthesized, and five of them were experimentally confirmed to have the desired wavelength, using ultraviolet visible (UV–vis) spectroscopy. This result shows that the molecules generated by an AI-assisted platform have a high chance of being synthesizable and functional. As exemplified by AlphaGO,[23] an interesting aspect of AI is that it often finds unconventional ways to solve a problem. Our origin-of-excitation analysis of the synthesized molecules showed that our platform preferred n−π* excitation over π–π* excitation, conventionally used to control the wavelength.[24,25] This illustrates AI-chemistry’s ability to not only accelerate discovery, but also shed light on hidden paths of possible research.

Results and Discussion

Our platform was configured to find molecules whose first excited states lie at 200, 300, 400, 500, and 600 nm (6.2–2.1 eV). The recurrent neural network in ChemTS was trained a priori with 13 000 molecules. For each target wavelength, our platform ran for 2 days (48 h). The total numbers of molecules generated are summarized in Table (the molecules included in ChemTS’s training set are not counted) as SMILES strings.[26] Out of about 3200 molecules, 86 were found to be within ±20 nm of the desired wavelength through DFT calculation (Table ). The six molecules marked with roman numerals (I–VI) were selected as synthesizable molecules according to the following criteria: (1) At least one synthetic route is reported in SciFinder.[27] (2) Oscillator strength obtained with time-dependent DFT (TD-DFT) is strong enough to allow the transition from the ground state to the first excited state.
Table 1

Number of Molecules at Different Qualification Levels for Each Target Wavelength

 Target wavelength
 200 nm300 nm400 nm500 nm600 nm
Generateda646757629607638
Simulator-qualifiedb342613121
Synthesizedc22110
Functionalc12110

The first row indicates the number of molecules generated by ChemTS.

The second row shows the number of simulator-qualified molecules whose absorption wavelength is predicted by DFT to be within 20 nm error from the target.

The third and fourth rows denote the number of synthesized molecules, and those experimentally confirmed by UV–vis measurement, respectively.

Table 2

Simulator-Qualified Moleculesa as SMILES Strings Found by Our AI-Assisted Chemistry Platform

The synthesized molecules are shown with their chemical structural formula. Randomly sampled 24 molecules’ origins of excitation are shown in parentheses with excitation wavelengths.

The first row indicates the number of molecules generated by ChemTS. The second row shows the number of simulator-qualified molecules whose absorption wavelength is predicted by DFT to be within 20 nm error from the target. The third and fourth rows denote the number of synthesized molecules, and those experimentally confirmed by UV–vis measurement, respectively. The synthesized molecules are shown with their chemical structural formula. Randomly sampled 24 molecules’ origins of excitation are shown in parentheses with excitation wavelengths.

UV–Vis Spectra Measurement

Figure shows the results of UV–vis spectra measurement of I–VI, together with computational spectra at the B3LYP/3-21G* level. Except for II, the first peak (be it a shoulder or an edge of the peak) in each experimental spectrum lies close to the target wavelength. Note that solvatochromic effects in I–VI were small (see the Supporting Information).
Figure 2

Experimental UV–vis absorption spectra and computational spectra at the B3LYP/3-21G* level of the compounds I–VI. The computational spectra are smoothed by a Gaussian function and arbitrarily scaled for comparison with the experimental spectra. The red dashed line in each spectrum indicates the target wavelength.

Experimental UV–vis absorption spectra and computational spectra at the B3LYP/3-21G* level of the compounds I–VI. The computational spectra are smoothed by a Gaussian function and arbitrarily scaled for comparison with the experimental spectra. The red dashed line in each spectrum indicates the target wavelength. We investigated the reason why molecule II failed to reproduce the DFT prediction. The broad peak around 350 nm is most likely caused by decomposition, as we observed trace impurity signals in the 1H NMR spectrum taken after several weeks after synthesis (see the Supporting Information). Another possible cause is keto–enol tautomerization. According to the 1H NMR measurement, the keto-form exists as a major peak (Figure S1 in the Supporting Information). The keto-form is more stable than II (enol-form) by 71.72 kJ mol–1 at the B3LYP/3-21G* level (Table S1 in the Supporting Information). Although the spectrum is definitely affected by keto–enol tautomerization, it does not seem to cause the absorption around 350 nm, since the computational spectrum of the keto-form also failed to reproduce the peak (Figure S17 in the Supporting Information). For molecule VI, we observed an unpredicted large peak from 500 to 300 nm. The 1H NMR spectrum of VI indicates that a tautomer in enol-form exists (Figure S14 in the Supporting Information). Each tautomer can have syn/anti conformers. As shown in Table S2 of the Supporting Information, the four isomers syn-keto, syn-enol, anti-keto, and anti-enol have small energetic differences, and can hence coexist. Among these isomers, only molecule VI (i.e., anti-/syn-keto) has a peak around 500 nm in its computational spectrum (Figure S18 in the Supporting Information), indicating that the edge at 500 nm is indeed due to molecule VI. These observations strongly suggest that the coexistence of four isomers of VI results in the large peak.

Origin of Excitation

Kohn–Sham orbitals involved in the first excited state of I–VI are summarized in Figure . A conventional means to control absorption wavelength focuses on a π–π* transition: the length of a π-system is altered to change the energy difference between π and π* orbitals.[24,25] Our AI-assisted platform seems to have taken a different approach: 10 out of 24 molecules (randomly sampled from 86 molecules in Table ) show an n−π* transition (41%), and 5 out of 24 molecules show a π–π* transition (21%). For molecules I, III, V, and VI, the first excited state corresponds to an n−π* transition. Only molecule IV is associated with a π–π* transition. Interestingly, the failed molecule II is based on a π–σ* transition.
Figure 3

Main Kohn–Sham orbitals involved in the first excited states of I–VI at the B3LYP/3-21G* level. HOMO and LUMO denote the highest occupied molecular orbital and the lowest unoccupied molecular orbital, respectively. λ and f denote the computational absorption wavelength and oscillator strength, respectively.

Main Kohn–Sham orbitals involved in the first excited states of I–VI at the B3LYP/3-21G* level. HOMO and LUMO denote the highest occupied molecular orbital and the lowest unoccupied molecular orbital, respectively. λ and f denote the computational absorption wavelength and oscillator strength, respectively. The lowest excitation energy of molecule I is exceptionally high (207.84 nm). Typically, n−π* transitions have lower excitation energy than π–π*, because an ordinary nonbonding orbital lies between π and π* orbitals in energy. For example, the absorption bands of the n−π* transition of azobenzene derivatives appear around 400–600 nm in UV–vis spectra.[24,28] It is likely that σ orbital mixing stabilized the nonbonding orbital of nitrogen to lie lower in energy than a π orbital. From the shape of orbitals in Figure , the transitions on molecules III, V, and VI indicate charge transfer. Under charge transfer, TD-DFT with conventional hybrid functionals often underestimates the excitation energy due to self-interaction error.[29] Fortunately, in the present instance, the error caused by charge transfer was limited, but it might become an issue in other types of molecule design problems. Molecule II is the only one with a π–σ* excitation. Since π–σ* excitations in aromatic molecules with XH (X = N, O, S) are reported as repulsive along the X–H coordinate,[30] we could predict that molecule II is extremely unstable to light, as was subsequently verified by the detection of decomposed products in the 1H NMR spectrum (Figure S3 in the Supporting Information).

Conclusion

In this work we built a proof-of-concept study for an AI-chemistry platform, which was able to find five synthesizable and stable organic molecules possessing target properties within 10 days: a remarkable and encouraging result. Additionally, our platform exhibited the counterintuitive and intriguing tendency to use n−π* excitations. Since our platform depends on DFT calculation, it inherits its drawbacks: our analysis of failed cases, including tautomerization, isomers, and instability, shows the type of issues that future AI-chemistry platforms will have to overcome. In the near future, such platforms may be used in various molecule discovery projects, with the potential to change the landscape of chemistry research.

Methods

Molecule Generator

We used the ChemTS library[4] for searching for novel molecules with desired absorption wavelength. It generates molecules by using Monte Carlo Tree Search (MCTS)[31] and recurrent neural network (RNN).[32,33]Figure describes details of our workflow. Before the start, the RNN is trained with a set of SMILES strings. In our case, 13 000 molecules that contain only H, O, N, and C elements obtained from the PubChemQC database[34,35] were used.
Figure 4

Workflow of a molecule generator (ChemTS[4]) coupled with electronic structure theory (DFT). Molecules in SMILES string generated by ChemTS (MCTS+RNN) are converted to those in a three-dimensional structure with RDKit.[36] Then, the computation for each molecule is performed with electronic structure theory to obtain the value of λ (absorption wavelength). Reward (r) is calculated by eq . Rewards of molecules whose wavelengths are not available because of the failure in DFT calculation are set to −1. The calculated reward (r) reflects MCTS as back-propagation.

Workflow of a molecule generator (ChemTS[4]) coupled with electronic structure theory (DFT). Molecules in SMILES string generated by ChemTS (MCTS+RNN) are converted to those in a three-dimensional structure with RDKit.[36] Then, the computation for each molecule is performed with electronic structure theory to obtain the value of λ (absorption wavelength). Reward (r) is calculated by eq . Rewards of molecules whose wavelengths are not available because of the failure in DFT calculation are set to −1. The calculated reward (r) reflects MCTS as back-propagation. ChemTS generates one SMILES string at a time. In a normal round, the SMILES string is converted to a three-dimensional chemical structure by RDKit; the absorption wavelength (λ) is computed by DFT, and the reward (r) is calculated by the following equationwhere λ* indicates the target wavelength. Parameter α is set to 0.01 in this work. Using the reward information, ChemTS updates its parameters to generate better molecule in the next rounds. When the procedure fails because of an invalid SMILES string or DFT failure, r = −1 is fed back to ChemTS. Note that the following SMILES symbols are used: {C, [C@@H], (, N, ), O, =, 1, /, c, n, [nH], [C@H], 2, [NH], [C], [CH], [N], [C@@], [C@], o, [O], 3, #, [O−], [n+], [N+], [CH2], [n]}.

Electronic Structure Theory

Relative to machine-learning algorithms, computation with electronic structure theory is very computationally costly. Therefore, we adopted density functional theory (DFT) with a well-known hybrid functional, B3LYP, taking into account the balance between reliability and computational costs. In addition, a 3-21G* basis set was used to explore molecules efficiently in the chemical space. In the present work, we evaluated valence excited states of molecules, avoiding haphazard use of diffuse functions to exclude Reydberg states. To evaluate the excitation energy, we adopted time-dependent DFT (TD-DFT) for the molecule generator at the aforementioned level. The lowest 20 states of each molecule were calculated after geometry optimization. All DFT calculations were performed with the Gaussian 16 package.[13]

UV–Vis Spectra Measurement

Electronic absorption spectra were measured using a Shimadzu UV-3600 UV–vis–NIR spectrophotometer at 20 °C. A quartz cell with 1 cm optical length was used. Spectroscopic grate solvents were purchased from Tokyo Chemical Industry (TCI) and Wako Pure Chemical Industries, and were used as received.
  17 in total

1.  Anion Recognition and Sensing: The State of the Art and Future Perspectives.

Authors:  Paul D. Beer; Philip A. Gale
Journal:  Angew Chem Int Ed Engl       Date:  2001-02-02       Impact factor: 15.336

2.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

3.  The future of organic photovoltaics.

Authors:  Katherine A Mazzio; Christine K Luscombe
Journal:  Chem Soc Rev       Date:  2014-09-08       Impact factor: 54.564

4.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals.

Authors:  Felix A Faber; Alexander Lindmaa; O Anatole von Lilienfeld; Rickard Armiento
Journal:  Phys Rev Lett       Date:  2016-09-20       Impact factor: 9.161

5.  Photoswitching azo compounds in vivo with red light.

Authors:  Subhas Samanta; Andrew A Beharry; Oleg Sadovski; Theresa M McCormick; Amirhossein Babalhavaeji; Vince Tropepe; G Andrew Woolley
Journal:  J Am Chem Soc       Date:  2013-06-21       Impact factor: 15.419

6.  Photodissociation dynamics of thiophenol-d1: the nature of excited electronic states along the S-D bond dissociation coordinate.

Authors:  Jeong Sik Lim; Heechol Choi; Ivan S Lim; Seong Byung Park; Yoon Sup Lee; Sang Kyu Kim
Journal:  J Phys Chem A       Date:  2009-10-01       Impact factor: 2.781

7.  Purely organic electroluminescent material realizing 100% conversion from electricity to light.

Authors:  Hironori Kaji; Hajime Suzuki; Tatsuya Fukushima; Katsuyuki Shizu; Katsuaki Suzuki; Shosei Kubo; Takeshi Komino; Hajime Oiwa; Furitsu Suzuki; Atsushi Wakamiya; Yasujiro Murata; Chihaya Adachi
Journal:  Nat Commun       Date:  2015-10-19       Impact factor: 14.919

8.  Bayesian molecular design with a chemical language model.

Authors:  Hisaki Ikebata; Kenta Hongo; Tetsu Isomura; Ryo Maezono; Ryo Yoshida
Journal:  J Comput Aided Mol Des       Date:  2017-03-09       Impact factor: 3.686

9.  ChemTS: an efficient python library for de novo molecular generation.

Authors:  Xiufeng Yang; Jinzhe Zhang; Kazuki Yoshizoe; Kei Terayama; Koji Tsuda
Journal:  Sci Technol Adv Mater       Date:  2017-11-24       Impact factor: 8.090

10.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks.

Authors:  Marwin H S Segler; Thierry Kogej; Christian Tyrchan; Mark P Waller
Journal:  ACS Cent Sci       Date:  2017-12-28       Impact factor: 14.553

View more
  12 in total

Review 1.  Generative chemistry: drug discovery with deep learning generative models.

Authors:  Yuemin Bian; Xiang-Qun Xie
Journal:  J Mol Model       Date:  2021-02-04       Impact factor: 1.810

2.  QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization.

Authors:  Masato Sumita; Kei Terayama; Ryo Tamura; Koji Tsuda
Journal:  J Chem Inf Model       Date:  2022-09-08       Impact factor: 6.162

3.  EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation.

Authors:  Jules Leguy; Thomas Cauchy; Marta Glavatskikh; Béatrice Duval; Benoit Da Mota
Journal:  J Cheminform       Date:  2020-09-16       Impact factor: 5.514

4.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space.

Authors:  Jan H Jensen
Journal:  Chem Sci       Date:  2019-02-11       Impact factor: 9.825

5.  De novo creation of a naked eye-detectable fluorescent molecule based on quantum chemical computation and machine learning.

Authors:  Masato Sumita; Kei Terayama; Naoya Suzuki; Shinsuke Ishihara; Ryo Tamura; Mandeep K Chahal; Daniel T Payne; Kazuki Yoshizoe; Koji Tsuda
Journal:  Sci Adv       Date:  2022-03-09       Impact factor: 14.136

6.  Accelerating Photofunctional Molecule Discovery with Artificial Intelligence.

Authors:  Chiho Kim
Journal:  ACS Cent Sci       Date:  2018-09-12       Impact factor: 14.553

7.  Materials informatics approach to understand aluminum alloys.

Authors:  Ryo Tamura; Makoto Watanabe; Hiroaki Mamiya; Kota Washio; Masao Yano; Katsunori Danno; Akira Kato; Tetsuya Shoji
Journal:  Sci Technol Adv Mater       Date:  2020-07-29       Impact factor: 8.090

8.  Enhancing Biomolecular Sampling with Reinforcement Learning: A Tree Search Molecular Dynamics Simulation Method.

Authors:  Kento Shin; Duy Phuoc Tran; Kazuhiro Takemura; Akio Kitao; Kei Terayama; Koji Tsuda
Journal:  ACS Omega       Date:  2019-08-19

9.  Deep Generative Models for 3D Linker Design.

Authors:  Fergus Imrie; Anthony R Bradley; Mihaela van der Schaar; Charlotte M Deane
Journal:  J Chem Inf Model       Date:  2020-04-02       Impact factor: 4.956

10.  NMR-TS: de novo molecule identification from NMR spectra.

Authors:  Jinzhe Zhang; Kei Terayama; Masato Sumita; Kazuki Yoshizoe; Kengo Ito; Jun Kikuchi; Koji Tsuda
Journal:  Sci Technol Adv Mater       Date:  2020-07-30       Impact factor: 8.090

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.