Literature DB >> 29701305

Machine Learning for Organic Synthesis: Are Robots Replacing Chemists?

Boris Maryasin1,2, Philipp Marquetand2, Nuno Maulide1.   

Abstract

Machines learn chemistry: An artificial intelligence algorithm has learned to predict the outcomes of C-N coupling reactions from a few thousand nanomole-scale experiments. This Highlight discusses this work in the context of other state-of-the-art approaches for predicting the yields of organic reactions and explains the significance of the results.
© 2018 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.

Entities:  

Keywords:  Buchwald-Hartwig reaction; high-throughput synthesis robot; machine learning; nanomole-scale reactions

Year:  2018        PMID: 29701305      PMCID: PMC6033144          DOI: 10.1002/anie.201803562

Source DB:  PubMed          Journal:  Angew Chem Int Ed Engl        ISSN: 1433-7851            Impact factor:   15.336


The ability to predict the outcome of complex chemical transformations has been a long‐standing challenge for chemists. The development of quantum‐chemical approaches has already opened some opportunities in this direction, and in many cases, the outcomes of experiments can be efficiently modeled in silico.1, 2, 3, 4, 5, 6 The advent of artificial intelligence (AI) algorithms to automatize, improve, and generalize predictions is gaining importance in this field, and several recent studies have been published in this area. For example, in 2016, Aspuru‐Guzik and co‐workers reported their attempt to apply neural networks to basic reactions of alkenes and alkyl halides, and they were able to identify the correct reaction type for the majority of a set of textbook problems.7 In 2017, Gambin and co‐workers tested AI algorithms to predict a large set (450 000 cases) of manifold organic reactions, emphasizing that it might be essential to identify new chemoinformatic descriptors for future developments.8 Among other important attempts to predict and optimize organic reactions on the basis of AI, recent studies by the group of Zare9 as well as Jensen, Green, and co‐workers are noteworthy examples.10 Although the predictions had some limitations, in general, the AI algorithms showed an encouragingly good performance even for sophisticated organic systems. A recent study by the groups of Doyle and Dreher11 now demonstrates how the yields of a Buchwald–Hartwig coupling (Scheme 1) with a large set of different substrates can be accurately predicted with an AI algorithm, in this case a so‐called random forest. The particularity of the study is that the data from which the algorithm learns are generated experimentally with a nanomole‐scale high‐throughput robot. The AI predictions substantially outperformed many previous works.
Scheme 1

Buchwald–Hartwig coupling investigated in the study by the groups of Doyle and Dreher.11

Buchwald–Hartwig coupling investigated in the study by the groups of Doyle and Dreher.11 The procedure is as follows: First, the random forest model is trained. Here, molecular properties of the reactants, for example, their vibrational frequencies or dipole moments, are calculated by quantum chemistry. These properties serve as “descriptors”, that is, as inputs for the random forest algorithm. The reaction yield with a given set of reactants is then determined experimentally with the high‐throughput robot, and is fed into the machine learning algorithm. The algorithm learns to generate these yields as outputs when provided with the corresponding inputs generated from quantum‐chemistry calculations. After this training step, the random forest algorithm is able to predict the reaction yield of other, previously untested reactant combinations, whereby the procedure could be summarized in an oversimplified manner as: “If the reactants feature these vibrational frequencies and these dipole moments, then the reaction yield will be that number.” In this regard, it is interesting to consider that machine learning algorithms (which have been employed for decades) think differently to an experimental organic chemist, who would probably not take properties such as the vibrational spectrum of a reactant or its dipole moment into detailed account to estimate whether a reaction involving that reactant shall result in a high or a low yield. The work of Doyle and Dreher is a very promising breakthrough as they managed to obtain an excellent prediction accuracy, and it opens a range of opportunities for both theoretical and experimental chemists. It holds promise to dramatically accelerate the reaction optimization process in modern organic synthesis. A particularly interesting outcome of the study relates to the conspicuous problems encountered when palladiumcatalyzed Buchwald–Hartwig coupling procedures are applied to the preparation of complex drug‐like products, namely the strong limitations observed for substrates containing heteroatom–heteroatom bonds, such as isoxazoles (Scheme 1). The authors sought to probe this in their model by concurrently screening several structurally diverse oxazole additives by using the fragment additive approach proposed by Glorius and co‐workers.12 The results of the study and the predictive model that it afforded (whereby certain properties of the oxazole additives were found to strongly correlate with the yield of the Buchwald–Hartwig coupling) ultimately guided the mechanistic discovery that Pd0 competitively inserts into the N−O bond of isoxazoles, as demonstrated in a series of guided experiments. Two isoxazole fragments with dramatically different C3 NMR shifts (13C NMR shifts being one of the top 10 descriptors of the trained random forest model) were shown to behave rather differently when exposed to a prototypical Pd0 precatalyst (Figure 1). As the authors themselves point out, such a mechanistic assumption would certainly not have been unconceivable without the machine learning process, and it also hints at a more “human” intuitive dimension that must still accompany the development of such AI‐generated algorithms.
Figure 1

Simplified diagram depicting isoxazole additives to Buchwald–Hartwig coupling reactions, compared in terms of the C3 13C NMR chemical shift descriptor and the experimentally confirmed propensity to undergo N−O oxidative addition upon exposure to a Pd0 precatalyst.11

Simplified diagram depicting isoxazole additives to Buchwald–Hartwig coupling reactions, compared in terms of the C3 13CNMR chemical shift descriptor and the experimentally confirmed propensity to undergo N−O oxidative addition upon exposure to a Pd0 precatalyst.11 This milestone achievement immediately leads to several questions, such as: How generalizable is this approach, that is, is it possible to use the method for other classes of organic reactions? Can the predictions be made even more efficiently? And, for all organic chemists reading this article, how far ahead is the (dystopian?) scenario of machine‐learning algorithms combined with synthesis robots effectively replacing them? One of the next likely steps is the improvement of the computational approach employed to obtain the descriptors. Indeed, other classes of organic reactions are likely to require the consideration of more structurally flexible and branched molecular systems. For these systems, it might not be enough to calculate only one conformational minimum. This is perhaps best illustrated with an example: Consider two structurally similar reactants, each with two possible stable conformations A and B. A single quantum‐chemical minimization of each reactant might find conformation A for the one reactant and conformation B for the other reactant. The two reactants might thus be recognized as being very different by the AI algorithm, resulting in different reaction yields being predicted although they may be similar in practice. Furthermore, the actual quantum‐chemistry method employed (mostly B3LYP/6‐31G* in this case) can be discussed. Just hearing this acronym might trigger a flurry of suggestions for improvement from quantum chemists; nevertheless, one should bear in mind that the AI algorithm only needs to learn about the similarities of the reactants and their reactions (which can also be obtained from similarly wrong results for similar reactants). It is therefore imaginable that semi‐empirical methods might provide similar, satisfactory results at reduced computational cost. The “age of automation”13 thus appears to hold the potential to advance organic synthesis in a revolutionary way. We can finally ask provocatively, as in the title of this manuscript: Are robots replacing chemists? Looking at the possible pitfalls of the methods discussed above, we believe that we are not there yet. Overall, the main problem remains a lack of generality. However, the rapid development of AI approaches in combination with modern organic and quantum chemistry might change this situation in the near future. Additionally, the “human intuition” factor alluded to previously should provide some comfort—at least until AI algorithms are capable of mechanistic inferences.

Conflict of interest

The authors declare no conflict of interest.
  13 in total

1.  Predictive Model for Site-Selective Aryl and Heteroaryl C-H Functionalization via Organic Photoredox Catalysis.

Authors:  Kaila A Margrey; Joshua B McManus; Simone Bonazzi; Frederic Zecri; David A Nicewicz
Journal:  J Am Chem Soc       Date:  2017-08-07       Impact factor: 15.419

2.  Computing organic stereoselectivity - from concepts to quantitative calculations and predictions.

Authors:  Qian Peng; Fernanda Duarte; Robert S Paton
Journal:  Chem Soc Rev       Date:  2016-11-07       Impact factor: 54.564

3.  Erratum for the Report "Predicting reaction performance in C-N cross-coupling using machine learning" by D. T. Ahneman, J. G. Estrada, S. Lin, S. D. Dreher, A. G. Doyle.

Authors: 
Journal:  Science       Date:  2018-04-13       Impact factor: 47.728

4.  Computation and Experiment: A Powerful Combination to Understand and Predict Reactivities.

Authors:  Theresa Sperger; Italo A Sanhueza; Franziska Schoenebeck
Journal:  Acc Chem Res       Date:  2016-05-12       Impact factor: 22.384

5.  Neural Networks for the Prediction of Organic Chemistry Reactions.

Authors:  Jennifer N Wei; David Duvenaud; Alán Aspuru-Guzik
Journal:  ACS Cent Sci       Date:  2016-10-14       Impact factor: 14.553

6.  Using IR vibrations to quantitatively describe and predict site-selectivity in multivariate Rh-catalyzed C-H functionalization.

Authors:  Elizabeth N Bess; David M Guptill; Huw M L Davies; Matthew S Sigman
Journal:  Chem Sci       Date:  2015-03-18       Impact factor: 9.825

7.  Optimizing Chemical Reactions with Deep Reinforcement Learning.

Authors:  Zhenpeng Zhou; Xiaocheng Li; Richard N Zare
Journal:  ACS Cent Sci       Date:  2017-12-15       Impact factor: 14.553

8.  Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient?

Authors:  G Skoraczyński; P Dittwald; B Miasojedow; S Szymkuć; E P Gajewska; B A Grzybowski; A Gambin
Journal:  Sci Rep       Date:  2017-06-15       Impact factor: 4.379

9.  Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions.

Authors:  Jimmy C Kromann; Jan H Jensen; Monika Kruszyk; Mikkel Jessing; Morten Jørgensen
Journal:  Chem Sci       Date:  2017-11-13       Impact factor: 9.825

10.  Machine Learning for Organic Synthesis: Are Robots Replacing Chemists?

Authors:  Boris Maryasin; Philipp Marquetand; Nuno Maulide
Journal:  Angew Chem Int Ed Engl       Date:  2018-04-27       Impact factor: 15.336

View more
  8 in total

1.  Machine Learning for Organic Synthesis: Are Robots Replacing Chemists?

Authors:  Boris Maryasin; Philipp Marquetand; Nuno Maulide
Journal:  Angew Chem Int Ed Engl       Date:  2018-04-27       Impact factor: 15.336

2.  Capturing chemical intuition in synthesis of metal-organic frameworks.

Authors:  Seyed Mohamad Moosavi; Arunraj Chidambaram; Leopold Talirz; Maciej Haranczyk; Kyriakos C Stylianou; Berend Smit
Journal:  Nat Commun       Date:  2019-02-01       Impact factor: 14.919

3.  Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions.

Authors:  K T Schütt; M Gastegger; A Tkatchenko; K-R Müller; R J Maurer
Journal:  Nat Commun       Date:  2019-11-15       Impact factor: 14.919

4.  Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex.

Authors:  Pascal Friederich; Gabriel Dos Passos Gomes; Riccardo De Bin; Alán Aspuru-Guzik; David Balcells
Journal:  Chem Sci       Date:  2020-04-07       Impact factor: 9.825

5.  tmQM Dataset-Quantum Geometries and Properties of 86k Transition Metal Complexes.

Authors:  David Balcells; Bastian Bjerkem Skjelstad
Journal:  J Chem Inf Model       Date:  2020-11-09       Impact factor: 4.956

6.  A reactivity model for oxidative addition to palladium enables quantitative predictions for catalytic cross-coupling reactions.

Authors:  Jingru Lu; Sofia Donnecke; Irina Paci; David C Leitch
Journal:  Chem Sci       Date:  2022-02-28       Impact factor: 9.825

7.  Artificial Intelligent Deep Learning Molecular Generative Modeling of Scaffold-Focused and Cannabinoid CB2 Target-Specific Small-Molecule Sublibraries.

Authors:  Yuemin Bian; Xiang-Qun Xie
Journal:  Cells       Date:  2022-03-07       Impact factor: 6.600

Review 8.  Functional and Material Properties in Nanocatalyst Design: A Data Handling and Sharing Problem.

Authors:  Daniel Lach; Uladzislau Zhdan; Adam Smolinski; Jaroslaw Polanski
Journal:  Int J Mol Sci       Date:  2021-05-13       Impact factor: 5.923

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.