| Literature DB >> 29915783 |
Fan Feng1, Luhua Lai1,2,3, Jianfeng Pei2.
Abstract
With the idea of retrosynthetic analysis, which was raised in the 1960s, chemical synthesis analysis and pathway design have been transformed from a complex problem to a regular process of structural simplification. This review aims to summarize the developments of computer-assisted synthetic analysis and design in recent years, and how machine-learning algorithms contributed to them. LHASA system started the pioneering work of designing semi-empirical reaction modes in computers, with its following rule-based and network-searching work not only expanding the databases, but also building new approaches to indicating reaction rules. Programs like ARChem Route Designer replaced hand-coded reaction modes with automatically-extracted rules, and programs like Chematica changed traditional designing into network searching. Afterward, with the help of machine learning, two-step models which combine reaction rules and statistical methods became the main stream. Recently, fully data-driven learning methods using deep neural networks which even do not require any prior knowledge, were applied into this field. Up to now, however, these methods still cannot replace experienced human organic chemists due to their relatively low accuracies. Future new algorithms with the aid of powerful computational hardware will make this topic promising and with good prospects.Entities:
Keywords: chemical synthesis analysis; deep learning; pathway design; retrosynthesis; seq2seq
Year: 2018 PMID: 29915783 PMCID: PMC5994992 DOI: 10.3389/fchem.2018.00199
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.221
Figure 1Schematic representation of a local part of the Reaction Networks. Reactions included in this figure are: (1) A + B = C; (2) B + C = D; (3) C = E.
Summary of some rule-based retrosynthesis models.
| LHASA & SECS | Corey et al., | Expressing several simple design strategies by a chemical language called CMTRN (ChemistryTRaNslator). | Few reaction rules No stereochemistry Not active for years |
| SYNLMA | Johnson et al., | Using knowledge base to do logical operations. | The problem of combination explosion |
| IGOR & IGOR2 | Bauer et al., | Transforming molecules into bond-electron (BE) matrices & transforming reactions rules were into the subtraction of reactant and product matrices. | High computationally cost |
| CHIRON | Hanessian et al., | Trying to maximize the overlap between targets and start materials. | CHIRON does not search full synthetic tree and can only be used to assist humans |
| WODCA | Hollering et al., | Analyzing the characters of bonds to suggest which one should be regarded as the retrosynthetic disconnections with matrix notation. | Slow computational speed |
| Syntaurus | Szymkuć and Gajewska, | Using 20,000 expert-coded and cross-checked chemical transforms and using CSF (Chemicals' Scoring Function) + RSF (Reaction Scoring Function) to evaluate and rank the synthetic routes. | Many years were taken to construct the database Some reactions are not applicable in real lab work |
Figure 2The process of extracting reaction rules. (A–C) Identifying the Reaction core (the set of atoms where connections or bonds have changed by going from reactant to product) by comparing reactants and products, and extending the cores to contain neighboring atoms or functional groups. (D) Clustering the extracted reaction cores into common groups. (E) Producing a generalized rule template for each cluster group and completing the generalized rule templates.
Figure 3Reaction rules play the intermediate role in two-step models. The judging or ranking (in diamond blocks) is implemented by using machine learning or deep learning methods.
Figure 4A schematic diagram of seq2seq—RNN with LSTM.