Literature DB >> 31529948

Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature.

Gian Marco Ghiandoni1, Michael J Bodkin2, Beining Chen3, Dimitar Hristozov2, James E A Wallace2, James Webster1, Valerie J Gillet1.   

Abstract

Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for "dynamic" reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.

Entities:  

Year:  2019        PMID: 31529948     DOI: 10.1021/acs.jcim.9b00537

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  5 in total

1.  RENATE: A Pseudo-retrosynthetic Tool for Synthetically Accessible de novo Design.

Authors:  Gian Marco Ghiandoni; Michael J Bodkin; Beining Chen; Dimitar Hristozov; James E A Wallace; James Webster; Valerie J Gillet
Journal:  Mol Inform       Date:  2021-11-08       Impact factor: 4.050

Review 2.  Molecular representations in AI-driven drug discovery: a review and practical guide.

Authors:  Laurianne David; Amol Thakkar; Rocío Mercado; Ola Engkvist
Journal:  J Cheminform       Date:  2020-09-17       Impact factor: 5.514

3.  Reaction classification and yield prediction using the differential reaction fingerprint DRFP.

Authors:  Daniel Probst; Philippe Schwaller; Jean-Louis Reymond
Journal:  Digit Discov       Date:  2022-01-21

4.  Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining.

Authors:  Mingjian Wen; Samuel M Blau; Xiaowei Xie; Shyam Dwaraknath; Kristin A Persson
Journal:  Chem Sci       Date:  2022-01-11       Impact factor: 9.825

5.  Enhancing reaction-based de novo design using a multi-label reaction class recommender.

Authors:  Gian Marco Ghiandoni; Michael J Bodkin; Beining Chen; Dimitar Hristozov; James E A Wallace; James Webster; Valerie J Gillet
Journal:  J Comput Aided Mol Des       Date:  2020-02-28       Impact factor: 3.686

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.