Literature DB >> 33431004

Transformer-CNN: Swiss knife for QSAR modeling and interpretation.

Pavel Karpov1,2, Guillaume Godin3, Igor V Tetko4,5.   

Abstract

We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.

Entities:  

Keywords:  Augmentation; Character-based models; Cheminformatics; Classification; Convolutional neural neural networks; Embeddings; QSAR; Regression; SMILES; Transformer model

Year:  2020        PMID: 33431004      PMCID: PMC7079452          DOI: 10.1186/s13321-020-00423-w

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  32 in total

1.  Estimation of aqueous solubility of chemical compounds using E-state indices.

Authors:  I V Tetko; V Y Tanchuk; T N Kasheva; A E Villa
Journal:  J Chem Inf Comput Sci       Date:  2001 Nov-Dec

2.  Janus kinase 3 (Jak3) is essential for common cytokine receptor gamma chain (gamma(c))-dependent signaling: comparative analysis of gamma(c), Jak3, and gamma(c) and Jak3 double-deficient mice.

Authors:  K Suzuki; H Nakajima; Y Saito; T Saito; W J Leonard; I Iwamoto
Journal:  Int Immunol       Date:  2000-02       Impact factor: 4.823

3.  PLS-optimal: a stepwise D-optimal design based on latent variables.

Authors:  Stefan Brandmaier; Ullrika Sahlin; Igor V Tetko; Tomas Öberg
Journal:  J Chem Inf Model       Date:  2012-04-11       Impact factor: 4.956

4.  Neural network studies. 2. Variable selection.

Authors:  I V Tetko; A E Villa; D J Livingstone
Journal:  J Chem Inf Comput Sci       Date:  1996 Jul-Aug

Review 5.  Mutagenic and carcinogenic structural alerts and their mechanisms of action.

Authors:  Alja Plošnik; Marjan Vračko; Marija Sollner Dolenc
Journal:  Arh Hig Rada Toksikol       Date:  2016-09-01       Impact factor: 1.948

6.  Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space.

Authors:  Sergey Sosnin; Dmitry Karlov; Igor V Tetko; Maxim V Fedorov
Journal:  J Chem Inf Model       Date:  2019-01-23       Impact factor: 4.956

7.  Open Babel: An open chemical toolbox.

Authors:  Noel M O'Boyle; Michael Banck; Craig A James; Chris Morley; Tim Vandermeersch; Geoffrey R Hutchison
Journal:  J Cheminform       Date:  2011-10-07       Impact factor: 5.514

8.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information.

Authors:  Iurii Sushko; Sergii Novotarskyi; Robert Körner; Anil Kumar Pandey; Matthias Rupp; Wolfram Teetz; Stefan Brandmaier; Ahmed Abdelaziz; Volodymyr V Prokopenko; Vsevolod Y Tanchuk; Roberto Todeschini; Alexandre Varnek; Gilles Marcou; Peter Ertl; Vladimir Potemkin; Maria Grishina; Johann Gasteiger; Christof Schwab; Igor I Baskin; Vladimir A Palyulin; Eugene V Radchenko; William J Welsh; Vladyslav Kholodovych; Dmitriy Chekmarev; Artem Cherkasov; Joao Aires-de-Sousa; Qing-You Zhang; Andreas Bender; Florian Nigsch; Luc Patiny; Antony Williams; Valery Tkachenko; Igor V Tetko
Journal:  J Comput Aided Mol Des       Date:  2011-06-10       Impact factor: 3.686

9.  Generative Recurrent Networks for De Novo Drug Design.

Authors:  Anvita Gupta; Alex T Müller; Berend J H Huisman; Jens A Fuchs; Petra Schneider; Gisbert Schneider
Journal:  Mol Inform       Date:  2017-11-02       Impact factor: 3.353

10.  MoleculeNet: a benchmark for molecular machine learning.

Authors:  Zhenqin Wu; Bharath Ramsundar; Evan N Feinberg; Joseph Gomes; Caleb Geniesse; Aneesh S Pappu; Karl Leswing; Vijay Pande
Journal:  Chem Sci       Date:  2017-10-31       Impact factor: 9.825

View more
  17 in total

Review 1.  Artificial intelligence to deep learning: machine intelligence approach for drug discovery.

Authors:  Rohan Gupta; Devesh Srivastava; Mehar Sahu; Swati Tiwari; Rashmi K Ambasta; Pravir Kumar
Journal:  Mol Divers       Date:  2021-04-12       Impact factor: 3.364

2.  Multi-PLI: interpretable multi-task deep learning model for unifying protein-ligand interaction datasets.

Authors:  Fan Hu; Jiaxin Jiang; Dongqi Wang; Muchun Zhu; Peng Yin
Journal:  J Cheminform       Date:  2021-04-15       Impact factor: 5.514

3.  Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias.

Authors:  Dávid Péter Kovács; William McCorkindale; Alpha A Lee
Journal:  Nat Commun       Date:  2021-03-16       Impact factor: 14.919

4.  Convolutional neural networks (CNNs): concepts and applications in pharmacogenomics.

Authors:  Joel Markus Vaz; S Balaji
Journal:  Mol Divers       Date:  2021-05-24       Impact factor: 3.364

5.  Topological Distance-Based Electron Interaction Tensor to Apply a Convolutional Neural Network on Drug-like Compounds.

Authors:  Hyun Kil Shin
Journal:  ACS Omega       Date:  2021-12-15

6.  In Silico Prediction and Insights Into the Structural Basis of Drug Induced Nephrotoxicity.

Authors:  Yinping Shi; Yuqing Hua; Baobao Wang; Ruiqiu Zhang; Xiao Li
Journal:  Front Pharmacol       Date:  2022-01-05       Impact factor: 5.810

7.  CRNNTL: Convolutional Recurrent Neural Network and Transfer Learning for QSAR Modeling in Organic Drug and Material Discovery.

Authors:  Yaqin Li; Yongjin Xu; Yi Yu
Journal:  Molecules       Date:  2021-11-30       Impact factor: 4.411

8.  In silico prediction of chemical-induced hematotoxicity with machine learning and deep learning methods.

Authors:  Yuqing Hua; Yinping Shi; Xueyan Cui; Xiao Li
Journal:  Mol Divers       Date:  2021-07-01       Impact factor: 2.943

9.  Transformer-based artificial neural networks for the conversion between chemical notations.

Authors:  Lev Krasnov; Ivan Khokhlov; Maxim V Fedorov; Sergey Sosnin
Journal:  Sci Rep       Date:  2021-07-20       Impact factor: 4.379

10.  MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning.

Authors:  Hyuntae Lim; YounJoon Jung
Journal:  J Cheminform       Date:  2021-07-31       Impact factor: 5.514

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.