Literature DB >> 28727421

Shallow Representation Learning via Kernel PCA Improves QSAR Modelability.

Stefano E Rensi1, Russ B Altman1.   

Abstract

Linear models offer a robust, flexible, and computationally efficient set of tools for modeling quantitative structure-activity relationships (QSARs) but have been eclipsed in performance by nonlinear methods. Support vector machines (SVMs) and neural networks are currently among the most popular and accurate QSAR methods because they learn new representations of the data that greatly improve modelability. In this work, we use shallow representation learning to improve the accuracy of L1 regularized logistic regression (LASSO) and meet the performance of Tanimoto SVM. We embedded chemical fingerprints in Euclidean space using Tanimoto (a.k.a. Jaccard) similarity kernel principal component analysis (KPCA) and compared the effects on LASSO and SVM model performance for predicting the binding activities of chemical compounds against 102 virtual screening targets. We observed similar performance and patterns of improvement for LASSO and SVM. We also empirically measured model training and cross-validation times to show that KPCA used in concert with LASSO classification is significantly faster than linear SVM over a wide range of training set sizes. Our work shows that powerful linear QSAR methods can match nonlinear methods and demonstrates a modular approach to nonlinear classification that greatly enhances QSAR model prototyping facility, flexibility, and transferability.

Entities:  

Mesh:

Year:  2017        PMID: 28727421      PMCID: PMC5942586          DOI: 10.1021/acs.jcim.6b00694

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  27 in total

1.  Comparison of support vector machine and artificial neural network systems for drug/nondrug classification.

Authors:  Evgeny Byvatov; Uli Fechner; Jens Sadowski; Gisbert Schneider
Journal:  J Chem Inf Comput Sci       Date:  2003 Nov-Dec

2.  Random forest: a classification and regression tool for compound classification and QSAR modeling.

Authors:  Vladimir Svetnik; Andy Liaw; Christopher Tong; J Christopher Culberson; Robert P Sheridan; Bradley P Feuston
Journal:  J Chem Inf Comput Sci       Date:  2003 Nov-Dec

3.  Reoptimization of MDL keys for use in drug discovery.

Authors:  Joseph L Durant; Burton A Leland; Douglas R Henry; James G Nourse
Journal:  J Chem Inf Comput Sci       Date:  2002 Nov-Dec

4.  QSAR and ADME.

Authors:  Corwin Hansch; Albert Leo; Suresh Babu Mekapati; Alka Kurup
Journal:  Bioorg Med Chem       Date:  2004-06-15       Impact factor: 3.641

Review 5.  Virtual screening of chemical libraries.

Authors:  Brian K Shoichet
Journal:  Nature       Date:  2004-12-16       Impact factor: 49.962

6.  A modular approach to the ECVAM principles on test validity.

Authors:  Thomas Hartung; Susanne Bremer; Silvia Casati; Sandra Coecke; Raffaella Corvi; Salvador Fortaner; Laura Gribaldo; Marlies Halder; Sebastian Hoffmann; Annett Janusch Roi; Pilar Prieto; Enrico Sabbioni; Laurie Scott; Andrew Worth; Valérie Zuang
Journal:  Altern Lab Anim       Date:  2004-11       Impact factor: 1.303

7.  ZINC--a free database of commercially available compounds for virtual screening.

Authors:  John J Irwin; Brian K Shoichet
Journal:  J Chem Inf Model       Date:  2005 Jan-Feb       Impact factor: 4.956

Review 8.  Similarity-based virtual screening using 2D fingerprints.

Authors:  Peter Willett
Journal:  Drug Discov Today       Date:  2006-10-20       Impact factor: 7.851

Review 9.  Support vector machines for drug discovery.

Authors:  Kathrin Heikamp; Jürgen Bajorath
Journal:  Expert Opin Drug Discov       Date:  2013-12-05       Impact factor: 6.098

10.  Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.

Authors:  Alessandro Lusci; Gianluca Pollastri; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2013-07-02       Impact factor: 4.956

View more
  3 in total

Review 1.  Machine learning in chemoinformatics and drug discovery.

Authors:  Yu-Chen Lo; Stefano E Rensi; Wen Torng; Russ B Altman
Journal:  Drug Discov Today       Date:  2018-05-08       Impact factor: 7.851

2.  Implicit-descriptor ligand-based virtual screening by means of collaborative filtering.

Authors:  Raghuram Srinivas; Pavel V Klimovich; Eric C Larson
Journal:  J Cheminform       Date:  2018-11-22       Impact factor: 5.514

3.  QSPR model for Caco-2 cell permeability prediction using a combination of HQPSO and dual-RBF neural network.

Authors:  Yukun Wang; Xuebo Chen
Journal:  RSC Adv       Date:  2020-11-26       Impact factor: 4.036

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.