Literature DB >> 17338509

One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties.

Chloé-Agathe Azencott1, Alexandre Ksikes, S Joshua Swamidass, Jonathan H Chen, Liva Ralaivola, Pierre Baldi.   

Abstract

Many chemoinformatics applications, including high-throughput virtual screening, benefit from being able to rapidly predict the physical, chemical, and biological properties of small molecules to screen large repositories and identify suitable candidates. When training sets are available, machine learning methods provide an effective alternative to ab initio methods for these predictions. Here, we leverage rich molecular representations including 1D SMILES strings, 2D graphs of bonds, and 3D coordinates to derive efficient machine learning kernels to address regression problems. We further expand the library of available spectral kernels for small molecules developed for classification problems to include 2.5D surface and 3D kernels using Delaunay tetrahedrization and other techniques from computational geometry, 3D pharmacophore kernels, and 3.5D or 4D kernels capable of taking into account multiple molecular configurations, such as conformers. The kernels are comprehensively tested using cross-validation and redundancy-reduction methods on regression problems using several available data sets to predict boiling points, melting points, aqueous solubility, octanol/water partition coefficients, and biological activity with state-of-the art results. When sufficient training data are available, 2D spectral kernels in general tend to yield the best and most robust results, better than state-of-the art. On data sets containing thousands of molecules, the kernels achieve a squared correlation coefficient of 0.91 for aqueous solubility prediction and 0.94 for octanol/water partition coefficient prediction. Averaging over conformations improves the performance of kernels based on the three-dimensional structure of molecules, especially on challenging data sets. Kernel predictors for aqueous solubility (kSOL), LogP (kLOGP), and melting point (kMELT) are available over the Web through: http://cdb.ics.uci.edu.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17338509     DOI: 10.1021/ci600397p

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  14 in total

1.  A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval.

Authors:  S Joshua Swamidass; Chloé-Agathe Azencott; Kenny Daily; Pierre Baldi
Journal:  Bioinformatics       Date:  2010-04-07       Impact factor: 6.937

2.  Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval.

Authors:  Pierre Baldi; Ryan W Benz; Daniel S Hirschberg; S Joshua Swamidass
Journal:  J Chem Inf Model       Date:  2007-10-30       Impact factor: 4.956

3.  Learning to predict chemical reactions.

Authors:  Matthew A Kayala; Chloé-Agathe Azencott; Jonathan H Chen; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2011-09-02       Impact factor: 4.956

4.  Analysis and use of fragment-occurrence data in similarity-based virtual screening.

Authors:  Shereena M Arif; John D Holliday; Peter Willett
Journal:  J Comput Aided Mol Des       Date:  2009-06-18       Impact factor: 3.686

5.  Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.

Authors:  Alessandro Lusci; Gianluca Pollastri; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2013-07-02       Impact factor: 4.956

6.  Protein-ligand interaction prediction: an improved chemogenomics approach.

Authors:  Laurent Jacob; Jean-Philippe Vert
Journal:  Bioinformatics       Date:  2008-08-01       Impact factor: 6.937

Review 7.  Machine learning for in silico virtual screening and chemical genomics: new strategies.

Authors:  Jean-Philippe Vert; Laurent Jacob
Journal:  Comb Chem High Throughput Screen       Date:  2008-09       Impact factor: 1.339

8.  A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem.

Authors:  William Wl Wong; Forbes J Burkowski
Journal:  J Cheminform       Date:  2009-04-28       Impact factor: 5.514

9.  Estimation of the applicability domain of kernel-based machine learning models for virtual screening.

Authors:  Nikolas Fechner; Andreas Jahn; Georg Hinselmann; Andreas Zell
Journal:  J Cheminform       Date:  2010-03-11       Impact factor: 5.514

10.  Influence relevance voting: an accurate and interpretable virtual high throughput screening method.

Authors:  S Joshua Swamidass; Chloé-Agathe Azencott; Ting-Wan Lin; Hugo Gramajo; Shiou-Chuan Tsai; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2009-04       Impact factor: 4.956

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.