Literature DB >> 34363562

Stacking Gaussian processes to improve [Formula: see text] predictions in the SAMPL7 challenge.

Robert M Raddi1, Vincent A Voelz2.   

Abstract

Accurate predictions of acid dissociation constants are essential to rational molecular design in the pharmaceutical industry and elsewhere. There has been much interest in developing new machine learning methods that can produce fast and accurate pKa predictions for arbitrary species, as well as estimates of prediction uncertainty. Previously, as part of the SAMPL6 community-wide blind challenge, Bannan et al. approached the problem of predicting [Formula: see text]s by using a Gaussian process regression to predict microscopic [Formula: see text]s, from which macroscopic [Formula: see text] values can be analytically computed (Bannan et al. in J Comput-Aided Mol Des 32:1165-1177). While this method can make reasonably quick and accurate predictions using a small training set, accuracy was limited by the lack of a sufficiently broad range of chemical space in the training set (e.g., the inclusion of polyprotic acids). Here, to address this issue, we construct a deep Gaussian Process (GP) model that can include more features without invoking the curse of dimensionality. We trained both a standard GP and a deep GP model using a database of approximately 3500 small molecules curated from public sources, filtered by similarity to targets. We tested the model on both the SAMPL6 and more recent SAMPL7 challenge, which introduced a similar lack of ionizable sites and/or environments found between the test set and the previous training set. The results show that while the deep GP model made only minor improvements over the standard GP model for SAMPL6 predictions, it made significant improvements over the standard GP model in SAMPL7 macroscopic predictions, achieving a MAE of 1.5 [Formula: see text].
© 2021. The Author(s), under exclusive licence to Springer Nature Switzerland AG.

Entities:  

Keywords:  Acid dissociation constants; Computational drug design; Gaussian process models; Machine learning; Physicochemical properties; SAMPL7 physical property prediction

Mesh:

Substances:

Year:  2021        PMID: 34363562      PMCID: PMC9478567          DOI: 10.1007/s10822-021-00411-8

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   4.179


  19 in total

1.  Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation.

Authors:  Araz Jakalian; David B Jack; Christopher I Bayly
Journal:  J Comput Chem       Date:  2002-12       Impact factor: 3.376

2.  Predicting pK(a) by molecular tree structured fingerprints and PLS.

Authors:  Li Xing; Robert C Glen; Robert D Clark
Journal:  J Chem Inf Comput Sci       Date:  2003 May-Jun

3.  Extended-connectivity fingerprints.

Authors:  David Rogers; Mathew Hahn
Journal:  J Chem Inf Model       Date:  2010-05-24       Impact factor: 4.956

4.  Generation of a set of simple, interpretable ADMET rules of thumb.

Authors:  M Paul Gleeson
Journal:  J Med Chem       Date:  2008-01-31       Impact factor: 7.446

5.  Comparison of the accuracy of experimental and predicted pKa values of basic and acidic compounds.

Authors:  Luca Settimo; Krista Bellman; Ronald M A Knegtel
Journal:  Pharm Res       Date:  2013-11-19       Impact factor: 4.200

6.  Multiconformation, Density Functional Theory-Based pKa Prediction in Application to Large, Flexible Organic Molecules with Diverse Functional Groups.

Authors:  Art D Bochevarov; Mark A Watson; Jeremy R Greenwood; Dean M Philipp
Journal:  J Chem Theory Comput       Date:  2016-11-29       Impact factor: 6.006

7.  Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 Part II log P Challenge.

Authors:  Mehtap Işık; Teresa Danielle Bergazin; Thomas Fox; Andrea Rizzi; John D Chodera; David L Mobley
Journal:  J Comput Aided Mol Des       Date:  2020-02-27       Impact factor: 3.686

8.  SAMPL6 challenge results from [Formula: see text] predictions based on a general Gaussian process model.

Authors:  Caitlin C Bannan; David L Mobley; A Geoffrey Skillman
Journal:  J Comput Aided Mol Des       Date:  2018-10-15       Impact factor: 3.686

9.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information.

Authors:  Iurii Sushko; Sergii Novotarskyi; Robert Körner; Anil Kumar Pandey; Matthias Rupp; Wolfram Teetz; Stefan Brandmaier; Ahmed Abdelaziz; Volodymyr V Prokopenko; Vsevolod Y Tanchuk; Roberto Todeschini; Alexandre Varnek; Gilles Marcou; Peter Ertl; Vladimir Potemkin; Maria Grishina; Johann Gasteiger; Christof Schwab; Igor I Baskin; Vladimir A Palyulin; Eugene V Radchenko; William J Welsh; Vladyslav Kholodovych; Dmitriy Chekmarev; Artem Cherkasov; Joao Aires-de-Sousa; Qing-You Zhang; Andreas Bender; Florian Nigsch; Luc Patiny; Antony Williams; Valery Tkachenko; Igor V Tetko
Journal:  J Comput Aided Mol Des       Date:  2011-06-10       Impact factor: 3.686

10.  Structure property relationships of N-acylsulfonamides and related bioisosteres.

Authors:  Karol R Francisco; Carmine Varricchio; Thomas J Paniak; Marisa C Kozlowski; Andrea Brancale; Carlo Ballatore
Journal:  Eur J Med Chem       Date:  2021-03-28       Impact factor: 7.088

View more
  1 in total

1.  Stacking Gaussian processes to improve [Formula: see text] predictions in the SAMPL7 challenge.

Authors:  Robert M Raddi; Vincent A Voelz
Journal:  J Comput Aided Mol Des       Date:  2021-08-07       Impact factor: 4.179

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.