Literature DB >> 25998559

The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity.

Robert P Sheridan1.   

Abstract

UNLABELLED: In QSAR, a statistical model is generated from a training set of molecules (represented by chemical descriptors) and their biological activities (an "activity model"). The aim of the field of domain applicability (DA) is to estimate the uncertainty of prediction of a specific molecule on a specific activity model. A number of DA metrics have been proposed in the literature for this purpose. A quantitative model of the prediction uncertainty (an "error model") can be built using one or more of these metrics. A previous publication from our laboratory ( Sheridan , R. P. J. Chem. Inf. MODEL: 2013 , 53 , 2837 - 2850 ) suggested that QSAR methods such as random forest could be used to build error models by fitting unsigned prediction errors against DA metrics. The QSAR paradigm contains two useful techniques: descriptor importance can determine which DA metrics are most useful, and cross-validation can be used to tell which subset of DA metrics is sufficient to estimate the unsigned errors. Previously we studied 10 large, diverse data sets and seven DA metrics. For those data sets for which it is possible to build a significant error model from those seven metrics, only two metrics were sufficient to account for almost all of the information in the error model. These were TREE_SD (the variation of prediction among random forest trees) and PREDICTED (the predicted activity itself). In this paper we show that when data sets are less diverse, as for example in QSAR models of molecules in a single chemical series, these two DA metrics become less important in explaining prediction error, and the DA metric SIMILARITYNEAREST1 (the similarity of the molecule being predicted to the closest training set compound) becomes more important. Our recommendation is that when the mean pairwise similarity (measured with the Carhart AP descriptor and the Dice similarity index) within a QSAR training set is less than 0.5, one can use only TREE_SD, PREDICTED to form the error model, but otherwise one should use TREE_SD, PREDICTED, SIMILARITYNEAREST1.

Mesh:

Year:  2015        PMID: 25998559     DOI: 10.1021/acs.jcim.5b00110

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  6 in total

1.  Efficiency of different measures for defining the applicability domain of classification models.

Authors:  Waldemar Klingspohn; Miriam Mathea; Antonius Ter Laak; Nikolaus Heinrich; Knut Baumann
Journal:  J Cheminform       Date:  2017-08-03       Impact factor: 5.514

2.  Implicit-descriptor ligand-based virtual screening by means of collaborative filtering.

Authors:  Raghuram Srinivas; Pavel V Klimovich; Eric C Larson
Journal:  J Cheminform       Date:  2018-11-22       Impact factor: 5.514

3.  Assessing the calibration in toxicological in vitro models with conformal prediction.

Authors:  Ola Spjuth; Andrea Volkamer; Andrea Morger; Fredrik Svensson; Staffan Arvidsson McShane; Niharika Gauraha; Ulf Norinder
Journal:  J Cheminform       Date:  2021-04-29       Impact factor: 5.514

Review 4.  Uncertainty quantification: Can we trust artificial intelligence in drug discovery?

Authors:  Jie Yu; Dingyan Wang; Mingyue Zheng
Journal:  iScience       Date:  2022-07-21

5.  PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein-Peptide and Protein-Protein Binding Affinity.

Authors:  Sandra Romero-Molina; Yasser B Ruiz-Blanco; Joel Mieres-Perez; Mirja Harms; Jan Münch; Michael Ehrmann; Elsa Sanchez-Garcia
Journal:  J Proteome Res       Date:  2022-06-02       Impact factor: 5.370

6.  Prediction of blood:air and fat:air partition coefficients of volatile organic compounds for the interpretation of data in breath gas analysis.

Authors:  Christian Kramer; Paweł Mochalski; Karl Unterkofler; Agapios Agapiou; Veronika Ruzsanyi; Klaus R Liedl
Journal:  J Breath Res       Date:  2016-01-27       Impact factor: 3.262

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.