Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds.

Literature DB >> 30404432

Molecular Similarity-Based Domain Applicability Metric Efficiently Identifies Out-of-Domain Compounds.

Abstract

Domain applicability (DA) is a concept introduced to gauge the reliability of quantitative structure-activity relationship (QSAR) predictions. A leading DA metric is ensemble variance, which is defined as the variance of predictions by an ensemble of QSAR models. However, this metric fails to identify large prediction errors in melting point (MP) data, despite the availability of large training data sets. In this study, we examined the performance of this metric on MP data and found that, for most molecules, ensemble variance increased as their structural similarity to the training molecules decreased. However, the metric decreased for "out-of-domain" molecules, i.e., molecules with little to no structural similarity to the training compounds. This explains why ensemble variance fails to identify large prediction errors. In contrast, a new molecular similarity-based DA metric that considers the contributions of all training molecules in gauging the reliability of a prediction successfully identified predictions of MP data for which the errors were large. To validate our results, we used four additional data sets of diverse molecular properties. We divided each data set into a training set and a test set at a ratio of approximately 2:1, ensuring a small fraction of the test compounds are out of the training domain. We then trained random forest (RF) models on the training data and made RF predictions for the test set molecules. Results from these data sets confirm that the new DA metric significantly outperformed ensemble variance in identifying predictions for out-of-domain compounds. For within-domain compounds, the two metrics performed similarly, with ensemble variance marginally but consistently outperforming the new DA metric. The new DA metric, which does not rely on an ensemble of QSAR models, can be deployed with any machine-learning method, including deep neural networks.

Mesh：

Year: 2018 PMID： 30404432 DOI： 10.1021/acs.jcim.8b00597

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Keyword Cloud
Cited

2 in total

Review 1. Uncertainty quantification: Can we trust artificial intelligence in drug discovery?

Authors: Jie Yu; Dingyan Wang; Mingyue Zheng
Journal: iScience Date: 2022-07-21

2. A quantitative uncertainty metric controls error in neural network-driven chemical discovery.

Authors: Jon Paul Janet; Chenru Duan; Tzuhsiung Yang; Aditya Nandy; Heather J Kulik
Journal: Chem Sci Date: 2019-07-11 Impact factor: 9.825

2 in total