Literature DB >> 24152204

Using random forest to model the domain applicability of another random forest model.

Robert P Sheridan1.   

Abstract

In QSAR, a statistical model is generated from a training set of molecules (represented by chemical descriptors) and their biological activities. We will call this traditional type of QSAR model an "activity model". The activity model can be used to predict the activities of molecules not in the training set. A relatively new subfield for QSAR is domain applicability. The aim is to estimate the reliability of prediction of a specific molecule on a specific activity model. A number of different metrics have been proposed in the literature for this purpose. It is desirable to build a quantitative model of reliability against one or more of these metrics. We can call this an "error model". A previous publication from our laboratory (Sheridan J. Chem. Inf. Model., 2012, 52, 814-823.) suggested the simultaneous use of three metrics would be more discriminating than any one metric. An error model could be built in the form of a three-dimensional set of bins. When the number of metrics exceeds three, however, the bin paradigm is not practical. An obvious solution for constructing an error model using multiple metrics is to use a QSAR method, in our case random forest. In this paper we demonstrate the usefulness of this paradigm, specifically for determining whether a useful error model can be built and which metrics are most useful for a given problem. For the ten data sets and for the seven metrics we examine here, it appears that it is possible to construct a useful error model using only two metrics (TREE_SD and PREDICTED). These do not require calculating similarities/distances between the molecules being predicted and the molecules used to build the activity model, which can be rate-limiting.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24152204     DOI: 10.1021/ci400482e

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  20 in total

Review 1.  QSAR without borders.

Authors:  Eugene N Muratov; Jürgen Bajorath; Robert P Sheridan; Igor V Tetko; Dmitry Filimonov; Vladimir Poroikov; Tudor I Oprea; Igor I Baskin; Alexandre Varnek; Adrian Roitberg; Olexandr Isayev; Stefano Curtarolo; Denis Fourches; Yoram Cohen; Alan Aspuru-Guzik; David A Winkler; Dimitris Agrafiotis; Artem Cherkasov; Alexander Tropsha
Journal:  Chem Soc Rev       Date:  2020-05-01       Impact factor: 54.564

2.  Opportunities and challenges using artificial intelligence in ADME/Tox.

Authors:  Barun Bhhatarai; W Patrick Walters; Cornelis E C A Hop; Guido Lanza; Sean Ekins
Journal:  Nat Mater       Date:  2019-05       Impact factor: 43.841

3.  ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling.

Authors:  Tailong Lei; Youyong Li; Yunlong Song; Dan Li; Huiyong Sun; Tingjun Hou
Journal:  J Cheminform       Date:  2016-02-01       Impact factor: 5.514

4.  Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel.

Authors:  Isidro Cortés-Ciriano; Gerard J P van Westen; Guillaume Bouvier; Michael Nilges; John P Overington; Andreas Bender; Thérèse E Malliavin
Journal:  Bioinformatics       Date:  2015-09-08       Impact factor: 6.937

5.  Using beta binomials to estimate classification uncertainty for ensemble models.

Authors:  Robert D Clark; Wenkel Liang; Adam C Lee; Michael S Lawless; Robert Fraczkiewicz; Marvin Waldman
Journal:  J Cheminform       Date:  2014-06-22       Impact factor: 5.514

6.  Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling.

Authors:  Isidro Cortes-Ciriano; Daniel S Murrell; Gerard Jp van Westen; Andreas Bender; Thérèse E Malliavin
Journal:  J Cheminform       Date:  2015-01-16       Impact factor: 5.514

7.  Efficiency of different measures for defining the applicability domain of classification models.

Authors:  Waldemar Klingspohn; Miriam Mathea; Antonius Ter Laak; Nikolaus Heinrich; Knut Baumann
Journal:  J Cheminform       Date:  2017-08-03       Impact factor: 5.514

8.  Prediction of blood:air and fat:air partition coefficients of volatile organic compounds for the interpretation of data in breath gas analysis.

Authors:  Christian Kramer; Paweł Mochalski; Karl Unterkofler; Agapios Agapiou; Veronika Ruzsanyi; Klaus R Liedl
Journal:  J Breath Res       Date:  2016-01-27       Impact factor: 3.262

Review 9.  Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.

Authors:  Qurrat Ul Ain; Antoniya Aleksandrova; Florian D Roessler; Pedro J Ballester
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2015-08-28

10.  How accurately can we predict the melting points of drug-like compounds?

Authors:  Igor V Tetko; Yurii Sushko; Sergii Novotarskyi; Luc Patiny; Ivan Kondratov; Alexander E Petrenko; Larisa Charochkina; Abdullah M Asiri
Journal:  J Chem Inf Model       Date:  2014-12-09       Impact factor: 4.956

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.