Literature DB >> 22385389

Three useful dimensions for domain applicability in QSAR models using random forest.

Robert P Sheridan1.   

Abstract

One popular metric for estimating the accuracy of prospective quantitative structure-activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.
© 2012 American Chemical Society

Mesh:

Year:  2012        PMID: 22385389     DOI: 10.1021/ci300004n

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  24 in total

1.  QSAR model based on weighted MCS trees approach for the representation of molecule data sets.

Authors:  Bernardo Palacios-Bejarano; Gonzalo Cerruela García; Irene Luque Ruiz; Miguel Ángel Gómez-Nieto
Journal:  J Comput Aided Mol Des       Date:  2013-02-06       Impact factor: 3.686

2.  Chlorophenol sorption on multi-walled carbon nanotubes: DFT modeling and structure-property relationship analysis.

Authors:  Marquita Watkins; Natalia Sizochenko; Quentarius Moore; Marek Golebiowski; Danuta Leszczynska; Jerzy Leszczynski
Journal:  J Mol Model       Date:  2017-01-24       Impact factor: 1.810

3.  Introduction to the BioChemical Library (BCL): An Application-Based Open-Source Toolkit for Integrated Cheminformatics and Machine Learning in Computer-Aided Drug Discovery.

Authors:  Benjamin P Brown; Oanh Vu; Alexander R Geanes; Sandeepkumar Kothiwale; Mariusz Butkiewicz; Edward W Lowe; Ralf Mueller; Richard Pape; Jeffrey Mendenhall; Jens Meiler
Journal:  Front Pharmacol       Date:  2022-02-21       Impact factor: 5.810

4.  Discovery of potent, selective multidrug and toxin extrusion transporter 1 (MATE1, SLC47A1) inhibitors through prescription drug profiling and computational modeling.

Authors:  Matthias B Wittwer; Arik A Zur; Natalia Khuri; Yasuto Kido; Alan Kosaka; Xuexiang Zhang; Kari M Morrissey; Andrej Sali; Yong Huang; Kathleen M Giacomini
Journal:  J Med Chem       Date:  2013-01-22       Impact factor: 7.446

5.  QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality.

Authors:  David J Wood; Lars Carlsson; Martin Eklund; Ulf Norinder; Jonna Stålring
Journal:  J Comput Aided Mol Des       Date:  2013-03-16       Impact factor: 3.686

6.  Assigning confidence to molecular property prediction.

Authors:  AkshatKumar Nigam; Robert Pollice; Matthew F D Hurley; Riley J Hickman; Matteo Aldeghi; Naruki Yoshikawa; Seyone Chithrananda; Vincent A Voelz; Alán Aspuru-Guzik
Journal:  Expert Opin Drug Discov       Date:  2021-06-15       Impact factor: 7.050

7.  A novel artificial intelligence protocol to investigate potential leads for Parkinson's disease.

Authors:  Zhi-Dong Chen; Lu Zhao; Hsin-Yi Chen; Jia-Ning Gong; Xu Chen; Calvin Yu-Chian Chen
Journal:  RSC Adv       Date:  2020-06-16       Impact factor: 4.036

8.  Prediction of blood:air and fat:air partition coefficients of volatile organic compounds for the interpretation of data in breath gas analysis.

Authors:  Christian Kramer; Paweł Mochalski; Karl Unterkofler; Agapios Agapiou; Veronika Ruzsanyi; Klaus R Liedl
Journal:  J Breath Res       Date:  2016-01-27       Impact factor: 3.262

9.  QSAR workbench: automating QSAR modeling to drive compound design.

Authors:  Richard Cox; Darren V S Green; Christopher N Luscombe; Noj Malcolm; Stephen D Pickett
Journal:  J Comput Aided Mol Des       Date:  2013-04-25       Impact factor: 3.686

10.  An in silico platform for predicting, screening and designing of antihypertensive peptides.

Authors:  Ravi Kumar; Kumardeep Chaudhary; Jagat Singh Chauhan; Gandharva Nagpal; Rahul Kumar; Minakshi Sharma; Gajendra P S Raghava
Journal:  Sci Rep       Date:  2015-07-27       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.