Literature DB >> 23030316

Does rational selection of training and test sets improve the outcome of QSAR modeling?

Todd M Martin1, Paul Harten, Douglas M Young, Eugene N Muratov, Alexander Golbraikh, Hao Zhu, Alexander Tropsha.   

Abstract

Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23030316     DOI: 10.1021/ci300338w

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  37 in total

Review 1.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors:  Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal:  Chem Rev       Date:  2020-06-10       Impact factor: 60.622

2.  QSAR modeling: where have you been? Where are you going to?

Authors:  Artem Cherkasov; Eugene N Muratov; Denis Fourches; Alexandre Varnek; Igor I Baskin; Mark Cronin; John Dearden; Paola Gramatica; Yvonne C Martin; Roberto Todeschini; Viviana Consonni; Victor E Kuz'min; Richard Cramer; Romualdo Benigni; Chihae Yang; James Rathman; Lothar Terfloth; Johann Gasteiger; Ann Richard; Alexander Tropsha
Journal:  J Med Chem       Date:  2014-01-06       Impact factor: 7.446

3.  Pred-hERG: A Novel web-Accessible Computational Tool for Predicting Cardiac Toxicity.

Authors:  Rodolpho C Braga; Vinicius M Alves; Meryck F B Silva; Eugene Muratov; Denis Fourches; Luciano M Lião; Alexander Tropsha; Carolina H Andrade
Journal:  Mol Inform       Date:  2015-07-20       Impact factor: 3.353

4.  Two new atom centered fragment descriptors and scoring function enhance classification of antibacterial activity.

Authors:  Durga Datta Kandel; Chandan Raychaudhury; Debnath Pal
Journal:  J Mol Model       Date:  2014-03-25       Impact factor: 1.810

5.  In silico prediction of pesticide aquatic toxicity with chemical category approaches.

Authors:  Fuxing Li; Defang Fan; Hao Wang; Hongbin Yang; Weihua Li; Yun Tang; Guixia Liu
Journal:  Toxicol Res (Camb)       Date:  2017-07-31       Impact factor: 3.524

6.  Ensemble QSAR Modeling to Predict Multispecies Fish Toxicity Lethal Concentrations and Points of Departure.

Authors:  Thomas Y Sheffield; Richard S Judson
Journal:  Environ Sci Technol       Date:  2019-10-10       Impact factor: 9.028

7.  The software tool to find greener solvent replacements, PARIS III.

Authors:  Paul Harten; Todd Martin; Michael Gonzalez; Douglas Young
Journal:  Environ Prog Sustain Energy       Date:  2020-01-01       Impact factor: 2.431

8.  Demonstration of a consensus approach for the calculation of physicochemical properties required for environmental fate assessments.

Authors:  Caroline Tebes-Stevens; Jay M Patel; Michaela Koopmans; John Olmstead; Said H Hilal; Nick Pope; Eric J Weber; Kurt Wolfe
Journal:  Chemosphere       Date:  2017-11-23       Impact factor: 7.086

9.  In silico design of anti-atherogenic biomaterials.

Authors:  Daniel R Lewis; Vladyslav Kholodovych; Michael D Tomasini; Dalia Abdelhamid; Latrisha K Petersen; William J Welsh; Kathryn E Uhrich; Prabhas V Moghe
Journal:  Biomaterials       Date:  2013-07-25       Impact factor: 12.479

10.  ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling.

Authors:  Tailong Lei; Youyong Li; Yunlong Song; Dan Li; Huiyong Sun; Tingjun Hou
Journal:  J Cheminform       Date:  2016-02-01       Impact factor: 5.514

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.