Literature DB >> 27481669

A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR.

Martin Gütlein1, Christoph Helma2, Andreas Karwath3, Stefan Kramer3.   

Abstract

(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate the common problem setting of building predictive models for relatively small datasets. The workflow allows to apply the built and validated models on large amounts of unseen data, and to compare the performance of the different validation approaches. The experimental results indicate that cross-validation produces higher performant (Q)SAR models than external test set validation, reduces the variance of the results, while at the same time underestimates the performance on unseen compounds. The experimental results reported in this paper suggest that, contrary to current conception in the community, cross-validation may play a significant role in evaluating the predictivity of (Q)SAR models.
Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Keywords:  Cheminformatics; Cross-validation; External validation; Structure-activity relationship; Validation

Year:  2013        PMID: 27481669     DOI: 10.1002/minf.201200134

Source DB:  PubMed          Journal:  Mol Inform        ISSN: 1868-1743            Impact factor:   3.353


  9 in total

1.  Boosted feature selectors: a case study on prediction P-gp inhibitors and substrates.

Authors:  Gonzalo Cerruela García; Nicolás García-Pedrajas
Journal:  J Comput Aided Mol Des       Date:  2018-10-26       Impact factor: 3.686

2.  Data mining in the U.S. National Toxicology Program (NTP) database reveals a potential bias regarding liver tumors in rodents irrespective of the test agent.

Authors:  Matthias Ring; Bjoern M Eskofier
Journal:  PLoS One       Date:  2015-02-06       Impact factor: 3.240

3.  Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation.

Authors:  Désirée Baumann; Knut Baumann
Journal:  J Cheminform       Date:  2014-11-26       Impact factor: 5.514

4.  Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability.

Authors:  Martin Gütlein; Stefan Kramer
Journal:  J Cheminform       Date:  2016-10-31       Impact factor: 5.514

5.  Predicting Drug-Induced Cholestasis with the Help of Hepatic Transporters-An in Silico Modeling Approach.

Authors:  Eleni Kotsampasakou; Gerhard F Ecker
Journal:  J Chem Inf Model       Date:  2017-03-08       Impact factor: 4.956

6.  Mechanistic Analysis of Chemically Diverse Bromodomain-4 Inhibitors Using Balanced QSAR Analysis and Supported by X-ray Resolved Crystal Structures.

Authors:  Magdi E A Zaki; Sami A Al-Hussain; Aamal A Al-Mutairi; Vijay H Masand; Abdul Samad; Rahul D Jawarkar
Journal:  Pharmaceuticals (Basel)       Date:  2022-06-14

7.  Comparison of various methods for validity evaluation of QSAR models.

Authors:  Shadi Shayanfar; Ali Shayanfar
Journal:  BMC Chem       Date:  2022-08-23

8.  Psychometric properties of the adapted instrument European Health Literacy Survey Questionnaire short-short form.

Authors:  Fábio Luiz Mialhe; Katarinne Lima Moraes; Fernanda Maria Rovai Bado; Virginia Visconde Brasil; Helena Alves De Carvalho Sampaio; Flávio Rebustini
Journal:  Rev Lat Am Enfermagem       Date:  2021-07-02

9.  Nano-Lazar: Read across Predictions for Nanoparticle Toxicities with Calculated and Measured Properties.

Authors:  Christoph Helma; Micha Rautenberg; Denis Gebele
Journal:  Front Pharmacol       Date:  2017-06-16       Impact factor: 5.810

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.