Literature DB >> 18826208

Robust cross-validation of linear regression QSAR models.

Dmitry A Konovalov1, Lyndon E Llewellyn, Yvan Vander Heyden, Danny Coomans.   

Abstract

A quantitative structure-activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any way during the model development/calibration. However, during the development of a QSAR model, it is necessary to obtain some indication of the model's predictive power. This is often done by some form of cross-validation (CV). In this study, the concepts of the predictive power and fitting ability of a multiple linear regression (MLR) QSAR model were examined in the CV context allowing for the presence of outliers. Commonly used predictive power and fitting ability statistics were assessed via Monte Carlo cross-validation when applied to percent human intestinal absorption, blood-brain partition coefficient, and toxicity values of saxitoxin QSAR data sets, as well as three known benchmark data sets with known outlier contamination. It was found that (1) a robust version of MLR should always be preferred over the ordinary-least-squares MLR, regardless of the degree of outlier contamination and that (2) the model's predictive power should only be assessed via robust statistics. The Matlab and java source code used in this study is freely available from the QSAR-BENCH section of www.dmitrykonovalov.org for academic use. The Web site also contains the java-based QSAR-BENCH program, which could be run online via java's Web Start technology (supporting Windows, Mac OSX, Linux/Unix) to reproduce most of the reported results or apply the reported procedures to other data sets.

Entities:  

Mesh:

Year:  2008        PMID: 18826208     DOI: 10.1021/ci800209k

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  14 in total

1.  Robust scoring functions for protein-ligand interactions with quantum chemical charge models.

Authors:  Jui-Chih Wang; Jung-Hsin Lin; Chung-Ming Chen; Alex L Perryman; Arthur J Olson
Journal:  J Chem Inf Model       Date:  2011-10-07       Impact factor: 4.956

2.  Toward better QSAR/QSPR modeling: simultaneous outlier detection and variable selection using distribution of model features.

Authors:  Dongsheng Cao; Yizeng Liang; Qingsong Xu; Yifeng Yun; Hongdong Li
Journal:  J Comput Aided Mol Des       Date:  2010-11-13       Impact factor: 3.686

3.  QSAR studies for prediction of cross-β sheet aggregate binding affinity and selectivity.

Authors:  Katryna Cisek; Jeff Kuret
Journal:  Bioorg Med Chem       Date:  2012-01-12       Impact factor: 3.641

4.  Structural determinants of Tau aggregation inhibitor potency.

Authors:  Kelsey N Schafer; Katryna Cisek; Carol J Huseby; Edward Chang; Jeff Kuret
Journal:  J Biol Chem       Date:  2013-09-26       Impact factor: 5.157

5.  Machine Learning-Assisted Identification and Quantification of Hydroxylated Metabolites of Polychlorinated Biphenyls in Animal Samples.

Authors:  Chun-Yun Zhang; Xueshu Li; Kimberly P Keil Stietz; Sunjay Sethi; Weizhu Yang; Rachel F Marek; Xinxin Ding; Pamela J Lein; Keri C Hornbuckle; Hans-Joachim Lehmler
Journal:  Environ Sci Technol       Date:  2022-09-01       Impact factor: 11.357

6.  Predicting Thermal Decomposition Temperature of Binary Imidazolium Ionic Liquid Mixtures from Molecular Structures.

Authors:  Hongpeng He; Yong Pan; Jianwen Meng; Yongheng Li; Junhong Zhong; Weijia Duan; Juncheng Jiang
Journal:  ACS Omega       Date:  2021-05-11

7.  QSAR analysis of benzophenone derivatives as antimalarial agents.

Authors:  Supriya Mahajan; Vijayalaxmi Kamath; Sonali Nayak; Shalaka Vaidya
Journal:  Indian J Pharm Sci       Date:  2012-01       Impact factor: 0.975

8.  Computational analysis and predictive modeling of polymorph descriptors.

Authors:  Yugyung Lee; Sourav Jana; Gayathri Acharya; Chi H Lee
Journal:  Chem Cent J       Date:  2013-02-04       Impact factor: 4.215

9.  Hemoglobin level, a prognostic factor for nasal extranodal natural killer/T-cell lymphoma patients from stage I to IV: A validated prognostic nomogram.

Authors:  Jianzhong Cao; Shengmin Lan; Liuhai Shen; Hongwei Si; Huan Xiao; Qiang Yuan; Xue Li; Hongwei Li; Ruyuan Guo
Journal:  Sci Rep       Date:  2017-09-08       Impact factor: 4.379

10.  Prognostic value of cancer antigen -125 for lung adenocarcinoma patients with brain metastasis: A random survival forest prognostic model.

Authors:  Hao Wang; Liuhai Shen; Jianhua Geng; Yitian Wu; Huan Xiao; Fan Zhang; Hongwei Si
Journal:  Sci Rep       Date:  2018-04-04       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.