Literature DB >> 13677492

Boosted leave-many-out cross-validation: the effect of training and test set diversity on PLS statistics.

Robert D Clark1.   

Abstract

It is becoming increasingly common in quantitative structure/activity relationship (QSAR) analyses to use external test sets to evaluate the likely stability and predictivity of the models obtained. In some cases, such as those involving variable selection, an internal test set--i.e., a cross-validation set--is also used. Care is sometimes taken to ensure that the subsets used exhibit response and/or property distributions similar to those of the data set as a whole, but more often the individual observations are simply assigned 'at random.' In the special case of MLR without variable selection, it can be analytically demonstrated that this strategy is inferior to others. Most particularly, D-optimal design performs better if the form of the regression equation is known and the variables involved are well behaved. This report introduces an alternative, non-parametric approach termed 'boosted leave-many-out' (boosted LMO) cross-validation. In this method, relatively small training sets are chosen by applying optimizable k-dissimilarity selection (OptiSim) using a small subsample size (k = 4, in this case), with the unselected observations being reserved as a test set for the corresponding reduced model. Predictive errors for the full model are then estimated by aggregating results over several such analyses. The countervailing effects of training and test set size, diversity, and representativeness on PLS model statistics are described for CoMFA analysis of a large data set of COX2 inhibitors.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 13677492     DOI: 10.1023/a:1025366721142

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  6 in total

1.  Beware of q2!

Authors:  Alexander Golbraikh; Alexander Tropsha
Journal:  J Mol Graph Model       Date:  2002-01       Impact factor: 2.518

2.  Three-dimensional quantitative structure-activity relationships of cyclo-oxygenase-2 (COX-2) inhibitors: a comparative molecular field analysis.

Authors:  P Chavatte; S Yous; C Marot; N Baurin; D Lesieur
Journal:  J Med Chem       Date:  2001-09-27       Impact factor: 7.446

3.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Authors:  Alexander Golbraikh; Alexander Tropsha
Journal:  J Comput Aided Mol Des       Date:  2002 May-Jun       Impact factor: 3.686

4.  On the use of neural network ensembles in QSAR and QSPR.

Authors:  Dimitris K Agrafiotis; Walter Cedeño; Victor S Lobanov
Journal:  J Chem Inf Comput Sci       Date:  2002 Jul-Aug

5.  QSAR and k-nearest neighbor classification analysis of selective cyclooxygenase-2 inhibitors using topologically-based numerical descriptors.

Authors:  G W Kauffman; P C Jurs
Journal:  J Chem Inf Comput Sci       Date:  2001 Nov-Dec

6.  Three-dimensional quantitative structure-activity relationship of human immunodeficiency virus (I) protease inhibitors. 2. Predictive power using limited exploration of alternate binding modes.

Authors:  T I Oprea; C L Waller; G R Marshall
Journal:  J Med Chem       Date:  1994-07-08       Impact factor: 7.446

  6 in total
  6 in total

1.  Statistical variation in progressive scrambling.

Authors:  Robert D Clark; Peter C Fox
Journal:  J Comput Aided Mol Des       Date:  2004 Jul-Sep       Impact factor: 3.686

2.  Novel approach to evolutionary neural network based descriptor selection and QSAR model development.

Authors:  Zeljko Debeljak; Viktor Marohnić; Goran Srecnik; Marica Medić-Sarić
Journal:  J Comput Aided Mol Des       Date:  2006-04-11       Impact factor: 3.686

Review 3.  Pushing the boundaries of 3D-QSAR.

Authors:  Richard D Cramer; Bernd Wendt
Journal:  J Comput Aided Mol Des       Date:  2007-01-26       Impact factor: 3.686

4.  A ligand's-eye view of protein binding.

Authors:  Robert D Clark
Journal:  J Comput Aided Mol Des       Date:  2008-01-24       Impact factor: 3.686

5.  DPRESS: Localizing estimates of predictive uncertainty.

Authors:  Robert D Clark
Journal:  J Cheminform       Date:  2009-07-14       Impact factor: 5.514

6.  Partial least Squares- least squares- Support Vector Machine Modeling of ATR-IR as a Spectrophotometric Method for Detection and Determination of Iron in Pharmaceutical Formulations.

Authors:  Elahehnaz Parhizkar; Hadi Saeedzadeh; Fatemeh Ahmadi; Mohammad Ghazali; Amirhossein Sakhteman
Journal:  Iran J Pharm Res       Date:  2019       Impact factor: 1.696

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.