Literature DB >> 12549674

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Alexander Golbraikh1, Alexander Tropsha.   

Abstract

One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12549674     DOI: 10.1023/a:1021372108686

Source DB:  PubMed          Journal:  Mol Divers        ISSN: 1381-1991            Impact factor:   2.943


  39 in total

1.  Modeling antimalarial activity: application of Kinetic Energy Density Quantum Similarity Measures as descriptors in QSAR.

Authors:  X Gironés; A Gallegos; R Carbó-Dorca
Journal:  J Chem Inf Comput Sci       Date:  2000 Nov-Dec

2.  Beyond mere diversity: tailoring combinatorial libraries for drug discovery.

Authors:  E J Martin; R E Critchlow
Journal:  J Comb Chem       Date:  1999-01

3.  Novel variable selection quantitative structure--property relationship approach based on the k-nearest-neighbor principle

Authors: 
Journal:  J Chem Inf Comput Sci       Date:  2000-01

4.  Novel chirality descriptors derived from molecular topology.

Authors:  A Golbraikh; D Bonchev; A Tropsha
Journal:  J Chem Inf Comput Sci       Date:  2001 Jan-Feb

5.  Toward an optimal procedure for variable selection and QSAR model building.

Authors:  A Yasri; D Hartsough
Journal:  J Chem Inf Comput Sci       Date:  2001 Sep-Oct

6.  Minimum analogue peptide sets (MAPS) for quantitative structure-activity relationships.

Authors:  S Hellberg; L Eriksson; J Jonsson; F Lindgren; M Sjöström; B Skagerberg; S Wold; P Andrews
Journal:  Int J Pept Protein Res       Date:  1991-05

7.  Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent effects and validation as a powerful tool in receptor-based drug design.

Authors:  C Pérez; M Pastor; A R Ortiz; F Gago
Journal:  J Med Chem       Date:  1998-03-12       Impact factor: 7.446

8.  A molecular modeling and 3D QSAR study of a large series of indole inhibitors of human non-pancreatic secretory phospholipase A2.

Authors:  P Bernard; M Pintore; J Y Berthon; J R Chrétien
Journal:  Eur J Med Chem       Date:  2001-01       Impact factor: 6.514

9.  Derivation of a three-dimensional pharmacophore model of substance P antagonists bound to the neurokinin-1 receptor.

Authors:  Y Takeuchi; E F Shands; D D Beusen; G R Marshall
Journal:  J Med Chem       Date:  1998-09-10       Impact factor: 7.446

10.  Quantitative structure-antitumor activity relationships of camptothecin analogues: cluster analysis and genetic algorithm-based studies.

Authors:  Y Fan; L M Shi; K W Kohn; Y Pommier; J N Weinstein
Journal:  J Med Chem       Date:  2001-09-27       Impact factor: 7.446

View more
  35 in total

1.  Validation subset selections for extrapolation oriented QSPAR models.

Authors:  Csaba Szántai-Kis; István Kövesdi; György Kéri; László Orfi
Journal:  Mol Divers       Date:  2003       Impact factor: 2.943

2.  Novel coumarin-based tyrosinase inhibitors discovered by OECD principles-validated QSAR approach from an enlarged, balanced database.

Authors:  Huong Le-Thi-Thu; Gerardo M Casañola-Martín; Yovani Marrero-Ponce; Antonio Rescigno; Luciano Saso; Virinder S Parmar; Francisco Torrens; Concepción Abad
Journal:  Mol Divers       Date:  2010-09-03       Impact factor: 2.943

3.  A novel RBF neural network training methodology to predict toxicity to Vibrio fischeri.

Authors:  Georgia Melagraki; Antreas Afantitis; Haralambos Sarimveis; Olga Igglessi-Markopoulou; Alex Alexandridis
Journal:  Mol Divers       Date:  2006-06-27       Impact factor: 2.943

4.  Investigation of substituent effect of 1-(3,3-diphenylpropyl)-piperidinyl phenylacetamides on CCR5 binding affinity using QSAR and virtual screening techniques.

Authors:  Antreas Afantitis; Georgia Melagraki; Haralambos Sarimveis; Panayiotis A Koutentis; John Markopoulos; Olga Igglessi-Markopoulou
Journal:  J Comput Aided Mol Des       Date:  2006-05-09       Impact factor: 3.686

5.  Chemometric analysis of ligand receptor complementarity: identifying Complementary Ligands Based on Receptor Information (CoLiBRI).

Authors:  Scott Oloff; Shuxing Zhang; Nagamani Sukumar; Curt Breneman; Alexander Tropsha
Journal:  J Chem Inf Model       Date:  2006 Mar-Apr       Impact factor: 4.956

6.  Novel semi-automated methodology for developing highly predictive QSAR models: application for development of QSAR models for insect repellent amides.

Authors:  Jayendra B Bhonsle; Apurba K Bhattacharjee; Raj K Gupta
Journal:  J Mol Model       Date:  2006-09-20       Impact factor: 1.810

7.  Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening.

Authors:  Jui-Hua Hsieh; Xiang S Wang; Denise Teotico; Alexander Golbraikh; Alexander Tropsha
Journal:  J Comput Aided Mol Des       Date:  2008-03-13       Impact factor: 3.686

8.  A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs.

Authors:  Antreas Afantitis; Georgia Melagraki; Haralambos Sarimveis; Panayiotis A Koutentis; Olga Igglessi-Markopoulou; George Kollias
Journal:  Mol Divers       Date:  2009-05-30       Impact factor: 2.943

9.  2D binary QSAR modeling of LPA3 receptor antagonism.

Authors:  James I Fells; Ryoko Tsukahara; Jianxiong Liu; Gabor Tigyi; Abby L Parrill
Journal:  J Mol Graph Model       Date:  2010-03-07       Impact factor: 2.518

10.  A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

Authors:  Antreas Afantitis; Georgia Melagraki; Haralambos Sarimveis; Panayiotis A Koutentis; John Markopoulos; Olga Igglessi-Markopoulou
Journal:  Mol Divers       Date:  2006-08-01       Impact factor: 2.943

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.