Literature DB >> 12489684

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Alexander Golbraikh1, Alexander Tropsha.   

Abstract

One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12489684     DOI: 10.1023/a:1020869118689

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  39 in total

1.  Modeling antimalarial activity: application of Kinetic Energy Density Quantum Similarity Measures as descriptors in QSAR.

Authors:  X Gironés; A Gallegos; R Carbó-Dorca
Journal:  J Chem Inf Comput Sci       Date:  2000 Nov-Dec

2.  Beyond mere diversity: tailoring combinatorial libraries for drug discovery.

Authors:  E J Martin; R E Critchlow
Journal:  J Comb Chem       Date:  1999-01

3.  Novel variable selection quantitative structure--property relationship approach based on the k-nearest-neighbor principle

Authors: 
Journal:  J Chem Inf Comput Sci       Date:  2000-01

4.  Novel chirality descriptors derived from molecular topology.

Authors:  A Golbraikh; D Bonchev; A Tropsha
Journal:  J Chem Inf Comput Sci       Date:  2001 Jan-Feb

5.  Toward an optimal procedure for variable selection and QSAR model building.

Authors:  A Yasri; D Hartsough
Journal:  J Chem Inf Comput Sci       Date:  2001 Sep-Oct

6.  Minimum analogue peptide sets (MAPS) for quantitative structure-activity relationships.

Authors:  S Hellberg; L Eriksson; J Jonsson; F Lindgren; M Sjöström; B Skagerberg; S Wold; P Andrews
Journal:  Int J Pept Protein Res       Date:  1991-05

7.  Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent effects and validation as a powerful tool in receptor-based drug design.

Authors:  C Pérez; M Pastor; A R Ortiz; F Gago
Journal:  J Med Chem       Date:  1998-03-12       Impact factor: 7.446

8.  A molecular modeling and 3D QSAR study of a large series of indole inhibitors of human non-pancreatic secretory phospholipase A2.

Authors:  P Bernard; M Pintore; J Y Berthon; J R Chrétien
Journal:  Eur J Med Chem       Date:  2001-01       Impact factor: 6.514

9.  Derivation of a three-dimensional pharmacophore model of substance P antagonists bound to the neurokinin-1 receptor.

Authors:  Y Takeuchi; E F Shands; D D Beusen; G R Marshall
Journal:  J Med Chem       Date:  1998-09-10       Impact factor: 7.446

10.  Quantitative structure-antitumor activity relationships of camptothecin analogues: cluster analysis and genetic algorithm-based studies.

Authors:  Y Fan; L M Shi; K W Kohn; Y Pommier; J N Weinstein
Journal:  J Med Chem       Date:  2001-09-27       Impact factor: 7.446

View more
  56 in total

1.  Boosted leave-many-out cross-validation: the effect of training and test set diversity on PLS statistics.

Authors:  Robert D Clark
Journal:  J Comput Aided Mol Des       Date:  2003 Feb-Apr       Impact factor: 3.686

2.  Rational selection of training and test sets for the development of validated QSAR models.

Authors:  Alexander Golbraikh; Min Shen; Zhiyan Xiao; Yun-De Xiao; Kuo-Hsiung Lee; Alexander Tropsha
Journal:  J Comput Aided Mol Des       Date:  2003 Feb-Apr       Impact factor: 3.686

3.  Structure based model for the prediction of phospholipidosis induction potential of small molecules.

Authors:  Hongmao Sun; Sampada Shahane; Menghang Xia; Christopher P Austin; Ruili Huang
Journal:  J Chem Inf Model       Date:  2012-07-05       Impact factor: 4.956

Review 4.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning.

Authors:  Kevin Maik Jablonka; Daniele Ongari; Seyed Mohamad Moosavi; Berend Smit
Journal:  Chem Rev       Date:  2020-06-10       Impact factor: 60.622

5.  Measuring CAMD technique performance: a virtual screening case study in the design of validation experiments.

Authors:  Andrew C Good; Mark A Hermsmeier; S A Hindle
Journal:  J Comput Aided Mol Des       Date:  2004 Jul-Sep       Impact factor: 3.686

6.  Quantitative structure-activity relationship analysis of pyridinone HIV-1 reverse transcriptase inhibitors using the k nearest neighbor method and QSAR-based database mining.

Authors:  Jose Luis Medina-Franco; Alexander Golbraikh; Scott Oloff; Rafael Castillo; Alexander Tropsha
Journal:  J Comput Aided Mol Des       Date:  2005-04       Impact factor: 3.686

7.  Development of quantitative structure-binding affinity relationship models based on novel geometrical chemical descriptors of the protein-ligand interfaces.

Authors:  Shuxing Zhang; Alexander Golbraikh; Alexander Tropsha
Journal:  J Med Chem       Date:  2006-05-04       Impact factor: 7.446

8.  Chemocentric informatics approach to drug discovery: identification and experimental validation of selective estrogen receptor modulators as ligands of 5-hydroxytryptamine-6 receptors and as potential cognition enhancers.

Authors:  Rima Hajjo; Vincent Setola; Bryan L Roth; Alexander Tropsha
Journal:  J Med Chem       Date:  2012-06-11       Impact factor: 7.446

9.  De novo design of N-(pyridin-4-ylmethyl)aniline derivatives as KDR inhibitors: 3D-QSAR, molecular fragment replacement, protein-ligand interaction fingerprint, and ADMET prediction.

Authors:  Yanmin Zhang; Haichun Liu; Yu Jiao; Haoliang Yuan; Fengxiao Wang; Shuai Lu; Sihui Yao; Zhipeng Ke; Wenting Tai; Yulei Jiang; Yadong Chen; Tao Lu
Journal:  Mol Divers       Date:  2012-10-23       Impact factor: 2.943

Review 10.  Integrative approaches for predicting in vivo effects of chemicals from their structural descriptors and the results of short-term biological assays.

Authors:  Yen Sia Low; Alexander Yeugenyevich Sedykh; Ivan Rusyn; Alexander Tropsha
Journal:  Curr Top Med Chem       Date:  2014       Impact factor: 3.295

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.