Literature DB >> 14632439

Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships.

Jeffrey J Sutherland1, Lee A O'Brien, Donald F Weaver.   

Abstract

Classification methods allow for the development of structure-activity relationship models when the target property is categorical rather than continuous. We describe a classification method which fits descriptor splines to activities, with descriptors selected using a genetic algorithm. This method, which we identify as SFGA, is compared to the well-established techniques of recursive partitioning (RP) and soft independent modeling by class analogy (SIMCA) using five series of compounds: cyclooxygenase-2 (COX-2) inhibitors, benzodiazepine receptor (BZR) ligands, estrogen receptor (ER) ligands, dihydrofolate reductase (DHFR) inhibitors, and monoamine oxidase (MAO) inhibitors. Only 1-D and 2-D descriptors were used. Approximately 40% of compounds in each series were assigned to a test set, "cherry-picked" from the complete set such that they lie outside the training set as much as possible. SFGA produced models that were more predictive for all but the DHFR set, for which SIMCA was most predictive. RP gave the least predictive models for all but the MAO set. A similar trend was observed when using training and test sets to which compounds were randomly assigned and when gradually eliminating compounds from the (designed) training set. The stability of models was examined for the random and reduced sets, where stability means that classification statistics and the selected descriptors are similar for models derived from different sets. Here, SIMCA produced the most stable models, followed by SFGA and RP. We show that a consensus approach that combines all three methods outperforms the single best model for all data sets.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 14632439     DOI: 10.1021/ci034143r

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  8 in total

1.  Feature-map vectors: a new class of informative descriptors for computational drug discovery.

Authors:  Gregory A Landrum; Julie E Penzotti; Santosh Putta
Journal:  J Comput Aided Mol Des       Date:  2007-01-05       Impact factor: 3.686

2.  Counting clusters using R-NN curves.

Authors:  Rajarshi Guha; Debojyoti Dutta; David J Wild; Ting Chen
Journal:  J Chem Inf Model       Date:  2007-06-30       Impact factor: 4.956

3.  Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

Authors:  Ola Spjuth; Egon L Willighagen; Rajarshi Guha; Martin Eklund; Jarl Es Wikberg
Journal:  J Cheminform       Date:  2010-06-30       Impact factor: 5.514

4.  Mixed learning algorithms and features ensemble in hepatotoxicity prediction.

Authors:  Chin Yee Liew; Yen Ching Lim; Chun Wei Yap
Journal:  J Comput Aided Mol Des       Date:  2011-09-06       Impact factor: 3.686

5.  Exploring Structure-Activity Data Using the Landscape Paradigm.

Authors:  Rajarshi Guha
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2012-11

6.  Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization.

Authors:  Eddie Y T Ma; Christopher J F Cameron; Stefan C Kremer
Journal:  BMC Bioinformatics       Date:  2010-10-26       Impact factor: 3.169

7.  Large scale study of multiple-molecule queries.

Authors:  Ramzi J Nasr; S Joshua Swamidass; Pierre F Baldi
Journal:  J Cheminform       Date:  2009-06-04       Impact factor: 5.514

8.  A novel descriptor based on atom-pair properties.

Authors:  Masataka Kuroda
Journal:  J Cheminform       Date:  2017-01-05       Impact factor: 5.514

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.