Literature DB >> 9484498

Random or rational design? Evaluation of diverse compound subsets from chemical structure databases.

T Pötter1, H Matter.   

Abstract

The performance of rational design to maximize the structural diversity of databases for lead finding and lead refinement was investigated. Rational methods such as maximum dissimilarity methods or hierarchical cluster analysis for designing compound subsets were compared to a random approach to study their efficiency for an enhancement of the diversity of three different databases. All investigations were done based on 2D fingerprints as a validated molecular descriptor. To compare the performance of the rational selection methods to a random approach, we additionally used probability calculations. When using maximum dissimilarity-based selections, a single compound can be a member of different neighborhoods as defined by the similarity threshold value, while in hierarchical clustering each compound is assigned to only a single cluster. Therefore the relationship between the similarity threshold of the maximum diversity selection method and a 2D similarity search threshold was studied. In contrast to hierarchical clustering analysis, maximum dissimilarity selections allow to use a similarity threshold for adding a new compound to an already selected compound list. Reasonable values for this similarity threshold are presented here. More diverse subsets were designed using maximum dissimilarity selections, which cover more biological classes than using random selections. An optimally diverse subset without redundant structures containing only 38% of one original dataset was generated, where no structure is more similar than 0.85 to its nearest neighbor, but all biological classes were represented. When it is acceptable to cover only 90% of all biological targets, 3.5-3.7 times more compounds need to be selected using a random approach than in a rational design approach. Such coverage rate shows the highest efficiency of design techniques compared to a random approach. In those subsets no compound is closer than 0.70 to its nearest neighbor. Furthermore a comparative molecular field analysis (CoMFA) is used to evaluate designed and randomly chosen subsets for a database consisting of inhibitors of the angiotensin-converting enzyme. It was shown that designed subsets using maximum dissimilarity methods lead to more stable quantitative structure-activity relationship (QSAR) models with higher predictive power compared to randomly chosen compounds. This predictive power is especially high when there is no compound in the test dataset with a similarity coefficient less than 0.7 to its nearest neighbor in the training set.

Mesh:

Year:  1998        PMID: 9484498     DOI: 10.1021/jm9700878

Source DB:  PubMed          Journal:  J Med Chem        ISSN: 0022-2623            Impact factor:   7.446


  8 in total

1.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Authors:  Alexander Golbraikh; Alexander Tropsha
Journal:  J Comput Aided Mol Des       Date:  2002 May-Jun       Impact factor: 3.686

2.  Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries.

Authors:  Fabien Fontaine; Manuel Pastor; Hugo Gutiérrez-de-Terán; Juan J Lozano; Ferran Sanz
Journal:  Mol Divers       Date:  2003       Impact factor: 2.943

3.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Authors:  Alexander Golbraikh; Alexander Tropsha
Journal:  Mol Divers       Date:  2002       Impact factor: 2.943

4.  Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections.

Authors:  Paul A Clemons; J Anthony Wilson; Vlado Dančík; Sandrine Muller; Hyman A Carrinski; Bridget K Wagner; Angela N Koehler; Stuart L Schreiber
Journal:  Proc Natl Acad Sci U S A       Date:  2011-04-11       Impact factor: 11.205

5.  Toward the computer-aided discovery of FabH inhibitors. Do predictive QSAR models ensure high quality virtual screening performance?

Authors:  Yunierkis Pérez-Castillo; Maykel Cruz-Monteagudo; Cosmin Lazar; Jonatan Taminau; Mathy Froeyen; Miguel Angel Cabrera-Pérez; Ann Nowé
Journal:  Mol Divers       Date:  2014-03-27       Impact factor: 2.943

6.  Application of a sparse matrix design strategy to the synthesis of dos libraries.

Authors:  Lakshmi B Akella; Lisa A Marcaurelle
Journal:  ACS Comb Sci       Date:  2011-04-28       Impact factor: 3.784

7.  A Multi-Objective Approach for Anti-Osteosarcoma Cancer Agents Discovery through Drug Repurposing.

Authors:  Alejandro Cabrera-Andrade; Andrés López-Cortés; Gabriela Jaramillo-Koupermann; Humberto González-Díaz; Alejandro Pazos; Cristian R Munteanu; Yunierkis Pérez-Castillo; Eduardo Tejera
Journal:  Pharmaceuticals (Basel)       Date:  2020-11-22

8.  Predicting relative efficiency of amide bond formation using multivariate linear regression.

Authors:  Brittany C Haas; Adam E Goetz; Ana Bahamonde; J Christopher McWilliams; Matthew S Sigman
Journal:  Proc Natl Acad Sci U S A       Date:  2022-04-11       Impact factor: 12.779

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.