Literature DB >> 12444722

Reoptimization of MDL keys for use in drug discovery.

Joseph L Durant1, Burton A Leland, Douglas R Henry, James G Nourse.   

Abstract

For a number of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in molecular similarity. Classification performance for a test data set of 957 compounds was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset containing 208 bits and 0.71 for a genetic algorithm optimized keyset containing 548 bits. We present an overview of the underlying technology supporting the definition of descriptors and the encoding of these descriptors into keysets. This technology allows definition of descriptors as combinations of atom properties, bond properties, and atomic neighborhoods at various topological separations as well as supporting a number of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodology developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning experiment highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.

Mesh:

Year:  2002        PMID: 12444722     DOI: 10.1021/ci010132r

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  233 in total

1.  Drug-drug interaction through molecular structure similarity analysis.

Authors:  Santiago Vilar; Rave Harpaz; Eugenio Uriarte; Lourdes Santana; Raul Rabadan; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2012-05-30       Impact factor: 4.497

2.  Quantifying and predicting the promiscuity and isoform specificity of small-molecule cytochrome P450 inhibitors.

Authors:  Abhinav Nath; Michael A Zientek; Benjamin J Burke; Ying Jiang; William M Atkins
Journal:  Drug Metab Dispos       Date:  2010-09-14       Impact factor: 3.922

3.  IGERS: inferring Gibbs energy changes of biochemical reactions from reaction similarities.

Authors:  Kristian Rother; Sabrina Hoffmann; Sascha Bulik; Andreas Hoppe; Johann Gasteiger; Herrmann-Georg Holzhütter
Journal:  Biophys J       Date:  2010-06-02       Impact factor: 4.033

4.  Activity cliffs and activity cliff generators based on chemotype-related activity landscapes.

Authors:  Jaime Pérez-Villanueva; Oscar Méndez-Lucio; Olivia Soria-Arteche; José L Medina-Franco
Journal:  Mol Divers       Date:  2015-07-07       Impact factor: 2.943

5.  An automated PLS search for biologically relevant QSAR descriptors.

Authors:  Marius Olah; Cristian Bologa; Tudor I Oprea
Journal:  J Comput Aided Mol Des       Date:  2004 Jul-Sep       Impact factor: 3.686

6.  Variable selection and model validation of 2D and 3D molecular descriptors.

Authors:  Anthony Nicholls; Norah E MacCuish; John D MacCuish
Journal:  J Comput Aided Mol Des       Date:  2004 Jul-Sep       Impact factor: 3.686

7.  Descriptor collision and confusion: toward the design of descriptors to mask chemical structures.

Authors:  Cristian Bologa; Tharun Kumar Allu; Marius Olah; Michael A Kappler; Tudor I Oprea
Journal:  J Comput Aided Mol Des       Date:  2005-12-02       Impact factor: 3.686

8.  Molecular shape and electrostatics in the encoding of relevant chemical information.

Authors:  Anthony Nicholls; J Andrew Grant
Journal:  J Comput Aided Mol Des       Date:  2005-11-23       Impact factor: 3.686

9.  Shallow Representation Learning via Kernel PCA Improves QSAR Modelability.

Authors:  Stefano E Rensi; Russ B Altman
Journal:  J Chem Inf Model       Date:  2017-08-07       Impact factor: 4.956

Review 10.  A review of mathematical representations of biomolecular data.

Authors:  Duc Duy Nguyen; Zixuan Cang; Guo-Wei Wei
Journal:  Phys Chem Chem Phys       Date:  2020-02-26       Impact factor: 3.676

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.