Literature DB >> 15446830

Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance.

Andreas Bender1, Hamse Y Mussa, Robert C Glen, Stephan Reiling.   

Abstract

A molecular similarity searching technique based on atom environments, information-gain-based feature selection, and the naive Bayesian classifier has been applied to a series of diverse datasets and its performance compared to those of alternative searching methods. Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecular structure. In this application, using a recently published dataset of more than 100000 molecules from the MDL Drug Data Report database, the atom environment approach appears to outperform fusion of ranking scores as well as binary kernel discrimination, which are both used in combination with Unity fingerprints. Overall retrieval rates among the top 5% of the sorted library are nearly 10% better (more than 14% better in relative numbers) than those of the second best method, Unity fingerprints and binary kernel discrimination. In 10 out of 11 sets of active compounds the combination of atom environments and the naive Bayesian classifier appears to be the superior method, while in the remaining dataset, data fusion and binary kernel discrimination in combination with Unity fingerprints is the method of choice. Binary kernel discrimination in combination with Unity fingerprints generally comes second in performance overall. The difference in performance can largely be attributed to the different molecular descriptors used. Atom environments outperform Unity fingerprints by a large margin if the combination of these descriptors with the Tanimoto coefficient is compared. The naive Bayesian classifier in combination with information-gain-based feature selection and selection of a sensible number of features performs about as well as binary kernel discrimination in experiments where these classification methods are compared. When used on a monoaminooxidase dataset, atom environments and the naive Bayesian classifier perform as well as binary kernel discrimination in the case of a 50/50 split of training and test compounds. In the case of sparse training data, binary kernel discrimination is found to be superior on this particular dataset. On a third dataset, the atom environment descriptor shows higher retrieval rates than other 2D fingerprints tested here when used in combination with the Tanimoto similarity coefficient. Feature selection is shown to be a crucial step in determining the performance of the algorithm. The representation of molecules by atom environments is found to be more effective than Unity fingerprints for the type of biological receptor similarity calculations examined here. Combining information prior to scoring and including information about inactive compounds, as in the Bayesian classifier and binary kernel discrimination, is found to be superior to posterior data fusion (in the datasets tested here). Copyright 2004 American Chemical Society

Year:  2004        PMID: 15446830     DOI: 10.1021/ci0498719

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  78 in total

1.  Profiling diverse compounds by flux- and electrophysiology-based primary screens for inhibition of human Ether-à-go-go related gene potassium channels.

Authors:  Beiyan Zou; Haibo Yu; Joseph J Babcock; Pritam Chanda; Joel S Bader; Owen B McManus; Min Li
Journal:  Assay Drug Dev Technol       Date:  2010-12       Impact factor: 1.738

2.  Reverse engineering chemical structures from molecular descriptors: how many solutions?

Authors:  Jean-Loup Faulon; W Michael Brown; Shawn Martin
Journal:  J Comput Aided Mol Des       Date:  2005-11-03       Impact factor: 3.686

Review 3.  Evaluation of machine-learning methods for ligand-based virtual screening.

Authors:  Beining Chen; Robert F Harrison; George Papadatos; Peter Willett; David J Wood; Xiao Qing Lewell; Paulette Greenidge; Nikolaus Stiefl
Journal:  J Comput Aided Mol Des       Date:  2007-01-05       Impact factor: 3.686

Review 4.  Pushing the boundaries of 3D-QSAR.

Authors:  Richard D Cramer; Bernd Wendt
Journal:  J Comput Aided Mol Des       Date:  2007-01-26       Impact factor: 3.686

5.  Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval.

Authors:  Pierre Baldi; Ryan W Benz; Daniel S Hirschberg; S Joshua Swamidass
Journal:  J Chem Inf Model       Date:  2007-10-30       Impact factor: 4.956

6.  Exploring structure-selectivity relationships of biogenic amine GPCR antagonists using similarity searching and dynamic compound mapping.

Authors:  Ingo Vogt; Hany E A Ahmed; Jens Auer; Jürgen Bajorath
Journal:  Mol Divers       Date:  2008-03-04       Impact factor: 2.943

7.  QSAR modeling: where have you been? Where are you going to?

Authors:  Artem Cherkasov; Eugene N Muratov; Denis Fourches; Alexandre Varnek; Igor I Baskin; Mark Cronin; John Dearden; Paola Gramatica; Yvonne C Martin; Roberto Todeschini; Viviana Consonni; Victor E Kuz'min; Richard Cramer; Romualdo Benigni; Chihae Yang; James Rathman; Lothar Terfloth; Johann Gasteiger; Ann Richard; Alexander Tropsha
Journal:  J Med Chem       Date:  2014-01-06       Impact factor: 7.446

8.  Analysis and use of fragment-occurrence data in similarity-based virtual screening.

Authors:  Shereena M Arif; John D Holliday; Peter Willett
Journal:  J Comput Aided Mol Des       Date:  2009-06-18       Impact factor: 3.686

9.  On the interpretation and interpretability of quantitative structure-activity relationship models.

Authors:  Rajarshi Guha
Journal:  J Comput Aided Mol Des       Date:  2008-09-11       Impact factor: 3.686

10.  Distribution of randomly generated activity class characteristic substructures in diverse active and database compounds.

Authors:  José Batista; Jürgen Bajorath
Journal:  Mol Divers       Date:  2008-05-28       Impact factor: 2.943

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.