Literature DB >> 18956832

Toward an improved clustering of large data sets using maximum common substructures and topological fingerprints.

Alexander Böcker1.   

Abstract

A new clustering algorithm was developed that is able to group large data sets with more than 100,000 molecules according to their chemotypes. The algorithm preclusters a data set using a fingerprint version of the hierarchical k-means algorithm. Chemotypes are extracted from the terminal clusters via a maximum common substructure approach. Molecules forming a chemotype have to share a predefined number of rings, atoms, and non-carbon heavy atoms. In an iterative procedure, similar chemotypes and singletons are fused to larger chemotypes. Singletons that cannot be assigned to any chemotype are then grouped based on the proportion of overlap between the molecules. Representatives from each chemotype and the singletons are used in a second round of the hierarchical k-means algorithm to provide a final hierarchical grouping. Results are reported to an interactive graphical user interface which allows initial insights about the structure activity relationship (SAR) of the molecules. Example applications are shown for two chemotypes of reverse transcriptase inhibitors in the MDDR database and for the evaluation of descriptor-based similarity searching routines. A special focus was laid on the chemotype hopping potential of each individual routine. The algorithm will allow the analysis of high-throughput and virtual screening results with improved quality.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18956832     DOI: 10.1021/ci8000887

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  4 in total

1.  The development of a knowledge base for basic active structures: an example case of dopamine agonists.

Authors:  Takashi Okada; Masumi Yamakawa; Norihito Ohmori; Sachio Mori; Hiroshi Horikawa; Taketo Hayashi; Satoshi Fujishima
Journal:  Chem Cent J       Date:  2010-01-23       Impact factor: 4.215

2.  Extracting SAR Information from a Large Collection of Anti-Malarial Screening Hits by NSG-SPT Analysis.

Authors:  Mathias Wawer; Jürgen Bajorath
Journal:  ACS Med Chem Lett       Date:  2011-01-05       Impact factor: 4.345

3.  Discovery of novel polyamine analogs with anti-protozoal activity by computer guided drug repositioning.

Authors:  Lucas N Alberca; María L Sbaraglini; Darío Balcazar; Laura Fraccaroli; Carolina Carrillo; Andrea Medeiros; Diego Benitez; Marcelo Comini; Alan Talevi
Journal:  J Comput Aided Mol Des       Date:  2016-02-18       Impact factor: 3.686

4.  The CARLSBAD database: a confederated database of chemical bioactivities.

Authors:  Stephen L Mathias; Jarrett Hines-Kay; Jeremy J Yang; Gergely Zahoransky-Kohalmi; Cristian G Bologa; Oleg Ursu; Tudor I Oprea
Journal:  Database (Oxford)       Date:  2013-06-21       Impact factor: 3.451

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.