Literature DB >> 17125166

NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening.

Alexander Böcker1, Gisbert Schneider, Andreas Teckentrup.   

Abstract

A hierarchical clustering algorithm--NIPALSTREE--was developed that is able to analyze large data sets in high-dimensional space. The result can be displayed as a dendrogram. At each tree level the algorithm projects a data set via principle component analysis onto one dimension. The data set is sorted according to this one dimension and split at the median position. To avoid distortion of clusters at the median position, the algorithm identifies a potentially more suited split point left or right of the median. The procedure is recursively applied on the resulting subsets until the maximal distance between cluster members exceeds a user-defined threshold. The approach was validated in a retrospective screening study for angiotensin converting enzyme (ACE) inhibitors. The resulting clusters were assessed for their purity and enrichment in actives belonging to this ligand class. Enrichment was observed in individual branches of the dendrogram. In further retrospective virtual screening studies employing the MDL Drug Data Report (MDDR), COBRA, and the SPECS catalog, NIPALSTREE was compared with the hierarchical k-means clustering approach. Results show that both algorithms can be used in the context of virtual screening. Intersecting the result lists obtained with both algorithms improved enrichment factors while losing only few chemotypes.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17125166     DOI: 10.1021/ci050541d

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  5 in total

1.  The development of a knowledge base for basic active structures: an example case of dopamine agonists.

Authors:  Takashi Okada; Masumi Yamakawa; Norihito Ohmori; Sachio Mori; Hiroshi Horikawa; Taketo Hayashi; Satoshi Fujishima
Journal:  Chem Cent J       Date:  2010-01-23       Impact factor: 4.215

2.  TSCC: Two-Stage Combinatorial Clustering for virtual screening using protein-ligand interactions and physicochemical features.

Authors:  Daniel L Clinciu; Yen-Fu Chen; Cheng-Neng Ko; Chi-Chun Lo; Jinn-Moon Yang
Journal:  BMC Genomics       Date:  2010-12-02       Impact factor: 3.969

3.  IVSPlat 1.0: an integrated virtual screening platform with a molecular graphical interface.

Authors:  Yin Xue Sun; Yan Xin Huang; Feng Li Li; Hong Yan Wang; Cong Fan; Yong Li Bao; Lu Guo Sun; Zhi Qiang Ma; Jun Kong; Yu Xin Li
Journal:  Chem Cent J       Date:  2012-01-05       Impact factor: 4.215

4.  CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering.

Authors:  Cheng Zhang; Lin Tao; Chu Qin; Peng Zhang; Shangying Chen; Xian Zeng; Feng Xu; Zhe Chen; Sheng Yong Yang; Yu Zong Chen
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

5.  Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries.

Authors:  Bucong Han; Xiaohua Ma; Ruiying Zhao; Jingxian Zhang; Xiaona Wei; Xianghui Liu; Xin Liu; Cunlong Zhang; Chunyan Tan; Yuyang Jiang; Yuzong Chen
Journal:  Chem Cent J       Date:  2012-11-23       Impact factor: 4.215

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.