Literature DB >> 10661564

An efficient projection protocol for chemical databases: singular value decomposition combined with truncated-newton minimization.

D Xie1, A Tropsha, T Schlick.   

Abstract

A rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) is presented as a first step in database analysis and design applications. The projection mapping of the compound database (described as vectors in the high-dimensional space of chemical descriptors) is based on the singular value decomposition (SVD) combined with a minimization procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments on four chemical datasets with real-valued descriptors (ranging from 58 to 27 255 compounds) show that the SVD/TNPACK projection duo achieves a reasonable accuracy in 2D, varying from 30% to about 100% of pairwise distance segments that lie within 10% of the original distances. The lowest percentages, corresponding to scaled datasets, can be made close to 100% with projections onto a 10-dimensional space. We also show that the SVD/TNPACK duo is efficient for minimizing the distance error objective function (especially for scaled datasets), and that TNPACK is much more efficient than a current popular approach of steepest descent minimization in this application context. Applications of our projection technique to similarity and diversity sampling in drug design can be envisioned.

Entities:  

Mesh:

Year:  2000        PMID: 10661564     DOI: 10.1021/ci990333j

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  5 in total

1.  Rational selection of training and test sets for the development of validated QSAR models.

Authors:  Alexander Golbraikh; Min Shen; Zhiyan Xiao; Yun-De Xiao; Kuo-Hsiung Lee; Alexander Tropsha
Journal:  J Comput Aided Mol Des       Date:  2003 Feb-Apr       Impact factor: 3.686

2.  Descriptor-based protein remote homology identification.

Authors:  Ziding Zhang; Sunil Kochhar; Martin G Grigorov
Journal:  Protein Sci       Date:  2005-01-04       Impact factor: 6.725

3.  A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

Authors:  Namhee Kim; Hin Hark Gan; Tamar Schlick
Journal:  RNA       Date:  2007-02-23       Impact factor: 4.942

4.  An incomplete Hessian Newton minimization method and its application in a chemical database problem.

Authors:  Dexuan Xie; Qin Ni
Journal:  Comput Optim Appl       Date:  2009-12       Impact factor: 2.167

5.  Amino acid patterns around disulfide bonds.

Authors:  José R F Marques; Rute R da Fonseca; Brett Drury; André Melo
Journal:  Int J Mol Sci       Date:  2010-11-18       Impact factor: 5.923

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.