Literature DB >> 19601605

An intersection inequality sharper than the tanimoto triangle inequality for efficiently searching large databases.

Pierre Baldi1, Daniel S Hirschberg.   

Abstract

Bounds on distances or similarity measures can be useful to help search large databases efficiently. Here we consider the case of large databases of small molecules represented by molecular fingerprint vectors with the Tanimoto similarity measure. We derive a new intersection inequality which provides a bound on the Tanimoto similarity between two fingerprint vectors and show that this bound is considerably sharper than the bound associated with the triangle inequality of the Tanimoto distance. The inequality can be applied to other intersection-based similarity measures. We introduce a new integer representation which relies on partitioning the fingerprint components, for instance by taking components modulo some integer M and reporting the total number of 1-bits falling in each partition. We show how the intersection inequality can be generalized immediately to these integer representations and used to search large databases of binary fingerprint vectors efficiently.

Entities:  

Mesh:

Year:  2009        PMID: 19601605      PMCID: PMC2758932          DOI: 10.1021/ci900133j

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  5 in total

1.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings.

Authors:  J D Holliday; C-Y Hu; P Willett
Journal:  Comb Chem High Throughput Screen       Date:  2002-03       Impact factor: 1.339

2.  ChemDB: a public database of small molecules and related chemoinformatics resources.

Authors:  Jonathan Chen; S Joshua Swamidass; Yimeng Dou; Jocelyne Bruand; Pierre Baldi
Journal:  Bioinformatics       Date:  2005-09-20       Impact factor: 6.937

3.  Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.

Authors:  S Joshua Swamidass; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2007-02-28       Impact factor: 4.956

4.  ChemDB update--full-text search and virtual chemical space.

Authors:  Jonathan H Chen; Erik Linstead; S Joshua Swamidass; Dennis Wang; Pierre Baldi
Journal:  Bioinformatics       Date:  2007-06-28       Impact factor: 6.937

5.  Speeding up chemical database searches using a proximity filter based on the logical exclusive or.

Authors:  Pierre Baldi; Daniel S Hirschberg; Ramzi J Nasr
Journal:  J Chem Inf Model       Date:  2008-07-02       Impact factor: 4.956

  5 in total
  3 in total

1.  Hashing algorithms and data structures for rapid searches of fingerprint vectors.

Authors:  Ramzi Nasr; Daniel S Hirschberg; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2010-08-23       Impact factor: 4.956

2.  Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods.

Authors:  Ramzi Nasr; Rares Vernica; Chen Li; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2012-04-10       Impact factor: 4.956

3.  The chemfp project.

Authors:  Andrew Dalke
Journal:  J Cheminform       Date:  2019-12-05       Impact factor: 5.514

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.