| Literature DB >> 19601605 |
Pierre Baldi1, Daniel S Hirschberg.
Abstract
Bounds on distances or similarity measures can be useful to help search large databases efficiently. Here we consider the case of large databases of small molecules represented by molecular fingerprint vectors with the Tanimoto similarity measure. We derive a new intersection inequality which provides a bound on the Tanimoto similarity between two fingerprint vectors and show that this bound is considerably sharper than the bound associated with the triangle inequality of the Tanimoto distance. The inequality can be applied to other intersection-based similarity measures. We introduce a new integer representation which relies on partitioning the fingerprint components, for instance by taking components modulo some integer M and reporting the total number of 1-bits falling in each partition. We show how the intersection inequality can be generalized immediately to these integer representations and used to search large databases of binary fingerprint vectors efficiently.Entities:
Mesh:
Year: 2009 PMID: 19601605 PMCID: PMC2758932 DOI: 10.1021/ci900133j
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956