| Literature DB >> 18081268 |
Abstract
Differences in molecular complexity and size are known to bias the evaluation of fingerprint similarity. For example, complex molecules tend to produce fingerprints with higher bit density than simple ones, which often leads to artificially high similarity values in search calculations. We introduce here a variant of the Tversky coefficient that makes it possible to modulate or eliminate molecular complexity effects when evaluating fingerprint similarity. This has enabled us to study in detail the role of molecular complexity in similarity searching and the relationship between reference and active database compounds. Balancing complexity effects leads to constant distributions of similarity values for reference and database molecules, independent of how compound contributions are weighted. When searching for active compounds with varying complexity, hit rates can be optimized by modulating complexity effects, rather than eliminating them, and adjusting relative compound weights. For reference molecules and active database compounds having different complexity, preferred parameter settings are identified.Mesh:
Year: 2007 PMID: 18081268 DOI: 10.1021/ci700314x
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956