| Literature DB >> 10529986 |
L Xue1, J W Godden, J Bajorath.
Abstract
In an effort to identify biologically active molecules in compound databases, we have investigated similarity searching using short binary bit strings with a maximum of 54 bit positions. These "minifingerprints" (MFPs) were designed to account for the presence or absence of structural fragments and/or aromatic character, flexibility, and hydrogen-bonding capacity of molecules. MFP design was based on an analysis of distributions of molecular descriptors and structural fragments in two large compound collections. The performance of different MFPs and a reference fingerprint was tested by systematic "one-against-all" similarity searches of molecules in a database containing 364 compounds with different biological activities. For each fingerprint, the most effective similarity cutoff value was determined. An MFP accounting for only 32 structural fragments showed less than 2% false positive similarity matches and correctly assigned on average approximately 40% of the compounds with the same biological activity to a query molecule. Inclusion of three numerical two-dimensional (2D) molecular descriptors increased the performance by 15%. This MFP performed better than a complex 2D fingerprint. At a similarity cutoff value of 0.85, the 2D fingerprint totally eliminated false positives but recognized less than 10% of the compounds within the same activity class.Entities:
Mesh:
Substances:
Year: 1999 PMID: 10529986 DOI: 10.1021/ci990308d
Source DB: PubMed Journal: J Chem Inf Comput Sci ISSN: 0095-2338