Literature DB >> 27230477

Uniqueness: skews bit occurrence frequencies in randomly generated fingerprint libraries.

Nelson G Chen1.   

Abstract

Requiring that randomly generated chemical fingerprint libraries have unique fingerprints such that no two fingerprints are identical causes a systematic skew in bit occurrence frequencies, the proportion at which specified bits are set. Observed frequencies (O) at which each bit is set within the resulting libraries systematically differ from frequencies at which bits are set at fingerprint generation (E). Observed frequencies systematically skew toward 0.5, with the effect being more pronounced as library size approaches the compound space, which is the total number of unique possible fingerprints given the number of bit positions each fingerprint contains. The effect is quantified for varying library sizes as a fraction of the overall compound space, and for changes in the specified frequency E. The cause and implications for this systematic skew are subsequently discussed. When generating random libraries of chemical fingerprints, the imposition of a uniqueness requirement should either be avoided or taken into account.

Keywords:  Algorithm; Fingerprints; Molecular modeling; Simulation

Mesh:

Substances:

Year:  2016        PMID: 27230477     DOI: 10.1007/s11030-016-9674-y

Source DB:  PubMed          Journal:  Mol Divers        ISSN: 1381-1991            Impact factor:   2.943


  7 in total

1.  Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients

Authors: 
Journal:  J Chem Inf Comput Sci       Date:  2000-01

Review 2.  Similarity searching using 2D structural fingerprints.

Authors:  Peter Willett
Journal:  Methods Mol Biol       Date:  2011

3.  Novel Algorithms for the Identification of Biologically Informative Chemical Diversity Metrics.

Authors:  Bhargav Theertham; Jenna L Wang; Jianwen Fang; Gerald H Lushington
Journal:  Curr Comput Aided Drug Des       Date:  2008-03-01       Impact factor: 1.606

4.  Structural Key Bit Occurrence Frequencies and Dependencies in PubChem and Their Effect on Similarity Searches.

Authors:  Nelson G Chen; Val Golovlev
Journal:  Mol Inform       Date:  2013-04-11       Impact factor: 3.353

5.  SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints.

Authors:  Yasuo Tabei; Koji Tsuda
Journal:  Mol Inform       Date:  2011-07-12       Impact factor: 3.353

6.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Authors:  Dávid Bajusz; Anita Rácz; Károly Héberger
Journal:  J Cheminform       Date:  2015-05-20       Impact factor: 5.514

7.  BLASTing small molecules--statistics and extreme statistics of chemical similarity scores.

Authors:  Pierre Baldi; Ryan W Benz
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.