Literature DB >> 12870914

Profile scaling increases the similarity search performance of molecular fingerprints containing numerical descriptors and structural keys.

Ling Xue1, Jeffrey W Godden, Florence L Stahura, Jürgen Bajorath.   

Abstract

The concept of compound class-specific profiling and scaling of molecular fingerprints for similarity searching is discussed and applied to newly designed fingerprint representations. The approach is based on the analysis of characteristic patterns of bits in keyed fingerprints that are set on in compounds having equivalent biological activity. Once a fingerprint profile is generated for a particular activity class, scaling factors that are weighted according to observed bit frequencies are applied to signature bit positions when searching for similar compounds. In systematic similarity search calculations over 23 diverse activity classes, profile scaling consistently increased the performance of fingerprints containing property descriptors and/or structural keys. A significant improvement of approximately 15% was observed for a new fingerprint consisting of binary encoded molecular property descriptors and structural keys. Under scaling conditions, this fingerprint, termed MP-MFP, correctly recognized on average close to 60% of all active test compounds, with only a few false positives. MP-MFP outperformed MACCS keys and other reference fingerprints. In general, optimum performance in scaling calculations was achieved at higher threshold values of the Tanimoto coefficient than in nonscaled calculations, thereby increasing the search selectivity. In general, putting relatively high weight on signature bit positions that were always, or almost always, set on was found to be the most effective scaling procedure. Analysis of class-specific search performance revealed that profile scaling of MP-MFP improved the similarity search results for each of the 23 activity classes.

Year:  2003        PMID: 12870914     DOI: 10.1021/ci030287u

Source DB:  PubMed          Journal:  J Chem Inf Comput Sci        ISSN: 0095-2338


  12 in total

1.  Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.

Authors:  S Joshua Swamidass; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2007-02-28       Impact factor: 4.956

2.  Lossless compression of chemical fingerprints using integer entropy codes improves storage and retrieval.

Authors:  Pierre Baldi; Ryan W Benz; Daniel S Hirschberg; S Joshua Swamidass
Journal:  J Chem Inf Model       Date:  2007-10-30       Impact factor: 4.956

3.  Hashing algorithms and data structures for rapid searches of fingerprint vectors.

Authors:  Ramzi Nasr; Daniel S Hirschberg; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2010-08-23       Impact factor: 4.956

4.  When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values.

Authors:  Pierre Baldi; Ramzi Nasr
Journal:  J Chem Inf Model       Date:  2010-07-26       Impact factor: 4.956

5.  Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods.

Authors:  Ramzi Nasr; Rares Vernica; Chen Li; Pierre Baldi
Journal:  J Chem Inf Model       Date:  2012-04-10       Impact factor: 4.956

6.  Comprehensive structural and functional characterization of the human kinome by protein structure modeling and ligand virtual screening.

Authors:  Michal Brylinski; Jeffrey Skolnick
Journal:  J Chem Inf Model       Date:  2010-10-25       Impact factor: 4.956

7.  The utility of geometrical and chemical restraint information extracted from predicted ligand-binding sites in protein structure refinement.

Authors:  Michal Brylinski; Seung Yup Lee; Hongyi Zhou; Jeffrey Skolnick
Journal:  J Struct Biol       Date:  2010-09-17       Impact factor: 2.867

8.  Turbo prediction: a new approach for bioactivity prediction.

Authors:  Ammar Abdo; Maude Pupin
Journal:  J Comput Aided Mol Des       Date:  2022-01-21       Impact factor: 3.686

9.  Large scale study of multiple-molecule queries.

Authors:  Ramzi J Nasr; S Joshua Swamidass; Pierre F Baldi
Journal:  J Cheminform       Date:  2009-06-04       Impact factor: 5.514

10.  BLASTing small molecules--statistics and extreme statistics of chemical similarity scores.

Authors:  Pierre Baldi; Ryan W Benz
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.