Literature DB >> 27467412

SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints.

Yasuo Tabei1, Koji Tsuda2,3.   

Abstract

Similarity networks of ligands are often reported useful in predicting chemical activities and target proteins. However, the naive method of computing all pairwise similarities of chemical fingerprints takes quadratic time, which is prohibitive for large scale databases with millions of ligands. We propose a fast all pairs similarity search method, called SketchSort, that maps chemical fingerprints to symbol strings with random projections, and finds similar strings by multiple masked sorting. Due to random projection, SketchSort misses a certain fraction of neighbors (i.e., false negatives). Nevertheless, the expected fraction of false negatives is theoretically derived and can be kept under a very small value. Experiments show that SketchSort is much faster than other similarity search methods and enables us to obtain a PubChem-scale similarity network quickly.
Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Keywords:  All pairs similarity search; Multiple sorting; SketchSort

Year:  2011        PMID: 27467412     DOI: 10.1002/minf.201100050

Source DB:  PubMed          Journal:  Mol Inform        ISSN: 1868-1743            Impact factor:   3.353


  2 in total

1.  Uniqueness: skews bit occurrence frequencies in randomly generated fingerprint libraries.

Authors:  Nelson G Chen
Journal:  Mol Divers       Date:  2016-05-26       Impact factor: 2.943

2.  The chemfp project.

Authors:  Andrew Dalke
Journal:  J Cheminform       Date:  2019-12-05       Impact factor: 5.514

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.