Literature DB >> 19266481

Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches.

Matthias Rupp1, Petra Schneider, Gisbert Schneider.   

Abstract

Measuring the (dis)similarity of molecules is important for many cheminformatics applications like compound ranking, clustering, and property prediction. In this work, we focus on real-valued vector representations of molecules (as opposed to the binary spaces of fingerprints). We demonstrate the influence which the choice of (dis)similarity measure can have on results, and provide recommendations for such choices. We review the mathematical concepts used to measure (dis)similarity in vector spaces, namely norms, metrics, inner products, and, similarity coefficients, as well as the relationships between them, employing (dis)similarity measures commonly used in cheminformatics as examples. We present several phenomena (empty space phenomenon, sphere volume related phenomena, distance concentration) in high-dimensional descriptor spaces which are not encountered in two and three dimensions. These phenomena are theoretically characterized and illustrated on both artificial and real (bioactivity) data. 2009 Wiley Periodicals, Inc.

Mesh:

Substances:

Year:  2009        PMID: 19266481     DOI: 10.1002/jcc.21218

Source DB:  PubMed          Journal:  J Comput Chem        ISSN: 0192-8651            Impact factor:   3.376


  7 in total

Review 1.  Virtual screening: an endless staircase?

Authors:  Gisbert Schneider
Journal:  Nat Rev Drug Discov       Date:  2010-04       Impact factor: 84.694

2.  Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus.

Authors:  Daniel Reker; Tiago Rodrigues; Petra Schneider; Gisbert Schneider
Journal:  Proc Natl Acad Sci U S A       Date:  2014-03-03       Impact factor: 11.205

3.  Spherical harmonics coefficients for ligand-based virtual screening of cyclooxygenase inhibitors.

Authors:  Quan Wang; Kerstin Birod; Carlo Angioni; Sabine Grösch; Tim Geppert; Petra Schneider; Matthias Rupp; Gisbert Schneider
Journal:  PLoS One       Date:  2011-07-27       Impact factor: 3.240

4.  Machine learning estimates of natural product conformational energies.

Authors:  Matthias Rupp; Matthias R Bauer; Rainer Wilcken; Andreas Lange; Michael Reutlinger; Frank M Boeckler; Gisbert Schneider
Journal:  PLoS Comput Biol       Date:  2014-01-16       Impact factor: 4.475

5.  MetMaxStruct: A Tversky-Similarity-Based Strategy for Analysing the (Sub)Structural Similarities of Drugs and Endogenous Metabolites.

Authors:  Steve O'Hagan; Douglas B Kell
Journal:  Front Pharmacol       Date:  2016-08-22       Impact factor: 5.810

6.  Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model.

Authors:  Zhenqiu Liu
Journal:  Genes (Basel)       Date:  2021-02-22       Impact factor: 4.096

7.  SimCAL: a flexible tool to compute biochemical reaction similarity.

Authors:  Tadi Venkata Sivakumar; Anirban Bhaduri; Rajasekhara Reddy Duvvuru Muni; Jin Hwan Park; Tae Yong Kim
Journal:  BMC Bioinformatics       Date:  2018-07-03       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.