Literature DB >> 25475496

Euclidean chemical spaces from molecular fingerprints: Hamming distance and Hempel's ravens.

Eric Martin1, Eddie Cao.   

Abstract

Molecules are often characterized by sparse binary fingerprints, where 1s represent the presence of substructures and 0s represent their absence. Fingerprints are especially useful for similarity calculations, such as database searching or clustering, generally measuring similarity as the Tanimoto coefficient. In other cases, such as visualization, design of experiments, or latent variable regression, a low-dimensional Euclidian "chemical space" is more useful, where proximity between points reflects chemical similarity. A temptation is to apply principal components analysis (PCA) directly to these fingerprints to obtain a low dimensional continuous chemical space. However, Gower has shown that distances from PCA on bit vectors are proportional to the square root of Hamming distance. Unlike Tanimoto similarity, Hamming similarity (HS) gives equal weight to shared 0s as to shared 1s, that is, HS gives as much weight to substructures that neither molecule contains, as to substructures which both molecules contain. Illustrative examples show that proximity in the corresponding chemical space reflects mainly similar size and complexity rather than shared chemical substructures. These spaces are ill-suited for visualizing and optimizing coverage of chemical space, or as latent variables for regression. A more suitable alternative is shown to be Multi-dimensional scaling on the Tanimoto distance matrix, which produces a space where proximity does reflect structural similarity.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25475496     DOI: 10.1007/s10822-014-9819-y

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  8 in total

1.  Beyond mere diversity: tailoring combinatorial libraries for drug discovery.

Authors:  E J Martin; R E Critchlow
Journal:  J Comb Chem       Date:  1999-01

2.  Visualizing substructural fingerprints.

Authors:  R D Clark; D E Patterson; F Soltanshahi; J F Blake; J B Matthew
Journal:  J Mol Graph Model       Date:  2000 Aug-Oct       Impact factor: 2.518

3.  A modified update rule for stochastic proximity embedding.

Authors:  Dmitrii N Rassokhin; Dimitris K Agrafiotis
Journal:  J Mol Graph Model       Date:  2003-11       Impact factor: 2.518

4.  Chemotography for multi-target SAR analysis in the context of biological pathways.

Authors:  Eugen Lounkine; Peter Kutchukian; Paula Petrone; John W Davies; Meir Glick
Journal:  Bioorg Med Chem       Date:  2012-02-20       Impact factor: 3.641

5.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets.

Authors:  P Demartines; J Herault
Journal:  IEEE Trans Neural Netw       Date:  1997

6.  A Computer Program for Classifying Plants.

Authors:  D J Rogers; T T Tanimoto
Journal:  Science       Date:  1960-10-21       Impact factor: 47.728

7.  Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets.

Authors:  Roberto Todeschini; Viviana Consonni; Hua Xiang; John Holliday; Massimo Buscema; Peter Willett
Journal:  J Chem Inf Model       Date:  2012-11-07       Impact factor: 4.956

8.  Measuring diversity: experimental design of combinatorial libraries for drug discovery.

Authors:  E J Martin; J M Blaney; M A Siani; D C Spellmeyer; A K Wong; W H Moos
Journal:  J Med Chem       Date:  1995-04-28       Impact factor: 7.446

  8 in total
  5 in total

1.  Binding site characterization - similarity, promiscuity, and druggability.

Authors:  Christiane Ehrt; Tobias Brinkjost; Oliver Koch
Journal:  Medchemcomm       Date:  2019-06-06       Impact factor: 3.597

2.  Statistics in molecular modeling: a summary.

Authors:  Anthony Nicholls
Journal:  J Comput Aided Mol Des       Date:  2016-03-21       Impact factor: 3.686

3.  The rcdk and cluster R packages applied to drug candidate selection.

Authors:  Adrian Voicu; Narcis Duteanu; Mirela Voicu; Daliborca Vlad; Victor Dumitrascu
Journal:  J Cheminform       Date:  2020-01-20       Impact factor: 5.514

4.  Analysis of Solar Irradiation Time Series Complexity and Predictability by Combining Kolmogorov Measures and Hamming Distance for La Reunion (France).

Authors:  Dragutin T Mihailović; Miloud Bessafi; Sara Marković; Ilija Arsenić; Slavica Malinović-Milićević; Patrick Jeanty; Mathieu Delsaut; Jean-Pierre Chabriat; Nusret Drešković; Anja Mihailović
Journal:  Entropy (Basel)       Date:  2018-08-01       Impact factor: 2.524

5.  MAIP: a web service for predicting blood-stage malaria inhibitors.

Authors:  Nicolas Bosc; Eloy Felix; Ricardo Arcila; David Mendez; Martin R Saunders; Darren V S Green; Jason Ochoada; Anang A Shelat; Eric J Martin; Preeti Iyer; Ola Engkvist; Andreas Verras; James Duffy; Jeremy Burrows; J Mark F Gardner; Andrew R Leach
Journal:  J Cheminform       Date:  2021-02-22       Impact factor: 5.514

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.