Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Euclidean chemical spaces from molecular fingerprints: Hamming distance and Hempel's ravens.

Literature DB >> 25475496

Euclidean chemical spaces from molecular fingerprints: Hamming distance and Hempel's ravens.

Abstract

Molecules are often characterized by sparse binary fingerprints, where 1s represent the presence of substructures and 0s represent their absence. Fingerprints are especially useful for similarity calculations, such as database searching or clustering, generally measuring similarity as the Tanimoto coefficient. In other cases, such as visualization, design of experiments, or latent variable regression, a low-dimensional Euclidian "chemical space" is more useful, where proximity between points reflects chemical similarity. A temptation is to apply principal components analysis (PCA) directly to these fingerprints to obtain a low dimensional continuous chemical space. However, Gower has shown that distances from PCA on bit vectors are proportional to the square root of Hamming distance. Unlike Tanimoto similarity, Hamming similarity (HS) gives equal weight to shared 0s as to shared 1s, that is, HS gives as much weight to substructures that neither molecule contains, as to substructures which both molecules contain. Illustrative examples show that proximity in the corresponding chemical space reflects mainly similar size and complexity rather than shared chemical substructures. These spaces are ill-suited for visualizing and optimizing coverage of chemical space, or as latent variables for regression. A more suitable alternative is shown to be Multi-dimensional scaling on the Tanimoto distance matrix, which produces a space where proximity does reflect structural similarity.

Entities: Species

Mesh：

Substances：

Year: 2014 PMID： 25475496 DOI： 10.1007/s10822-014-9819-y

Source DB: PubMed Journal: J Comput Aided Mol Des ISSN： 0920-654X Impact factor: 3.686

8 in total

1. Beyond mere diversity: tailoring combinatorial libraries for drug discovery.

Authors: E J Martin; R E Critchlow
Journal: J Comb Chem Date: 1999-01

2. Visualizing substructural fingerprints.

Authors: R D Clark; D E Patterson; F Soltanshahi; J F Blake; J B Matthew
Journal: J Mol Graph Model Date: 2000 Aug-Oct Impact factor: 2.518

3. A modified update rule for stochastic proximity embedding.

Authors: Dmitrii N Rassokhin; Dimitris K Agrafiotis
Journal: J Mol Graph Model Date: 2003-11 Impact factor: 2.518

4. Chemotography for multi-target SAR analysis in the context of biological pathways.

Authors: Eugen Lounkine; Peter Kutchukian; Paula Petrone; John W Davies; Meir Glick
Journal: Bioorg Med Chem Date: 2012-02-20 Impact factor: 3.641

5. Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets.

Authors: P Demartines; J Herault
Journal: IEEE Trans Neural Netw Date: 1997

6. A Computer Program for Classifying Plants.

Authors: D J Rogers; T T Tanimoto
Journal: Science Date: 1960-10-21 Impact factor: 47.728

7. Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets.

Authors: Roberto Todeschini; Viviana Consonni; Hua Xiang; John Holliday; Massimo Buscema; Peter Willett
Journal: J Chem Inf Model Date: 2012-11-07 Impact factor: 4.956

8. Measuring diversity: experimental design of combinatorial libraries for drug discovery.

Authors: E J Martin; J M Blaney; M A Siani; D C Spellmeyer; A K Wong; W H Moos
Journal: J Med Chem Date: 1995-04-28 Impact factor: 7.446

8 in total

5 in total

1. Binding site characterization - similarity, promiscuity, and druggability.

Authors: Christiane Ehrt; Tobias Brinkjost; Oliver Koch
Journal: Medchemcomm Date: 2019-06-06 Impact factor: 3.597

2. Statistics in molecular modeling: a summary.

Authors: Anthony Nicholls
Journal: J Comput Aided Mol Des Date: 2016-03-21 Impact factor: 3.686

3. The rcdk and cluster R packages applied to drug candidate selection.

Authors: Adrian Voicu; Narcis Duteanu; Mirela Voicu; Daliborca Vlad; Victor Dumitrascu
Journal: J Cheminform Date: 2020-01-20 Impact factor: 5.514

4. Analysis of Solar Irradiation Time Series Complexity and Predictability by Combining Kolmogorov Measures and Hamming Distance for La Reunion (France).

Authors: Dragutin T Mihailović; Miloud Bessafi; Sara Marković; Ilija Arsenić; Slavica Malinović-Milićević; Patrick Jeanty; Mathieu Delsaut; Jean-Pierre Chabriat; Nusret Drešković; Anja Mihailović
Journal: Entropy (Basel) Date: 2018-08-01 Impact factor: 2.524

5. MAIP: a web service for predicting blood-stage malaria inhibitors.

Authors: Nicolas Bosc; Eloy Felix; Ricardo Arcila; David Mendez; Martin R Saunders; Darren V S Green; Jason Ochoada; Anang A Shelat; Eric J Martin; Preeti Iyer; Ola Engkvist; Andreas Verras; James Duffy; Jeremy Burrows; J Mark F Gardner; Andrew R Leach
Journal: J Cheminform Date: 2021-02-22 Impact factor: 5.514

5 in total