| Literature DB >> 33430987 |
Adrian Voicu1, Narcis Duteanu2, Mirela Voicu3, Daliborca Vlad4, Victor Dumitrascu4.
Abstract
The aim of this article is to show how thevpower of statistics and cheminformatics can be combined, in R, using two packages: rcdk and cluster.We describe the role of clustering methods for identifying similar structures in a group of 23 molecules according to their fingerprints. The most commonly used method is to group the molecules using a "score" obtained by measuring the average distance between them. This score reflects the similarity/non-similarity between compounds and helps us identify active or potentially toxic substances through predictive studies.Clustering is the process by which the common characteristics of a particular class of compounds are identified. For clustering applications, we are generally measure the molecular fingerprint similarity with the Tanimoto coefficient. Based on the molecular fingerprints, we calculated the molecular distances between the methotrexate molecule and the other 23 molecules in the group, and organized them into a matrix. According to the molecular distances and Ward 's method, the molecules were grouped into 3 clusters. We can presume structural similarity between the compounds and their locations in the cluster map. Because only 5 molecules were included in the methotrexate cluster, we considered that they might have similar properties and might be further tested as potential drug candidates.Entities:
Keywords: Clusters; Cytostatic; Molecular fingerprint; Rcdk
Year: 2020 PMID: 33430987 PMCID: PMC6970292 DOI: 10.1186/s13321-019-0405-0
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Methotrexate molecule visualisation in R
Fig. 2Molecule set visualization
Fig. 3Estimation of the optimal number of clusters
Fig. 4Dendrogram−hierarchical clustering using Ward’s method
Fig. 5Polygonal clusters
Fig. 6Molecules similar to methotrexate
Fig. 7Silhouette plot for K-means clustering