Literature DB >> 26441310

Get Your Atoms in Order--An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm.

Nadine Schneider1, Roger A Sayle2, Gregory A Landrum1.   

Abstract

Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with that algorithm that lead to noncanonical atom orderings as well as problems when it is applied to large molecules like proteins. Furthermore, each cheminformatics toolkit or software provides its own version of a canonical ordering, most based on unpublished algorithms, which also complicates the generation of a universal unique identifier for molecules. We present an alternative canonicalization approach that uses a standard stable-sorting algorithm instead of a Morgan-like index. Two new invariants that allow canonical ordering of molecules with dependent chirality as well as those with highly symmetrical cyclic graphs have been developed. The new approach proved to be robust and fast when tested on the 1.45 million compounds of the ChEMBL 20 data set in different scenarios like random renumbering of input atoms or SMILES round tripping. Our new algorithm is able to generate a canonical order of the atoms of protein molecules within a few milliseconds. The novel algorithm is implemented in the open-source cheminformatics toolkit RDKit. With this paper, we provide a reference Python implementation of the algorithm that could easily be integrated in any cheminformatics toolkit. This provides a first step toward a common standard for canonical atom ordering to generate a universal unique identifier for molecules other than InChI.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26441310     DOI: 10.1021/acs.jcim.5b00543

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  17 in total

1.  GEN: highly efficient SMILES explorer using autodidactic generative examination networks.

Authors:  Ruud van Deursen; Peter Ertl; Igor V Tetko; Guillaume Godin
Journal:  J Cheminform       Date:  2020-04-10       Impact factor: 5.514

2.  One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.

Authors:  Alice Capecchi; Daniel Probst; Jean-Louis Reymond
Journal:  J Cheminform       Date:  2020-06-12       Impact factor: 5.514

3.  Predicting novel drug candidates against Covid-19 using generative deep neural networks.

Authors:  Santhosh Amilpur; Raju Bhukya
Journal:  J Mol Graph Model       Date:  2021-10-13       Impact factor: 2.518

4.  Unique identifiers for small molecules enable rigorous labeling of their atoms.

Authors:  Hesam Dashti; William M Westler; John L Markley; Hamid R Eghbalnia
Journal:  Sci Data       Date:  2017-05-23       Impact factor: 6.444

5.  Atomic ring invariant and Modified CANON extended connectivity algorithm for symmetry perception in molecular graphs and rigorous canonicalization of SMILES.

Authors:  Dmytro G Krotko
Journal:  J Cheminform       Date:  2020-08-20       Impact factor: 5.514

Review 6.  Molecular representations in AI-driven drug discovery: a review and practical guide.

Authors:  Laurianne David; Amol Thakkar; Rocío Mercado; Ola Engkvist
Journal:  J Cheminform       Date:  2020-09-17       Impact factor: 5.514

7.  MET: a Java package for fast molecule equivalence testing.

Authors:  Jördis-Ann Schüler; Steffen Rechner; Matthias Müller-Hannemann
Journal:  J Cheminform       Date:  2020-12-17       Impact factor: 5.514

Review 8.  A review on compound-protein interaction prediction methods: Data, format, representation and model.

Authors:  Sangsoo Lim; Yijingxiu Lu; Chang Yun Cho; Inyoung Sung; Jungwoo Kim; Youngkuk Kim; Sungjoon Park; Sun Kim
Journal:  Comput Struct Biotechnol J       Date:  2021-03-10       Impact factor: 7.271

9.  Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning.

Authors:  Alice Capecchi; Jean-Louis Reymond
Journal:  Biomolecules       Date:  2020-09-28

10.  Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases.

Authors:  Huan Jin; Joshua M Mitchell; Hunter N B Moseley
Journal:  Metabolites       Date:  2020-09-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.