Literature DB >> 22989151

Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI.

Noel M O'Boyle1.   

Abstract

BACKGROUND: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string.
RESULTS: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset.
CONCLUSIONS: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain - such as the development of a standard aromatic model for SMILES - the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.

Entities:  

Year:  2012        PMID: 22989151      PMCID: PMC3495655          DOI: 10.1186/1758-2946-4-22

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  13 in total

1.  Modular Chemical Descriptor Language (MCDL): composition, connectivity, and supplementary modules.

Authors:  A A Gakh; M N Burnett
Journal:  J Chem Inf Comput Sci       Date:  2001 Nov-Dec

2.  yaInChI: modified InChI string scheme for line notation of chemical structures.

Authors:  Y S Cho; K T No; K-H Cho
Journal:  SAR QSAR Environ Res       Date:  2012-04-02       Impact factor: 3.000

3.  Tautomer identification and tautomer structure generation based on the InChI code.

Authors:  Torsten Thalheim; Armin Vollmer; Ralf-Uwe Ebert; Ralph Kühne; Gerrit Schüürmann
Journal:  J Chem Inf Model       Date:  2010-07-26       Impact factor: 4.956

4.  Algorithm for advanced canonical coding of planar chemical structures that considers stereochemical and symmetric information.

Authors:  Shungo Koichi; Satoru Iwata; Takeaki Uno; Hiroyuki Koshino; Hiroko Satoh
Journal:  J Chem Inf Model       Date:  2007-07-18       Impact factor: 4.956

5.  SYBYL line notation (SLN): a single notation to represent chemical structures, queries, reactions, and virtual libraries.

Authors:  R Webster Homer; Jon Swanson; Robert J Jilek; Tad Hurst; Robert D Clark
Journal:  J Chem Inf Model       Date:  2008-12       Impact factor: 4.956

6.  A list of organic kryptoracemates.

Authors:  László Fábián; Carolyn Pratt Brock
Journal:  Acta Crystallogr B       Date:  2010-01-22

7.  Modular Chemical Descriptor Language (MCDL): Stereochemical modules.

Authors:  Andrei A Gakh; Michael N Burnett; Sergei V Trepalin; Alexander V Yarkov
Journal:  J Cheminform       Date:  2011-01-31       Impact factor: 5.514

8.  Open Babel: An open chemical toolbox.

Authors:  Noel M O'Boyle; Michael Banck; Craig A James; Chris Morley; Tim Vandermeersch; Geoffrey R Hutchison
Journal:  J Cheminform       Date:  2011-10-07       Impact factor: 5.514

9.  ChEMBL: a large-scale bioactivity database for drug discovery.

Authors:  Anna Gaulton; Louisa J Bellis; A Patricia Bento; Jon Chambers; Mark Davies; Anne Hersey; Yvonne Light; Shaun McGlinchey; David Michalovich; Bissan Al-Lazikani; John P Overington
Journal:  Nucleic Acids Res       Date:  2011-09-23       Impact factor: 16.971

10.  InChI, the IUPAC International Chemical Identifier.

Authors:  Stephen R Heller; Alan McNaught; Igor Pletnev; Stephen Stein; Dmitrii Tchekhovskoi
Journal:  J Cheminform       Date:  2015-05-30       Impact factor: 5.514

View more
  35 in total

1.  Many InChIs and quite some feat.

Authors:  Wendy A Warr
Journal:  J Comput Aided Mol Des       Date:  2015-06-17       Impact factor: 3.686

Review 2.  In silico methods for drug repurposing and pharmacology.

Authors:  Rachel A Hodos; Brian A Kidd; Khader Shameer; Ben P Readhead; Joel T Dudley
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2016-04-15

3.  Ensemble docking to difficult targets in early-stage drug discovery: Methodology and application to fibroblast growth factor 23.

Authors:  Hector A Velazquez; Demian Riccardi; Zhousheng Xiao; Leigh Darryl Quarles; Charless Ryan Yates; Jerome Baudry; Jeremy C Smith
Journal:  Chem Biol Drug Des       Date:  2017-11-03       Impact factor: 2.817

4.  Comparative analysis of molecular fingerprints in prediction of drug combination effects.

Authors:  B Zagidullin; Z Wang; Y Guan; E Pitkänen; J Tang
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

5.  Consistency of systematic chemical identifiers within and between small-molecule databases.

Authors:  Saber A Akhondi; Jan A Kors; Sorel Muresan
Journal:  J Cheminform       Date:  2012-12-13       Impact factor: 5.514

Review 6.  Representation of molecules for drug response prediction.

Authors:  Xin An; Xi Chen; Daiyao Yi; Hongyang Li; Yuanfang Guan
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 13.994

7.  InChI, the IUPAC International Chemical Identifier.

Authors:  Stephen R Heller; Alan McNaught; Igor Pletnev; Stephen Stein; Dmitrii Tchekhovskoi
Journal:  J Cheminform       Date:  2015-05-30       Impact factor: 5.514

8.  Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics.

Authors:  Mingshu Cao; Karl Fraser; Jan Huege; Tom Featonby; Susanne Rasmussen; Chris Jones
Journal:  Metabolomics       Date:  2014-09-07       Impact factor: 4.290

Review 9.  Towards operando computational modeling in heterogeneous catalysis.

Authors:  Lukáš Grajciar; Christopher J Heard; Anton A Bondarenko; Mikhail V Polynski; Jittima Meeprasert; Evgeny A Pidko; Petr Nachtigall
Journal:  Chem Soc Rev       Date:  2018-11-12       Impact factor: 54.564

10.  Combining generative artificial intelligence and on-chip synthesis for de novo drug design.

Authors:  Francesca Grisoni; Berend J H Huisman; Alexander L Button; Michael Moret; Kenneth Atz; Daniel Merk; Gisbert Schneider
Journal:  Sci Adv       Date:  2021-06-11       Impact factor: 14.136

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.