| Literature DB >> 21384929 |
Daniel M Lowe1, Peter T Corbett, Peter Murray-Rust, Robert C Glen.
Abstract
We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7-99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.Mesh:
Year: 2011 PMID: 21384929 DOI: 10.1021/ci100384d
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956