Literature DB >> 18586724

Detection of IUPAC and IUPAC-like chemical names.

Roman Klinger1, Corinna Kolárik, Juliane Fluck, Martin Hofmann-Apitius, Christoph M Friedrich.   

Abstract

MOTIVATION: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools.
RESULTS: We present an IUPAC name recognizer with an F(1) measure of 85.6% on a MEDLINE corpus. The evaluation of different CRF orders and offset conjunction orders demonstrates the importance of these parameters. An evaluation of hand-selected patent sections containing large enumerations and terms with mixed nomenclature shows a good performance on these cases (F(1) measure 81.5%). Remaining recognition problems are to detect correct borders of the typically long terms, especially when occurring in parentheses or enumerations. We demonstrate the scalability of our implementation by providing results from a full MEDLINE run. AVAILABILITY: We plan to publish the corpora, annotation guideline as well as the conditional random field model as a UIMA component.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18586724      PMCID: PMC2718657          DOI: 10.1093/bioinformatics/btn181

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  15 in total

1.  A biological named entity recognizer.

Authors:  Meenakshi Narayanaswamy; K E Ravikumar; K Vijay-Shanker
Journal:  Pac Symp Biocomput       Date:  2003

2.  An entity tagger for recognizing acquired genomic variations in cancer literature.

Authors:  Ryan T McDonald; R Scott Winters; Mark Mandel; Yang Jin; Peter S White; Fernando Pereira
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  EBIMed--text crunching to gather facts for proteins from Medline.

Authors:  Dietrich Rebholz-Schuhmann; Harald Kirsch; Miguel Arregui; Sylvain Gaudan; Mark Riethoven; Peter Stoehr
Journal:  Bioinformatics       Date:  2007-01-15       Impact factor: 6.937

4.  Mining patents using molecular similarity search.

Authors:  James Rhodes; Stephen Boyer; Jeffrey Kreulen; Ying Chen; Patricia Ordonez
Journal:  Pac Symp Biocomput       Date:  2007

5.  Identification of new drug classification terms in textual resources.

Authors:  Corinna Kolárik; Martin Hofmann-Apitius; Marc Zimmermann; Juliane Fluck
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

6.  Identifying gene-specific variations in biomedical text.

Authors:  Roman Klinger; Christoph M Friedrich; Heinz Theodor Mevissen; Juliane Fluck; Martin Hofmann-Apitius; Laura I Furlong; Ferran Sanz
Journal:  J Bioinform Comput Biol       Date:  2007-12       Impact factor: 1.122

7.  Reconstruction of chemical molecules from images.

Authors:  Maria-Elena Algorri; Marc Zimmermann; Christoph M Friedrich; Santiago Akle; Martin Hofmann-Apitius
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2007

8.  Synthesis of racemic 6,7,8,9-tetrahydro-3-hydroxy-1H-1-benzazepine-2,5-diones as antagonists of N-methyl-d-aspartate (NMDA) and alpha-amino-3-hydroxy-5-methylisoxazole-4-propionic acid (AMPA) receptors.

Authors:  A P Guzikowski; E R Whittemore; R M Woodward; E Weber; J F Keana
Journal:  J Med Chem       Date:  1997-07-18       Impact factor: 7.446

9.  DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors:  David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.

Authors:  Christoph Steinbeck; Yongquan Han; Stefan Kuhn; Oliver Horlacher; Edgar Luttmann; Egon Willighagen
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr
View more
  32 in total

1.  Silver threads.

Authors:  Wendy A Warr
Journal:  J Comput Aided Mol Des       Date:  2011-12-09       Impact factor: 3.686

2.  Cross-species gene normalization by species inference.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

3.  Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

Authors:  Hong-Jie Dai; Po-Ting Lai; Yung-Chun Chang; Richard Tzong-Han Tsai
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

4.  tmChem: a high performance approach for chemical named entity recognition and normalization.

Authors:  Robert Leaman; Chih-Hsuan Wei; Zhiyong Lu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

5.  Many InChIs and quite some feat.

Authors:  Wendy A Warr
Journal:  J Comput Aided Mol Des       Date:  2015-06-17       Impact factor: 3.686

6.  NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition.

Authors:  Richard Tzong-Han Tsai; Yu-Cheng Hsiao; Po-Ting Lai
Journal:  Database (Oxford)       Date:  2016-10-25       Impact factor: 3.451

7.  A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.

Authors:  Buzhou Tang; Yudong Feng; Xiaolong Wang; Yonghui Wu; Yaoyun Zhang; Min Jiang; Jingqi Wang; Hua Xu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

8.  LeadMine: a grammar and dictionary driven approach to entity recognition.

Authors:  Daniel M Lowe; Roger A Sayle
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

9.  Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.

Authors:  Kristina M Hettne; Antony J Williams; Erik M van Mulligen; Jos Kleinjans; Valery Tkachenko; Jan A Kors
Journal:  J Cheminform       Date:  2010-03-23       Impact factor: 5.514

10.  Iron behaving badly: inappropriate iron chelation as a major contributor to the aetiology of vascular and other progressive inflammatory and degenerative diseases.

Authors:  Douglas B Kell
Journal:  BMC Med Genomics       Date:  2009-01-08       Impact factor: 3.063

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.