Literature DB >> 31414701

NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition.

Richard Tzong-Han Tsai, Yu-Cheng Hsiao, Po-Ting Lai.   

Abstract

Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates ( 1 ) class composition, which is used for combining chemical classes whose naming conventions are similar; ( 2 ) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and ( 3 ) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem.
© The Author(s) 2016. Published by Oxford University Press.

Entities:  

Year:  2016        PMID: 31414701      PMCID: PMC5091336          DOI: 10.1093/database/baw135

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  14 in total

1.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

2.  Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

Authors:  Hong-Jie Dai; Po-Ting Lai; Yung-Chun Chang; Richard Tzong-Han Tsai
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

3.  tmVar: a text mining approach for extracting sequence variants in biomedical literature.

Authors:  Chih-Hsuan Wei; Bethany R Harris; Hung-Yu Kao; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-04-05       Impact factor: 6.937

4.  Detection of IUPAC and IUPAC-like chemical names.

Authors:  Roman Klinger; Corinna Kolárik; Juliane Fluck; Martin Hofmann-Apitius; Christoph M Friedrich
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

5.  OSCAR4: a flexible architecture for chemical text-mining.

Authors:  David M Jessop; Sam E Adams; Egon L Willighagen; Lezan Hawizy; Peter Murray-Rust
Journal:  J Cheminform       Date:  2011-10-14       Impact factor: 5.514

6.  CHEMDNER: The drugs and chemical names extraction challenge.

Authors:  Martin Krallinger; Florian Leitner; Obdulia Rabal; Miguel Vazquez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

7.  DrugBank 4.0: shedding new light on drug metabolism.

Authors:  Vivian Law; Craig Knox; Yannick Djoumbou; Tim Jewison; An Chi Guo; Yifeng Liu; Adam Maciejewski; David Arndt; Michael Wilson; Vanessa Neveu; Alexandra Tang; Geraldine Gabriel; Carol Ly; Sakina Adamjee; Zerihun T Dame; Beomsoo Han; You Zhou; David S Wishart
Journal:  Nucleic Acids Res       Date:  2013-11-06       Impact factor: 16.971

8.  Annotated chemical patent corpus: a gold standard for text mining.

Authors:  Saber A Akhondi; Alexander G Klenner; Christian Tyrchan; Anil K Manchala; Kiran Boppana; Daniel Lowe; Marc Zimmermann; Sarma A R P Jagarlapudi; Roger Sayle; Jan A Kors; Sorel Muresan
Journal:  PLoS One       Date:  2014-09-30       Impact factor: 3.240

9.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.

Authors:  Allan Peter Davis; Cynthia J Grondin; Kelley Lennon-Hopkins; Cynthia Saraceni-Richards; Daniela Sciaky; Benjamin L King; Thomas C Wiegers; Carolyn J Mattingly
Journal:  Nucleic Acids Res       Date:  2014-10-17       Impact factor: 16.971

10.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013.

Authors:  Janna Hastings; Paula de Matos; Adriano Dekker; Marcus Ennis; Bhavana Harsha; Namrata Kale; Venkatesh Muthukrishnan; Gareth Owen; Steve Turner; Mark Williams; Christoph Steinbeck
Journal:  Nucleic Acids Res       Date:  2012-11-24       Impact factor: 16.971

View more
  3 in total

1.  The extraction of complex relationships and their conversion to biological expression language (BEL) overview of the BioCreative VI (2017) BEL track.

Authors:  Sumit Madan; Justyna Szostak; Ravikumar Komandur Elayavilli; Richard Tzong-Han Tsai; Mehdi Ali; Longhua Qian; Majid Rastegar-Mojarad; Julia Hoeng; Juliane Fluck
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

2.  Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.

Authors:  Huiwei Zhou; Shixian Ning; Zhe Liu; Chengkun Lang; Zhuang Liu; Bizun Lei
Journal:  BMC Bioinformatics       Date:  2020-01-30       Impact factor: 3.169

3.  Using a Large Margin Context-Aware Convolutional Neural Network to Automatically Extract Disease-Disease Association from Literature: Comparative Analytic Study.

Authors:  Richard Tzong-Han Tsai; Jorng-Tzong Horng; Po-Ting Lai; Wei-Liang Lu; Ting-Rung Kuo; Chia-Ru Chung; Jen-Chieh Han
Journal:  JMIR Med Inform       Date:  2019-11-26
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.