Literature DB >> 36170114

Full-text chemical identification with improved generalizability and tagging consistency.

Hyunjae Kim1, Mujeen Sung1, Wonjin Yoon1, Sungjoon Park2, Jaewoo Kang1,3.   

Abstract

Chemical identification involves finding chemical entities in text (i.e. named entity recognition) and assigning unique identifiers to the entities (i.e. named entity normalization). While current models are developed and evaluated based on article titles and abstracts, their effectiveness has not been thoroughly verified in full text. In this paper, we identify two limitations of models in tagging full-text articles: (1) low generalizability to unseen mentions and (2) tagging inconsistency. We use simple training and post-processing methods to address the limitations such as transfer learning and mention-wise majority voting. We also present a hybrid model for the normalization task that utilizes the high recall of a neural model while maintaining the high precision of a dictionary model. In the BioCreative VII NLM-Chem track challenge, our best model achieves 86.72 and 78.31 F1 scores in named entity recognition and normalization, significantly outperforming the median (83.73 and 77.49 F1 scores) and taking first place in named entity recognition. In a post-challenge evaluation, we re-implement our model and obtain 84.70 F1 score in the normalization task, outperforming the best score in the challenge by 3.34 F1 score. Database URL: https://github.com/dmis-lab/bc7-chem-id.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2022        PMID: 36170114      PMCID: PMC9518746          DOI: 10.1093/database/baac074

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   4.462


  11 in total

1.  An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.

Authors:  Ling Luo; Zhihao Yang; Pei Yang; Yin Zhang; Lei Wang; Hongfei Lin; Jian Wang
Journal:  Bioinformatics       Date:  2018-04-15       Impact factor: 6.937

2.  BERN2: an advanced neural biomedical named entity recognition and normalization tool.

Authors:  Mujeen Sung; Minbyul Jeong; Yonghwa Choi; Donghyeon Kim; Jinhyuk Lee; Jaewoo Kang
Journal:  Bioinformatics       Date:  2022-10-14       Impact factor: 6.931

3.  How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?

Authors:  Hyunjae Kim; Jaewoo Kang
Journal:  IEEE Access       Date:  2022-03-08       Impact factor: 3.476

4.  The CHEMDNER corpus of chemicals and drugs and its annotation principles.

Authors:  Martin Krallinger; Obdulia Rabal; Florian Leitner; Miguel Vazquez; David Salgado; Zhiyong Lu; Robert Leaman; Yanan Lu; Donghong Ji; Daniel M Lowe; Roger A Sayle; Riza Theresa Batista-Navarro; Rafal Rak; Torsten Huber; Tim Rocktäschel; Sérgio Matos; David Campos; Buzhou Tang; Hua Xu; Tsendsuren Munkhdalai; Keun Ho Ryu; S V Ramanan; Senthil Nathan; Slavko Žitnik; Marko Bajec; Lutz Weber; Matthias Irmer; Saber A Akhondi; Jan A Kors; Shuo Xu; Xin An; Utpal Kumar Sikdar; Asif Ekbal; Masaharu Yoshioka; Thaer M Dieb; Miji Choi; Karin Verspoor; Madian Khabsa; C Lee Giles; Hongfang Liu; Komandur Elayavilli Ravikumar; Andre Lamurias; Francisco M Couto; Hong-Jie Dai; Richard Tzong-Han Tsai; Caglar Ata; Tolga Can; Anabel Usié; Rui Alves; Isabel Segura-Bedmar; Paloma Martínez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

5.  BioCreative V CDR task corpus: a resource for chemical disease relation extraction.

Authors:  Jiao Li; Yueping Sun; Robin J Johnson; Daniela Sciaky; Chih-Hsuan Wei; Robert Leaman; Allan Peter Davis; Carolyn J Mattingly; Thomas C Wiegers; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2016-05-09       Impact factor: 3.451

6.  CollaboNet: collaboration of deep neural networks for biomedical named entity recognition.

Authors:  Wonjin Yoon; Chan Ho So; Jinhyuk Lee; Jaewoo Kang
Journal:  BMC Bioinformatics       Date:  2019-05-29       Impact factor: 3.169

7.  NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.

Authors:  Rezarta Islamaj; Robert Leaman; Sun Kim; Dongseop Kwon; Chih-Hsuan Wei; Donald C Comeau; Yifan Peng; David Cissel; Cathleen Coss; Carol Fisher; Rob Guzman; Preeti Gokal Kochar; Stella Koppel; Dorothy Trinh; Keiko Sekiya; Janice Ward; Deborah Whitman; Susan Schmidt; Zhiyong Lu
Journal:  Sci Data       Date:  2021-03-25       Impact factor: 6.444

8.  Chemical-gene relation extraction using recursive neural network.

Authors:  Sangrak Lim; Jaewoo Kang
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

9.  Drug-drug interaction extraction via hierarchical RNNs on sequence and shortest dependency paths.

Authors:  Yijia Zhang; Wei Zheng; Hongfei Lin; Jian Wang; Zhihao Yang; Michel Dumontier
Journal:  Bioinformatics       Date:  2018-03-01       Impact factor: 6.937

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.