Literature DB >> 35582496

How Do Your Biomedical Named Entity Recognition Models Generalize to Novel Entities?

Hyunjae Kim1, Jaewoo Kang1.   

Abstract

The number of biomedical literature on new biomedical concepts is rapidly increasing, which necessitates a reliable biomedical named entity recognition (BioNER) model for identifying new and unseen entity mentions. However, it is questionable whether existing models can effectively handle them. In this work, we systematically analyze the three types of recognition abilities of BioNER models: memorization, synonym generalization, and concept generalization. We find that although current best models achieve state-of-the-art performance on benchmarks based on overall performance, they have limitations in identifying synonyms and new biomedical concepts, indicating they are overestimated in terms of their generalization abilities. We also investigate failure cases of models and identify several difficulties in recognizing unseen mentions in biomedical literature as follows: (1) models tend to exploit dataset biases, which hinders the models' abilities to generalize, and (2) several biomedical names have novel morphological patterns with weak name regularity, and models fail to recognize them. We apply a statistics-based debiasing method to our problem as a simple remedy and show the improvement in generalization to unseen mentions. We hope that our analyses and findings would be able to facilitate further research into the generalization capabilities of NER models in a domain where their reliability is of utmost importance.

Entities:  

Keywords:  Bioinformatics (in engineering in medicine and biology); natural language processing; text mining

Year:  2022        PMID: 35582496      PMCID: PMC9014470          DOI: 10.1109/ACCESS.2022.3157854

Source DB:  PubMed          Journal:  IEEE Access        ISSN: 2169-3536            Impact factor:   3.476


  19 in total

1.  Cross-type biomedical named entity recognition with deep multi-task learning.

Authors:  Xuan Wang; Yu Zhang; Xiang Ren; Yuhao Zhang; Marinka Zitnik; Jingbo Shang; Curtis Langlotz; Jiawei Han
Journal:  Bioinformatics       Date:  2019-05-15       Impact factor: 6.937

2.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors:  Robert Leaman; Zhiyong Lu
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

3.  NCBI disease corpus: a resource for disease name recognition and concept normalization.

Authors:  Rezarta Islamaj Doğan; Robert Leaman; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2014-01-03       Impact factor: 6.317

4.  EDGAR: extraction of drugs, genes and relations from the biomedical literature.

Authors:  T C Rindflesch; L Tanabe; J N Weinstein; L Hunter
Journal:  Pac Symp Biocomput       Date:  2000

5.  A neural network multi-task learning approach to biomedical named entity recognition.

Authors:  Gamal Crichton; Sampo Pyysalo; Billy Chiu; Anna Korhonen
Journal:  BMC Bioinformatics       Date:  2017-08-15       Impact factor: 3.169

6.  Towards reliable named entity recognition in the biomedical domain.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

7.  Drug-Drug Interaction Extraction via Convolutional Neural Networks.

Authors:  Shengyu Liu; Buzhou Tang; Qingcai Chen; Xiaolong Wang
Journal:  Comput Math Methods Med       Date:  2016-01-31       Impact factor: 2.238

8.  Deep learning with word embeddings improves biomedical named entity recognition.

Authors:  Maryam Habibi; Leon Weber; Mariana Neves; David Luis Wiegandt; Ulf Leser
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

9.  Transfer learning for biomedical named entity recognition with neural networks.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.937

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more
  1 in total

1.  Full-text chemical identification with improved generalizability and tagging consistency.

Authors:  Hyunjae Kim; Mujeen Sung; Wonjin Yoon; Sungjoon Park; Jaewoo Kang
Journal:  Database (Oxford)       Date:  2022-09-28       Impact factor: 4.462

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.