Literature DB >> 33561139

A pre-training and self-training approach for biomedical named entity recognition.

Shang Gao1, Olivera Kotevska2, Alexandre Sorokine3, J Blair Christian1.   

Abstract

Named entity recognition (NER) is a key component of many scientific literature mining tasks, such as information retrieval, information extraction, and question answering; however, many modern approaches require large amounts of labeled training data in order to be effective. This severely limits the effectiveness of NER models in applications where expert annotations are difficult and expensive to obtain. In this work, we explore the effectiveness of transfer learning and semi-supervised self-training to improve the performance of NER models in biomedical settings with very limited labeled data (250-2000 labeled samples). We first pre-train a BiLSTM-CRF and a BERT model on a very large general biomedical NER corpus such as MedMentions or Semantic Medline, and then we fine-tune the model on a more specific target NER task that has very limited training data; finally, we apply semi-supervised self-training using unlabeled data to further boost model performance. We show that in NER tasks that focus on common biomedical entity types such as those in the Unified Medical Language System (UMLS), combining transfer learning with self-training enables a NER model such as a BiLSTM-CRF or BERT to obtain similar performance with the same model trained on 3x-8x the amount of labeled data. We further show that our approach can also boost performance in a low-resource application where entities types are more rare and not specifically covered in UMLS.

Entities:  

Mesh:

Year:  2021        PMID: 33561139      PMCID: PMC7872256          DOI: 10.1371/journal.pone.0246310

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  15 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  ChemSpot: a hybrid system for chemical named entity recognition.

Authors:  Tim Rocktäschel; Michael Weidlich; Ulf Leser
Journal:  Bioinformatics       Date:  2012-04-12       Impact factor: 6.937

3.  HUNER: improving biomedical NER with pretraining.

Authors:  Leon Weber; Jannes Münchmeyer; Tim Rocktäschel; Maryam Habibi; Ulf Leser
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

4.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning.

Authors:  Takeru Miyato; Shin-Ichi Maeda; Masanori Koyama; Shin Ishii
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2018-07-23       Impact factor: 6.226

5.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

6.  ProMiner: rule-based protein and gene entity recognition.

Authors:  Daniel Hanisch; Katrin Fundel; Heinz-Theodor Mevissen; Ralf Zimmer; Juliane Fluck
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

7.  BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors:  Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal:  Sci Data       Date:  2019-05-10       Impact factor: 6.444

8.  Towards reliable named entity recognition in the biomedical domain.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

9.  Transfer learning for biomedical named entity recognition with neural networks.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.937

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more
  2 in total

1.  Accelerated variant curation from scientific literature using biomedical text mining.

Authors:  Rishab Mallick; Valerio Arnaboldi; Paul Davis; Stavros Diamantakis; Magdalena Zarowiecki; Kevin Howe
Journal:  MicroPubl Biol       Date:  2022-06-01

2.  PICO entity extraction for preclinical animal literature.

Authors:  Qianying Wang; Jing Liao; Mirella Lapata; Malcolm Macleod
Journal:  Syst Rev       Date:  2022-09-30
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.