Literature DB >> 31243432

HUNER: improving biomedical NER with pretraining.

Leon Weber1, Jannes Münchmeyer1,2, Tim Rocktäschel3, Maryam Habibi1, Ulf Leser1.   

Abstract

MOTIVATION: Several recent studies showed that the application of deep neural networks advanced the state-of-the-art in named entity recognition (NER), including biomedical NER. However, the impact on performance and the robustness of improvements crucially depends on the availability of sufficiently large training corpora, which is a problem in the biomedical domain with its often rather small gold standard corpora.
RESULTS: We evaluate different methods for alleviating the data sparsity problem by pretraining a deep neural network (LSTM-CRF), followed by a rather short fine-tuning phase focusing on a particular corpus. Experiments were performed using 34 different corpora covering five different biomedical entity types, yielding an average increase in F1-score of ∼2 pp compared to learning without pretraining. We experimented both with supervised and semi-supervised pretraining, leading to interesting insights into the precision/recall trade-off. Based on our results, we created the stand-alone NER tool HUNER incorporating fully trained models for five entity types. On the independent CRAFT corpus, which was not used for creating HUNER, it outperforms the state-of-the-art tools GNormPlus and tmChem by 5-13 pp on the entity types chemicals, species and genes.
AVAILABILITY AND IMPLEMENTATION: HUNER is freely available at https://hu-ner.github.io. HUNER comes in containers, making it easy to install and use, and it can be applied off-the-shelf to arbitrary texts. We also provide an integrated tool for obtaining and converting all 34 corpora used in our evaluation, including fixed training, development and test splits to enable fair comparisons in the future. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Year:  2020        PMID: 31243432     DOI: 10.1093/bioinformatics/btz528

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.

Authors:  Renzo M Rivera-Zavala; Paloma Martínez
Journal:  BMC Bioinformatics       Date:  2021-12-17       Impact factor: 3.169

2.  Assigning species information to corresponding genes by a sequence labeling framework.

Authors:  Ling Luo; Chih-Hsuan Wei; Po-Ting Lai; Qingyu Chen; Rezarta Islamaj; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2022-10-13       Impact factor: 4.462

3.  Parallel sequence tagging for concept recognition.

Authors:  Lenz Furrer; Joseph Cornelius; Fabio Rinaldi
Journal:  BMC Bioinformatics       Date:  2022-03-24       Impact factor: 3.169

4.  A pre-training and self-training approach for biomedical named entity recognition.

Authors:  Shang Gao; Olivera Kotevska; Alexandre Sorokine; J Blair Christian
Journal:  PLoS One       Date:  2021-02-09       Impact factor: 3.240

5.  The Construction Model of the TCM Clinical Knowledge Coding Database Based on Knowledge Organization.

Authors:  Pan Zhang; Shaowu Shen; Wenping Deng; Shusong Mao; Yan Wang
Journal:  Biomed Res Int       Date:  2022-01-17       Impact factor: 3.411

6.  PEDL: extracting protein-protein associations using deep language models and distant supervision.

Authors:  Leon Weber; Kirsten Thobe; Oscar Arturo Migueles Lozano; Jana Wolf; Ulf Leser
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.