Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 HUNER: improving biomedical NER with pretraining.

Literature DB >> 31243432

HUNER: improving biomedical NER with pretraining.

Leon Weber¹, Jannes Münchmeyer^1,2, Tim Rocktäschel³, Maryam Habibi¹, Ulf Leser¹.

Abstract

MOTIVATION: Several recent studies showed that the application of deep neural networks advanced the state-of-the-art in named entity recognition (NER), including biomedical NER. However, the impact on performance and the robustness of improvements crucially depends on the availability of sufficiently large training corpora, which is a problem in the biomedical domain with its often rather small gold standard corpora.
RESULTS: We evaluate different methods for alleviating the data sparsity problem by pretraining a deep neural network (LSTM-CRF), followed by a rather short fine-tuning phase focusing on a particular corpus. Experiments were performed using 34 different corpora covering five different biomedical entity types, yielding an average increase in F1-score of ∼2 pp compared to learning without pretraining. We experimented both with supervised and semi-supervised pretraining, leading to interesting insights into the precision/recall trade-off. Based on our results, we created the stand-alone NER tool HUNER incorporating fully trained models for five entity types. On the independent CRAFT corpus, which was not used for creating HUNER, it outperforms the state-of-the-art tools GNormPlus and tmChem by 5-13 pp on the entity types chemicals, species and genes.
AVAILABILITY AND IMPLEMENTATION: HUNER is freely available at https://hu-ner.github.io. HUNER comes in containers, making it easy to install and use, and it can be applied off-the-shelf to arbitrary texts. We also provide an integrated tool for obtaining and converting all 34 corpora used in our evaluation, including fixed training, development and test splits to enable fair comparisons in the future. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical

Year: 2020 PMID： 31243432 DOI： 10.1093/bioinformatics/btz528

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

6 in total

HUNER: improving biomedical NER with pretraining.

1. Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.

2. Assigning species information to corresponding genes by a sequence labeling framework.

3. Parallel sequence tagging for concept recognition.

4. A pre-training and self-training approach for biomedical named entity recognition.

5. The Construction Model of the TCM Clinical Knowledge Coding Database Based on Knowledge Organization.

6. PEDL: extracting protein-protein associations using deep language models and distant supervision.