Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Adversarial active learning for the identification of medical concepts and annotation inconsistency.

Literature DB >> 32687985

Adversarial active learning for the identification of medical concepts and annotation inconsistency.

Gang Yu¹, Yiwen Yang², Xuying Wang³, Huachun Zhen⁴, Guoping He⁵, Zheming Li⁶, Yonggen Zhao⁷, Qiang Shu⁸, Liqi Shu⁹.

Abstract

OBJECTIVE: Named entity recognition (NER) is a principal task in the biomedical field and deep learning-based algorithms have been widely applied to biomedical NER. However, all of these methods that are applied to biomedical corpora use only annotated samples to maximize their performances. Thus, (1) large numbers of unannotated samples are relinquished and their values are overlooked. (2) Compared with other types of active learning (AL) algorithms, generative adversarial learning (GAN)-based AL methods have developed slowly. Furthermore, current diversity-based AL methods only compute similarities between a pair of sentences and cannot evaluate distribution similarities between groups of sentences. Annotation inconsistency is one of the significant challenges in the biomedical annotation field. Most existing methods for addressing this challenge are statistics-based or rule-based methods. (3) They require sufficient expert knowledge and complex designs. To address challenges (1), (2), and (3) simultaneously, we propose innovative algorithms.
METHODS: GAN is introduced in this paper, and we propose the GAN-bidirectional long short-term memory-conditional random field (GAN-BiLSTM-CRF) and the GAN-bidirectional encoder representations from transformers-conditional random field (GAN-BERT-CRF) models, which can be considered an NER model, an AL model, and a model identifying error labels. BiLSTM-CRF or BERT-CRF is defined as the generator and a convolutional neural network (CNN)-based network is considered the discriminator. (1) The generator employs unannotated samples in addition to annotated samples to maximize NER performance. (2) The outputs of the CRF layer and the discriminator are used to select unlabeled samples for the AL task. (3) The discriminator discriminates the distribution of error labels from that of correct labels, identify error labels, and address the annotation inconsistency challenge.
RESULTS: The corpus from the 2010 i2b2/VA NLP challenge and the Chinese CCKS-2017 Task 2 dataset are adopted for experiments. Compared to the baseline BiLSTM-CRF and BERT-CRF, the GAN-BiLSTM-CRF and GAN-BERT-CRF models achieved significant improvements on the precision, recall, and F1 scores in terms of NER performance. Learning curves in AL experiments show the comparative results of the proposed models. Furthermore, the trained discriminator can identify samples with incorrect medical labels in both simulation and real-word experimental environments.
CONCLUSION: The idea of introducing GAN contributes significant results in terms of NER, active learning, and the ability to identify incorrect annotated samples. The benefits of GAN will be further studied.

Entities: Chemical

Keywords: Active learning; Annotation inconsistency; Clinical natural language processing; Generative adversarial nets; Named entity recognition

Mesh：

Year: 2020 PMID： 32687985 DOI： 10.1016/j.jbi.2020.103481

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

2 in total

1. Data governance system of the National Clinical Research Center for Child Health in China.

Authors: Jing Li; Gang Yu; Wen Ding; Jian Huang; Zheming Li; Zhu Zhu; Dejian Wang; Jie Zhang; Jing Wang; Jianwei Yin
Journal: Transl Pediatr Date: 2021-07

2. Natural language processing in clinical neuroscience and psychiatry: A review.

Authors: Claudio Crema; Giuseppe Attardi; Daniele Sartiano; Alberto Redolfi
Journal: Front Psychiatry Date: 2022-09-14 Impact factor: 5.435

2 in total