| Literature DB >> 32028051 |
Ilseyar Alimova1, Elena Tutubalina2.
Abstract
Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding features. We systematically investigate the impact of these features with standard distance- and word-based features, conducting experiments on two benchmark datasets of clinical texts from MADE 2018 and n2c2 2018 shared tasks. For comparison with the feature-based model, we utilize state-of-the-art models and three BERT-based models, including BioBERT and Clinical BERT. Our results demonstrate that distance and word features provide significant benefits to the classifier. Knowledge-based features improve classification results only for particular types of relations. The sentence embedding feature provides the largest improvement in results, among other explored features on the MADE corpus. The classifier obtains state-of-the-art performance in clinical relation extraction with F-measure of 92.6%, improving F-measure by 3.5% on the MADE corpus.Entities:
Keywords: Clinical data; Electronic health records; Features; MADE corpus; Machine learning; Natural language processing; Relation extraction; n2c2 corpus
Mesh:
Year: 2020 PMID: 32028051 DOI: 10.1016/j.jbi.2020.103382
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317