Literature DB >> 31838210

Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.

Jackson M Steinkamp1, Wasif Bala2, Abhinav Sharma3, Jacob J Kantrowitz4.   

Abstract

INTRODUCTION: Machine learning (ML) and natural language processing have great potential to improve information extraction (IE) within electronic medical records (EMRs) for a wide variety of clinical search and summarization tools. Despite ML advancements, clinical adoption of real time IE tools for patient care remains low. Clinically motivated IE task definitions, publicly available annotated clinical datasets, and inclusion of subtasks such as coreference resolution and named entity normalization are critical for the development of useful clinical tools.
MATERIALS AND METHODS: We provide a task definition and comprehensive annotation requirements for a clinically motivated symptom extraction task. Four annotators labeled symptom mentions within 1108 discharge summaries from two public clinical note datasets for the tasks of named entity recognition, coreference resolution, and named entity normalization; these annotations will be released to the public. Baseline human performance was assessed and two ML models were evaluated on the symptom extraction task.
RESULTS: 16,922 symptom mentions were identified within the discharge summaries, with 11,944 symptom instances after coreference resolution and 1255 unique normalized answer forms. Human annotator performance averaged 92.2% F1. Recurrent network model performance was 85.6% F1 (recall 85.8%, precision 85.4%), and Transformer-based model performance was 86.3% F1 (recall 86.6%, precision 86.1%). Our models extracted vague symptoms, acronyms, typographical errors, and grouping statements. The models generalized effectively to a separate clinical note corpus and can run in real time.
CONCLUSION: To our knowledge, this dataset will be the largest and most comprehensive publicly released, annotated dataset for clinically motivated symptom extraction, as it includes annotations for named entity recognition, coreference, and normalization for more than 1000 clinical documents. Our neural network models extracted symptoms from unstructured clinical free text at near human performance in real time. In this paper, we present a clinically motivated task definition, dataset, and simple supervised natural language processing models to demonstrate the feasibility of building clinically applicable information extraction tools.
Copyright © 2019 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Electronic medical record; Information extraction; Machine learning; Natural language processing

Mesh:

Year:  2019        PMID: 31838210     DOI: 10.1016/j.jbi.2019.103354

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  3 in total

1.  A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media.

Authors:  Xiao Luo; Priyanka Gandhi; Susan Storey; Kun Huang
Journal:  IEEE J Biomed Health Inform       Date:  2022-04-14       Impact factor: 7.021

2.  Composition-driven symptom phrase recognition for Chinese medical consultation corpora.

Authors:  Xuan Gu; Zhengya Sun; Wensheng Zhang
Journal:  BMC Med Inform Decis Mak       Date:  2021-12-27       Impact factor: 2.796

3.  Strategies to Address the Lack of Labeled Data for Supervised Machine Learning Training With Electronic Health Records: Case Study for the Extraction of Symptoms From Clinical Notes.

Authors:  Marie Humbert-Droz; Pritam Mukherjee; Olivier Gevaert
Journal:  JMIR Med Inform       Date:  2022-03-14
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.