Literature DB >> 26385377

A study of active learning methods for named entity recognition in clinical text.

Yukun Chen1, Thomas A Lasko1, Qiaozhu Mei2, Joshua C Denny3, Hua Xu4.   

Abstract

OBJECTIVES: Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes.
METHODS: Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed.
RESULTS: Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort.
CONCLUSION: In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Active learning; Clinical named entity recognition; Clinical natural language processing; Machine learning

Mesh:

Year:  2015        PMID: 26385377      PMCID: PMC4934373          DOI: 10.1016/j.jbi.2015.09.010

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  26 in total

1.  Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.

Authors:  Son Doan; Hua Xu
Journal:  Proc Int Conf Comput Ling       Date:  2010-08

2.  Development and evaluation of a clinical note section header terminology.

Authors:  Joshua C Denny; Randolph A Miller; Kevin B Johnson; Anderson Spickard
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

3.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors:  Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2011-04-20       Impact factor: 4.497

4.  UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity.

Authors:  Bridget T McInnes; Ted Pedersen; Serguei V S Pakhomov
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

5.  Applying active learning to supervised word sense disambiguation in MEDLINE.

Authors:  Yukun Chen; Hongxin Cao; Qiaozhu Mei; Kai Zheng; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2013-01-30       Impact factor: 4.497

6.  Unlocking clinical data from narrative reports: a study of natural language processing.

Authors:  G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal:  Ann Intern Med       Date:  1995-05-01       Impact factor: 25.391

Review 7.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge.

Authors:  Weiyi Sun; Anna Rumshisky; Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2013-04-05       Impact factor: 4.497

8.  Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.

Authors:  Berry de Bruijn; Colin Cherry; Svetlana Kiritchenko; Joel Martin; Xiaodan Zhu
Journal:  J Am Med Inform Assoc       Date:  2011-05-12       Impact factor: 4.497

9.  Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.

Authors:  Brett R South; Danielle Mowery; Ying Suo; Jianwei Leng; Óscar Ferrández; Stephane M Meystre; Wendy W Chapman
Journal:  J Biomed Inform       Date:  2014-05-20       Impact factor: 6.317

10.  Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

Authors:  Buzhou Tang; Hongxin Cao; Yonghui Wu; Min Jiang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-05       Impact factor: 2.796

View more
  16 in total

1.  Efficient Active Learning for Electronic Medical Record De-identification.

Authors:  Muqun Li; Martin Scaiano; Khaled El Emam; Bradley A Malin
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2019-05-06

2.  The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development.

Authors:  Joshua C Denny; Sara L Van Driest; Wei-Qi Wei; Dan M Roden
Journal:  Clin Pharmacol Ther       Date:  2018-02-05       Impact factor: 6.875

3.  A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model.

Authors:  Xinhang Li; Hao Liu; Fabrício Kury; Chi Yuan; Alex Butler; Yingcheng Sun; Anna Ostropolets; Hua Xu; Chunhua Weng
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2021-05-17

4.  A computational framework for converting textual clinical diagnostic criteria into the quality data model.

Authors:  Na Hong; Dingcheng Li; Yue Yu; Qiongying Xiu; Hongfang Liu; Guoqian Jiang
Journal:  J Biomed Inform       Date:  2016-07-19       Impact factor: 6.317

5.  A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience.

Authors:  Matthew Shardlow; Meizhi Ju; Maolin Li; Christian O'Reilly; Elisabetta Iavarone; John McNaught; Sophia Ananiadou
Journal:  Neuroinformatics       Date:  2019-07

6.  Cost-aware active learning for named entity recognition in clinical text.

Authors:  Qiang Wei; Yukun Chen; Mandana Salimi; Joshua C Denny; Qiaozhu Mei; Thomas A Lasko; Qingxia Chen; Stephen Wu; Amy Franklin; Trevor Cohen; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

7.  Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.

Authors:  Kevin Lybarger; Mari Ostendorf; Meliha Yetisgen
Journal:  J Biomed Inform       Date:  2020-12-05       Impact factor: 6.317

Review 8.  Artificial intelligence to deep learning: machine intelligence approach for drug discovery.

Authors:  Rohan Gupta; Devesh Srivastava; Mehar Sahu; Swati Tiwari; Rashmi K Ambasta; Pravir Kumar
Journal:  Mol Divers       Date:  2021-04-12       Impact factor: 3.364

9.  A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text.

Authors:  Mujiono Sadikin; Mohamad Ivan Fanany; T Basaruddin
Journal:  Comput Intell Neurosci       Date:  2016-10-24

10.  Leveraging network analysis to evaluate biomedical named entity recognition tools.

Authors:  Eduardo P García Del Valle; Gerardo Lagunes García; Lucía Prieto Santamaría; Massimiliano Zanin; Ernestina Menasalvas Ruiz; Alejandro Rodríguez-González
Journal:  Sci Rep       Date:  2021-06-29       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.