Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A study of active learning methods for named entity recognition in clinical text.

Literature DB >> 26385377

A study of active learning methods for named entity recognition in clinical text.

Yukun Chen¹, Thomas A Lasko¹, Qiaozhu Mei², Joshua C Denny³, Hua Xu⁴.

Abstract

OBJECTIVES: Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes.
METHODS: Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed.
RESULTS: Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort.
CONCLUSION: In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting.

Entities: CellLine Chemical Disease Gene Species

Keywords: Active learning; Clinical named entity recognition; Clinical natural language processing; Machine learning

Mesh：

Year: 2015 PMID： 26385377 PMCID： PMC4934373 DOI： 10.1016/j.jbi.2015.09.010

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

26 in total

1. Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.

Authors: Son Doan; Hua Xu
Journal: Proc Int Conf Comput Ling Date: 2010-08

2. Development and evaluation of a clinical note section header terminology.

Authors: Joshua C Denny; Randolph A Miller; Kevin B Johnson; Anderson Spickard
Journal: AMIA Annu Symp Proc Date: 2008-11-06

3. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors: Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal: J Am Med Inform Assoc Date: 2011-04-20 Impact factor: 4.497

4. UMLS-Interface and UMLS-Similarity : open source software for measuring paths and semantic similarity.

Authors: Bridget T McInnes; Ted Pedersen; Serguei V S Pakhomov
Journal: AMIA Annu Symp Proc Date: 2009-11-14

5. Applying active learning to supervised word sense disambiguation in MEDLINE.

Authors: Yukun Chen; Hongxin Cao; Qiaozhu Mei; Kai Zheng; Hua Xu
Journal: J Am Med Inform Assoc Date: 2013-01-30 Impact factor: 4.497

6. Unlocking clinical data from narrative reports: a study of natural language processing.

Authors: G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal: Ann Intern Med Date: 1995-05-01 Impact factor: 25.391

Review 7. Evaluating temporal relations in clinical text: 2012 i2b2 Challenge.

Authors: Weiyi Sun; Anna Rumshisky; Ozlem Uzuner
Journal: J Am Med Inform Assoc Date: 2013-04-05 Impact factor: 4.497

8. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.

Authors: Berry de Bruijn; Colin Cherry; Svetlana Kiritchenko; Joel Martin; Xiaodan Zhu
Journal: J Am Med Inform Assoc Date: 2011-05-12 Impact factor: 4.497

9. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.

Authors: Brett R South; Danielle Mowery; Ying Suo; Jianwei Leng; Óscar Ferrández; Stephane M Meystre; Wendy W Chapman
Journal: J Biomed Inform Date: 2014-05-20 Impact factor: 6.317

10. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

Authors: Buzhou Tang; Hongxin Cao; Yonghui Wu; Min Jiang; Hua Xu
Journal: BMC Med Inform Decis Mak Date: 2013-04-05 Impact factor: 2.796

16 in total

1. Efficient Active Learning for Electronic Medical Record De-identification.

Authors: Muqun Li; Martin Scaiano; Khaled El Emam; Bradley A Malin
Journal: AMIA Jt Summits Transl Sci Proc Date: 2019-05-06

2. The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development.

Authors: Joshua C Denny; Sara L Van Driest; Wei-Qi Wei; Dan M Roden
Journal: Clin Pharmacol Ther Date: 2018-02-05 Impact factor: 6.875

3. A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model.

Authors: Xinhang Li; Hao Liu; Fabrício Kury; Chi Yuan; Alex Butler; Yingcheng Sun; Anna Ostropolets; Hua Xu; Chunhua Weng
Journal: AMIA Jt Summits Transl Sci Proc Date: 2021-05-17

4. A computational framework for converting textual clinical diagnostic criteria into the quality data model.

Authors: Na Hong; Dingcheng Li; Yue Yu; Qiongying Xiu; Hongfang Liu; Guoqian Jiang
Journal: J Biomed Inform Date: 2016-07-19 Impact factor: 6.317

5. A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience.

Authors: Matthew Shardlow; Meizhi Ju; Maolin Li; Christian O'Reilly; Elisabetta Iavarone; John McNaught; Sophia Ananiadou
Journal: Neuroinformatics Date: 2019-07

6. Cost-aware active learning for named entity recognition in clinical text.

Authors: Qiang Wei; Yukun Chen; Mandana Salimi; Joshua C Denny; Qiaozhu Mei; Thomas A Lasko; Qingxia Chen; Stephen Wu; Amy Franklin; Trevor Cohen; Hua Xu
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

7. Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.

Authors: Kevin Lybarger; Mari Ostendorf; Meliha Yetisgen
Journal: J Biomed Inform Date: 2020-12-05 Impact factor: 6.317