Literature DB >> 23364851

Applying active learning to supervised word sense disambiguation in MEDLINE.

Yukun Chen1, Hongxin Cao, Qiaozhu Mei, Kai Zheng, Hua Xu.   

Abstract

OBJECTIVES: This study was to assess whether active learning strategies can be integrated with supervised word sense disambiguation (WSD) methods, thus reducing the number of annotated samples, while keeping or improving the quality of disambiguation models.
METHODS: We developed support vector machine (SVM) classifiers to disambiguate 197 ambiguous terms and abbreviations in the MSH WSD collection. Three different uncertainty sampling-based active learning algorithms were implemented with the SVM classifiers and were compared with a passive learner (PL) based on random sampling. For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy computed from the test set as a function of the number of annotated samples used in the model was generated. The area under the learning curve (ALC) was used as the primary metric for evaluation.
RESULTS: Our experiments demonstrated that active learners (ALs) significantly outperformed the PL, showing better performance for 177 out of 197 (89.8%) WSD tasks. Further analysis showed that to achieve an average accuracy of 90%, the PL needed 38 annotated samples, while the ALs needed only 24, a 37% reduction in annotation effort. Moreover, we analyzed cases where active learning algorithms did not achieve superior performance and identified three causes: (1) poor models in the early learning stage; (2) easy WSD cases; and (3) difficult WSD cases, which provide useful insight for future improvements.
CONCLUSIONS: This study demonstrated that integrating active learning strategies with supervised WSD methods could effectively reduce annotation cost and improve the disambiguation models.

Entities:  

Keywords:  Active Learning; Annotation; Machine Learning; Natural Language Processing; Uncertainty Sampling; Word Sense Disambiguation

Mesh:

Year:  2013        PMID: 23364851      PMCID: PMC3756255          DOI: 10.1136/amiajnl-2012-001244

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  11 in total

1.  A study of abbreviations in the UMLS.

Authors:  H Liu; Y A Lussier; C Friedman
Journal:  Proc AMIA Symp       Date:  2001

2.  Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS.

Authors:  Hongfang Liu; Stephen B Johnson; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2002 Nov-Dec       Impact factor: 4.497

3.  Gene name ambiguity of eukaryotic nomenclatures.

Authors:  Lifeng Chen; Hongfang Liu; Carol Friedman
Journal:  Bioinformatics       Date:  2004-08-27       Impact factor: 6.937

4.  A multi-aspect comparison study of supervised word sense disambiguation.

Authors:  Hongfang Liu; Virginia Teller; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2004-04-02       Impact factor: 4.497

Review 5.  Word sense disambiguation in the biomedical domain: an overview.

Authors:  Martijn J Schuemie; Jan A Kors; Barend Mons
Journal:  J Comput Biol       Date:  2005-06       Impact factor: 1.479

6.  Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles.

Authors:  Hong Yu; Won Kim; Vasileios Hatzivassiloglou; W John Wilbur
Journal:  J Biomed Inform       Date:  2006-06-07       Impact factor: 6.317

7.  Active learning for clinical text classification: is it better than random sampling?

Authors:  Rosa L Figueroa; Qing Zeng-Treitler; Long H Ngo; Sergey Goryachev; Eduardo P Wiechmann
Journal:  J Am Med Inform Assoc       Date:  2012-06-15       Impact factor: 4.497

8.  Applying active learning to assertion classification of concepts in clinical text.

Authors:  Yukun Chen; Subramani Mani; Hua Xu
Journal:  J Biomed Inform       Date:  2011-11-22       Impact factor: 6.317

9.  Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.

Authors:  Antonio J Jimeno-Yepes; Bridget T McInnes; Alan R Aronson
Journal:  BMC Bioinformatics       Date:  2011-06-02       Impact factor: 3.169

10.  Semi-automated screening of biomedical citations for systematic reviews.

Authors:  Byron C Wallace; Thomas A Trikalinos; Joseph Lau; Carla Brodley; Christopher H Schmid
Journal:  BMC Bioinformatics       Date:  2010-01-26       Impact factor: 3.169

View more
  11 in total

1.  Clinical Word Sense Disambiguation with Interactive Search and Classification.

Authors:  Yue Wang; Kai Zheng; Hua Xu; Qiaozhu Mei
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

2.  Supervised machine learning and active learning in classification of radiology reports.

Authors:  Dung H M Nguyen; Jon D Patrick
Journal:  J Am Med Inform Assoc       Date:  2014-05-22       Impact factor: 4.497

3.  Active deep learning for the identification of concepts and relations in electroencephalography reports.

Authors:  Ramon Maldonado; Sanda M Harabagiu
Journal:  J Biomed Inform       Date:  2019-08-27       Impact factor: 6.317

4.  Interactive medical word sense disambiguation through informed learning.

Authors:  Yue Wang; Kai Zheng; Hua Xu; Qiaozhu Mei
Journal:  J Am Med Inform Assoc       Date:  2018-07-01       Impact factor: 4.497

5.  Computerized "Learn-As-You-Go" classification of traumatic brain injuries using NEISS narrative data.

Authors:  Wei Chen; Krista K Wheeler; Simon Lin; Yungui Huang; Huiyun Xiang
Journal:  Accid Anal Prev       Date:  2016-02-03

6.  A study of active learning methods for named entity recognition in clinical text.

Authors:  Yukun Chen; Thomas A Lasko; Qiaozhu Mei; Joshua C Denny; Hua Xu
Journal:  J Biomed Inform       Date:  2015-09-15       Impact factor: 6.317

7.  An active learning-enabled annotation system for clinical named entity recognition.

Authors:  Yukun Chen; Thomas A Lask; Qiaozhu Mei; Qingxia Chen; Sungrim Moon; Jingqi Wang; Ky Nguyen; Tolulola Dawodu; Trevor Cohen; Joshua C Denny; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2017-07-05       Impact factor: 2.796

8.  Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies.

Authors:  Shuai Zheng; James J Lu; Nima Ghasemzadeh; Salim S Hayek; Arshed A Quyyumi; Fusheng Wang
Journal:  JMIR Med Inform       Date:  2017-05-09

9.  Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

Authors:  Xu Han; Jung-jae Kim; Chee Keong Kwoh
Journal:  J Biomed Semantics       Date:  2016-04-27

10.  Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning.

Authors:  Kevin Bretonnel Cohen; Benjamin Glass; Hansel M Greiner; Katherine Holland-Bouley; Shannon Standridge; Ravindra Arya; Robert Faist; Diego Morita; Francesco Mangano; Brian Connolly; Tracy Glauser; John Pestian
Journal:  Biomed Inform Insights       Date:  2016-05-22
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.