Literature DB >> 22707743

Active learning for clinical text classification: is it better than random sampling?

Rosa L Figueroa1, Qing Zeng-Treitler, Long H Ngo, Sergey Goryachev, Eduardo P Wiechmann.   

Abstract

OBJECTIVE: This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks.
DESIGN: Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. MEASUREMENTS: Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences.
RESULTS: The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm.
CONCLUSION: For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.

Entities:  

Mesh:

Year:  2012        PMID: 22707743      PMCID: PMC3422824          DOI: 10.1136/amiajnl-2011-000648

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  14 in total

1.  Support vector machine classification and validation of cancer tissue samples using microarray expression data.

Authors:  T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

2.  Active learning with support vector machines in the drug discovery process.

Authors:  Manfred K Warmuth; Jun Liao; Gunnar Rätsch; Michael Mathieson; Santosh Putta; Christian Lemmen
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr

Review 3.  Two biomedical sublanguages: a description based on the theories of Zellig Harris.

Authors:  Carol Friedman; Pauline Kra; Andrey Rzhetsky
Journal:  J Biomed Inform       Date:  2002-08       Impact factor: 6.317

4.  Active learning for an efficient training strategy of computer-aided diagnosis systems: application to diabetic retinopathy screening.

Authors:  C I Sánchez; M Niemeijer; M D Abràmoff; B van Ginneken
Journal:  Med Image Comput Comput Assist Interv       Date:  2010

5.  Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.

Authors:  Kaihong Liu; Wendy Chapman; Rebecca Hwa; Rebecca S Crowley
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

6.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model.

Authors:  R H Perlis; D V Iosifescu; V M Castro; S N Murphy; V S Gainer; J Minnier; T Cai; S Goryachev; Q Zeng; P J Gallagher; M Fava; J B Weilburg; S E Churchill; I S Kohane; J W Smoller
Journal:  Psychol Med       Date:  2011-06-20       Impact factor: 7.723

7.  Classification models for the prediction of clinicians' information needs.

Authors:  Guilherme Del Fiol; Peter J Haug
Journal:  J Biomed Inform       Date:  2008-07-13       Impact factor: 6.317

8.  Mayo clinic NLP system for patient smoking status identification.

Authors:  Guergana K Savova; Philip V Ogren; Patrick H Duffy; James D Buntrock; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2007-10-18       Impact factor: 4.497

9.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system.

Authors:  Qing T Zeng; Sergey Goryachev; Scott Weiss; Margarita Sordo; Shawn N Murphy; Ross Lazarus
Journal:  BMC Med Inform Decis Mak       Date:  2006-07-26       Impact factor: 2.796

10.  Accelerating the annotation of sparse named entities by dynamic sentence selection.

Authors:  Yoshimasa Tsuruoka; Jun'ichi Tsujii; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2008-11-19       Impact factor: 3.169

View more
  18 in total

1.  Applying active learning to supervised word sense disambiguation in MEDLINE.

Authors:  Yukun Chen; Hongxin Cao; Qiaozhu Mei; Kai Zheng; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2013-01-30       Impact factor: 4.497

2.  Using statistical text classification to identify health information technology incidents.

Authors:  Kevin E K Chai; Stephen Anthony; Enrico Coiera; Farah Magrabi
Journal:  J Am Med Inform Assoc       Date:  2013-05-10       Impact factor: 4.497

3.  Learning regular expressions for clinical text classification.

Authors:  Duy Duc An Bui; Qing Zeng-Treitler
Journal:  J Am Med Inform Assoc       Date:  2014-02-27       Impact factor: 4.497

4.  Supervised machine learning and active learning in classification of radiology reports.

Authors:  Dung H M Nguyen; Jon D Patrick
Journal:  J Am Med Inform Assoc       Date:  2014-05-22       Impact factor: 4.497

5.  Improving condition severity classification with an efficient active learning based framework.

Authors:  Nir Nissim; Mary Regina Boland; Nicholas P Tatonetti; Yuval Elovici; George Hripcsak; Yuval Shahar; Robert Moskovitch
Journal:  J Biomed Inform       Date:  2016-03-22       Impact factor: 6.317

6.  Inter-labeler and intra-labeler variability of condition severity classification models using active and passive learning methods.

Authors:  Nir Nissim; Yuval Shahar; Yuval Elovici; George Hripcsak; Robert Moskovitch
Journal:  Artif Intell Med       Date:  2017-04-27       Impact factor: 5.326

7.  A study of active learning methods for named entity recognition in clinical text.

Authors:  Yukun Chen; Thomas A Lasko; Qiaozhu Mei; Joshua C Denny; Hua Xu
Journal:  J Biomed Inform       Date:  2015-09-15       Impact factor: 6.317

8.  Active learning: a step towards automating medical concept extraction.

Authors:  Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
Journal:  J Am Med Inform Assoc       Date:  2015-08-07       Impact factor: 4.497

9.  Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings.

Authors:  David S Carrell; Robert E Schoen; Daniel A Leffler; Michele Morris; Sherri Rose; Andrew Baer; Seth D Crockett; Rebecca A Gourevitch; Katie M Dean; Ateev Mehrotra
Journal:  J Am Med Inform Assoc       Date:  2017-09-01       Impact factor: 4.497

10.  TextHunter--A User Friendly Tool for Extracting Generic Concepts from Free Text in Clinical Research.

Authors:  Richard G Jackson MSc; Michael Ball; Rashmi Patel; Richard D Hayes; Richard J B Dobson; Robert Stewart
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.