Literature DB >> 33750288

Deep active learning for classifying cancer pathology reports.

Kevin De Angeli1,2, Shang Gao3, Mohammed Alawad1, Hong-Jun Yoon1, Noah Schaefferkoetter1, Xiao-Cheng Wu4, Eric B Durbin5, Jennifer Doherty6, Antoinette Stroup7, Linda Coyle8, Lynne Penberthy9, Georgia Tourassi1.   

Abstract

BACKGROUND: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model.
RESULTS: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes.
CONCLUSIONS: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling.

Entities:  

Keywords:  Active learning; Cancer pathology reports; Convolutional neural networks; Deep learning; Text classification

Mesh:

Year:  2021        PMID: 33750288      PMCID: PMC7941989          DOI: 10.1186/s12859-021-04047-1

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  8 in total

1.  Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

Authors:  John X Qiu; Hong-Jun Yoon; Paul A Fearn; Georgia D Tourassi
Journal:  IEEE J Biomed Health Inform       Date:  2017-05-03       Impact factor: 5.772

2.  Active learning for clinical text classification: is it better than random sampling?

Authors:  Rosa L Figueroa; Qing Zeng-Treitler; Long H Ngo; Sergey Goryachev; Eduardo P Wiechmann
Journal:  J Am Med Inform Assoc       Date:  2012-06-15       Impact factor: 4.497

3.  Applying active learning to assertion classification of concepts in clinical text.

Authors:  Yukun Chen; Subramani Mani; Hua Xu
Journal:  J Biomed Inform       Date:  2011-11-22       Impact factor: 6.317

4.  Classifying cancer pathology reports with hierarchical self-attention networks.

Authors:  Shang Gao; John X Qiu; Mohammed Alawad; Jacob D Hinkle; Noah Schaefferkoetter; Hong-Jun Yoon; Blair Christian; Paul A Fearn; Lynne Penberthy; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi; Arvind Ramanathan
Journal:  Artif Intell Med       Date:  2019-10-15       Impact factor: 5.326

5.  Active learning: a step towards automating medical concept extraction.

Authors:  Mahnoosh Kholghi; Laurianne Sitbon; Guido Zuccon; Anthony Nguyen
Journal:  J Am Med Inform Assoc       Date:  2015-08-07       Impact factor: 4.497

6.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.

Authors:  Cao Xiao; Edward Choi; Jimeng Sun
Journal:  J Am Med Inform Assoc       Date:  2018-10-01       Impact factor: 4.497

7.  Using case-level context to classify cancer pathology reports.

Authors:  Shang Gao; Mohammed Alawad; Noah Schaefferkoetter; Lynne Penberthy; Xiao-Cheng Wu; Eric B Durbin; Linda Coyle; Arvind Ramanathan; Georgia Tourassi
Journal:  PLoS One       Date:  2020-05-12       Impact factor: 3.240

8.  Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors:  Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

  8 in total
  4 in total

1.  Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types.

Authors:  Kevin De Angeli; Shang Gao; Ioana Danciu; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen Schwartz; Charles Wiggins; Mark Damesyn; Linda Coyle; Lynne Penberthy; Georgia D Tourassi; Hong-Jun Yoon
Journal:  J Biomed Inform       Date:  2021-11-22       Impact factor: 8.000

2.  Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports.

Authors:  Kevin De Angeli; Shang Gao; Andrew Blanchard; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen M Schwartz; Charles Wiggins; Linda Coyle; Lynne Penberthy; Georgia Tourassi; Hong-Jun Yoon
Journal:  JAMIA Open       Date:  2022-09-13

3.  A Keyword-Enhanced Approach to Handle Class Imbalance in Clinical Text Classification.

Authors:  Andrew E Blanchard; Shang Gao; Hong-Jun Yoon; J Blair Christian; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen M Schwartz; Charles Wiggins; Linda Coyle; Lynne Penberthy; Georgia D Tourassi
Journal:  IEEE J Biomed Health Inform       Date:  2022-06-03       Impact factor: 7.021

Review 4.  Labels in a haystack: Approaches beyond supervised learning in biomedical applications.

Authors:  Artur Yakimovich; Anaël Beaugnon; Yi Huang; Elif Ozkirimli
Journal:  Patterns (N Y)       Date:  2021-12-10
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.