Literature DB >> 34823030

Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types.

Kevin De Angeli1, Shang Gao2, Ioana Danciu3, Eric B Durbin4, Xiao-Cheng Wu5, Antoinette Stroup6, Jennifer Doherty7, Stephen Schwartz8, Charles Wiggins9, Mark Damesyn10, Linda Coyle11, Lynne Penberthy12, Georgia D Tourassi2, Hong-Jun Yoon2.   

Abstract

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.
Copyright © 2021. Published by Elsevier Inc.

Entities:  

Keywords:  CNN; Class Imbalance; Deep Learning; Ensemble; NLP; Text Classification

Mesh:

Year:  2021        PMID: 34823030      PMCID: PMC9274264          DOI: 10.1016/j.jbi.2021.103957

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   8.000


  14 in total

1.  Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.

Authors:  Anthony Rios; Ramakanth Kavuluru
Journal:  ACM BCB       Date:  2015-09

2.  Medical Text Classification Using Convolutional Neural Networks.

Authors:  Mark Hughes; Irene Li; Spyros Kotoulas; Toyotaro Suzumura
Journal:  Stud Health Technol Inform       Date:  2017

3.  Classifying medical relations in clinical text via convolutional neural networks.

Authors:  Bin He; Yi Guan; Rui Dai
Journal:  Artif Intell Med       Date:  2018-05-18       Impact factor: 5.326

4.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning.

Authors:  Takeru Miyato; Shin-Ichi Maeda; Masanori Koyama; Shin Ishii
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2018-07-23       Impact factor: 6.226

5.  Measuring Domain Shift for Deep Learning in Histopathology.

Authors:  Karin Stacke; Gabriel Eilertsen; Jonas Unger; Claes Lundstrom
Journal:  IEEE J Biomed Health Inform       Date:  2021-02-05       Impact factor: 5.772

6.  Classifying cancer pathology reports with hierarchical self-attention networks.

Authors:  Shang Gao; John X Qiu; Mohammed Alawad; Jacob D Hinkle; Noah Schaefferkoetter; Hong-Jun Yoon; Blair Christian; Paul A Fearn; Lynne Penberthy; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi; Arvind Ramanathan
Journal:  Artif Intell Med       Date:  2019-10-15       Impact factor: 5.326

7.  Clinical text classification with rule-based features and knowledge-guided convolutional neural networks.

Authors:  Liang Yao; Chengsheng Mao; Yuan Luo
Journal:  BMC Med Inform Decis Mak       Date:  2019-04-04       Impact factor: 2.796

8.  Deep active learning for classifying cancer pathology reports.

Authors:  Kevin De Angeli; Shang Gao; Mohammed Alawad; Hong-Jun Yoon; Noah Schaefferkoetter; Xiao-Cheng Wu; Eric B Durbin; Jennifer Doherty; Antoinette Stroup; Linda Coyle; Lynne Penberthy; Georgia Tourassi
Journal:  BMC Bioinformatics       Date:  2021-03-09       Impact factor: 3.169

9.  Hierarchical attention networks for information extraction from cancer pathology reports.

Authors:  Shang Gao; Michael T Young; John X Qiu; Hong-Jun Yoon; James B Christian; Paul A Fearn; Georgia D Tourassi; Arvind Ramanthan
Journal:  J Am Med Inform Assoc       Date:  2018-03-01       Impact factor: 4.497

10.  Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors:  Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

View more
  1 in total

1.  Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports.

Authors:  Kevin De Angeli; Shang Gao; Andrew Blanchard; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen M Schwartz; Charles Wiggins; Linda Coyle; Lynne Penberthy; Georgia Tourassi; Hong-Jun Yoon
Journal:  JAMIA Open       Date:  2022-09-13
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.