Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types.

Literature DB >> 34823030

Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types.

Kevin De Angeli¹, Shang Gao², Ioana Danciu³, Eric B Durbin⁴, Xiao-Cheng Wu⁵, Antoinette Stroup⁶, Jennifer Doherty⁷, Stephen Schwartz⁸, Charles Wiggins⁹, Mark Damesyn¹⁰, Linda Coyle¹¹, Lynne Penberthy¹², Georgia D Tourassi², Hong-Jun Yoon².

Abstract

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.

Entities: Chemical

Keywords: CNN; Class Imbalance; Deep Learning; Ensemble; NLP; Text Classification

Mesh：

Year: 2021 PMID： 34823030 PMCID： PMC9274264 DOI： 10.1016/j.jbi.2021.103957

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 8.000

14 in total

1. Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.

Authors: Anthony Rios; Ramakanth Kavuluru
Journal: ACM BCB Date: 2015-09

2. Medical Text Classification Using Convolutional Neural Networks.

Authors: Mark Hughes; Irene Li; Spyros Kotoulas; Toyotaro Suzumura
Journal: Stud Health Technol Inform Date: 2017

3. Classifying medical relations in clinical text via convolutional neural networks.

Authors: Bin He; Yi Guan; Rui Dai
Journal: Artif Intell Med Date: 2018-05-18 Impact factor: 5.326

4. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning.

Authors: Takeru Miyato; Shin-Ichi Maeda; Masanori Koyama; Shin Ishii
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2018-07-23 Impact factor: 6.226

5. Measuring Domain Shift for Deep Learning in Histopathology.

Authors: Karin Stacke; Gabriel Eilertsen; Jonas Unger; Claes Lundstrom
Journal: IEEE J Biomed Health Inform Date: 2021-02-05 Impact factor: 5.772

6. Classifying cancer pathology reports with hierarchical self-attention networks.

Authors: Shang Gao; John X Qiu; Mohammed Alawad; Jacob D Hinkle; Noah Schaefferkoetter; Hong-Jun Yoon; Blair Christian; Paul A Fearn; Lynne Penberthy; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi; Arvind Ramanathan
Journal: Artif Intell Med Date: 2019-10-15 Impact factor: 5.326

7. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks.

Authors: Liang Yao; Chengsheng Mao; Yuan Luo
Journal: BMC Med Inform Decis Mak Date: 2019-04-04 Impact factor: 2.796

8. Deep active learning for classifying cancer pathology reports.

Authors: Kevin De Angeli; Shang Gao; Mohammed Alawad; Hong-Jun Yoon; Noah Schaefferkoetter; Xiao-Cheng Wu; Eric B Durbin; Jennifer Doherty; Antoinette Stroup; Linda Coyle; Lynne Penberthy; Georgia Tourassi
Journal: BMC Bioinformatics Date: 2021-03-09 Impact factor: 3.169

9. Hierarchical attention networks for information extraction from cancer pathology reports.

Authors: Shang Gao; Michael T Young; John X Qiu; Hong-Jun Yoon; James B Christian; Paul A Fearn; Georgia D Tourassi; Arvind Ramanthan
Journal: J Am Med Inform Assoc Date: 2018-03-01 Impact factor: 4.497

10. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors: Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal: J Am Med Inform Assoc Date: 2020-01-01 Impact factor: 4.497

1 in total

1. Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports.

Authors: Kevin De Angeli; Shang Gao; Andrew Blanchard; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen M Schwartz; Charles Wiggins; Linda Coyle; Lynne Penberthy; Georgia Tourassi; Hong-Jun Yoon
Journal: JAMIA Open Date: 2022-09-13

1 in total