Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

Literature DB >> 32919043

Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

Hong-Jun Yoon¹, Hilda B Klasky², John P Gounley³, Mohammed Alawad⁴, Shang Gao⁵, Eric B Durbin⁶, Xiao-Cheng Wu⁷, Antoinette Stroup⁸, Jennifer Doherty⁹, Linda Coyle¹⁰, Lynne Penberthy¹¹, J Blair Christian¹², Georgia D Tourassi¹³.

Abstract

OBJECTIVE: In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems.
MATERIALS AND METHODS: The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL).
RESULTS: We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement.
CONCLUSION: Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.

Entities: CellLine Chemical Disease Gene Species

Keywords: Bootstrap aggregation; Convolutional neural networks; Data partitioning; Deep learning; Hierarchical self-attention networks; High-performance computing; Natural language processing

Year: 2020 PMID： 32919043 PMCID： PMC8276580 DOI： 10.1016/j.jbi.2020.103564

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
References

13 in total

1. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting.

Authors: Saba Bashir; Usman Qamar; Farhan Hassan Khan
Journal: Australas Phys Eng Sci Med Date: 2015-03-10 Impact factor: 1.430

2. Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

Authors: John X Qiu; Hong-Jun Yoon; Paul A Fearn; Georgia D Tourassi
Journal: IEEE J Biomed Health Inform Date: 2017-05-03 Impact factor: 5.772

3. A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.

Authors: Yonghui Wu; Joshua C Denny; S Trent Rosenbloom; Randolph A Miller; Dario A Giuse; Hua Xu
Journal: AMIA Annu Symp Proc Date: 2012-11-03

4. Classifying cancer pathology reports with hierarchical self-attention networks.

Authors: Shang Gao; John X Qiu; Mohammed Alawad; Jacob D Hinkle; Noah Schaefferkoetter; Hong-Jun Yoon; Blair Christian; Paul A Fearn; Lynne Penberthy; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi; Arvind Ramanathan
Journal: Artif Intell Med Date: 2019-10-15 Impact factor: 5.326

5. Epileptic seizure detection in EEG signals using tunable-Q factor wavelet transform and bootstrap aggregating.

Authors: Ahnaf Rashik Hassan; Siuly Siuly; Yanchun Zhang
Journal: Comput Methods Programs Biomed Date: 2016-09-26 Impact factor: 5.428

6. Using machine learning to parse breast pathology reports.

Authors: Adam Yala; Regina Barzilay; Laura Salama; Molly Griffin; Grace Sollender; Aditya Bardia; Constance Lehman; Julliette M Buckley; Suzanne B Coopey; Fernanda Polubriaginof; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Thomas M Gudewicz; Anthony J Guidi; Alphonse Taghian; Kevin S Hughes
Journal: Breast Cancer Res Treat Date: 2016-11-08 Impact factor: 4.872

7. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.

Authors: Freddie Bray; Jacques Ferlay; Isabelle Soerjomataram; Rebecca L Siegel; Lindsey A Torre; Ahmedin Jemal
Journal: CA Cancer J Clin Date: 2018-09-12 Impact factor: 508.702

8. The feasibility of using natural language processing to extract clinical information from breast pathology reports.

Authors: Julliette M Buckley; Suzanne B Coopey; John Sharko; Fernanda Polubriaginof; Brian Drohan; Ahmet K Belli; Elizabeth M H Kim; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Constance A Roche; Thomas M Gudewicz; Kevin S Hughes
Journal: J Pathol Inform Date: 2012-06-30

9. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors: Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal: J Am Med Inform Assoc Date: 2020-01-01 Impact factor: 4.497

10. BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors: Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937