Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Privacy-Preserving Deep Learning NLP Models for Cancer Registries.

Literature DB >> 36117774

Privacy-Preserving Deep Learning NLP Models for Cancer Registries.

Mohammed Alawad¹, Hong-Jun Yoon¹, Shang Gao¹, Brent Mumphrey², Xiao-Cheng Wu², Eric B Durbin³, Jong Cheol Jeong⁴, Isaac Hands⁵, David Rust⁶, Linda Coyle⁷, Lynne Penberthy⁸, Georgia Tourassi¹.

Abstract

Population cancer registries can benefit from Deep Learning (DL) to automatically extract cancer characteristics from the high volume of unstructured pathology text reports they process annually. The success of DL to tackle this and other real-world problems is proportional to the availability of large labeled datasets for model training. Although collaboration among cancer registries is essential to fully exploit the promise of DL, privacy and confidentiality concerns are main obstacles for data sharing across cancer registries. Moreover, DL for natural language processing (NLP) requires sharing a vocabulary dictionary for the embedding layer which may contain patient identifiers. Thus, even distributing the trained models across cancer registries causes a privacy violation issue. In this paper, we propose DL NLP model distribution via privacy-preserving transfer learning approaches without sharing sensitive data. These approaches are used to distribute a multitask convolutional neural network (MT-CNN) NLP model among cancer registries. The model is trained to extract six key cancer characteristics - tumor site, subsite, laterality, behavior, histology, and grade - from cancer pathology reports. Using 410,064 pathology documents from two cancer registries, we compare our proposed approach to conventional transfer learning without privacy-preserving, single-registry models, and a model trained on centrally hosted data. The results show that transfer learning approaches including data sharing and model distribution outperform significantly the single-registry model. In addition, the best performing privacy-preserving model distribution approach achieves statistically indistinguishable average micro- and macro-F1 scores across all extraction tasks (0.823,0.580) as compared to the centralized model (0.827,0.585).

Entities: Chemical

Keywords: NLP; Privacy-preserving; cancer pathology reports; information extraction; multi-task CNN; transfer learning

Year: 2020 PMID： 36117774 PMCID： PMC9481201 DOI： 10.1109/tetc.2020.2983404

Source DB: PubMed Journal: IEEE Trans Emerg Top Comput ISSN： 2168-6750 Impact factor: 6.595

Keyword Cloud
References

16 in total

Review 1. De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors: Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal: J Biomed Inform Date: 2017-06-11 Impact factor: 6.317

2. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors: Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal: J Am Med Inform Assoc Date: 2011-04-20 Impact factor: 4.497

3. Transfer Learning with Convolutional Neural Networks for Classification of Abdominal Ultrasound Images.

Authors: Phillip M Cheng; Harshawn S Malhi
Journal: J Digit Imaging Date: 2017-04 Impact factor: 4.056

4. Classifying cancer pathology reports with hierarchical self-attention networks.

Authors: Shang Gao; John X Qiu; Mohammed Alawad; Jacob D Hinkle; Noah Schaefferkoetter; Hong-Jun Yoon; Blair Christian; Paul A Fearn; Lynne Penberthy; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi; Arvind Ramanathan
Journal: Artif Intell Med Date: 2019-10-15 Impact factor: 5.326

5. Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports.

Authors: Mohammed Alawad; Shang Gao; John Qiu; Noah Schaefferkoetter; Jacob D Hinkle; Hong-Jun Yoon; J Blair Christian; Xiao-Cheng Wu; Eric B Durbin; Jong Cheol Jeong; Isaac Hands; David Rust; Georgia Tourassi
Journal: IEEE EMBS Int Conf Biomed Health Inform Date: 2019-09-12

Review 6. Clinical information extraction applications: A literature review.

Authors: Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu
Journal: J Biomed Inform Date: 2017-11-21 Impact factor: 6.317

7. BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors: Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal: Sci Data Date: 2019-05-10 Impact factor: 6.444

8. A privacy-preserving distributed filtering framework for NLP artifacts.

Authors: Md Nazmus Sadat; Md Momin Al Aziz; Noman Mohammed; Serguei Pakhomov; Hongfang Liu; Xiaoqian Jiang
Journal: BMC Med Inform Decis Mak Date: 2019-09-07 Impact factor: 2.796

9. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Authors: Andrew L Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin Weber; Nathan Palmer; Xu Shi; Tianxi Cai; Isaac S Kohane
Journal: Pac Symp Biocomput Date: 2020

10. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors: Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal: J Am Med Inform Assoc Date: 2020-01-01 Impact factor: 4.497