Literature DB >> 36117774

Privacy-Preserving Deep Learning NLP Models for Cancer Registries.

Mohammed Alawad1, Hong-Jun Yoon1, Shang Gao1, Brent Mumphrey2, Xiao-Cheng Wu2, Eric B Durbin3, Jong Cheol Jeong4, Isaac Hands5, David Rust6, Linda Coyle7, Lynne Penberthy8, Georgia Tourassi1.   

Abstract

Population cancer registries can benefit from Deep Learning (DL) to automatically extract cancer characteristics from the high volume of unstructured pathology text reports they process annually. The success of DL to tackle this and other real-world problems is proportional to the availability of large labeled datasets for model training. Although collaboration among cancer registries is essential to fully exploit the promise of DL, privacy and confidentiality concerns are main obstacles for data sharing across cancer registries. Moreover, DL for natural language processing (NLP) requires sharing a vocabulary dictionary for the embedding layer which may contain patient identifiers. Thus, even distributing the trained models across cancer registries causes a privacy violation issue. In this paper, we propose DL NLP model distribution via privacy-preserving transfer learning approaches without sharing sensitive data. These approaches are used to distribute a multitask convolutional neural network (MT-CNN) NLP model among cancer registries. The model is trained to extract six key cancer characteristics - tumor site, subsite, laterality, behavior, histology, and grade - from cancer pathology reports. Using 410,064 pathology documents from two cancer registries, we compare our proposed approach to conventional transfer learning without privacy-preserving, single-registry models, and a model trained on centrally hosted data. The results show that transfer learning approaches including data sharing and model distribution outperform significantly the single-registry model. In addition, the best performing privacy-preserving model distribution approach achieves statistically indistinguishable average micro- and macro-F1 scores across all extraction tasks (0.823,0.580) as compared to the centralized model (0.827,0.585).

Entities:  

Keywords:  NLP; Privacy-preserving; cancer pathology reports; information extraction; multi-task CNN; transfer learning

Year:  2020        PMID: 36117774      PMCID: PMC9481201          DOI: 10.1109/tetc.2020.2983404

Source DB:  PubMed          Journal:  IEEE Trans Emerg Top Comput        ISSN: 2168-6750            Impact factor:   6.595


  16 in total

Review 1.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

2.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors:  Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2011-04-20       Impact factor: 4.497

3.  Transfer Learning with Convolutional Neural Networks for Classification of Abdominal Ultrasound Images.

Authors:  Phillip M Cheng; Harshawn S Malhi
Journal:  J Digit Imaging       Date:  2017-04       Impact factor: 4.056

4.  Classifying cancer pathology reports with hierarchical self-attention networks.

Authors:  Shang Gao; John X Qiu; Mohammed Alawad; Jacob D Hinkle; Noah Schaefferkoetter; Hong-Jun Yoon; Blair Christian; Paul A Fearn; Lynne Penberthy; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi; Arvind Ramanathan
Journal:  Artif Intell Med       Date:  2019-10-15       Impact factor: 5.326

5.  Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports.

Authors:  Mohammed Alawad; Shang Gao; John Qiu; Noah Schaefferkoetter; Jacob D Hinkle; Hong-Jun Yoon; J Blair Christian; Xiao-Cheng Wu; Eric B Durbin; Jong Cheol Jeong; Isaac Hands; David Rust; Georgia Tourassi
Journal:  IEEE EMBS Int Conf Biomed Health Inform       Date:  2019-09-12

Review 6.  Clinical information extraction applications: A literature review.

Authors:  Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2017-11-21       Impact factor: 6.317

7.  BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors:  Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal:  Sci Data       Date:  2019-05-10       Impact factor: 6.444

8.  A privacy-preserving distributed filtering framework for NLP artifacts.

Authors:  Md Nazmus Sadat; Md Momin Al Aziz; Noman Mohammed; Serguei Pakhomov; Hongfang Liu; Xiaoqian Jiang
Journal:  BMC Med Inform Decis Mak       Date:  2019-09-07       Impact factor: 2.796

9.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Authors:  Andrew L Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin Weber; Nathan Palmer; Xu Shi; Tianxi Cai; Isaac S Kohane
Journal:  Pac Symp Biocomput       Date:  2020

10.  Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors:  Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.