Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology.

Literature DB >> 35213361

Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology.

Hong-Jun Yoon¹, Christopher Stanley¹, J Blair Christian¹, Hilda B Klasky¹, Andrew E Blanchard¹, Eric B Durbin², Xiao-Cheng Wu³, Antoinette Stroup⁴, Jennifer Doherty⁵, Stephen M Schwartz⁶, Charles Wiggins⁷, Mark Damesyn⁸, Linda Coyle⁹, Georgia D Tourassi¹⁰.

Abstract

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information.
OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches.
METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments.
RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.

Entities: Chemical

Keywords: Privacy; artificial intelligence; cancer epidemiology; deep learning; natural language processing; privacy-preserving training

Mesh：

Year: 2022 PMID： 35213361 PMCID： PMC9377550 DOI： 10.3233/CBM-210306

Source DB: PubMed Journal: Cancer Biomark ISSN： 1574-0153 Impact factor: 3.828

Keyword Cloud
References

8 in total

Review 1. Privacy challenges and research opportunities for genomic data sharing.

Authors: Luca Bonomi; Yingxiang Huang; Lucila Ohno-Machado
Journal: Nat Genet Date: 2020-06-29 Impact factor: 38.330

2. A sparse deep learning model for privacy attack on remote sensing images.

Authors: Eric Ke Wang; Nie Zhe; Yue Ping Li; Zuo Dong Liang; Xun Zhang; Jun Tao Yu; Yun Ming Ye
Journal: Math Biosci Eng Date: 2019-02-20 Impact factor: 2.080

3. Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

Authors: John X Qiu; Hong-Jun Yoon; Paul A Fearn; Georgia D Tourassi
Journal: IEEE J Biomed Health Inform Date: 2017-05-03 Impact factor: 5.772

Review 4. Privacy in the age of medical big data.

Authors: W Nicholson Price; I Glenn Cohen
Journal: Nat Med Date: 2019-01-07 Impact factor: 87.241

5. CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research.

Authors: Justin M Wozniak; Rajeev Jain; Prasanna Balaprakash; Jonathan Ozik; Nicholson T Collier; John Bauer; Fangfang Xia; Thomas Brettin; Rick Stevens; Jamaludin Mohd-Yusof; Cristina Garcia Cardona; Brian Van Essen; Matthew Baughman
Journal: BMC Bioinformatics Date: 2018-12-21 Impact factor: 3.169

6. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.

Authors: Nils Homer; Szabolcs Szelinger; Margot Redman; David Duggan; Waibhav Tembe; Jill Muehling; John V Pearson; Dietrich A Stephan; Stanley F Nelson; David W Craig
Journal: PLoS Genet Date: 2008-08-29 Impact factor: 5.917

7. Big data phenotyping in rare diseases: some ethical issues.

Authors: Nina Hallowell; Michael Parker; Christoffer Nellåker
Journal: Genet Med Date: 2019-02 Impact factor: 8.822

8. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.

Authors: Mohammed Alawad; Shang Gao; John X Qiu; Hong Jun Yoon; J Blair Christian; Lynne Penberthy; Brent Mumphrey; Xiao-Cheng Wu; Linda Coyle; Georgia Tourassi
Journal: J Am Med Inform Assoc Date: 2020-01-01 Impact factor: 4.497

8 in total