Literature DB >> 34920126

Membership inference attacks against synthetic health data.

Ziqi Zhang1, Chao Yan2, Bradley A Malin3.   

Abstract

Synthetic data generation has emerged as a promising method to protect patient privacy while sharing individual-level health data. Intuitively, sharing synthetic data should reduce disclosure risks because no explicit linkage is retained between the synthetic records and the real data upon which it is based. However, the risks associated with synthetic data are still evolving, and what seems protected today may not be tomorrow. In this paper, we show that membership inference attacks, whereby an adversary infers if the data from certain target individuals (known to the adversary a priori) were relied upon by the synthetic data generation process, can be substantially enhanced through state-of-the-art machine learning frameworks, which calls into question the protective nature of existing synthetic data generators. Specifically, we formulate the membership inference problem from the perspective of the data holder, who aims to perform a disclosure risk assessment prior to sharing any health data. To support such an assessment, we introduce a framework for effective membership inference against synthetic health data without specific assumptions about the generative model or a well-defined data structure, leveraging the principles of contrastive representation learning. To illustrate the potential for such an attack, we conducted experiments against synthesis approaches using two datasets derived from several health data resources (Vanderbilt University Medical Center, the All of Us Research Program) to determine the upper bound of risk brought by an adversary who invokes an optimal strategy. The results indicate that partially synthetic data are vulnerable to membership inference at a very high rate. By contrast, fully synthetic data are only marginally susceptible and, in most cases, could be deemed sufficiently protected from membership inference.
Copyright © 2021 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Contrastive representation learning; Electronic health record; Membership inference; Synthetic data

Mesh:

Year:  2021        PMID: 34920126      PMCID: PMC8766950          DOI: 10.1016/j.jbi.2021.103977

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  12 in total

Review 1.  Weaving technology and policy together to maintain confidentiality.

Authors:  L Sweeney
Journal:  J Law Med Ethics       Date:  1997 Summer-Fall       Impact factor: 1.718

2.  Focal Loss for Dense Object Detection.

Authors:  Tsung-Yi Lin; Priya Goyal; Ross Girshick; Kaiming He; Piotr Dollar
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2018-07-23       Impact factor: 6.226

3.  Optimizing drug outcomes through pharmacogenetics: a case for preemptive genotyping.

Authors:  J S Schildcrout; J C Denny; E Bowton; W Gregg; J M Pulley; M A Basford; J D Cowan; H Xu; A H Ramirez; D C Crawford; M D Ritchie; J F Peterson; D R Masys; R A Wilke; D M Roden
Journal:  Clin Pharmacol Ther       Date:  2012-06-27       Impact factor: 6.875

4.  Generating sequential electronic health records using dual adversarial autoencoder.

Authors:  Dongha Lee; Hwanjo Yu; Xiaoqian Jiang; Deevakar Rogith; Meghana Gudala; Mubeen Tejani; Qiuchen Zhang; Li Xiong
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

5.  Ensuring electronic medical record simulation through better training, modeling, and evaluation.

Authors:  Ziqi Zhang; Chao Yan; Diego A Mesa; Jimeng Sun; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

6.  SynTEG: a framework for temporal structured electronic health data simulation.

Authors:  Ziqi Zhang; Chao Yan; Thomas A Lasko; Jimeng Sun; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 4.497

7.  The "All of Us" Research Program.

Authors:  Frank Sullivan; Brian McKinstry; Shobna Vasishta
Journal:  N Engl J Med       Date:  2019-11-07       Impact factor: 91.245

Review 8.  Differential privacy in health research: A scoping review.

Authors:  Joseph Ficek; Wei Wang; Henian Chen; Getachew Dagne; Ellen Daley
Journal:  J Am Med Inform Assoc       Date:  2021-09-18       Impact factor: 7.942

9.  Optimizing the synthesis of clinical trial data using sequential trees.

Authors:  Khaled El Emam; Lucy Mosquera; Chaoyi Zheng
Journal:  J Am Med Inform Assoc       Date:  2021-01-15       Impact factor: 4.497

10.  Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.

Authors:  Nils Homer; Szabolcs Szelinger; Margot Redman; David Duggan; Waibhav Tembe; Jill Muehling; John V Pearson; Dietrich A Stephan; Stanley F Nelson; David W Craig
Journal:  PLoS Genet       Date:  2008-08-29       Impact factor: 5.917

View more
  2 in total

1.  Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.

Authors:  Ziqi Zhang; Chao Yan; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2022-10-07       Impact factor: 7.942

2.  Reconciling public health common good and individual privacy: new methods and issues in geoprivacy.

Authors:  Maged N Kamel Boulos; Mei-Po Kwan; Khaled El Emam; Ada Lai-Ling Chung; Song Gao; Douglas B Richardson
Journal:  Int J Health Geogr       Date:  2022-01-19       Impact factor: 3.918

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.