Literature DB >> 33277896

SynTEG: a framework for temporal structured electronic health data simulation.

Ziqi Zhang1, Chao Yan1, Thomas A Lasko2, Jimeng Sun3, Bradley A Malin1,2,4.   

Abstract

OBJECTIVE: Simulating electronic health record data offers an opportunity to resolve the tension between data sharing and patient privacy. Recent techniques based on generative adversarial networks have shown promise but neglect the temporal aspect of healthcare. We introduce a generative framework for simulating the trajectory of patients' diagnoses and measures to evaluate utility and privacy.
MATERIALS AND METHODS: The framework simulates date-stamped diagnosis sequences based on a 2-stage process that 1) sequentially extracts temporal patterns from clinical visits and 2) generates synthetic data conditioned on the learned patterns. We designed 3 utility measures to characterize the extent to which the framework maintains feature correlations and temporal patterns in clinical events. We evaluated the framework with billing codes, represented as phenome-wide association study codes (phecodes), from over 500 000 Vanderbilt University Medical Center electronic health records. We further assessed the privacy risks based on membership inference and attribute disclosure attacks.
RESULTS: The simulated temporal sequences exhibited similar characteristics to real sequences on the utility measures. Notably, diagnosis prediction models based on real versus synthetic temporal data exhibited an average relative difference in area under the ROC curve of 1.6% with standard deviation of 3.8% for 1276 phecodes. Additionally, the relative difference in the mean occurrence age and time between visits were 4.9% and 4.2%, respectively. The privacy risks in synthetic data, with respect to the membership and attribute inference were negligible.
CONCLUSION: This investigation indicates that temporal diagnosis code sequences can be simulated in a manner that provides utility and respects privacy.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  billing codes; electronic heath records (EHRs); generative adversarial networks (GANs); privacy; temporal simulation

Year:  2021        PMID: 33277896      PMCID: PMC7936402          DOI: 10.1093/jamia/ocaa262

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  18 in total

Review 1.  Using electronic health records to drive discovery in disease genomics.

Authors:  Isaac S Kohane
Journal:  Nat Rev Genet       Date:  2011-05-18       Impact factor: 53.242

2.  Synthesizing electronic health records using improved generative adversarial networks.

Authors:  Mrinal Kanti Baowaly; Chia-Ching Lin; Chao-Lin Liu; Kuan-Ta Chen
Journal:  J Am Med Inform Assoc       Date:  2019-03-01       Impact factor: 4.497

3.  HITECH Act Drove Large Gains In Hospital Electronic Health Record Adoption.

Authors:  Julia Adler-Milstein; Ashish K Jha
Journal:  Health Aff (Millwood)       Date:  2017-08-01       Impact factor: 6.301

Review 4.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

5.  Toward practicing privacy.

Authors:  Cynthia Dwork; Rebecca Pottenger
Journal:  J Am Med Inform Assoc       Date:  2013-01-01       Impact factor: 4.497

6.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.

Authors:  Edward Choi; Mohammad Taha Bahadori; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal:  JMLR Workshop Conf Proc       Date:  2016-12-10

Review 7.  Mining electronic health records: towards better research applications and clinical care.

Authors:  Peter B Jensen; Lars J Jensen; Søren Brunak
Journal:  Nat Rev Genet       Date:  2012-05-02       Impact factor: 53.242

8.  Electronic medical records for genetic research: results of the eMERGE consortium.

Authors:  Abel N Kho; Jennifer A Pacheco; Peggy L Peissig; Luke Rasmussen; Katherine M Newton; Noah Weston; Paul K Crane; Jyotishman Pathak; Christopher G Chute; Suzette J Bielinski; Iftikhar J Kullo; Rongling Li; Teri A Manolio; Rex L Chisholm; Joshua C Denny
Journal:  Sci Transl Med       Date:  2011-04-20       Impact factor: 17.956

9.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.

Authors:  Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden
Journal:  Nat Biotechnol       Date:  2013-12       Impact factor: 54.908

10.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.

Authors:  Riccardo Miotto; Li Li; Brian A Kidd; Joel T Dudley
Journal:  Sci Rep       Date:  2016-05-17       Impact factor: 4.379

View more
  5 in total

1.  Membership inference attacks against synthetic health data.

Authors:  Ziqi Zhang; Chao Yan; Bradley A Malin
Journal:  J Biomed Inform       Date:  2021-12-14       Impact factor: 6.317

2.  Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.

Authors:  Ziqi Zhang; Chao Yan; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2022-10-07       Impact factor: 7.942

3.  Forecasting the future clinical events of a patient through contrastive learning.

Authors:  Ziqi Zhang; Chao Yan; Xinmeng Zhang; Steve L Nyemba; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2022-08-16       Impact factor: 7.942

4.  Synthetic data in machine learning for medicine and healthcare.

Authors:  Richard J Chen; Ming Y Lu; Tiffany Y Chen; Drew F K Williamson; Faisal Mahmood
Journal:  Nat Biomed Eng       Date:  2021-06       Impact factor: 29.234

5.  Validating a membership disclosure metric for synthetic health data.

Authors:  Khaled El Emam; Lucy Mosquera; Xi Fang
Journal:  JAMIA Open       Date:  2022-10-11
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.