Literature DB >> 31592533

Ensuring electronic medical record simulation through better training, modeling, and evaluation.

Ziqi Zhang1, Chao Yan1, Diego A Mesa2, Jimeng Sun3, Bradley A Malin1,2,4.   

Abstract

OBJECTIVE: Electronic medical records (EMRs) can support medical research and discovery, but privacy risks limit the sharing of such data on a wide scale. Various approaches have been developed to mitigate risk, including record simulation via generative adversarial networks (GANs). While showing promise in certain application domains, GANs lack a principled approach for EMR data that induces subpar simulation. In this article, we improve EMR simulation through a novel pipeline that (1) enhances the learning model, (2) incorporates evaluation criteria for data utility that informs learning, and (3) refines the training process.
MATERIALS AND METHODS: We propose a new electronic health record generator using a GAN with a Wasserstein divergence and layer normalization techniques. We designed 2 utility measures to characterize similarity in the structural properties of real and simulated EMRs in the original and latent space, respectively. We applied a filtering strategy to enhance GAN training for low-prevalence clinical concepts. We evaluated the new and existing GANs with utility and privacy measures (membership and disclosure attacks) using billing codes from over 1 million EMRs at Vanderbilt University Medical Center.
RESULTS: The proposed model outperformed the state-of-the-art approaches with significant improvement in retaining the nature of real records, including prediction performance and structural properties, without sacrificing privacy. Additionally, the filtering strategy achieved higher utility when the EMR training dataset was small.
CONCLUSIONS: These findings illustrate that EMR simulation through GANs can be substantially improved through more appropriate training, modeling, and evaluation criteria.
© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Keywords:  EMRs; GANs; Wasserstein divergence; electronic medical records; generative adversarial networks; privacy; simulation

Mesh:

Year:  2020        PMID: 31592533      PMCID: PMC6913223          DOI: 10.1093/jamia/ocz161

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  26 in total

1.  A globally optimal k-anonymity method for the de-identification of health data.

Authors:  Khaled El Emam; Fida Kamal Dankar; Romeo Issa; Elizabeth Jonker; Daniel Amyot; Elise Cogo; Jean-Pierre Corriveau; Mark Walker; Sadrul Chowdhury; Regis Vaillancourt; Tyson Roffey; Jim Bottomley
Journal:  J Am Med Inform Assoc       Date:  2009-06-30       Impact factor: 4.497

2.  Evaluating re-identification risks with respect to the HIPAA privacy rule.

Authors:  Kathleen Benitez; Bradley Malin
Journal:  J Am Med Inform Assoc       Date:  2010 Mar-Apr       Impact factor: 4.497

Review 3.  Machine learning: Trends, perspectives, and prospects.

Authors:  M I Jordan; T M Mitchell
Journal:  Science       Date:  2015-07-17       Impact factor: 47.728

4.  Synthesizing electronic health records using improved generative adversarial networks.

Authors:  Mrinal Kanti Baowaly; Chia-Ching Lin; Chao-Lin Liu; Kuan-Ta Chen
Journal:  J Am Med Inform Assoc       Date:  2019-03-01       Impact factor: 4.497

5.  Toward practicing privacy.

Authors:  Cynthia Dwork; Rebecca Pottenger
Journal:  J Am Med Inform Assoc       Date:  2013-01-01       Impact factor: 4.497

6.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors:  Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-03-26       Impact factor: 4.497

7.  Progress In Interoperability: Measuring US Hospitals' Engagement In Sharing Patient Data.

Authors:  A Jay Holmgren; Vaishali Patel; Julia Adler-Milstein
Journal:  Health Aff (Millwood)       Date:  2017-10-01       Impact factor: 6.301

8.  A systematic review of re-identification attacks on health data.

Authors:  Khaled El Emam; Elizabeth Jonker; Luk Arbuckle; Bradley Malin
Journal:  PLoS One       Date:  2011-12-02       Impact factor: 3.240

9.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms.

Authors:  Tim Van den Bulcke; Koenraad Van Leemput; Bart Naudts; Piet van Remortel; Hongwu Ma; Alain Verschoren; Bart De Moor; Kathleen Marchal
Journal:  BMC Bioinformatics       Date:  2006-01-26       Impact factor: 3.169

10.  Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.

Authors:  Jason Walonoski; Mark Kramer; Joseph Nichols; Andre Quina; Chris Moesel; Dylan Hall; Carlton Duffett; Kudakwashe Dube; Thomas Gallagher; Scott McLachlan
Journal:  J Am Med Inform Assoc       Date:  2018-03-01       Impact factor: 4.497

View more
  13 in total

1.  Generating Electronic Health Records with Multiple Data Types and Constraints.

Authors:  Chao Yan; Ziqi Zhang; Steve Nyemba; Bradley A Malin
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

2.  Generating sequential electronic health records using dual adversarial autoencoder.

Authors:  Dongha Lee; Hwanjo Yu; Xiaoqian Jiang; Deevakar Rogith; Meghana Gudala; Mubeen Tejani; Qiuchen Zhang; Li Xiong
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

3.  SynTEG: a framework for temporal structured electronic health data simulation.

Authors:  Ziqi Zhang; Chao Yan; Thomas A Lasko; Jimeng Sun; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 4.497

4.  Membership inference attacks against synthetic health data.

Authors:  Ziqi Zhang; Chao Yan; Bradley A Malin
Journal:  J Biomed Inform       Date:  2021-12-14       Impact factor: 6.317

5.  Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.

Authors:  Ziqi Zhang; Chao Yan; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2022-10-07       Impact factor: 7.942

6.  Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study.

Authors:  Khaled El Emam; Lucy Mosquera; Xi Fang; Alaa El-Hussuna
Journal:  JMIR Med Inform       Date:  2022-04-07

7.  Generation and evaluation of synthetic patient data.

Authors:  Andre Goncalves; Priyadip Ray; Braden Soper; Jennifer Stevens; Linda Coyle; Ana Paula Sales
Journal:  BMC Med Res Methodol       Date:  2020-05-07       Impact factor: 4.615

8.  Evaluating the utility of synthetic COVID-19 case data.

Authors:  Khaled El Emam; Lucy Mosquera; Elizabeth Jonker; Harpreet Sood
Journal:  JAMIA Open       Date:  2021-03-01

9.  Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).

Authors:  Jason A Thomas; Randi E Foraker; Noa Zamstein; Philip R O Payne; Adam B Wilcox
Journal:  medRxiv       Date:  2021-07-08

10.  Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.

Authors:  Khaled El Emam; Lucy Mosquera; Jason Bass
Journal:  J Med Internet Res       Date:  2020-11-16       Impact factor: 5.428

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.