Literature DB >> 35927974

Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.

Ziqi Zhang1, Chao Yan2, Bradley A Malin1,2,3.   

Abstract

OBJECTIVE: Synthetic data are increasingly relied upon to share electronic health record (EHR) data while maintaining patient privacy. Current simulation methods can generate longitudinal data, but the results are unreliable for several reasons. First, the synthetic data drifts from the real data distribution over time. Second, the typical approach to quality assessment, which is based on the extent to which real records can be distinguished from synthetic records using a critic model, often fails to recognize poor simulation results. In this article, we introduce a longitudinal simulation framework, called LS-EHR, which addresses these issues.
MATERIALS AND METHODS: LS-EHR enhances simulation through conditional fuzzing and regularization, rejection sampling, and prior knowledge embedding. We compare LS-EHR to the state-of-the-art using data from 60 000 EHRs from Vanderbilt University Medical Center (VUMC) and the All of Us Research Program. We assess discrimination between real and synthetic data over time. We evaluate the generation process and critic model using the area under the receiver operating characteristic curve (AUROC). For the critic, a higher value indicates a more robust model for quality assessment. For the generation process, a lower value indicates better synthetic data quality.
RESULTS: The LS-EHR critic improves discrimination AUROC from 0.655 to 0.909 and 0.692 to 0.918 for VUMC and All of Us data, respectively. By using the new critic, the LS-EHR generation model reduces the AUROC from 0.909 to 0.758 and 0.918 to 0.806.
CONCLUSION: LS-EHR can substantially improve the usability of simulated longitudinal EHR data.
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  electronic health records (EHRs); generative adversarial networks (GANs); longitudinal simulation; privacy; synthetic data

Mesh:

Year:  2022        PMID: 35927974      PMCID: PMC9552284          DOI: 10.1093/jamia/ocac131

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   7.942


  16 in total

1.  Development of a large-scale de-identified DNA biobank to enable personalized medicine.

Authors:  D M Roden; J M Pulley; M A Basford; G R Bernard; E W Clayton; J R Balser; D R Masys
Journal:  Clin Pharmacol Ther       Date:  2008-05-21       Impact factor: 6.875

2.  Generating sequential electronic health records using dual adversarial autoencoder.

Authors:  Dongha Lee; Hwanjo Yu; Xiaoqian Jiang; Deevakar Rogith; Meghana Gudala; Mubeen Tejani; Qiuchen Zhang; Li Xiong
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

3.  Ensuring electronic medical record simulation through better training, modeling, and evaluation.

Authors:  Ziqi Zhang; Chao Yan; Diego A Mesa; Jimeng Sun; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

4.  SynTEG: a framework for temporal structured electronic health data simulation.

Authors:  Ziqi Zhang; Chao Yan; Thomas A Lasko; Jimeng Sun; Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 4.497

5.  Membership inference attacks against synthetic health data.

Authors:  Ziqi Zhang; Chao Yan; Bradley A Malin
Journal:  J Biomed Inform       Date:  2021-12-14       Impact factor: 6.317

6.  Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).

Authors:  Jason A Thomas; Randi E Foraker; Noa Zamstein; Jon D Morrow; Philip R O Payne; Adam B Wilcox
Journal:  J Am Med Inform Assoc       Date:  2022-07-12       Impact factor: 7.942

7.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.

Authors:  Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford
Journal:  Bioinformatics       Date:  2010-03-24       Impact factor: 6.937

8.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.

Authors:  Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden
Journal:  Nat Biotechnol       Date:  2013-12       Impact factor: 54.908

9.  Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study.

Authors:  Khaled El Emam; Lucy Mosquera; Xi Fang; Alaa El-Hussuna
Journal:  JMIR Med Inform       Date:  2022-04-07

10.  Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.

Authors:  Khaled El Emam; Lucy Mosquera; Jason Bass
Journal:  J Med Internet Res       Date:  2020-11-16       Impact factor: 5.428

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.