| Literature DB >> 33936510 |
Chao Yan1, Ziqi Zhang1, Steve Nyemba2, Bradley A Malin1,2.
Abstract
Sharing electronic health records (EHRs) on a large scale may lead to privacy intrusions. Recent research has shown that risks may be mitigated by simulating EHRs through generative adversarial network (GAN) frameworks. Yet the methods developed to date are limited because they 1) focus on generating data of a single type (e.g., diagnosis codes), neglecting other data types (e.g., demographics, procedures or vital signs), and 2) do not represent constraints betweenfeatures. In this paper, we introduce a method to simulate EHRs composed of multiple data types by 1) refining the GAN model, 2) accounting for feature constraints, and 3) incorporating key utility measures for such generation tasks. Our analysis with over 770,000 EHRs from Vanderbilt University Medical Center demonstrates that the new model achieves higher performance in terms ofretaining basic statistics, cross-feature correlations, latent structural properties, feature constraints and associated patterns from real data, without sacrificing privacy. ©2020 AMIA - All rights reserved.Mesh:
Year: 2021 PMID: 33936510 PMCID: PMC8075510
Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN: 1559-4076