Literature DB >> 33367620

Application of Bayesian networks to generate synthetic health data.

Dhamanpreet Kaur1, Matthew Sobiesk1, Shubham Patil2, Jin Liu3, Puran Bhagat3, Amar Gupta1, Natasha Markuzon3.   

Abstract

OBJECTIVE: This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data.
MATERIALS AND METHODS: We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data.
RESULTS: Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. DISCUSSION: Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools.
CONCLUSION: We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  Bayesian networks; data dissemination, disclosure risk; health data; synthetic data

Mesh:

Year:  2021        PMID: 33367620      PMCID: PMC7973486          DOI: 10.1093/jamia/ocaa303

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  22 in total

1.  A new machine learning classifier for high dimensional healthcare data.

Authors:  Rema Padman; Xue Bai; Edoardo M Airoldi
Journal:  Stud Health Technol Inform       Date:  2007

2.  The potential for artificial intelligence in healthcare.

Authors:  Thomas Davenport; Ravi Kalakota
Journal:  Future Healthc J       Date:  2019-06

3.  Generation of Realistic Synthetic Validation Healthcare Datasets Using Generative Adversarial Networks.

Authors:  Eda Bilici Ozyigit; Theodoros N Arvanitis; George Despotou
Journal:  Stud Health Technol Inform       Date:  2020-06-26

4.  Is deidentification sufficient to protect health privacy in research?

Authors:  Mark A Rothstein
Journal:  Am J Bioeth       Date:  2010-09       Impact factor: 11.229

5.  Patient-tailored prioritization for a pediatric care decision support system through machine learning.

Authors:  Jeffrey G Klann; Vibha Anand; Stephen M Downs
Journal:  J Am Med Inform Assoc       Date:  2013-07-25       Impact factor: 4.497

6.  Nonparametric Bayes Modeling of Multivariate Categorical Data.

Authors:  David B Dunson; Chuanhua Xing
Journal:  J Am Stat Assoc       Date:  2012-01-01       Impact factor: 5.033

7.  Empirical evaluation of scoring functions for Bayesian network model selection.

Authors:  Zhifa Liu; Brandon Malone; Changhe Yuan
Journal:  BMC Bioinformatics       Date:  2012-09-11       Impact factor: 3.169

8.  Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records.

Authors:  Beata Strack; Jonathan P DeShazo; Chris Gennings; Juan L Olmo; Sebastian Ventura; Krzysztof J Cios; John N Clore
Journal:  Biomed Res Int       Date:  2014-04-03       Impact factor: 3.411

9.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.

Authors:  Cao Xiao; Edward Choi; Jimeng Sun
Journal:  J Am Med Inform Assoc       Date:  2018-10-01       Impact factor: 4.497

10.  Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.

Authors:  Jason Walonoski; Mark Kramer; Joseph Nichols; Andre Quina; Chris Moesel; Dylan Hall; Carlton Duffett; Kudakwashe Dube; Thomas Gallagher; Scott McLachlan
Journal:  J Am Med Inform Assoc       Date:  2018-03-01       Impact factor: 4.497

View more
  3 in total

Review 1.  Towards effective data sharing in ophthalmology: data standardization and data privacy.

Authors:  William Halfpenny; Sally L Baxter
Journal:  Curr Opin Ophthalmol       Date:  2022-07-12       Impact factor: 4.299

2.  Evaluating the utility of synthetic COVID-19 case data.

Authors:  Khaled El Emam; Lucy Mosquera; Elizabeth Jonker; Harpreet Sood
Journal:  JAMIA Open       Date:  2021-03-01

3.  Synthetic data generation with probabilistic Bayesian Networks.

Authors:  Grigoriy Gogoshin; Sergio Branciamore; Andrei S Rodin
Journal:  Math Biosci Eng       Date:  2021-10-09       Impact factor: 2.080

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.