Literature DB >> 23670983

Disclosure control using partially synthetic data for large-scale health surveys, with applications to CanCORS.

Bronwyn Loong1, Alan M Zaslavsky, Yulei He, David P Harrington.   

Abstract

Statistical agencies have begun to partially synthesize public-use data for major surveys to protect the confidentiality of respondents' identities and sensitive attributes by replacing high disclosure risk and sensitive variables with multiple imputations. To date, there are few applications of synthetic data techniques to large-scale healthcare survey data. Here, we describe partial synthesis of survey data collected by the Cancer Care Outcomes Research and Surveillance (CanCORS) project, a comprehensive observational study of the experiences, treatments, and outcomes of patients with lung or colorectal cancer in the USA. We review inferential methods for partially synthetic data and discuss selection of high disclosure risk variables for synthesis, specification of imputation models, and identification disclosure risk assessment. We evaluate data utility by replicating published analyses and comparing results using original and synthetic data and discuss practical issues in preserving inferential conclusions. We found that important subgroup relationships must be included in the synthetic data imputation model, to preserve the data utility of the observed data for a given analysis procedure. We conclude that synthetic CanCORS data are suited best for preliminary data analyses purposes. These methods address the requirement to share data in clinical research without compromising confidentiality.
Copyright © 2013 John Wiley & Sons, Ltd.

Entities:  

Keywords:  data confidentiality; data utility; disclosure risk; multiple imputation; synthetic data

Mesh:

Year:  2013        PMID: 23670983      PMCID: PMC3869901          DOI: 10.1002/sim.5841

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  6 in total

1.  Understanding cancer treatment and outcomes: the Cancer Care Outcomes Research and Surveillance Consortium.

Authors:  John Z Ayanian; Elizabeth A Chrischilles; Robert H Fletcher; Mona N Fouad; David P Harrington; Katherine L Kahn; Catarina I Kiefe; Joseph Lipscomb; Jennifer L Malin; Arnold L Potosky; Dawn T Provenzale; Robert S Sandler; Michelle van Ryn; Robert B Wallace; Jane C Weeks; Dee W West
Journal:  J Clin Oncol       Date:  2004-08-01       Impact factor: 44.544

2.  [''R"--project for statistical computing].

Authors:  Ram Benny Dessau; Christian Bressen Pipper
Journal:  Ugeskr Laeger       Date:  2008-01-28

3.  Gaussian-based routines to impute categorical variables in health surveys.

Authors:  Recai M Yucel; Yulei He; Alan M Zaslavsky
Journal:  Stat Med       Date:  2011-10-04       Impact factor: 2.373

4.  Cancer patients' roles in treatment decisions: do characteristics of the decision influence roles?

Authors:  Nancy L Keating; Mary Beth Landrum; Neeraj K Arora; Jennifer L Malin; Patricia A Ganz; Michelle van Ryn; Jane C Weeks
Journal:  J Clin Oncol       Date:  2010-08-16       Impact factor: 44.544

5.  Multiple imputation in a large-scale complex survey: a practical guide.

Authors:  Y He; A M Zaslavsky; M B Landrum; D P Harrington; P Catalano
Journal:  Stat Methods Med Res       Date:  2009-08-04       Impact factor: 3.021

6.  Discussions with physicians about hospice among patients with metastatic lung cancer.

Authors:  Haiden A Huskamp; Nancy L Keating; Jennifer L Malin; Alan M Zaslavsky; Jane C Weeks; Craig C Earle; Joan M Teno; Beth A Virnig; Katherine L Kahn; Yulei He; John Z Ayanian
Journal:  Arch Intern Med       Date:  2009-05-25
  6 in total
  3 in total

1.  Selecting Optimal Subset to release under Differentially Private M-estimators from Hybrid Datasets.

Authors:  Meng Wang; Zhanglong Ji; Hyeon-Eui Kim; Shuang Wang; Li Xiong; Xiaoqian Jiang
Journal:  IEEE Trans Knowl Data Eng       Date:  2017-11-14       Impact factor: 6.977

2.  Confidence interval estimation in R-DAS.

Authors:  Olga A Vsevolozhskaya; James C Anthony
Journal:  Drug Alcohol Depend       Date:  2014-08-17       Impact factor: 4.492

3.  Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.

Authors:  Khaled El Emam; Lucy Mosquera; Jason Bass
Journal:  J Med Internet Res       Date:  2020-11-16       Impact factor: 5.428

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.