Literature DB >> 34183527

Proof of Concept Example for Use of Simulation to Allow Data Pooling Despite Privacy Restrictions.

Teresa J Filshtein1, Xiang Li2, Scott C Zimmerman1, Sarah F Ackley1, M Maria Glymour1, Melinda C Power2.   

Abstract

BACKGROUND: Integrating results from multiple samples is often desirable, but privacy restrictions may preclude full data pooling, and most datasets do not include fully harmonized variable sets. We propose a simulation-based method leveraging partial information across datasets to guide creation of synthetic data based on explicit assumptions about the underlying causal structure that permits pooled analyses that adjust for all desired confounders in the context of privacy restrictions.
METHODS: This proof-of-concept project uses data from the Health and Retirement Study (HRS) and Atherosclerosis Risk in Communities (ARIC) study. We specified an estimand of interest and a directed acyclic graph (DAG) summarizing the presumed causal structure for the effect of glycated hemoglobin (HbA1c) on cognitive change. We derived publicly reportable statistics to describe the joint distribution of each variable in our DAG. These summary estimates were used as data-generating rules to create synthetic datasets. After pooling, we imputed missing covariates in the synthetic datasets and used the synthetic data to estimate the pooled effect of HbA1c on cognitive change, adjusting for all desired covariates.
RESULTS: Distributions of covariates and model coefficients and associated standard errors for our model estimating the effect of HbA1c on cognitive change were similar across cohort-specific original and preimputation synthetic data. The estimate from the pooled synthetic incorporates control for confounders measured in either original dataset. DISCUSSION: Our approach has advantages over meta-analysis or individual-level pooling/data harmonization when privacy concerns preclude data sharing and key confounders are not uniformly measured across datasets.
Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.

Entities:  

Mesh:

Year:  2021        PMID: 34183527      PMCID: PMC8338788          DOI: 10.1097/EDE.0000000000001373

Source DB:  PubMed          Journal:  Epidemiology        ISSN: 1044-3983            Impact factor:   4.860


  18 in total

1.  Multiple imputation by chained equations: what is it and how does it work?

Authors:  Melissa J Azur; Elizabeth A Stuart; Constantine Frangakis; Philip J Leaf
Journal:  Int J Methods Psychiatr Res       Date:  2011-03       Impact factor: 4.035

2.  Intervening on risk factors for coronary heart disease: an application of the parametric g-formula.

Authors:  Sarah L Taubman; James M Robins; Murray A Mittleman; Miguel A Hernán
Journal:  Int J Epidemiol       Date:  2009-04-23       Impact factor: 7.196

3.  A coordinated multi-study analysis of the longitudinal association between handgrip strength and cognitive function in older adults.

Authors:  Andrea R Zammit; Andrea M Piccinin; Emily C Duggan; Andriy Koval; Sean Clouston; Annie Robitaille; Cassandra L Brown; Philipp Handschuh; Chenkai Wu; Valérie Jarry; Deborah Finkel; Raquel B Graham; Graciela Muniz-Terrera; Marcus Praetorius Björk; David Bennett; Dorly J Deeg; Boo Johansson; Mindy J Katz; Jeffrey Kaye; Richard B Lipton; Mike Martin; Nancy L Pederson; Avron Spiro; Daniel Zimprich; Scott M Hofer
Journal:  J Gerontol B Psychol Sci Soc Sci       Date:  2019-06-11       Impact factor: 4.077

4.  The parametric g-formula for time-to-event data: intuition and a worked example.

Authors:  Alexander P Keil; Jessie K Edwards; David B Richardson; Ashley I Naimi; Stephen R Cole
Journal:  Epidemiology       Date:  2014-11       Impact factor: 4.822

5.  Meta-analysis in clinical trials.

Authors:  R DerSimonian; N Laird
Journal:  Control Clin Trials       Date:  1986-09

6.  The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death.

Authors:  Daniel Westreich; Stephen R Cole; Jessica G Young; Frank Palella; Phyllis C Tien; Lawrence Kingsley; Stephen J Gange; Miguel A Hernán
Journal:  Stat Med       Date:  2012-04-11       Impact factor: 2.373

7.  Multiple imputation using chained equations: Issues and guidance for practice.

Authors:  Ian R White; Patrick Royston; Angela M Wood
Journal:  Stat Med       Date:  2010-11-30       Impact factor: 2.373

8.  Validity of Privacy-Protecting Analytical Methods That Use Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks.

Authors:  Xiaojuan Li; Bruce H Fireman; Jeffrey R Curtis; David E Arterburn; David P Fisher; Érick Moyneur; Mia Gallagher; Marsha A Raebel; W Benjamin Nowell; Lindsay Lagreid; Sengwee Toh
Journal:  Am J Epidemiol       Date:  2019-04-01       Impact factor: 4.897

9.  Generation and evaluation of synthetic patient data.

Authors:  Andre Goncalves; Priyadip Ray; Braden Soper; Jennifer Stevens; Linda Coyle; Ana Paula Sales
Journal:  BMC Med Res Methodol       Date:  2020-05-07       Impact factor: 4.615

10.  Validation of a theoretically motivated approach to measuring childhood socioeconomic circumstances in the Health and Retirement Study.

Authors:  Anusha M Vable; Paola Gilsanz; Thu T Nguyen; Ichiro Kawachi; M Maria Glymour
Journal:  PLoS One       Date:  2017-10-13       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.