Literature DB >> 31944335

Recovery of original individual person data (IPD) inferences from empirical IPD summaries only: Applications to distributed computing under disclosure constraints.

Federico Bonofiglio1,2, Martin Schumacher1,2, Harald Binder1,2.   

Abstract

There are many settings where individual person data (IPD) are not available, due to privacy or technical reasons, and one must work with IPD proxies, such as summary statistics, to approximate original IPD inferences, that is, the results of statistical analyses that would ideally have been performed on individual-level data. For instance, in a distributed computing setting, as implemented in the DataSHIELD software framework, different centers can only share IPD proxies to obtain pooled IPD inferences. Such privacy requirements limit the scope of statistical investigation. For example, it can be challenging to perform between-center random-effect regression models. To increase modeling freedom we propose a method that only uses simple nondisclosive summaries of the original IPD as input, such as empirical marginal moments and correlation matrices, and generates artificial data compatible with those summary features. Specifically, data are generated from a Gaussian copula with marginal and joint components specified by the above summaries. The goal is to reproduce original IPD features in the artificial data, such that original IPD inferences are recovered from the artificial data. In an application example, and through simulations, we show that we can recover estimates of a multivariable IPD random-effect logistic regression, from artificial data generated via the Gaussian copula using the above IPD summaries, suggesting the proposed approach provides a generally applicable strategy for distributed computing settings with data protection constraints.
© 2020 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

Entities:  

Keywords:  DataSHIELD; GLM; copula; multivariable; random-effect

Mesh:

Year:  2020        PMID: 31944335     DOI: 10.1002/sim.8470

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  4 in total

1.  dsSynthetic: synthetic data generation for the DataSHIELD federated analysis system.

Authors:  Soumya Banerjee; Tom R P Bishop
Journal:  BMC Res Notes       Date:  2022-06-27

2.  A Privacy-Preserving Distributed Analytics Platform for Health Care Data.

Authors:  Sascha Welten; Yongli Mou; Laurenz Neumann; Mehrshad Jaberansary; Yeliz Yediel Ucer; Toralf Kirsten; Stefan Decker; Oya Beyan
Journal:  Methods Inf Med       Date:  2022-01-17       Impact factor: 1.800

3.  Meta-analysis of continuous outcomes: Using pseudo IPD created from aggregate data to adjust for baseline imbalance and assess treatment-by-baseline modification.

Authors:  Katerina Papadimitropoulou; Theo Stijnen; Richard D Riley; Olaf M Dekkers; Saskia le Cessie
Journal:  Res Synth Methods       Date:  2020-07-25       Impact factor: 5.273

4.  Deep generative models in DataSHIELD.

Authors:  Stefan Lenz; Moritz Hess; Harald Binder
Journal:  BMC Med Res Methodol       Date:  2021-04-03       Impact factor: 4.615

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.