| Literature DB >> 35761417 |
Soumya Banerjee1, Tom R P Bishop2.
Abstract
OBJECTIVE: Platforms such as DataSHIELD allow users to analyse sensitive data remotely, without having full access to the detailed data items (federated analysis). While this feature helps to overcome difficulties with data sharing, it can make it challenging to write code without full visibility of the data. One solution is to generate realistic, non-disclosive synthetic data that can be transferred to the analyst so they can perfect their code without the access limitation. When this process is complete, they can run the code on the real data.Entities:
Keywords: Data harmonisation; Federated analysis; Synthetic data
Mesh:
Year: 2022 PMID: 35761417 PMCID: PMC9235208 DOI: 10.1186/s13104-022-06111-2
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1A schematic of generating synthetic data with dsSynthetic, and using it to write, debug and test code. The final code is then used on real data on the server 1. User requests synthetic copy of real data 2. Synthetic copy of data generated on server side and passed to client 3. User writes code against synthetic data on client side 4. Code run on server side with real data
Fig. 2Writing an analysis script via synthetic data without full access. The steps are the following: 1. User requests synthetic copy of real data 2. Synthetic data generated on server side and passed to client 3. Synthetic data held in DSLite environment that simulates server side DataSHIELD environment 4. Develop code against synthetic data in DSLite (with full access) 5. Run code against real data
Fig. 3Central harmonisation via synthetic data without full access. The steps are the following: 1. User requests synthetic copy of real data 2. Synthetic copy generated on server side and passed to client 3. Synthetic data loaded into JavaScript session. User writes harmonisation code in (MagmaScript) on client side 4. MagmaScript code implemented on server side to run on real data