Literature DB >> 31029266

Privacy-enhancing ETL-processes for biomedical data.

Fabian Prasser1, Helmut Spengler2, Raffael Bild2, Johanna Eicher2, Klaus A Kuhn2.   

Abstract

BACKGROUND: Modern data-driven approaches to medical research require patient-level information at comprehensive depth and breadth. To create the required big datasets, information from disparate sources can be integrated into clinical and translational warehouses. This is typically implemented with Extract, Transform, Load (ETL) processes, which access, harmonize and upload data into the analytics platform.
OBJECTIVE: Privacy-protection needs careful consideration when data is pooled or re-used for secondary purposes, and data anonymization is an important protection mechanism. However, common ETL environments do not support anonymization, and common anonymization tools cannot easily be integrated into ETL workflows. The objective of the work described in this article was to bridge this gap.
METHODS: Our main design goals were (1) to base the anonymization process on expert-level risk assessment methodologies, (2) to use transformation methods which preserve both the truthfulness of data and its schematic properties (e.g. data types), (3) to implement a method which is easy to understand and intuitive to configure, and (4) to provide high scalability.
RESULTS: We designed a novel and efficient anonymization process and implemented a plugin for the Pentaho Data Integration (PDI) platform, which enables integrating data anonymization and re-identification risk analyses directly into ETL workflows. By combining different instances into a single ETL process, data can be protected from multiple threats. The plugin supports very large datasets by leveraging the streaming-based processing model of the underlying platform. We present results of an extensive experimental evaluation and discuss successful applications.
CONCLUSIONS: Our work shows that expert-level anonymization methodologies can be integrated into ETL workflows. Our implementation is available under a non-restrictive open source license and it overcomes several limitations of other data anonymization tools.
Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Anonymization; Clinical data warehousing; Extract Transform Load; Privacy

Mesh:

Year:  2019        PMID: 31029266     DOI: 10.1016/j.ijmedinf.2019.03.006

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  2 in total

1.  Generative Adversarial Network Based Automatic Segmentation of Corneal Subbasal Nerves on In Vivo Confocal Microscopy Images.

Authors:  Erdost Yildiz; Abdullah Taha Arslan; Ayse Yildiz Tas; Ali Faik Acer; Sertaç Demir; Afsun Sahin; Duygun Erol Barkana
Journal:  Transl Vis Sci Technol       Date:  2021-05-03       Impact factor: 3.283

2.  Ethical Use of Electronic Health Record Data and Artificial Intelligence: Recommendations of the Primary Care Informatics Working Group of the International Medical Informatics Association.

Authors:  Siaw-Teng Liaw; Harshana Liyanage; Craig Kuziemsky; Amanda L Terry; Richard Schreiber; Jitendra Jonnagaddala; Simon de Lusignan
Journal:  Yearb Med Inform       Date:  2020-04-17
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.