Literature DB >> 34605868

A scalable software solution for anonymizing high-dimensional biomedical data.

Thierry Meurers1, Raffael Bild2, Kieu-Mi Do3, Fabian Prasser1.   

Abstract

BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets.
FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets.
CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.
© The Author(s) 2021. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  anonymization; biomedical data; data privacy; data protection; de-identification; genetic algorithm; heuristics; privacy preserving data publishing; software tool

Mesh:

Year:  2021        PMID: 34605868      PMCID: PMC8489190          DOI: 10.1093/gigascience/giab068

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  15 in total

1.  A globally optimal k-anonymity method for the de-identification of health data.

Authors:  Khaled El Emam; Fida Kamal Dankar; Romeo Issa; Elizabeth Jonker; Daniel Amyot; Elise Cogo; Jean-Pierre Corriveau; Mark Walker; Sadrul Chowdhury; Regis Vaillancourt; Tyson Roffey; Jim Bottomley
Journal:  J Am Med Inform Assoc       Date:  2009-06-30       Impact factor: 4.497

2.  Genomic privacy and limits of individual detection in a pool.

Authors:  Sriram Sankararaman; Guillaume Obozinski; Michael I Jordan; Eran Halperin
Journal:  Nat Genet       Date:  2009-08-23       Impact factor: 38.330

3.  An Open Source Tool for Game Theoretic Health Data De-Identification.

Authors:  Fabian Prasser; James Gaupp; Zhiyu Wan; Weiyi Xia; Yevgeniy Vorobeychik; Murat Kantarcioglu; Klaus Kuhn; Brad Malin
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

4.  The Importance of Context: Risk-based De-identification of Biomedical Data.

Authors:  Fabian Prasser; Florian Kohlmayer; Klaus A Kuhn
Journal:  Methods Inf Med       Date:  2016-06-20       Impact factor: 2.176

5.  A review on genetic algorithm: past, present, and future.

Authors:  Sourabh Katoch; Sumit Singh Chauhan; Vijay Kumar
Journal:  Multimed Tools Appl       Date:  2020-10-31       Impact factor: 2.757

6.  Open University Learning Analytics dataset.

Authors:  Jakub Kuzilek; Martin Hlosta; Zdenek Zdrahal
Journal:  Sci Data       Date:  2017-11-28       Impact factor: 6.444

7.  Utility-preserving anonymization for health data publishing.

Authors:  Hyukki Lee; Soohyung Kim; Jong Wook Kim; Yon Dohn Chung
Journal:  BMC Med Inform Decis Mak       Date:  2017-07-11       Impact factor: 2.796

8.  Estimating the success of re-identifications in incomplete datasets using generative models.

Authors:  Luc Rocher; Julien M Hendrickx; Yves-Alexandre de Montjoye
Journal:  Nat Commun       Date:  2019-07-23       Impact factor: 14.919

9.  Efficient and effective pruning strategies for health data de-identification.

Authors:  Fabian Prasser; Florian Kohlmayer; Klaus A Kuhn
Journal:  BMC Med Inform Decis Mak       Date:  2016-04-30       Impact factor: 2.796

10.  Where is the human in the data? A guide to ethical data use.

Authors:  Angela Ballantyne
Journal:  Gigascience       Date:  2018-07-01       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.