Literature DB >> 26385376

The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.

Florian Kohlmayer1, Fabian Prasser2, Klaus A Kuhn2.   

Abstract

OBJECTIVE: With the ARX data anonymization tool structured biomedical data can be de-identified using syntactic privacy models, such as k-anonymity. Data is transformed with two methods: (a) generalization of attribute values, followed by (b) suppression of data records. The former method results in data that is well suited for analyses by epidemiologists, while the latter method significantly reduces loss of information. Our tool uses an optimal anonymization algorithm that maximizes output utility according to a given measure. To achieve scalability, existing optimal anonymization algorithms exclude parts of the search space by predicting the outcome of data transformations regarding privacy and utility without explicitly applying them to the input dataset. These optimizations cannot be used if data is transformed with generalization and suppression. As optimal data utility and scalability are important for anonymizing biomedical data, we had to develop a novel method.
METHODS: In this article, we first confirm experimentally that combining generalization with suppression significantly increases data utility. Next, we proof that, within this coding model, the outcome of data transformations regarding privacy and utility cannot be predicted. As a consequence, existing algorithms fail to deliver optimal data utility. We confirm this finding experimentally. The limitation of previous work can be overcome at the cost of increased computational complexity. However, scalability is important for anonymizing data with user feedback. Consequently, we identify properties of datasets that may be predicted in our context and propose a novel and efficient algorithm. Finally, we evaluate our solution with multiple datasets and privacy models.
RESULTS: This work presents the first thorough investigation of which properties of datasets can be predicted when data is anonymized with generalization and suppression. Our novel approach adopts existing optimization strategies to our context and combines different search methods. The experiments show that our method is able to efficiently solve a broad spectrum of anonymization problems.
CONCLUSION: Our work shows that implementing syntactic privacy models is challenging and that existing algorithms are not well suited for anonymizing data with transformation models which are more complex than generalization alone. As such models have been recommended for use in the biomedical domain, our results are of general relevance for de-identifying structured biomedical data.
Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Anonymization; De-identification; Optimization; Privacy; Security; Statistical disclosure control

Mesh:

Year:  2015        PMID: 26385376     DOI: 10.1016/j.jbi.2015.09.007

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  4 in total

1.  Utility-preserving anonymization for health data publishing.

Authors:  Hyukki Lee; Soohyung Kim; Jong Wook Kim; Yon Dohn Chung
Journal:  BMC Med Inform Decis Mak       Date:  2017-07-11       Impact factor: 2.796

Review 2.  Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

Authors:  Raphaël Chevrier; Vasiliki Foufi; Christophe Gaudet-Blavignac; Arnaud Robert; Christian Lovis
Journal:  J Med Internet Res       Date:  2019-05-31       Impact factor: 5.428

3.  Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation.

Authors:  MinDong Sung; Dongchul Cha; Yu Rang Park
Journal:  JMIR Med Inform       Date:  2021-11-08

4.  Efficient and effective pruning strategies for health data de-identification.

Authors:  Fabian Prasser; Florian Kohlmayer; Klaus A Kuhn
Journal:  BMC Med Inform Decis Mak       Date:  2016-04-30       Impact factor: 2.796

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.