Literature DB >> 27322502

The Importance of Context: Risk-based De-identification of Biomedical Data.

Fabian Prasser1, Florian Kohlmayer, Klaus A Kuhn.   

Abstract

BACKGROUND: Data sharing is a central aspect of modern biomedical research. It is accompanied by significant privacy concerns and often data needs to be protected from re-identification. With methods of de-identification datasets can be transformed in such a way that it becomes extremely difficult to link their records to identified individuals. The most important challenge in this process is to find an adequate balance between an increase in privacy and a decrease in data quality.
OBJECTIVES: Accurately measuring the risk of re-identification in a specific data sharing scenario is an important aspect of data de-identification. Overestimation of risks will significantly deteriorate data quality, while underestimation will leave data prone to attacks on privacy. Several models have been proposed for measuring risks, but there is a lack of generic methods for risk-based data de-identification. The aim of the work described in this article was to bridge this gap and to show how the quality of de-identified datasets can be improved by using risk models to tailor the process of de-identification to a concrete context.
METHODS: We implemented a generic de-identification process and several models for measuring re-identification risks into the ARX de-identification tool for biomedical data. By integrating the methods into an existing framework, we were able to automatically transform datasets in such a way that information loss is minimized while it is ensured that re-identification risks meet a user-defined threshold. We performed an extensive experimental evaluation to analyze the impact of using different risk models and assumptions about the goals and the background knowledge of an attacker on the quality of de-identified data.
RESULTS: The results of our experiments show that data quality can be improved significantly by using risk models for data de-identification. On a scale where 100 % represents the original input dataset and 0 % represents a dataset from which all information has been removed, the loss of information content could be reduced by up to 10 % when protecting datasets against strong adversaries and by up to 24 % when protecting datasets against weaker adversaries.
CONCLUSIONS: The methods studied in this article are well suited for protecting sensitive biomedical data and our implementation is available as open-source software. Our results can be used by data custodians to increase the information content of de-identified data by tailoring the process to a specific data sharing scenario. Improving data quality is important for fostering the adoption of de-identification methods in biomedical research.

Entities:  

Keywords:  Information science; computer security; data anonymization; data protection; data quality; risk

Mesh:

Year:  2016        PMID: 27322502     DOI: 10.3414/ME16-01-0012

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  5 in total

1.  Ethics and Epistemology in Big Data Research.

Authors:  Wendy Lipworth; Paul H Mason; Ian Kerridge; John P A Ioannidis
Journal:  J Bioeth Inq       Date:  2017-03-20       Impact factor: 1.352

Review 2.  Secondary Use of Patient Data: Review of the Literature Published in 2016.

Authors:  D R Schlegel; G Ficheur
Journal:  Yearb Med Inform       Date:  2017-09-11

3.  A scalable software solution for anonymizing high-dimensional biomedical data.

Authors:  Thierry Meurers; Raffael Bild; Kieu-Mi Do; Fabian Prasser
Journal:  Gigascience       Date:  2021-10-04       Impact factor: 6.524

Review 4.  Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

Authors:  Raphaël Chevrier; Vasiliki Foufi; Christophe Gaudet-Blavignac; Arnaud Robert; Christian Lovis
Journal:  J Med Internet Res       Date:  2019-05-31       Impact factor: 5.428

5.  Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example.

Authors:  Patrick J Thoral; Jan M Peppink; Ronald H Driessen; Eric J G Sijbrands; Erwin J O Kompanje; Lewis Kaplan; Heatherlee Bailey; Jozef Kesecioglu; Maurizio Cecconi; Matthew Churpek; Gilles Clermont; Mihaela van der Schaar; Ari Ercole; Armand R J Girbes; Paul W G Elbers
Journal:  Crit Care Med       Date:  2021-06-01       Impact factor: 9.296

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.