Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The Importance of Context: Risk-based De-identification of Biomedical Data.

Literature DB >> 27322502

The Importance of Context: Risk-based De-identification of Biomedical Data.

Fabian Prasser¹, Florian Kohlmayer, Klaus A Kuhn.

Abstract

BACKGROUND: Data sharing is a central aspect of modern biomedical research. It is accompanied by significant privacy concerns and often data needs to be protected from re-identification. With methods of de-identification datasets can be transformed in such a way that it becomes extremely difficult to link their records to identified individuals. The most important challenge in this process is to find an adequate balance between an increase in privacy and a decrease in data quality.
OBJECTIVES: Accurately measuring the risk of re-identification in a specific data sharing scenario is an important aspect of data de-identification. Overestimation of risks will significantly deteriorate data quality, while underestimation will leave data prone to attacks on privacy. Several models have been proposed for measuring risks, but there is a lack of generic methods for risk-based data de-identification. The aim of the work described in this article was to bridge this gap and to show how the quality of de-identified datasets can be improved by using risk models to tailor the process of de-identification to a concrete context.
METHODS: We implemented a generic de-identification process and several models for measuring re-identification risks into the ARX de-identification tool for biomedical data. By integrating the methods into an existing framework, we were able to automatically transform datasets in such a way that information loss is minimized while it is ensured that re-identification risks meet a user-defined threshold. We performed an extensive experimental evaluation to analyze the impact of using different risk models and assumptions about the goals and the background knowledge of an attacker on the quality of de-identified data.
RESULTS: The results of our experiments show that data quality can be improved significantly by using risk models for data de-identification. On a scale where 100 % represents the original input dataset and 0 % represents a dataset from which all information has been removed, the loss of information content could be reduced by up to 10 % when protecting datasets against strong adversaries and by up to 24 % when protecting datasets against weaker adversaries.
CONCLUSIONS: The methods studied in this article are well suited for protecting sensitive biomedical data and our implementation is available as open-source software. Our results can be used by data custodians to increase the information content of de-identified data by tailoring the process to a specific data sharing scenario. Improving data quality is important for fostering the adoption of de-identification methods in biomedical research.

Entities: Gene

Keywords: Information science; computer security; data anonymization; data protection; data quality; risk

Mesh：

Year: 2016 PMID： 27322502 DOI： 10.3414/ME16-01-0012

Source DB: PubMed Journal: Methods Inf Med ISSN： 0026-1270 Impact factor: 2.176

Keyword Cloud
Cited

5 in total

1. Ethics and Epistemology in Big Data Research.

Authors: Wendy Lipworth; Paul H Mason; Ian Kerridge; John P A Ioannidis
Journal: J Bioeth Inq Date: 2017-03-20 Impact factor: 1.352

Review 2. Secondary Use of Patient Data: Review of the Literature Published in 2016.

Authors: D R Schlegel; G Ficheur
Journal: Yearb Med Inform Date: 2017-09-11

3. A scalable software solution for anonymizing high-dimensional biomedical data.

Authors: Thierry Meurers; Raffael Bild; Kieu-Mi Do; Fabian Prasser
Journal: Gigascience Date: 2021-10-04 Impact factor: 6.524

Review 4. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

Authors: Raphaël Chevrier; Vasiliki Foufi; Christophe Gaudet-Blavignac; Arnaud Robert; Christian Lovis
Journal: J Med Internet Res Date: 2019-05-31 Impact factor: 5.428

5. Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example.

Authors: Patrick J Thoral; Jan M Peppink; Ronald H Driessen; Eric J G Sijbrands; Erwin J O Kompanje; Lewis Kaplan; Heatherlee Bailey; Jozef Kesecioglu; Maurizio Cecconi; Matthew Churpek; Gilles Clermont; Mihaela van der Schaar; Ari Ercole; Armand R J Girbes; Paul W G Elbers
Journal: Crit Care Med Date: 2021-06-01 Impact factor: 9.296

5 in total