Literature DB >> 35907927

Utility-driven assessment of anonymized data via clustering.

Maria Eugénia Ferrão1, Paula Prata2, Paulo Fazendeiro3.   

Abstract

In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.
© 2022. The Author(s).

Entities:  

Year:  2022        PMID: 35907927      PMCID: PMC9339002          DOI: 10.1038/s41597-022-01561-6

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   8.501


  5 in total

1.  A Generic Method for Assessing the Quality of De-Identified Health Data.

Authors:  Fabian Prasser; Raffael Bild; Klaus A Kuhn
Journal:  Stud Health Technol Inform       Date:  2016

Review 2.  Statistical methods in cancer research. Volume II--The design and analysis of cohort studies.

Authors:  N E Breslow; N E Day
Journal:  IARC Sci Publ       Date:  1987

Review 3.  A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis.

Authors:  Teng Wang; Xuefeng Zhang; Jingyu Feng; Xinyu Yang
Journal:  Sensors (Basel)       Date:  2020-12-08       Impact factor: 3.576

Review 4.  Utility-driven assessment of anonymized data via clustering.

Authors:  Maria Eugénia Ferrão; Paula Prata; Paulo Fazendeiro
Journal:  Sci Data       Date:  2022-07-30       Impact factor: 8.501

5.  A novel bidirectional clustering algorithm based on local density.

Authors:  Baicheng Lyu; Wenhua Wu; Zhiqiang Hu
Journal:  Sci Rep       Date:  2021-07-09       Impact factor: 4.379

  5 in total
  1 in total

Review 1.  Utility-driven assessment of anonymized data via clustering.

Authors:  Maria Eugénia Ferrão; Paula Prata; Paulo Fazendeiro
Journal:  Sci Data       Date:  2022-07-30       Impact factor: 8.501

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.