| Literature DB >> 34321961 |
Hasan B Kartal1, Xiao-Bai Li2.
Abstract
This study concerns the risks of privacy disclosure when sharing and releasing a dataset in which each individual may be associated with multiple records. Existing data privacy approaches and policies typically assume that each individual in a shared dataset corresponds to a single record, leading to an underestimation of the disclosure risks in multiple records per person scenarios. We propose two novel measures of privacy disclosure to arrive at a more appropriate assessment of disclosure risks. The first measure assesses individual-record disclosure risk based upon the frequency distribution of individuals' occurrences. The second measure assesses sensitive-attribute disclosure risk based upon the number of individuals affiliated with a sensitive value. We show that the two proposed disclosure measures generalize the well-known k-anonymity and l-diversity measures, respectively, and work for scenarios with either a single record or multiple records per person. We have developed an efficient computational procedure that integrates the two proposed measures and a data quality measure to anonymize the data with multiple records per person when sharing and releasing the data for research and analytics. The results of the experimental evaluation using real-world data demonstrate the advantage of the proposed approach over existing techniques for protecting privacy while preserving data quality.Keywords: Data Privacy; Gini Index; k-Anonymity; kd-Trees; l-Diversity
Year: 2020 PMID: 34321961 PMCID: PMC8315096 DOI: 10.17705/1jais.00643
Source DB: PubMed Journal: J Assoc Inf Syst ISSN: 1536-9323 Impact factor: 5.149