Literature DB >> 21647242

An Entropy Approach to Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains.

Edoardo M Airoldi1, Xue Bai, Bradley A Malin.   

Abstract

We live in an increasingly mobile world, which leads to the duplication of information across domains. Though organizations attempt to obscure the identities of their constituents when sharing information for worthwhile purposes, such as basic research, the uncoordinated nature of such environment can lead to privacy vulnerabilities. For instance, disparate healthcare providers can collect information on the same patient. Federal policy requires that such providers share "de-identified" sensitive data, such as biomedical (e.g., clinical and genomic) records. But at the same time, such providers can share identified information, devoid of sensitive biomedical data, for administrative functions. On a provider-by-provider basis, the biomedical and identified records appear unrelated, however, links can be established when multiple providers' databases are studied jointly. The problem, known as trail disclosure, is a generalized phenomenon and occurs because an individual's location access pattern can be matched across the shared databases. Due to technical and legal constraints, it is often difficult to coordinate between providers and thus it is critical to assess the disclosure risk in distributed environments, so that we can develop techniques to mitigate such risks. Research on privacy protection has so far focused on developing technologies to suppress or encrypt identifiers associated with sensitive information. There is growing body of work on the formal assessment of the disclosure risk of database entries in publicly shared databases, but a less attention has been paid to the distributed setting. In this research, we review the trail disclosure problem in several domains with known vulnerabilities and show that disclosure risk is influenced by the distribution of how people visit service providers. Based on empirical evidence, we propose an entropy metric for assessing such risk in shared databases prior to their release. This metric assesses risk by leveraging the statistical characteristics of a visit distribution, as opposed to person-level data. It is computationally efficient and superior to existing risk assessment methods, which rely on ad hoc assessment that are often computationally expensive and unreliable. We evaluate our approach on a range of location access patterns in simulated environments. Our results demonstrate the approach is effective at estimating trail disclosure risks and the amount of self-information contained in a distributed system is one of the main driving factors.

Entities:  

Year:  2011        PMID: 21647242      PMCID: PMC3107517          DOI: 10.1016/j.dss.2010.11.014

Source DB:  PubMed          Journal:  Decis Support Syst        ISSN: 0167-9236            Impact factor:   5.795


  15 in total

1.  Determining the identifiability of DNA database entries.

Authors:  B Malin; L Sweeney
Journal:  Proc AMIA Symp       Date:  2000

2.  Privacy enhancing techniques - the key to secure communication and management of clinical and genomic data.

Authors:  G J E De Moor; B Claerhout; F De Meyer
Journal:  Methods Inf Med       Date:  2003       Impact factor: 2.176

3.  How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems.

Authors:  Bradley Malin; Latanya Sweeney
Journal:  J Biomed Inform       Date:  2004-06       Impact factor: 6.317

4.  Understanding and enhancing the value of hospital discharge data.

Authors:  Julie A Schoenman; Janet P Sutton; Anne Elixhauser; Denise Love
Journal:  Med Care Res Rev       Date:  2007-08       Impact factor: 3.929

5.  Encrypting personal identifiers.

Authors:  E Meux
Journal:  Health Serv Res       Date:  1994-06       Impact factor: 3.402

6.  Protection of privacy by third-party encryption in genetic research in Iceland.

Authors:  J R Gulcher; K Kristjánsson; H Gudbjartsson; K Stefánsson
Journal:  Eur J Hum Genet       Date:  2000-10       Impact factor: 4.246

Review 7.  Genome-wide association studies in pharmacogenomics: successes and lessons.

Authors:  Alison A Motsinger-Reif; Eric Jorgenson; Mary V Relling; Deanna L Kroetz; Richard Weinshilboum; Nancy J Cox; Dan M Roden
Journal:  Pharmacogenet Genomics       Date:  2013-08       Impact factor: 2.089

8.  Secure construction of k-unlinkable patient records from distributed providers.

Authors:  Bradley Malin
Journal:  Artif Intell Med       Date:  2009-10-28       Impact factor: 5.326

Review 9.  Collaborative genome-wide association studies of diverse diseases: programs of the NHGRI's office of population genomics.

Authors:  Teri A Manolio
Journal:  Pharmacogenomics       Date:  2009-02       Impact factor: 2.533

10.  A computational model to protect patient data from location-based re-identification.

Authors:  Bradley Malin
Journal:  Artif Intell Med       Date:  2007-06-01       Impact factor: 5.326

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.