Literature DB >> 17544262

A computational model to protect patient data from location-based re-identification.

Bradley Malin1.   

Abstract

OBJECTIVE: Health care organizations must preserve a patient's anonymity when disclosing personal data. Traditionally, patient identity has been protected by stripping identifiers from sensitive data such as DNA. However, simple automated methods can re-identify patient data using public information. In this paper, we present a solution to prevent a threat to patient anonymity that arises when multiple health care organizations disclose data. In this setting, a patient's location visit pattern, or "trail", can re-identify seemingly anonymous DNA to patient identity. This threat exists because health care organizations (1) cannot prevent the disclosure of certain types of patient information and (2) do not know how to systematically avoid trail re-identification. In this paper, we develop and evaluate computational methods that health care organizations can apply to disclose patient-specific DNA records that are impregnable to trail re-identification. METHODS AND MATERIALS: To prevent trail re-identification, we introduce a formal model called k-unlinkability, which enables health care administrators to specify different degrees of patient anonymity. Specifically, k-unlinkability is satisfied when the trail of each DNA record is linkable to no less than k identified records. We present several algorithms that enable health care organizations to coordinate their data disclosure, so that they can determine which DNA records can be shared without violating k-unlinkability. We evaluate the algorithms with the trails of patient populations derived from publicly available hospital discharge databases. Algorithm efficacy is evaluated using metrics based on real world applications, including the number of suppressed records and the number of organizations that disclose records.
RESULTS: Our experiments indicate that it is unnecessary to suppress all patient records that initially violate k-unlinkability. Rather, only portions of the trails need to be suppressed. For example, if each hospital discloses 100% of its data on patients diagnosed with cystic fibrosis, then 48% of the DNA records are 5-unlinkable. A naïve solution would suppress the 52% of the DNA records that violate 5-unlinkability. However, by applying our protection algorithms, the hospitals can disclose 95% of the DNA records, all of which are 5-unlinkable. Similar findings hold for all populations studied.
CONCLUSION: This research demonstrates that patient anonymity can be formally protected in shared databases. Our findings illustrate that significant quantities of patient-specific data can be disclosed with provable protection from trail re-identification. The configurability of our methods allows health care administrators to quantify the effects of different levels of privacy protection and formulate policy accordingly.

Entities:  

Mesh:

Year:  2007        PMID: 17544262     DOI: 10.1016/j.artmed.2007.04.002

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  7 in total

1.  Implementing partnership-driven clinical federated electronic health record data sharing networks.

Authors:  Kari A Stephens; Nicholas Anderson; Ching-Ping Lin; Hossein Estiri
Journal:  Int J Med Inform       Date:  2016-06-01       Impact factor: 4.046

Review 2.  Identifiability in biobanks: models, measures, and mitigation strategies.

Authors:  Bradley Malin; Grigorios Loukides; Kathleen Benitez; Ellen Wright Clayton
Journal:  Hum Genet       Date:  2011-07-08       Impact factor: 4.132

3.  The disclosure of diagnosis codes can breach research participants' privacy.

Authors:  Grigorios Loukides; Joshua C Denny; Bradley Malin
Journal:  J Am Med Inform Assoc       Date:  2010 May-Jun       Impact factor: 4.497

4.  An Entropy Approach to Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains.

Authors:  Edoardo M Airoldi; Xue Bai; Bradley A Malin
Journal:  Decis Support Syst       Date:  2011-04-01       Impact factor: 5.795

5.  Genetic data sharing and privacy.

Authors:  Marco D Sorani; John K Yue; Sourabh Sharma; Geoffrey T Manley; Adam R Ferguson
Journal:  Neuroinformatics       Date:  2015-01

6.  Implementation of a deidentified federated data network for population-based cohort discovery.

Authors:  Nicholas Anderson; Aaron Abend; Aaron Mandel; Estella Geraghty; Davera Gabriel; Rob Wynden; Michael Kamerick; Kent Anderson; Julie Rainwater; Peter Tarczy-Hornoch
Journal:  J Am Med Inform Assoc       Date:  2011-08-26       Impact factor: 4.497

7.  Health information security: a case study of three selected medical centers in iran.

Authors:  Nafiseh Hajrahimi; Sayed Mehdi Hejazi Dehaghani; Abbas Sheikhtaheri
Journal:  Acta Inform Med       Date:  2013-03
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.