Kathleen Benitez1, Bradley Malin. 1. Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee 37203, USA.
Abstract
OBJECTIVE: Many healthcare organizations follow data protection policies that specify which patient identifiers must be suppressed to share "de-identified" records. Such policies, however, are often applied without knowledge of the risk of "re-identification". The goals of this work are: (1) to estimate re-identification risk for data sharing policies of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule; and (2) to evaluate the risk of a specific re-identification attack using voter registration lists. MEASUREMENTS: We define several risk metrics: (1) expected number of re-identifications; (2) estimated proportion of a population in a group of size g or less, and (3) monetary cost per re-identification. For each US state, we estimate the risk posed to hypothetical datasets, protected by the HIPAA Safe Harbor and Limited Dataset policies by an attacker with full knowledge of patient identifiers and with limited knowledge in the form of voter registries. RESULTS: The percentage of a state's population estimated to be vulnerable to unique re-identification (ie, g=1) when protected via Safe Harbor and Limited Datasets ranges from 0.01% to 0.25% and 10% to 60%, respectively. In the voter attack, this number drops for many states, and for some states is 0%, due to the variable availability of voter registries in the real world. We also find that re-identification cost ranges from $0 to $17,000, further confirming risk variability. CONCLUSIONS: This work illustrates that blanket protection policies, such as Safe Harbor, leave different organizations vulnerable to re-identification at different rates. It provides justification for locally performed re-identification risk estimates prior to sharing data.
OBJECTIVE: Many healthcare organizations follow data protection policies that specify which patient identifiers must be suppressed to share "de-identified" records. Such policies, however, are often applied without knowledge of the risk of "re-identification". The goals of this work are: (1) to estimate re-identification risk for data sharing policies of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule; and (2) to evaluate the risk of a specific re-identification attack using voter registration lists. MEASUREMENTS: We define several risk metrics: (1) expected number of re-identifications; (2) estimated proportion of a population in a group of size g or less, and (3) monetary cost per re-identification. For each US state, we estimate the risk posed to hypothetical datasets, protected by the HIPAA Safe Harbor and Limited Dataset policies by an attacker with full knowledge of patient identifiers and with limited knowledge in the form of voter registries. RESULTS: The percentage of a state's population estimated to be vulnerable to unique re-identification (ie, g=1) when protected via Safe Harbor and Limited Datasets ranges from 0.01% to 0.25% and 10% to 60%, respectively. In the voter attack, this number drops for many states, and for some states is 0%, due to the variable availability of voter registries in the real world. We also find that re-identification cost ranges from $0 to $17,000, further confirming risk variability. CONCLUSIONS: This work illustrates that blanket protection policies, such as Safe Harbor, leave different organizations vulnerable to re-identification at different rates. It provides justification for locally performed re-identification risk estimates prior to sharing data.
Authors: Charles Safran; Meryl Bloomrosen; W Edward Hammond; Steven Labkoff; Suzanne Markel-Fox; Paul C Tang; Don E Detmer Journal: J Am Med Inform Assoc Date: 2006-10-31 Impact factor: 4.497
Authors: Amy L McGuire; Melissa Basford; Lynn G Dressler; Stephanie M Fullerton; Barbara A Koenig; Rongling Li; Cathy A McCarty; Erin Ramos; Maureen E Smith; Carol P Somkin; Carol Waudby; Wendy A Wolf; Ellen Wright Clayton Journal: Genome Res Date: 2011-06-01 Impact factor: 9.043
Authors: Brendan C Delaney; Kevin A Peterson; Stuart Speedie; Adel Taweel; Theodoros N Arvanitis; F D Richard Hobbs Journal: Ann Fam Med Date: 2012 Jan-Feb Impact factor: 5.166
Authors: Mehmet Kuzu; Murat Kantarcioglu; Elizabeth Ashley Durham; Csaba Toth; Bradley Malin Journal: J Am Med Inform Assoc Date: 2012-07-30 Impact factor: 4.497