Weiyi Xia1, Raymond Heatherly2, Xiaofeng Ding3, Jiuyong Li4, Bradley A Malin5. 1. Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN, USA weiyi.xia@vanderbilt.edu. 2. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. 3. Huazhong University of Science and Technology, Wuhan, China. 4. School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, South Australia, Australia. 5. Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN, USA Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
Abstract
OBJECTIVE: The Health Insurance Portability and Accountability Act Privacy Rule enables healthcare organizations to share de-identified data via two routes. They can either 1) show re-identification risk is small (e.g., via a formal model, such as k-anonymity) with respect to an anticipated recipient or 2) apply a rule-based policy (i.e., Safe Harbor) that enumerates attributes to be altered (e.g., dates to years). The latter is often invoked because it is interpretable, but it fails to tailor protections to the capabilities of the recipient. The paper shows rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically trade off between these goals. METHODS: We extend an algorithm to efficiently compose an R-U frontier using a lattice of policy options. Risk is proportional to the number of patients to which a record corresponds, while utility is proportional to similarity of the original and de-identified distribution. We allow our method to search 20 000 rule-based policies (out of 2(700)) and compare the resulting frontier with k-anonymous solutions and Safe Harbor using the demographics of 10 U.S. states. RESULTS: The results demonstrate the rule-based frontier 1) consists, on average, of 5000 policies, 2% of which enable better utility with less risk than Safe Harbor and 2) the policies cover a broader spectrum of utility and risk than k-anonymity frontiers. CONCLUSIONS: R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.
OBJECTIVE: The Health Insurance Portability and Accountability Act Privacy Rule enables healthcare organizations to share de-identified data via two routes. They can either 1) show re-identification risk is small (e.g., via a formal model, such as k-anonymity) with respect to an anticipated recipient or 2) apply a rule-based policy (i.e., Safe Harbor) that enumerates attributes to be altered (e.g., dates to years). The latter is often invoked because it is interpretable, but it fails to tailor protections to the capabilities of the recipient. The paper shows rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically trade off between these goals. METHODS: We extend an algorithm to efficiently compose an R-U frontier using a lattice of policy options. Risk is proportional to the number of patients to which a record corresponds, while utility is proportional to similarity of the original and de-identified distribution. We allow our method to search 20 000 rule-based policies (out of 2(700)) and compare the resulting frontier with k-anonymous solutions and Safe Harbor using the demographics of 10 U.S. states. RESULTS: The results demonstrate the rule-based frontier 1) consists, on average, of 5000 policies, 2% of which enable better utility with less risk than Safe Harbor and 2) the policies cover a broader spectrum of utility and risk than k-anonymity frontiers. CONCLUSIONS: R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.
Authors: Peter Arzberger; Peter Schroeder; Anne Beaulieu; Geof Bowker; Kathleen Casey; Leif Laaksonen; David Moorman; Paul Uhlir; Paul Wouters Journal: Science Date: 2004-03-19 Impact factor: 47.728
Authors: Khaled El Emam; Fida Kamal Dankar; Romeo Issa; Elizabeth Jonker; Daniel Amyot; Elise Cogo; Jean-Pierre Corriveau; Mark Walker; Sadrul Chowdhury; Regis Vaillancourt; Tyson Roffey; Jim Bottomley Journal: J Am Med Inform Assoc Date: 2009-06-30 Impact factor: 4.497
Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497
Authors: Kenney Ng; Amol Ghoting; Steven R Steinhubl; Walter F Stewart; Bradley Malin; Jimeng Sun Journal: J Biomed Inform Date: 2013-12-25 Impact factor: 6.317
Authors: Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden Journal: Nat Biotechnol Date: 2013-12 Impact factor: 54.908
Authors: Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams Journal: Genet Med Date: 2013-06-06 Impact factor: 8.822
Authors: Elizabeth A McGlynn; Tracy A Lieu; Mary L Durham; Alan Bauck; Reesa Laws; Alan S Go; Jersey Chen; Heather Spencer Feigelson; Douglas A Corley; Deborah Rohm Young; Andrew F Nelson; Arthur J Davidson; Leo S Morales; Michael G Kahn Journal: J Am Med Inform Assoc Date: 2014-05-12 Impact factor: 4.497
Authors: David S Carrell; David J Cronkite; Muqun Rachel Li; Steve Nyemba; Bradley A Malin; John S Aberdeen; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497
Authors: David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497
Authors: Weiyi Xia; Yongtai Liu; Zhiyu Wan; Yevgeniy Vorobeychik; Murat Kantacioglu; Steve Nyemba; Ellen Wright Clayton; Bradley A Malin Journal: J Am Med Inform Assoc Date: 2021-03-18 Impact factor: 4.497
Authors: Johanna Eicher; Raffael Bild; Helmut Spengler; Klaus A Kuhn; Fabian Prasser Journal: BMC Med Inform Decis Mak Date: 2020-02-11 Impact factor: 2.796