Weiyi Xia1,2, Yongtai Liu2,3, Zhiyu Wan2,3, Yevgeniy Vorobeychik2,4, Murat Kantacioglu5, Steve Nyemba1,2, Ellen Wright Clayton2,6,7,8, Bradley A Malin1,2,3,9. 1. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 2. Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 3. Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA. 4. Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, Missouri, USA. 5. Department of Computer Science, University of Texas at Dallas, Dallas, Texas, USA. 6. Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 7. Law School, Vanderbilt University, Nashville, Tennessee, USA. 8. Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA. 9. Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Abstract
OBJECTIVE: Re-identification risk methods for biomedical data often assume a worst case, in which attackers know all identifiable features (eg, age and race) about a subject. Yet, worst-case adversarial modeling can overestimate risk and induce heavy editing of shared data. The objective of this study is to introduce a framework for assessing the risk considering the attacker's resources and capabilities. MATERIALS AND METHODS: We integrate 3 established risk measures (ie, prosecutor, journalist, and marketer risks) and compute re-identification probabilities for data subjects. This probability is dependent on an attacker's capabilities (eg, ability to obtain external identified resources) and the subject's decision on whether to reveal their participation in a dataset. We illustrate the framework through case studies using data from over 1 000 000 patients from Vanderbilt University Medical Center and show how re-identification risk changes when attackers are pragmatic and use 2 known resources for attack: (1) voter registration lists and (2) social media posts. RESULTS: Our framework illustrates that the risk is substantially smaller in the pragmatic scenarios than in the worst case. Our experiments yield a median worst-case risk of 0.987 (where 0 is least risky and 1 is most risky); however, the median reduction in risk was 90.1% in the voter registration scenario and 100% in the social media posts scenario. Notably, these observations hold true for a wide range of adversarial capabilities. CONCLUSIONS: This research illustrates that re-identification risk is situationally dependent and that appropriate adversarial modeling may permit biomedical data sharing on a wider scale than is currently the case.
OBJECTIVE: Re-identification risk methods for biomedical data often assume a worst case, in which attackers know all identifiable features (eg, age and race) about a subject. Yet, worst-case adversarial modeling can overestimate risk and induce heavy editing of shared data. The objective of this study is to introduce a framework for assessing the risk considering the attacker's resources and capabilities. MATERIALS AND METHODS: We integrate 3 established risk measures (ie, prosecutor, journalist, and marketer risks) and compute re-identification probabilities for data subjects. This probability is dependent on an attacker's capabilities (eg, ability to obtain external identified resources) and the subject's decision on whether to reveal their participation in a dataset. We illustrate the framework through case studies using data from over 1 000 000 patients from Vanderbilt University Medical Center and show how re-identification risk changes when attackers are pragmatic and use 2 known resources for attack: (1) voter registration lists and (2) social media posts. RESULTS: Our framework illustrates that the risk is substantially smaller in the pragmatic scenarios than in the worst case. Our experiments yield a median worst-case risk of 0.987 (where 0 is least risky and 1 is most risky); however, the median reduction in risk was 90.1% in the voter registration scenario and 100% in the social media posts scenario. Notably, these observations hold true for a wide range of adversarial capabilities. CONCLUSIONS: This research illustrates that re-identification risk is situationally dependent and that appropriate adversarial modeling may permit biomedical data sharing on a wider scale than is currently the case.
Authors: Angela K Green; Katherine E Reeder-Hayes; Robert W Corty; Ethan Basch; Mathew I Milowsky; Stacie B Dusetzina; Antonia V Bennett; William A Wood Journal: Oncologist Date: 2015-04-15
Authors: D M Roden; J M Pulley; M A Basford; G R Bernard; E W Clayton; J R Balser; D R Masys Journal: Clin Pharmacol Ther Date: 2008-05-21 Impact factor: 6.875
Authors: Stephen P Wright; Tyish S Hall Brown; Scott R Collier; Kathryn Sandberg Journal: Am J Physiol Regul Integr Comp Physiol Date: 2017-01-04 Impact factor: 3.619
Authors: Edmund C Lau; Fionna S Mowat; Michael A Kelsh; Jason C Legg; Nicole M Engel-Nitz; Heather N Watson; Helen L Collins; Robert J Nordyke; Joanna L Whyte Journal: Clin Epidemiol Date: 2011-10-11 Impact factor: 4.790
Authors: Melissa A Haendel; Christopher G Chute; Tellen D Bennett; David A Eichmann; Justin Guinney; Warren A Kibbe; Philip R O Payne; Emily R Pfaff; Peter N Robinson; Joel H Saltz; Heidi Spratt; Christine Suver; John Wilbanks; Adam B Wilcox; Andrew E Williams; Chunlei Wu; Clair Blacketer; Robert L Bradford; James J Cimino; Marshall Clark; Evan W Colmenares; Patricia A Francis; Davera Gabriel; Alexis Graves; Raju Hemadri; Stephanie S Hong; George Hripscak; Dazhi Jiao; Jeffrey G Klann; Kristin Kostka; Adam M Lee; Harold P Lehmann; Lora Lingrey; Robert T Miller; Michele Morris; Shawn N Murphy; Karthik Natarajan; Matvey B Palchuk; Usman Sheikh; Harold Solbrig; Shyam Visweswaran; Anita Walden; Kellie M Walters; Griffin M Weber; Xiaohan Tanner Zhang; Richard L Zhu; Benjamin Amor; Andrew T Girvin; Amin Manna; Nabeel Qureshi; Michael G Kurilla; Sam G Michael; Lili M Portilla; Joni L Rutter; Christopher P Austin; Ken R Gersing Journal: J Am Med Inform Assoc Date: 2021-03-01 Impact factor: 7.942