Kerem Ayoz1, Miray Aysen1, Erman Ayday1,2, A Ercument Cicek1,3. 1. Computer Engineering Department, Bilkent University, Ankara 06800, Turkey. 2. Computer and Data Sciences Department, Case Western Reserve University, Cleveland, OH 44106, USA. 3. Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Abstract
MOTIVATION: Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted 'genomic data sharing beacon' protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable to re-identification (or membership inference) attacks. As beacons are generally associated with sensitive phenotype information, re-identification creates a significant risk for the participants. Unfortunately, proposed countermeasures against such attacks have failed to be effective, as they do not consider the utility of beacon protocol. RESULTS: In this study, for the first time, we analyze the mitigation effect of the kinship relationships among beacon participants against re-identification attacks. We argue that having multiple family members in a beacon can garble the information for attacks since a substantial number of variants are shared among kin-related people. Using family genomes from HapMap and synthetically generated datasets, we show that having one of the parents of a victim in the beacon causes (i) significant decrease in the power of attacks and (ii) substantial increase in the number of queries needed to confirm an individual's beacon membership. We also show how the protection effect attenuates when more distant relatives, such as grandparents are included alongside the victim. Furthermore, we quantify the utility loss due adding relatives and show that it is smaller compared with flipping based techniques.
MOTIVATION: Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted 'genomic data sharing beacon' protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable to re-identification (or membership inference) attacks. As beacons are generally associated with sensitive phenotype information, re-identification creates a significant risk for the participants. Unfortunately, proposed countermeasures against such attacks have failed to be effective, as they do not consider the utility of beacon protocol. RESULTS: In this study, for the first time, we analyze the mitigation effect of the kinship relationships among beacon participants against re-identification attacks. We argue that having multiple family members in a beacon can garble the information for attacks since a substantial number of variants are shared among kin-related people. Using family genomes from HapMap and synthetically generated datasets, we show that having one of the parents of a victim in the beacon causes (i) significant decrease in the power of attacks and (ii) substantial increase in the number of queries needed to confirm an individual's beacon membership. We also show how the protection effect attenuates when more distant relatives, such as grandparents are included alongside the victim. Furthermore, we quantify the utility loss due adding relatives and show that it is smaller compared with flipping based techniques.
Authors: Marc Fiume; Miroslav Cupak; Stephen Keenan; Jordi Rambla; Sabela de la Torre; Stephanie O M Dyke; Anthony J Brookes; Knox Carey; David Lloyd; Peter Goodhand; Maximilian Haeussler; Michael Baudis; Heinz Stockinger; Lena Dolman; Ilkka Lappalainen; Juha Törnroos; Mikael Linden; J Dylan Spalding; Saif Ur-Rehman; Angela Page; Paul Flicek; Stephen Sherry; David Haussler; Susheel Varma; Gary Saunders; Serena Scollen Journal: Nat Biotechnol Date: 2019-03 Impact factor: 54.908
Authors: Zhiyu Wan; James W Hazel; Ellen Wright Clayton; Yevgeniy Vorobeychik; Murat Kantarcioglu; Bradley A Malin Journal: Nat Rev Genet Date: 2022-03-04 Impact factor: 59.581