| Literature DB >> 25521367 |
Abstract
In response to the growing interest in genome-wide association study (GWAS) data privacy, the Integrating Data for Analysis, Anonymization and SHaring (iDASH) center organized the iDASH Healthcare Privacy Protection Challenge, with the aim of investigating the effectiveness of applying privacy-preserving methodologies to human genetic data. This paper is based on a submission to the iDASH Healthcare Privacy Protection Challenge. We apply privacy-preserving methods that are adapted from Uhler et al. 2013 and Yu et al. 2014 to the challenge's data and analyze the data utility after the data are perturbed by the privacy-preserving methods. Major contributions of this paper include new interpretation of the χ2 statistic in a GWAS setting and new results about the Hamming distance score, a key component for one of the privacy-preserving methods.Entities:
Mesh:
Year: 2014 PMID: 25521367 PMCID: PMC4290802 DOI: 10.1186/1472-6947-14-S1-S3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Genotype table
| # of minor alleles | Total | |||
|---|---|---|---|---|
| 0 | 1 | 2 | ||
| Case | R | |||
| Control | S | |||
| Total | N | |||
Allelic table
| Allele type | Total | ||
|---|---|---|---|
| Minor | Major | ||
| Case | 2 | 2R | |
| Control | 2 | 2S | |
| Total | 2 | 2N | |
Figure 1Legal moves in the space of genotype tables with fixed .
Figure 2An example of a genotype table, . Each dot represent a genotype table. Each dashed line has slope = fl2, representing the lines x = 2r0 + r1. The red line is x = (2s0 + s1)R/S = 2r0 + r1, and the two black lines correspond to values of (2r0 + r1) such that Y(r0, r1; ) = c, where c is a pre-specified significance threshold value.
Figure 3Risk-utility plots. Performance comparison of Algorithm 1 and Algorithm 2 with χ2 statistic or Hamming distance score as score function. Each row corresponds to a fixed K, the number of top SNPs to release. Each column corresponds to a fixed threshold p-value, which is relevant to the mechanism based on Hamming distance score only. The threshold p-values 0.1 and 0.01 divided by the number of SNPs.