| Literature DB >> 29888039 |
Diyue Bu1, Xiaofeng Wang1, Haixu Tang1.
Abstract
The acquisition of human genomic sequences is of increasing convenience and reduced expense. The sharing of these data is critical for biomedical researchers to study genomic loci or variants that are potentially associated with human diseases1. However, sharing genomic data broadly is impeded by privacy concerns. The statistical inference techniques for the re-identification of genomic data donors have been extensively investigated in the literature2-5. The Beacon services project is recently brought into view, aiming to test the willingness of data holders to share genomic data in a simple technical context: a query to ask a specified nucleotide at a given position within a chromosome6, also suffering from being compromised7,8. In this paper, we introduce a real-time mitigation method to protect Beacon services from re-identification attacks7, and show that it performs favorably in comparison with previous approaches on mitigation efficiency, i.e., with lower re-identification risks and higher utility of Beacon database.Entities:
Year: 2018 PMID: 29888039 PMCID: PMC5961811
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1:(a) RTF mitigation method. The method flips the answers to some rare SNPs. The vulnerability of each individual in the database is monitored using the LRT scores on previously queried variants. The decision of flipping the answer to the query of a rare variant is made based on the vulnerability of the target individual who carries the rare variant. The vulnerability of the target individual is updated and recorded internally after the answer is returned. (b) The secure-Beacon workflow. Three methods (RF, SF and RTF) were implemented for mitigating re-identification risks in an open-query Beacon system. For a newly-received query, the true answer (“yes” or “no”) is first obtained. If query has been previously answered, the same answer will be returned. Otherwise, the mitigation procedure is activated. Note that RF and RTF are applied to rare variants while SF may be applied to both rare and common variants (see Text for details).
Proportions of queries in each allele frequency range
| Allele frequency | Singleton | < 0.001 not singleton | 0.001 -0.01 | 0.01 -0.05 | 0.05 -0.5 | 0.5 -1 |
|---|---|---|---|---|---|---|
| Queries in ExAC | 0.434 | 0.418 | 0.0076 | 0.023 | 0.033 | 0.014 |
Figure 2:The percentage of flipped rare SNPs under different query patterns
Figure 3:The percentage of flipped SNPs in total under different query patterns