Literature DB >> 35495121

A Sequence Obfuscation Method for Protecting Personal Genomic Privacy.

Shibiao Wan1, Jieqiong Wang2.   

Abstract

With the technological advances in recent decades, determining whole genome sequencing of a person has become feasible and affordable. As a result, large-scale individual genomic sequences are produced and collected for genetic medical diagnoses and cancer drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. It is highly urgent to develop methods which make the personal genomic data both utilizable and confidential. Existing genomic privacy-protection methods are either time-consuming for encryption or with low accuracy of data recovery. To tackle these problems, this paper proposes a sequence similarity-based obfuscation method, namely IterMegaBLAST, for fast and reliable protection of personal genomic privacy. Specifically, given a randomly selected sequence from a dataset of genomic sequences, we first use MegaBLAST to find its most similar sequence from the dataset. These two aligned sequences form a cluster, for which an obfuscated sequence was generated via a DNA generalization lattice scheme. These procedures are iteratively performed until all of the sequences in the dataset are clustered and their obfuscated sequences are generated. Experimental results on benchmark datasets demonstrate that under the same degree of anonymity, IterMegaBLAST significantly outperforms existing state-of-the-art approaches in terms of both utility accuracy and time complexity.
Copyright © 2022 Wan and Wang.

Entities:  

Keywords:  DNA generalization lattice; IterMegaBLAST; MegaBLAST; clustering; genomic privacy; machine learning; obfuscation methods; sequence similarity

Year:  2022        PMID: 35495121      PMCID: PMC9043694          DOI: 10.3389/fgene.2022.876686

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.772


  19 in total

1.  Determining the identifiability of DNA database entries.

Authors:  B Malin; L Sweeney
Journal:  Proc AMIA Symp       Date:  2000

2.  A greedy algorithm for aligning DNA sequences.

Authors:  Z Zhang; S Schwartz; L Wagner; W Miller
Journal:  J Comput Biol       Date:  2000 Feb-Apr       Impact factor: 1.479

Review 3.  Ethical, legal, and social implications of genomic medicine.

Authors:  Ellen Wright Clayton
Journal:  N Engl J Med       Date:  2003-08-07       Impact factor: 91.245

4.  How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems.

Authors:  Bradley Malin; Latanya Sweeney
Journal:  J Biomed Inform       Date:  2004-06       Impact factor: 6.317

5.  An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future.

Authors:  Bradley A Malin
Journal:  J Am Med Inform Assoc       Date:  2004-10-18       Impact factor: 4.497

6.  A cryptographic approach to securely share and query genomic sequences.

Authors:  Murat Kantarcioglu; Wei Jiang; Ying Liu; Bradley Malin
Journal:  IEEE Trans Inf Technol Biomed       Date:  2008-09

7.  Identifying personal genomes by surname inference.

Authors:  Melissa Gymrek; Amy L McGuire; David Golan; Eran Halperin; Yaniv Erlich
Journal:  Science       Date:  2013-01-18       Impact factor: 47.728

8.  Human DNA sequence variation in a 6.6-kb region containing the melanocortin 1 receptor promoter.

Authors:  K D Makova; M Ramsay; T Jenkins; W H Li
Journal:  Genetics       Date:  2001-07       Impact factor: 4.562

9.  Protecting genomic sequence anonymity with generalization lattices.

Authors:  B A Malin
Journal:  Methods Inf Med       Date:  2005       Impact factor: 2.176

10.  openSNP--a crowdsourced web resource for personal genomics.

Authors:  Bastian Greshake; Philipp E Bayer; Helge Rausch; Julia Reda
Journal:  PLoS One       Date:  2014-03-19       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.