Literature DB >> 16400377

Protecting genomic sequence anonymity with generalization lattices.

B A Malin1.   

Abstract

OBJECTIVES: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences.
METHODS: The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines).
RESULTS: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy.
CONCLUSIONS: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

Entities:  

Mesh:

Year:  2005        PMID: 16400377

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  11 in total

Review 1.  Privacy challenges and research opportunities for genomic data sharing.

Authors:  Luca Bonomi; Yingxiang Huang; Lucila Ohno-Machado
Journal:  Nat Genet       Date:  2020-06-29       Impact factor: 38.330

Review 2.  Identifiability in biobanks: models, measures, and mitigation strategies.

Authors:  Bradley Malin; Grigorios Loukides; Kathleen Benitez; Ellen Wright Clayton
Journal:  Hum Genet       Date:  2011-07-08       Impact factor: 4.132

3.  The disclosure of diagnosis codes can breach research participants' privacy.

Authors:  Grigorios Loukides; Joshua C Denny; Bradley Malin
Journal:  J Am Med Inform Assoc       Date:  2010 May-Jun       Impact factor: 4.497

4.  Big Data Privacy in Biomedical Research.

Authors:  Shuang Wang; Luca Bonomi; Wenrui Dai; Feng Chen; Cynthia Cheung; Cinnamon S Bloss; Samuel Cheng; Xiaoqian Jiang
Journal:  IEEE Trans Big Data       Date:  2016-09-13

5.  A Sequence Obfuscation Method for Protecting Personal Genomic Privacy.

Authors:  Shibiao Wan; Jieqiong Wang
Journal:  Front Genet       Date:  2022-04-13       Impact factor: 4.772

Review 6.  Routes for breaching and protecting genetic privacy.

Authors:  Yaniv Erlich; Arvind Narayanan
Journal:  Nat Rev Genet       Date:  2014-05-08       Impact factor: 53.242

7.  Privacy in the Genomic Era.

Authors:  Muhammad Naveed; Erman Ayday; Ellen W Clayton; Jacques Fellay; Carl A Gunter; Jean-Pierre Hubaux; Bradley A Malin; Xiaofeng Wang
Journal:  ACM Comput Surv       Date:  2015-09       Impact factor: 10.282

8.  Methods for the de-identification of electronic health records for genomic research.

Authors:  Khaled El Emam
Journal:  Genome Med       Date:  2011-04-27       Impact factor: 11.117

9.  Using game theory to thwart multistage privacy intrusions when sharing data.

Authors:  Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Yongtai Liu; Myrna Wooders; Jia Guo; Zhijun Yin; Ellen Wright Clayton; Murat Kantarcioglu; Bradley A Malin
Journal:  Sci Adv       Date:  2021-12-10       Impact factor: 14.136

10.  Ethical and practical issues associated with aggregating databases.

Authors:  David R Karp; Shelley Carlin; Robert Cook-Deegan; Daniel E Ford; Gail Geller; David N Glass; Hank Greely; Joel Guthridge; Jeffrey Kahn; Richard Kaslow; Cheryl Kraft; Kathleen Macqueen; Bradley Malin; Richard H Scheuerman; Jeremy Sugarman
Journal:  PLoS Med       Date:  2008-09-23       Impact factor: 11.069

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.