| Literature DB >> 34345236 |
Marie Oestreich1, Dingfan Chen2, Joachim L Schultze1,3,4, Mario Fritz2, Matthias Becker1.
Abstract
An increasing amount of attention has been geared towards understanding the privacy risks that arise from sharing genomic data of human origin. Most of these efforts have focused on issues in the context of genomic sequence data, but the popularity of techniques for collecting other types of genome-related data has prompted researchers to investigate privacy concerns in a broader genomic context. In this review, we give an overview of different types of genome-associated data, their individual ways of revealing sensitive information, the motivation to share them as well as established and upcoming methods to minimize information leakage. We further discuss the concise threats that are being posed, who is at risk, and how the risk level compares to potential benefits, all while addressing the topic in the context of modern technology, methodology, and information sharing culture. Additionally, we will discuss the current legal situation regarding the sharing of genomic data in a selection of countries, evaluating the scope of their applicability as well as their limitations. We will finalize this review by evaluating the development that is required in the scientific field in the near future in order to improve and develop privacy-preserving data sharing techniques for the genomic context.Entities:
Keywords: data privacy; data sharing; epigenomic data; genomic data; transcriptomic data
Year: 2021 PMID: 34345236 PMCID: PMC8326502 DOI: 10.17179/excli2021-4002
Source DB: PubMed Journal: EXCLI J ISSN: 1611-2156 Impact factor: 4.068
Figure 1Brief overview over the contents of this review. a: The three different types of genomic data that are covered in this work. A: Adenine; C: Cytosine; G: Guanine; T: Thymine; me: methyl group. b: Shown are a selection of applications that encourage data sharing. From left to right: Genomic data sharing is often required when building machine learning models in order to increase the available sample size required for training. Collecting and enriching data on minorities can reduce subpopulation bias in a trained model. Data often needs joining in multiparty studies when it is collected at different sites. Other motivators are sharing genomic data to allow the reproducibility of results or to reuse the data for new scientific questions. c: The subject re-identification is the core concern in genomic data privacy. The ability to produce uniquely identifying Single-Nucleotide-Polymorphism(SNP)-barcodes from the data allows an adversary to cross-reference these with public databases, often containing meta information that give away sensitive medical information. d: A timeline of selected laws that were introduced in several countries to protect citizens from discrimination based on genome-related data. e: Displayed are a selection of commonly used data sharing methods, colour-coded based on the maximum level of security they can provide. f: Selection of upcoming sharing techniques that are subject of ongoing research. Also shown as a necessary future step is the invocation of globally valid laws to protect subjects from discrimination in the case of a security breach. GAN: Generative Adversarial Network; RBM: Restricted Boltzmann Machine; VAE: Variational Autoencoder.