| Literature DB >> 34493313 |
Gamze Gürsoy1,2, Nancy Lu3,4, Sarah Wagner5, Mark Gerstein6,7,8,9.
Abstract
With the recent increase in RNA sequencing efforts using large cohorts of individuals, surveying allele-specific gene expression is becoming increasingly frequent. Here, we report that, despite not containing explicit variant information, a list of genes known to be allele-specific in an individual is enough to recover key variants and link the individuals back to their genotypes and phenotypes. This creates a privacy conundrum.Entities:
Mesh:
Year: 2021 PMID: 34493313 PMCID: PMC8425091 DOI: 10.1186/s13059-021-02477-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Schematic representation of using allele-specific genes to de-anonymize individuals. a Schematic of going from a list of genes to a list of SNPs. b De-anonymizing a list of anonymous ASE genes using publicly available genomes from known individuals and inferring private phenotypes. b Recovering the anonymized genome of a known individual by using their ASE gene list
Fig. 2Linking attack accuracy and impact of auxiliary information on linking ability. a The number of individuals that can be linked to their genomes with different statistical techniques. b The percentage of individuals that can be linked to their genomes when we relax the criteria from best match to top k ranked. c The top 20 genes that are found on the ASE gene list of correctly identified and misidentified individuals. d The self-information of ASE genes vs. the number of individuals that they are observed as ASE. e The percentage of correctly linked individuals when we used different combinations of ASE genes. f The percentage of correctly linked individuals when we used biological sex and/or ancestry as auxiliary information