Literature DB >> 33186529

Data Sanitization to Reduce Private Information Leakage from Functional Genomics.

Gamze Gürsoy1, Prashant Emani1, Charlotte M Brannon1, Otto A Jolanki2, Arif Harmanci3, J Seth Strattan2, J Michael Cherry2, Andrew D Miranker4, Mark Gerstein5.   

Abstract

The generation of functional genomics datasets is surging, because they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intent behind functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to broadly share raw reads for better statistical power and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs. Our protocol works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA sequencing. It involves quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  RNA-seq; data sanitization“; functional genomics; genome privacy; linkage attacks; surreptitious DNA sequencing

Mesh:

Year:  2020        PMID: 33186529      PMCID: PMC7672785          DOI: 10.1016/j.cell.2020.09.036

Source DB:  PubMed          Journal:  Cell        ISSN: 0092-8674            Impact factor:   41.582


  32 in total

1.  A One-Penny Imputed Genome from Next-Generation Reference Panels.

Authors:  Brian L Browning; Ying Zhou; Sharon R Browning
Journal:  Am J Hum Genet       Date:  2018-08-09       Impact factor: 11.025

2.  Near-optimal probabilistic RNA-seq quantification.

Authors:  Nicolas L Bray; Harold Pimentel; Páll Melsted; Lior Pachter
Journal:  Nat Biotechnol       Date:  2016-04-04       Impact factor: 54.908

3.  A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis.

Authors:  Eunjee Lee; Seungyeul Yoo; Wenhui Wang; Zhidong Tu; Jun Zhu
Journal:  Gigascience       Date:  2019-07-01       Impact factor: 6.524

4.  From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.

Authors:  Geraldine A Van der Auwera; Mauricio O Carneiro; Christopher Hartl; Ryan Poplin; Guillermo Del Angel; Ami Levy-Moonshine; Tadeusz Jordan; Khalid Shakir; David Roazen; Joel Thibault; Eric Banks; Kiran V Garimella; David Altshuler; Stacey Gabriel; Mark A DePristo
Journal:  Curr Protoc Bioinformatics       Date:  2013

5.  On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy.

Authors:  Hae Kyung Im; Eric R Gamazon; Dan L Nicolae; Nancy J Cox
Journal:  Am J Hum Genet       Date:  2012-03-28       Impact factor: 11.025

6.  A complete bacterial genome assembled de novo using only nanopore sequencing data.

Authors:  Nicholas J Loman; Joshua Quick; Jared T Simpson
Journal:  Nat Methods       Date:  2015-06-15       Impact factor: 28.547

7.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

8.  MODMatcher: multi-omics data matcher for integrative genomic analysis.

Authors:  Seungyeul Yoo; Tao Huang; Joshua D Campbell; Eunjee Lee; Zhidong Tu; Mark W Geraci; Charles A Powell; Eric E Schadt; Avrum Spira; Jun Zhu
Journal:  PLoS Comput Biol       Date:  2014-08-14       Impact factor: 4.475

9.  NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.

Authors:  Sejoon Lee; Soohyun Lee; Scott Ouellette; Woong-Yang Park; Eunjung A Lee; Peter J Park
Journal:  Nucleic Acids Res       Date:  2017-06-20       Impact factor: 16.971

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more
  7 in total

Review 1.  Functional genomics data: privacy risk assessment and technological mitigation.

Authors:  Gamze Gürsoy; Tianxiao Li; Susanna Liu; Eric Ni; Charlotte M Brannon; Mark B Gerstein
Journal:  Nat Rev Genet       Date:  2021-11-10       Impact factor: 53.242

2.  Storing and analyzing a genome on a blockchain.

Authors:  Gamze Gürsoy; Charlotte M Brannon; Eric Ni; Sarah Wagner; Amol Khanna; Mark Gerstein
Journal:  Genome Biol       Date:  2022-06-29       Impact factor: 17.906

Review 3.  "Big Data" Approaches for Prevention of the Metabolic Syndrome.

Authors:  Xinping Jiang; Zhang Yang; Shuai Wang; Shuanglin Deng
Journal:  Front Genet       Date:  2022-04-27       Impact factor: 4.772

4.  BAMboozle removes genetic variation from human sequence data for open data sharing.

Authors:  Christoph Ziegenhain; Rickard Sandberg
Journal:  Nat Commun       Date:  2021-10-28       Impact factor: 14.919

5.  Recovering genotypes and phenotypes using allele-specific genes.

Authors:  Gamze Gürsoy; Nancy Lu; Sarah Wagner; Mark Gerstein
Journal:  Genome Biol       Date:  2021-09-07       Impact factor: 13.583

Review 6.  Sociotechnical safeguards for genomic data privacy.

Authors:  Zhiyu Wan; James W Hazel; Ellen Wright Clayton; Yevgeniy Vorobeychik; Murat Kantarcioglu; Bradley A Malin
Journal:  Nat Rev Genet       Date:  2022-03-04       Impact factor: 59.581

Review 7.  Privacy considerations for sharing genomics data.

Authors:  Marie Oestreich; Dingfan Chen; Joachim L Schultze; Mario Fritz; Matthias Becker
Journal:  EXCLI J       Date:  2021-07-16       Impact factor: 4.068

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.