| Literature DB >> 28874526 |
Christoph Lippert1, Riccardo Sabatini2, M Cyrus Maher2, Eun Yong Kang2, Seunghak Lee2, Okan Arikan2, Alena Harley2, Axel Bernal2, Peter Garst2, Victor Lavrenko2, Ken Yocum2, Theodore Wong2, Mingfu Zhu2, Wen-Yun Yang2, Chris Chang2, Tim Lu3, Charlie W H Lee3, Barry Hicks2, Smriti Ramakrishnan2, Haibao Tang2, Chao Xie4, Jason Piper4, Suzanne Brewerton4, Yaron Turpaz3,4, Amalio Telenti3, Rhonda K Roby3,5, Franz J Och2, J Craig Venter6,5.
Abstract
Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.Entities:
Keywords: DNA phenotyping; genome sequencing; genomic privacy; phenotype prediction; reidentification
Mesh:
Year: 2017 PMID: 28874526 PMCID: PMC5617305 DOI: 10.1073/pnas.1711125114
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205