Yuichi Sei1, Akihiko Ohsuga2. 1. The University of Electro-Communications, Tokyo, Japan. seiuny@uec.ac.jp. 2. The University of Electro-Communications, Tokyo, Japan.
Abstract
BACKGROUND: The importance of privacy protection in analyses of personal data, such as genome-wide association studies (GWAS), has grown in recent years. GWAS focuses on identifying single-nucleotide polymorphisms (SNPs) associated with certain diseases such as cancer and diabetes, and the chi-squared (χ2) hypothesis test of independence can be utilized for this identification. However, recent studies have shown that publishing the results of χ2 tests of SNPs or personal data could lead to privacy violations. Several studies have proposed anonymization methods for χ2 testing with ε-differential privacy, which is the cryptographic community's de facto privacy metric. However, existing methods can only be applied to 2×2 or 2×3 contingency tables, otherwise their accuracy is low for small numbers of samples. It is difficult to collect numerous high-sensitive samples in many cases such as COVID-19 analysis in its early propagation stage. RESULTS: We propose a novel anonymization method (RandChiDist), which anonymizes χ2 testing for small samples. We prove that RandChiDist satisfies differential privacy. We also experimentally evaluate its analysis using synthetic datasets and real two genomic datasets. RandChiDist achieved the least number of Type II errors among existing and baseline methods that can control the ratio of Type I errors. CONCLUSIONS: We propose a new differentially private method, named RandChiDist, for anonymizing χ2 values for an I×J contingency table with a small number of samples. The experimental results show that RandChiDist outperforms existing methods for small numbers of samples.
BACKGROUND: The importance of privacy protection in analyses of personal data, such as genome-wide association studies (GWAS), has grown in recent years. GWAS focuses on identifying single-nucleotide polymorphisms (SNPs) associated with certain diseases such as cancer and diabetes, and the chi-squared (χ2) hypothesis test of independence can be utilized for this identification. However, recent studies have shown that publishing the results of χ2 tests of SNPs or personal data could lead to privacy violations. Several studies have proposed anonymization methods for χ2 testing with ε-differential privacy, which is the cryptographic community's de facto privacy metric. However, existing methods can only be applied to 2×2 or 2×3 contingency tables, otherwise their accuracy is low for small numbers of samples. It is difficult to collect numerous high-sensitive samples in many cases such as COVID-19 analysis in its early propagation stage. RESULTS: We propose a novel anonymization method (RandChiDist), which anonymizes χ2 testing for small samples. We prove that RandChiDist satisfies differential privacy. We also experimentally evaluate its analysis using synthetic datasets and real two genomic datasets. RandChiDist achieved the least number of Type II errors among existing and baseline methods that can control the ratio of Type I errors. CONCLUSIONS: We propose a new differentially private method, named RandChiDist, for anonymizing χ2 values for an I×J contingency table with a small number of samples. The experimental results show that RandChiDist outperforms existing methods for small numbers of samples.
Entities:
Keywords:
Chi-squared testing; Differentical privacy; Privacy-preserving data mining
Authors: Donald F Conrad; Mattias Jakobsson; Graham Coop; Xiaoquan Wen; Jeffrey D Wall; Noah A Rosenberg; Jonathan K Pritchard Journal: Nat Genet Date: 2006-10-22 Impact factor: 38.330
Authors: T J Pemberton; M Jakobsson; D F Conrad; G Coop; J D Wall; J K Pritchard; P I Patel; N A Rosenberg Journal: Ann Hum Genet Date: 2007-05-30 Impact factor: 1.670
Authors: Zachary A Kohutek; Abraham J Wu; Zhigang Zhang; Amanda Foster; Shaun U Din; Ellen D Yorke; Robert Downey; Kenneth E Rosenzweig; Wolfgang A Weber; Andreas Rimner Journal: Lung Cancer Date: 2015-05-28 Impact factor: 5.705
Authors: Louis Jacob; Lee Smith; Laurie Butler; Yvonne Barnett; Igor Grabovac; Daragh McDermott; Nicola Armstrong; Anita Yakkundi; Mark A Tully Journal: J Sex Med Date: 2020-05-14 Impact factor: 3.802