Literature DB >> 32478311

Differential privacy in the 2020 US census: what will it do? Quantifying the accuracy/privacy tradeoff.

Samantha Petti1, Abraham Flaxman2.   

Abstract

Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents' data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau.
Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy.
Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion. Copyright:
© 2020 Petti S and Flaxman A.

Entities:  

Keywords:  Decennial census; TopDown algorithm; differential privacy; empirical privacy loss

Year:  2020        PMID: 32478311      PMCID: PMC7216402          DOI: 10.12688/gatesopenres.13089.2

Source DB:  PubMed          Journal:  Gates Open Res        ISSN: 2572-4754


  4 in total

1.  How differential privacy will affect our understanding of health disparities in the United States.

Authors:  Alexis R Santos-Lozada; Jeffrey T Howard; Ashton M Verdery
Journal:  Proc Natl Acad Sci U S A       Date:  2020-05-28       Impact factor: 11.205

2.  Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).

Authors:  Jason A Thomas; Randi E Foraker; Noa Zamstein; Jon D Morrow; Philip R O Payne; Adam B Wilcox
Journal:  J Am Med Inform Assoc       Date:  2022-07-12       Impact factor: 7.942

Review 3.  Differential privacy for public health data: An innovative tool to optimize information sharing while protecting data confidentiality.

Authors:  Amalie Dyda; Michael Purcell; Stephanie Curtis; Emma Field; Priyanka Pillai; Kieran Ricardo; Haotian Weng; Jessica C Moore; Michael Hewett; Graham Williams; Colleen L Lau
Journal:  Patterns (N Y)       Date:  2021-12-10

4.  Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).

Authors:  Jason A Thomas; Randi E Foraker; Noa Zamstein; Philip R O Payne; Adam B Wilcox
Journal:  medRxiv       Date:  2021-07-08
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.