| Literature DB >> 34909703 |
Amalie Dyda1, Michael Purcell2, Stephanie Curtis3, Emma Field4,5, Priyanka Pillai6, Kieran Ricardo2, Haotian Weng2, Jessica C Moore7, Michael Hewett8, Graham Williams2, Colleen L Lau1,4.
Abstract
Coronavirus disease 2019 (COVID-19) has highlighted the need for the timely collection and sharing of public health data. It is important that data sharing is balanced with protecting confidentiality. Here we discuss an innovative mechanism to protect health data, called differential privacy. Differential privacy is a mathematically rigorous definition of privacy that aims to protect against all possible adversaries. In layperson's terms, statistical noise is applied to the data so that overall patterns can be described, but data on individuals are unlikely to be extracted. One of the first use cases for health data in Australia is the development of the COVID-19 Real-Time Information System for Preparedness and Epidemic Response (CRISPER), which provides proof of concept for the use of this technology in the health sector. If successful, this will benefit future sharing of public health data.Entities:
Keywords: COVID-19; data privacy; surveillance
Year: 2021 PMID: 34909703 PMCID: PMC8662814 DOI: 10.1016/j.patter.2021.100366
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1Comparative plot of the density functions of the Laplace (0; 1) and the Gaussian (0; 1) distributions
Note that the Laplace distribution has a sharp “peak” at zero, while the Gaussian is more rounded. Also note that the tails of the Laplace distribution are much heavier than those for the Gaussian distribution. That is, samples drawn from the Laplace distribution are more likely to be farther away from the mean than are samples drawn from the Gaussian distribution.
Figure 2Distribution of the probabilities of query responses produced by the Laplace mechanism when Alice has been diagnosed with COVID-19 (blue) and when Alice has not been diagnosed with COVID-19 (orange).
Figure 3Histogram of real data (blue) compared with differentially private query responses of the same dataset (ϵ = 1/8; orange).