| Literature DB >> 35327940 |
Hai Liu1,2,3, Changgen Peng1,2,3, Youliang Tian2,3, Shigong Long2,3, Feng Tian4, Zhenqiang Wu4.
Abstract
The existing work has conducted in-depth research and analysis on global differential privacy (GDP) and local differential privacy (LDP) based on information theory. However, the data privacy preserving community does not systematically review and analyze GDP and LDP based on the information-theoretic channel model. To this end, we systematically reviewed GDP and LDP from the perspective of the information-theoretic channel in this survey. First, we presented the privacy threat model under information-theoretic channel. Second, we described and compared the information-theoretic channel models of GDP and LDP. Third, we summarized and analyzed definitions, privacy-utility metrics, properties, and mechanisms of GDP and LDP under their channel models. Finally, we discussed the open problems of GDP and LDP based on different types of information-theoretic channel models according to the above systematic review. Our main contribution provides a systematic survey of channel models, definitions, privacy-utility metrics, properties, and mechanisms for GDP and LDP from the perspective of information-theoretic channel and surveys the differential privacy synthetic data generation application using generative adversarial network and federated learning, respectively. Our work is helpful for systematically understanding the privacy threat model, definitions, privacy-utility metrics, properties, and mechanisms of GDP and LDP from the perspective of information-theoretic channel and promotes in-depth research and analysis of GDP and LDP based on different types of information-theoretic channel models.Entities:
Keywords: GDP vs. LDP; Rényi divergence; expected distortion; information-theoretic channel; mutual information
Year: 2022 PMID: 35327940 PMCID: PMC8953244 DOI: 10.3390/e24030430
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Advantages and disadvantages of GDP and LDP.
| Privacy Type | Advantage | Disadvantage |
|---|---|---|
| GDP | Better data utility | Needing trusted data collector |
| Suitable for dataset of any scale | ||
| LDP | Without needing trusted data collector | Poor data utility |
| Not applicable to small scale dataset |
Common mathematical symbols.
| Symbol | Description |
|---|---|
|
| Dataset |
|
| Randomized mechanism |
|
| Privacy budget |
|
| Probability without satisfying differential privacy |
|
| Input random variable of information-theoretic channel |
|
| Output random variable of information-theoretic channel |
|
| Channel transition probability matrix |
|
| Probability distribution on source |
|
| Another probability distribution on source |
|
| Rényi divergence |
|
| Rényi entropy |
|
| Shannon entropy |
|
| Min-entropy |
|
| Conditional Rényi entropy |
|
| Conditional Shannon entropy |
|
| Conditional min-entropy |
|
| Mutual information |
|
| Max-information |
|
| |
|
| Kullback–Leibler divergence |
|
| |
|
| Total variation distance |
|
| Max-divergence |
|
| |
|
| Expected distortion |
|
| Single symbol distortion |
|
| Error probability |
|
| A class of functions |
|
| A divergence |
|
| |
|
| |
|
|
Information-theoretic channel models of GDP and LDP.
| Privacy Type | Data Type | Input | GDP and LDP Mapping | Real Output | Random Output | Adjacent Relationship | |
|---|---|---|---|---|---|---|---|
| GDP [ | Numerical data | Dataset |
|
|
|
| |
| LDP | Categorical data | Data item |
|
|
|
|
GDP definitions using different information-theoretic metrics.
| Existing Work | Privacy Type | Information-Theoretic Metric | Formula | Description |
|---|---|---|---|---|
| DP [ | Channel transition probability |
| The transition probability matrix is used as the GDP mapping. | |
| DP [ |
| |||
| DP [ | Max-divergence |
| Since the max-divergence is not symmetric and does not satisfy triangular inequality, the reciprocal of equation must be true. | |
|
| ||||
| Rényi divergence |
| When | ||
| If | ||||
| Capacity bounded DP [ |
| An adversary cannot distinguish between | ||
| An adversary cannot distinguish between |
Comparative analysis of GDP and other information-theoretic privacy definitions.
| Existing Work | Information-Theoretic Privacy Definition | Formula | Description | Relationship to GDP | Stronger or Weaker than GDP |
|---|---|---|---|---|---|
| [ |
| When the output is given, the posterior and prior probabilities of the input | |||
| [ |
| ||||
| The same as above. | The same as above. | ||||
| Worst-case divergence privacy |
| Some private data | |||
| [ |
| Two adjacent datasets cannot be distinguished from the posterior probabilities after observing the output dataset, which makes any individual’s data hard to identify. | |||
|
| Mutual-information privacy measures the average amount of information about | ||||
| [ |
| The same as |
Comparative analysis of LDP and other information-theoretic privacy definitions.
| Existing Work | Information-Theoretic Privacy Definition | Formula | Description | Relationship to LDP | Stronger or Weaker than LDP |
|---|---|---|---|---|---|
| [ | The same as | The same as | |||
| [ | |||||
| [ | The same as | The same as | |||
| [ |
| SRLIP satisfies |
Privacy metrics of GDP under information-theoretic channel model.
| Existing Work | Privacy Metric | Formula | Description | Bound |
|---|---|---|---|---|
| [ | Maximal leakage |
| The maximal leakage of channel | |
| [ | Min-entropy leakage |
| The min-entropy leakage corresponds to the ratio between the probabilities of attack success with a priori and a posterior. | |
| Worst-case leakage |
| The same as maximal leakage above. | ||
| [ | Mutual information |
| The mutual information denotes the amount of information leaked on | |
| [ | Min-entropy leakage | The same as above. | The same as above. | |
| [ | Mutual information |
| The same as above. | – |
| [ |
| The notion of |
| |
| [ | Max-information |
| Maximum information is a correlation measure, similar to mutual information, which allows to bound the change of the conditional probability of an event relative to prior probability. | |
|
| ||||
| [ | Rényi divergence |
| A natural relaxation of GDP based on the Rényi divergence. | – |
| [ |
| The privacy loss is measured in terms of a divergence |
| |
| [ | Privacy budget |
| The privacy budget represents the level of privacy preserving. | – |
Utility metrics of GDP under information-theoretic channel model.
| Existing Work | Utility Metric | Formula | Description | Bound |
|---|---|---|---|---|
| [ | Expected distortion |
| How much information about the real answer can be obtained from the reported answer to average. | |
| [ | Expected distortion |
| The same as above. | – |
| [ | Fidelity |
| The fidelity of a pair of transition probability distributions is | – |
| [ | Mutual information |
| Mutual information captures the amount of information shared by two variables, that is to say, quantifying how much information can be preserved when releasing a private view of the data. | – |
Privacy metrics of LDP under information-theoretic channel model.
| Existing Work | Privacy Metric | Formula | Description | Bound |
|---|---|---|---|---|
| [ | KL-divergence |
| The general result bounds the KL-divergence between distributions |
|
| [ | Mutual information |
| The same as | – |
| [ | Privacy budget |
| The same as | – |
| Average privacy [ | Conditional entropy |
| Privacy metric is the fraction of sensitive information that is retained from the aggregator with prior knowledge | – |
Utility metrics of LDP under information-theoretic channel model.
| Existing Work | Utility Metric | Formula | Description | Bound |
|---|---|---|---|---|
| [ | Expected Hamming distortion |
| Hamming distortion measures the utility of a channel | – |
| [ |
|
| ||
| Mutual information |
| The same as | ||
| [ | Expected distortion |
| A channel | – |
| Average error probability [ | Expected Hamming distortion |
| The average error probability is defined to be the expected Hamming distortion between the input and output data based on maximum a posterior estimation. |
|
| [ | Mutual information |
| The same as | |
| Distribution utility [ | Mutual information |
| Utility metric is the fraction of relevant information after accessing to prior knowledge | – |
| Tally utility [ | Entropy Mutual information |
|
Properties of GDP under information-theoretic channel model.
| Existing Work | Privacy Type | Privacy Property | Information-Theoretic Metric | Formal Description |
|---|---|---|---|---|
| [ | GDP | Sequential composition | Maximal leakage | |
| Parallel composition | ||||
| [ | GDP | Sequential composition | ||
| [ | RDP | Post-processing | Rényi divergence | If there is a randomized mapping |
| Group privacy | If | |||
| Sequential composition | If | |||
| [ | Capacity bounded DP | Post-processing | ||
| Convexity | ||||
| Sequential composition | ||||
| Parallel composition | ||||
| [ | GDP | Privacy-utility monotonicity | Mutual information | The mutual information decreases as the decreasing of the privacy budget, and vice versa |
GDP mechanisms under information-theoretic channel model.
| Existing Work | Privacy Type | Model | Objective Function | Constraint Condition | Mechanism | Solution | Description | ||
|---|---|---|---|---|---|---|---|---|---|
| [ | GDP | Maximal utility | Expected distortion |
| Min-entropy leakage |
| Graph symmetry induced by the adjacent relationship of adjacent datasets. | Optimal randomization mechanism provides the better utility while guaranteeing | |
| [ | GDP | Risk-distortion | Mutual information |
| Expected distortion |
| Lagrangian multipliers. | Conditional probability distribution is DP mapping, which minimizes the privacy risk given a distortion constraint. | |
| [ | GDP | Constrained maximization program | Mutual information |
| GDP |
|
| Definition of GDP. | When |
LDP mechanisms under information-theoretic channel model.
| Existing Work | Privacy Type | Model | Objective Function | Constraint Condition | Mechanism | Solution | Description | |||
|---|---|---|---|---|---|---|---|---|---|---|
| [ | LDP | Rate-distortion function | Mutual information |
| Expected Hamming distortion |
| Binary channel |
| Memoryless symmetric channel. | LDP is just a function of the channel, and the worst-case Hamming distortion on source distribution |
| Discrete alphabet |
| |||||||||
| [ | LDP | Constraint maximization problem | KL-divergence Mutual information |
| LDP |
| Binary randomized response |
| Solving the privacy-utility maximization problem is equivalent to solving finite-dimensional linear program. | The binary and multivariate randomized response mechanisms are universally optimal in the low and high privacy regimes and well approximate the intermediate regime. The quaternary randomized mechanism satisfies |
| Multivariate randomized response |
| |||||||||
| Quaternary randomized response |
| |||||||||
| [ | LDP | Maximize utility | Mutual information |
| LDP |
|
| This problem maximizes mutual information when | The mutual information bound is used as a universal statistical utility measurement, and the | |
Membership inference attack and model extraction attack against GAN.
| Existing Work | Attack Target | Attack Type | Attack Method | Characteristic | Attack Effect |
|---|---|---|---|---|---|
| [ | Generative models | Membership inference | The discriminator can learn the statistical difference of distribution, detect overfitting and recognize the input as part of the training dataset. | The proposed attack has low running cost, does not need information about the attacked model, and has good generalization. | Defenses are either ineffective or lead to a significant decline in the performance of the generative models in terms of training stability or sample quality. |
| [ | Generative models | Co-membership inference | The membership inference of the target data | When the generative models are trained with large datasets, the co-membership inference attack is necessary to achieve success. | The performance of attacker’s network is better than that of previous membership attacks, and the power of co-membership attack is much greater than that of a single attack. |
| [ | Generative models | Membership inference | The membership inference attack based on Monte Carlo integration only considers the small distance samples in the model. | This attack allows membership inference without assuming the type of generative models. | The success rate of this attack is better than that of previous studies on most datasets, and there are only very mild assumptions. |
| [ | Generative models | Membership inference | This work proposed a general attack model based on reconstruction for which the model is suitable for all settings according to the attacker’s knowledge about the victim model. | This work provides a theoretically reliable attack calibration technology, which can continuously improve the attack performance in different attack settings, data modes, and training configurations in all cases. | This attack reveals the information of the training data used for the victim model. |
| [ | GAN | Model extraction | This work studied the model extraction attack based on target and background knowledge from the perspectives of fidelity extraction and accuracy extraction. | Model extraction based on transfer learning can enable adversaries to improve the performance of their GAN model through transfer learning. | Attack model stealing the most advanced target model can be transferred to new fields to expand the application scope of extraction model. |
Differential privacy synthetic data generation with GAN.
| Existing Work | GAN Type | Clipping Strategy | Perturbation Strategy | Privacy Loss Accountant |
|---|---|---|---|---|
| [ | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | WGAN | Clipping weight | Gradient perturbation | Moment accountant |
| [ | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | CGAN | Clipping gradient | Gradient perturbation | RDP accountant |
| [ | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | WGAN | Clipping gradient | Gradient perturbation | RDP accountant |
| [ | WGAN-GP | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | AC-GAN | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | GAN | Clipping gradient | Gradient perturbation | Moment accountant |
| [ | NetGAN | Clipping gradient | Gradient perturbation | Privacy budget composition [ |
| [ | GAN | – | Data perturbation | – |
| [ | GAN | – | Data perturbation | Advanced composition [ |
| [ | GAN | – | Data perturbation | – |
| [ | GAN | – | Data perturbation | – |
| [ | GAN | – | Label perturbation | Moment accountant |
| [ | GAN | – | Objective function perturbation | Advanced composition |
| [ | GAN | – | Differential privacy identifier | Privacy budget composition |
Differential privacy synthetic data generation with federated learning.
| Existing Work | GAN Type | Clipping Strategy | Perturbation Strategy | Privacy Loss Accountant | Training Method |
|---|---|---|---|---|---|
| [ | GAN | Clipping weight | Weight perturbation | RDP accountant | FedAvg algorithm |
| [ | WGAN | Clipping gradient | Gradient perturbation | RDP accountant | FedAvg algorithm |
| [ | GAN | Clipping weight | Gradient perturbation | Moment accountant | FedAvg algorithm |
| [ | GAN | – | Gradient perturbation | – | FedAvg algorithm |
| [ | GAN | Clipping gradient | Gradient perturbation | RDP accountant | Serial training |
| [ | GAN | – | Differential average-case privacy | – | FedAvg algorithm |
Open problems of GDP and LDP from the perspective of different types of information-theoretic channel.
| Scenario | Data Type | Privacy Type | Open Problem | Method | Information-Theoretic Foundation |
|---|---|---|---|---|---|
| Data collection | Categorical data | LDP | Personalized privacy demands | Rate-distortion framework | Discrete single symbol information-theoretic channel |
| Poor data utility | |||||
| Information-theoretic analysis of existing LDP mechanisms | |||||
| High-dimensional (correlated) data collection | Categorical data | LDP | Poor data utility | Rate-distortion framework Joint probability Markov chain | Discrete sequence information-theoretic channel |
| Continuous (correlated) data releasing | Numerical data | GDP | Information-theoretic analysis of existing GDP mechanisms | Rate-distortion framework Joint probability Markov chain | Continuous information-theoretic channel |
| RDP mechanisms | |||||
| Personalized privacy demands | |||||
| Poor data utility | |||||
| Multiuser (correlated) data collection | Numerical data Categorical data | GDP LDP | Privacy leakage risk | Rate-distortion framework | Multiple access channel Multiuser channel with correlated sources |
| Multi-party data releasing | Broadcast channel | ||||
| Synthetic data generation | Numerical data Categorical data | GDP LDP | Poor data utility | GAN GAN with federated learning | Information-theoretic metrics |