Weisan Wu1. 1. School of Mathematics and Statistics, Baicheng Normal University, Baicheng, China.
Abstract
In this paper, we give a modified gradient EM algorithm; it can protect the privacy of sensitive data by adding discrete Gaussian mechanism noise. Specifically, it makes the high-dimensional data easier to process mainly by scaling, truncating, noise multiplication, and smoothing steps on the data. Since the variance of discrete Gaussian is smaller than that of the continuous Gaussian, the difference privacy of data can be guaranteed more effectively by adding the noise of the discrete Gaussian mechanism. Finally, the standard gradient EM algorithm, clipped algorithm, and our algorithm (DG-EM) are compared with the GMM model. The experiments show that our algorithm can effectively protect high-dimensional sensitive data.
In this paper, we give a modified gradient EM algorithm; it can protect the privacy of sensitive data by adding discrete Gaussian mechanism noise. Specifically, it makes the high-dimensional data easier to process mainly by scaling, truncating, noise multiplication, and smoothing steps on the data. Since the variance of discrete Gaussian is smaller than that of the continuous Gaussian, the difference privacy of data can be guaranteed more effectively by adding the noise of the discrete Gaussian mechanism. Finally, the standard gradient EM algorithm, clipped algorithm, and our algorithm (DG-EM) are compared with the GMM model. The experiments show that our algorithm can effectively protect high-dimensional sensitive data.
Now, big data have spread to every field and organization in our society, generating large amounts of personal data every day, which people use and analyse to enable the rapid development of society and technology. However, it is expected that some personal private data will be protected from being hacked or made public when it is collected. Therefore, how to effectively protect the privacy of data, not to be attacked, and can be effectively used, has gradually been paid attention to. Dwork et al. [1] introduced the concept and basic theoretical framework of differential privacy, which can effectively protect users' data privacy and has a strict and elegant mathematical theoretical framework and guarantees.Gradient EM algorithm is one of the most important statistical models, and Wang et al. [2] recently applied sensitive data for privacy protection. Before this, people used the original EM algorithm and gradient EM algorithm, and there is no statistical guarantee. Until Balakrishnan et al. [4] gave the statistical guarantee of EM algorithm, Wang et al. [3] gave the guarantee of gradient EM algorithm based on it and extended it to the data privacy protection theory. However, just like most scholars, Gaussian noise with continuous distribution is added to the data, while in practice, the data output queries are often discrete, such as the number of records in the database that meets certain conditions. For this reason, Canonne et al. [5] proposed to use a discrete Gaussian mechanism to add discrete Gaussian noise to the data and to ensure that it has the same excellent accuracy as adding continuous Gaussian noise.In this paper, we design a discretized Gaussian algorithm based on the gradient EM algorithm for differential privacy calculation based on [2]. Our algorithm has a good practical effect and can be extended to the general standard model. Meanwhile, the corresponding statistical guarantee of the algorithm is given in this paper. The structure of this paper is as follows: in the second part, we first introduce some theories of gradient EM algorithm, discrete Gaussian, and differential privacy, as well as some works related to this paper. In the third part, we introduce our model, namely, differential privacy discrete Gaussian EM (Gradient) algorithm (DG-EM), and the relevant statistical guarantee theorem. In the fourth part, we give the data simulation of the sensitivity, sample size, and dimension of the aggregated data, and the discussion of the model and future work are shown in the fifth part. Finally, we add the proof of some lemmas in the appendix.
2. Preliminaries
2.1. Gradient EM Algorithm
Assume that (X, Z) is complete data, where X is an observing sample and called Z as a latent variable. They are generally unobservable because they are missing or have underlying data structures. We denote 𝒳 and 𝒵 as the sample space for variables X, Z, respectively. Suppose that (X, Z) has a joint density function p(x, z); it belongs to some parameterized distribution family {p|θ0 ∈ Ω}. For convenience, the variable X has a margin density function π(x)=∫p(x, z)dz, and π(z|x)=p(x, z)/π(x) is a Z′ s conditional density function which is under X=x. Suppose that the given observer samples are x1,…, x from population X. The EM algorithm needs to maximize the log-likelihood function ℓ(θ)=log p(x, z). Through Jensen's inequality, the lower bound of the log-likelihood function can be writen as follows:whereThe expectation of Q(θ, θ′) is denoted asTo maximize equation (3), the left term of the inequality can be sufficiently large by iteratively increasing the lower bound on the right term. The standard EM algorithm [6-9] estimates the function Q(θ, θ() by E-step at each iteration, then the parameters are estimated in M-step to make the parameter values of this iteration maximize the function Q(θ, θ() and denote the parameter as θ(=maxQ(θ, θ(). The gradient EM algorithm is usually used to achieve higher accuracy and faster global maximum if the function is differentiable at each iteration step. The gradient EM algorithm is usually stated as follows: when the function Q(θ, θ() is differentiable at the t-th iteration, we can update the current parameter θ( to θ( by the following steps:where η is a parameter which calls step size.E-step: compute Q(θ, θ(),M-step: update θ(=θ(+η∇Q(θ(, θ(),
2.2. Discrete Gaussian
The study of discrete distributed forms of noise has received more attention this year. In the literature, people studied discrete Laplace distribution, discrete binomial distribution, and discrete Gaussian distribution and applied them to the field of cryptography.In this paper, the differential privacy model is studied based on Gaussian mechanism. The noise with normal distribution makes the model have many elegant mathematical properties. Although the discrete Laplace noise mechanism and the discrete Gaussian noise mechanism cannot be compared in the same model, since they are used in different privacy mechanisms, we are still willing to use the discrete Gaussian noise in order to obtain aesthetic mathematical conclusions [10-13].In this paper, we need to add noise to have discrete Gaussian distribution to specially treated sample. Firstly, we will give the definition of the discrete Gaussian distribution and some useful related theories.
Definition 1 .
Let μ, σ2 ∈ ℝ, σ > 0, if random variable X has probability mass function as follows:On the integers support set, then we call it is a discrete Gaussian distribution with location parameter μ and scale parameter σ2 and denoted N(μ, σ2).
2.3. Some Basic Theories on Differential Privacy
In this part, we will give some basic theories on differential privacy [14, 15].
Definition 2 .
A randomized algorithm ℳ : 𝒳⟶𝒴 satisfies (ϵ, δ)-differential privacy (DP) if for all neighboring datasets ,D, D′ ⊂ 𝒳, differing on a single entry. For all events S in the space 𝒴, we have Pr(ℳ(D) ∈ S) ≤ ePr(ℳ(D′) ∈ S)+δ. Moreover, we called its approximate differential privacy, if δ > 0, and we called its pure or point-wise ϵ-differential privacy in the case of (ϵ, 0)-differential privacy.The concept of concentrated differential privacy given by Bun et al. [14] as follows:
Definition 3 .
A randomized algorithm ℳ : 𝒳⟶𝒴 satisfies ρ-concentrated differential privacy if for neighboring datasets D, D′ ⊂ 𝒳, and for any α ∈ (1, ∞), we have D(ℳ(D)‖ℳ(D′)) ≤ ρ, where D(P‖Q)=(1/α − 1)log∑(P(y)/Q(y))Q(y) is the Renyi divergence of order α of the distribution form the distribution.From these definitions, we have the conclusion that pure-DP can imply ρ-CDP, and ρ-CDP can imply -DP, where δ is a positive constant.In order to ensure the consistency of the parameters of our model, we need some basic definitions and assumptions based on [4].
Definition 4 .
(self-consistent). We called the function Q(·; θ) is self-consistent if θ=argmaxQ(θ; θ).
Definition 5 .
(Lipschitz-gradient-2 (L, ℬ)). We called the function Q(·; ·) is Lipschitz-gradient-2 (L, ℬ), if we have the following inequality for parameter θ and θ ∈ ℬ:
Definition 6 .
(μ-smooth). We call the function Q(·; ·) is μ-smooth, if for any parameters θ, θ′ ∈ ℬ, we have the inequality
Definition 7 .
(λ-strongly concave). We call the function Q(·; θ) is λ-strongly concave, if for any parameters θ, θ′ ∈ ℬ, we have the inequality
Assumption 1 .
We assume that the function Q(·; ·) is self-consistent, Lipschitz-gradient-2 (L, ℬ), μ-smooth, and λ-strongly concave on some parameter sets ℬ.
3. Differential Privacy Discrete Gaussian EM (Gradient) Model
We will mention that the EM algorithm based on [2] and use the discrete Gaussian noise mechanism of high-dimensional truncation algorithm, which satisfies the centralized differential privacy (CDP). Like Wang et al. [2], we have first considered one coordinate case that is 1-dimensional random variable x. Let x1,…, x be i.i.d. sampled from x. We get the clipped estimator as follows:Step 1. For the sample x, we take a soft truncation function h(x) which is defined by Catoni and Giulini [16],Then, we take some mild constant ω and rescaled sample x by dividing ω to get h(x/ω); through this approach, we can get the truncated mean as follows:From the expression of the function h(x), we know h(x) is bounded by , so the sensitivity is .Step 2. Generate random noises o1,…, o from a common distribution o ~ χ with 𝔼(o)=0. For data x, we get a new data x(1+o) though multiply the noise factor 1+o, and we get term h(x(1+o)/ω) by scaling and truncation step. Finally, we getMultiplicative noise is an effective method to ensure the estimation effect of typical points and increase the estimation effect of outliers as much as possible. It was first proposed by Srivastava et al. [17], and the motivation of using Gaussian multiplicative noise comes from [18].Step 3. Finally, we take the expectation for the distributions with arrive multiplicative noise as follows:Like Catoni and Giulini [16], taking χ ~ N(0, (1/β)), we take the distribution χ following the discrete Gaussian distribution as χ ~ N(0, (1/β)). Easily, for any given constant a, b > 0, we also havewhere R(a, b) is a correction term R(a, b)=T1+T2+T3+T4+T5. Signs T1 − T5 are respectively denoted asAlso, the notation is defined byUnproved, we have the following estimation error Lemma 1 which is like Lemma 5 in Holland [19], and we gave the proof of it in Appendix A.
Lemma 1 .
Let x1,…, x be i.i.d. sampled form x ~ μ. Assume 𝔼x2 ≤ τ, and the upper bound has known. Given a number 0 < γ < 1, for β=2 log(γ−1) and , we havewith probability at least 1 − γ.From the soft truncation function and the multiplicative noise algorithm, we know that the sensitivity of the processed observation samples is . Next, we need to add discrete Gaussian noise to the observations and obtain that the querywill be (ϵ, δ)-DP, which leads the following Lemma 3; we give the proof in Appendix B.
Lemma 2 .
Let ϵ > 0; let the function q : 𝒳⟶ℤ be an operator algorithm which is defined by Steps 1–3, satisfying |q(x) − q(x′)| ≤ Δ for any x, x′ ∈ 𝒳; the query can be writen as randomized algorithm ℳ : 𝒳⟶ℤ by ℳ(D)=q(x)+Y, where Y ~ N(0, σ2), then ℳ satisfies (ϵ, δ)-DP.Furthermore, these results imply the following lemma.
Lemma 3 .
Under the assumptions in Assumption 1, with probability at least 1 − γ, the following holds:After the estimation of the univariate private data, in the t-th iteration of Algorithm 1, we use the univariate estimation method for each coordinate of the gradient ∇Q(θ(; θ() and then get the estimation of the gradient ∇Q(θ(; θ(). Finally, step M is performed.
For any 0 < ϵ < 1, let D(ℳ(x)‖ℳ(x′)) ≤ ϖ; for any α ∈ (1, ∞), ϵ ≥ 0, and x, x′ ∈ 𝒳, Algorithm 1 satisfies (ϵ, δ)-DP forwhere Y ~ N(0, σ2).For Algorithm 1, the next theorem shows that the parameter estimation is consistent if the initial parameter θ is close to the true parameter θ enough. After some simple calculations, we conclude that in Lemma 2, the upper bound is Δ=(nτ+ω2/nω){1+[(1/4)log(3nτ/2ω2)+log(γ−1)]−1}, where ω is the optimal numerical solution to the equation
Lemma 5 .
Let ℬ={θ : ‖θ − θ‖2 ≤ R} denote a parameter set with R=κ‖θ‖22, κ ∈ (0,1) which is a positive constant. Assume parameters L, ℬ, μ, λ, τ satisfying condition of 1 − 2(λ − L/λ+μ) ∈ (0,1). If ‖θ − θ‖2 ≤ R/2 and n is a large number such thatWe have Pr(θ( ∈ ℬ) ≥ 1 − 2Tγ for all t ∈ [T]. Furthermore, if we take T=O((λ+μ/λ − L)log(n)) and η=(2/λ+μ), we have
Lemma 6 .
Let (‖θ‖/σ) ≥ r, then there exists a constant C such that the properties of self-consistent Lipschitz-gradient-2(L, ℬ), μ-smoothness, and λ-strongly concave hold for the function Q(·; ·) with L=exp(−Cr), μ=λ=1, R=κ‖θ‖2, κ=1/4, ℬ={θ : ‖θ − θ‖ ≤ R}, where r is a enough large constant means that the minimum signal-to-noise ratio (SNR).Furthermore, we can get Theorems 1 and 2. The proof of these theorems is very simple; we do not list the detailed proof procedure here. In fact, we only need to replace the upper bound on the variance of the discrete noise in [2] with a single coordinate with 3 exp(−1/2σ2).
Theorem 1 .
With the same condition as in Lemma 4, for any θ ∈ ℬ, the j-th coordinate of ∇q(θ; θ) satisfies the following results:
Theorem 2 .
With the same conditions in Lemma 3, we assume that ‖θ − θ‖2 ≤ (‖θ‖22/8) in Algorithm 1, and n is a large enough number such thatIf we take T=O(log(n)) and the ratio as η=O(1), then for a failure probability γ, we have with probability at least 1 − 2TγWe note that Lemmas 3–6 and Theorems 1 and 2 are easy to get through Lemmas 1 and 2. Due to limited space, we delete these proofs here, and readers can prove them by themselves. It is only necessary to pay attention to the upper bound of the ℓ2-norm between the iterative values of parameters and the truth values in the process of proof.
4. Experiments and Results
In this section, we will evaluate the performance of Algorithm 1 on the GMM model based on these methods. We will study the statistical setting and theoretical behavior of this algorithm on synthetic data.
4.1. Baseline Methods
In this part, we will compare the two methods primarily. For convenience, we will refer to the gradient EM algorithm as EM, which will serve as a nonprivate baseline method. The other is the clipped differential private EM algorithm, which we still refer to as clipped [20], which will serve as our privacy baseline approach.
4.2. Experimental Settings
In this experiment, we generate the synthetic data of the mixed distribution of two components. To generate each of the algorithm, we consider the random initialization method for the selection of initial parameter values. In the results, we used to measure the resulting estimation error. We set signal-to-noise ratio (‖θ‖/σ)=3. For the privacy parameter ϵ, we set ϵ={0.5, 0.8, 1}, and then the parameter δ=Pr(Y > (ϵσ2/Δ)+(Δ/2)) needs to calculate because it is the function of ϵ.
4.3. Experimental Results
As can be seen from Figure 1, we fixed n=1000, d=20. When the budget of our method is set at different values, the estimation error decreases significantly with the increase of iteration time. When the budget is 0.2, 0.5, and 1, the optimal value is 1,2, and 2, respectively. It is difficult for us to determine the optimal value C.
Figure 1
Estimation error of GMM clipped vs. iteration t under different clipping threshold C and budgets ϵ. (a) n = 1000; d = 20; ϵ = 0.2, (b) n = 1000; d = 20; ϵ = 0.5, and (c) n = 1000; d = 20; ϵ = 1.
In Figure 2, under the lower dimension case, we test how the data dimension d, privacy budget ϵ, and data size n affect the estimation error ‖θ − θ‖2 of algorithms on the Gaussian mixture model over iteration t. We can see that the estimation error of Algorithm 1 in GMM decreases when ϵ increases, n increases, or d decreases. However, we can see that when the budget ϵ is small, the effect of our algorithm is performed badly, and the estimation error declines unstably with the increase of the number of iterations.
Figure 2
Estimation error of GMM w.r.t. privacy budget ϵ, data dimension (lower) d, data size n, and iteration t. (a) n = 2000; d = 10, (b) n = 2000; ϵ = 0.5, and (c) d = 10, ϵ = 0.5.
In Figure 3, we can see that, in the face of high-dimensional data, the effect of estimation error ‖θ − θ‖2 needs a relatively large sample to be guaranteed. We conducted experiments with higher dimensions d=40,80,160 and different sample sizes of 2000,5000, and 10 000, respectively. It can be seen that when the sample size n is large enough, the estimation error can be guaranteed to decrease significantly with the number of iterations t. As shown in Figure 3, with the increase of sample size, our algorithm is equally effective in high-dimensional space, which is not comparable with Wang et al.'s [2] algorithm.
Figure 3
Estimation error of GMM w.r.t. privacy budget ϵ, data dimension (higher) d, data size n, and iteration t. (a) n = 2000; d = 100, (b) n = 2000; ϵ = 0.5, and (c) d = 100, ϵ = 0.5.
5. Conclusions
In this paper, we study the differential privacy model with discrete Gaussian mechanism noise. Through the process of data scaling and truncation, the model effectively solves the influence of high-dimensional data on the model. Through the experimental part and theoretical proof, we can see that the estimation error of the model adding discrete Gaussian noise is faster than that of the model adding continuous Gaussian noise in the low dimension than that of the clipped model. The effect is much better than that of [2] in the case of high dimension. At the same time, in the previous lemma section, we can see that our model has more compact bounds, because of the smaller variance of discrete Gaussian noise.