| Literature DB >> 35626554 |
Kurt A Pflughoeft1, Ehsan S Soofi2, Refik Soyer3.
Abstract
Preserving confidentiality of individuals in data disclosure is a prime concern for public and private organizations. The main challenge in the data disclosure problem is to release data such that misuse by intruders is avoided while providing useful information to legitimate users for analysis. We propose an information theoretic architecture for the data disclosure problem. The proposed framework consists of developing a maximum entropy (ME) model based on statistical information of the actual data, testing the adequacy of the ME model, producing disclosure data from the ME model and quantifying the discrepancy between the actual and the disclosure data. The architecture can be used both for univariate and multivariate data disclosure. We illustrate the implementation of our approach using financial data.Entities:
Keywords: Kullback–Leibler information; data confidentiality; data utility; differential privacy; disclosure risk; maximum entropy
Year: 2022 PMID: 35626554 PMCID: PMC9140670 DOI: 10.3390/e24050670
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Figure 1Plan of the data disclosure; numbers indicate sequence of tasks; d is Euclidean distance; D is information divergence; is energy statistic; is proportion of distances between all possible pairs of points in the actual and disclosure data.
Examples of univariate maximum entropy models and information moments.
| ME Model | Density | Information Moments |
|---|---|---|
| Generalized error, | ||
|
|
| |
| Student- | ||
|
|
| |
| Logistic, | ||
|
|
| |
| Asymmetric Laplace, | ||
|
|
| |
| Exponential | ||
|
|
| |
| Pareto Type II [ | ||
|
|
| |
| Gamma [ | ||
|
|
| |
| Beta [ | ||
|
|
| |
Examples of univariate maximum entropy models obtained by transformation and information moments.
| Family and Transformation | Density | Information Moments |
|---|---|---|
| Location-scale transformation | ||
|
|
|
|
| Log and exponential transformations | ||
| Logistic, | ||
|
|
| |
| Log-Gamma, | ||
|
|
| |
| Lognormal, | ||
|
|
| |
| Power transformations | ||
| Generalized Gamma, | ||
|
|
| |
| Pareto Type IV, | ||
|
|
| |
| Inverted beta, | ||
|
|
| |
Examples of bivariate maximum entropy models and information moments.
| ME Model | Density | Information Moments |
|---|---|---|
|
| ||
|
|
| |
|
| ||
|
|
| |
|
| ||
|
|
| |
|
| ||
|
|
| |
|
| ||
|
|
| |
|
| ||
|
|
| |
Figure 2Plots of original and log transformed mortgage data.
Information moments of log-transformed mortgage data and kernel PDF and information divergence between the kernel and ME PDFs.
| Information Moment | Entropy | KL Divergence | K Index | Coin | |||
|---|---|---|---|---|---|---|---|
| Actual | Kernel |
|
|
|
|
| |
| Loan | 0.563 | 0.564 | 0.009 | 0.017 | 0.565 | ||
| Mean | 11.117 | 11.111 | |||||
| Variance | 0.180 | 0.189 | |||||
| Income | 0.594 | 0.609 | 0.016 | 0.031 | 0.588 | ||
| Mean | 10.394 | 10.389 | |||||
| Variance | 0.192 | 0.203 | |||||
| Bivariate | 0.925 | 0.866 | 0.072 | 0.134 | 0.683 | ||
| Covariance | 0.123 | 0.118 | |||||
Information moments and Euclidean measures for log-transformed mortgage data and disclosure data.
| Information Moment | Energy Stat | Euclidean Dist | ||
|---|---|---|---|---|
| Actual | Disclosure |
|
| |
| Loan | 0.134 | 0.027 | ||
| Mean | 11.117 | 11.115 | ||
| Variance | 0.180 | 0.188 | ||
| Income | 0.065 | 0.026 | ||
| Mean | 10.394 | 10.397 | ||
| Variance | 0.192 | 0.191 | ||
| Bivariate | 0.201 | <0.001 | ||
| Covariance | 0.123 | 0.119 | ||
Figure 3Scatter plots and regression lines of the actual and information architecture disclosure data with unadjusted and adjusted moments and disclosure data created by adding 100% noise and adjusted moments.
Figure 4Empirical CDFs of the mortgage and disclosure data and the ME CDF of the actual data.
Figure 5Bivariate kernel and ME densities of log-loan and log-income.
Information measures of the ME models for the mortgage data and disclosure data.
| Entropy | KL Divergence | K Index | Coin | ||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| Loan | 0.563 | 0.583 | <0.001 | 0.001 | 0.514 |
| Income | 0.595 | 0.593 | <0.001 | <0.001 | 0.504 |
| Bivariate | 868 | 0.923 | 0.004 | 0.007 | 0.542 |
| Mutual info | 0.290 | 0.253 | |||
| M index | 0.440 | 0.397 | |||
| Coin index | 0.832 | 0.815 | |||
Figure 6Plots of original and log transformed Bank data.
Information moments of log-transformed bank data and kernel PDF and information divergence between the kernel and ME PDFs.
| Information Moment | Entropy | KL Divergence | K Index | Coin | |||
|---|---|---|---|---|---|---|---|
| Actual | Kernel |
|
|
|
|
| |
| Asset | 2.044 | 2.085 | 0.016 | 0.031 | 0.589 | ||
| Mean | 6.473 | 6.461 | |||||
| Score | 1.774 | 1.787 | 0.014 | 0.027 | 0.582 | ||
| Mean | 5.470 | 5.457 | |||||
| Bivariate | 3.625 | 3.766 | 0.283 | 0.432 | 0.828 | ||
| Log-sum-expo | 1.161 | 1.518 | |||||
Information moments and Euclidean measures for log-transformed bank data and disclosure data.
| Information Moment | Energy Stat | Euclidean Dist | ||
|---|---|---|---|---|
| Actual | Disclosure |
|
| |
| Asset | 0.460 | 0.006 | ||
| Mean | 6.473 | 6.376 | ||
| Score | 0.655 | 0.008 | ||
| Mean | 5.470 | 5.481 | ||
| Bivariate | 2.529 | <0.001 | ||
| Log-sum-expo | 1.161 | 1.495 | ||
Figure 7Scatter plots of the actual and disclosure data.
Figure 8Empirical CDFs of the bank and disclosure data and the ME CDF of the actual data.
Figure 9Bivariate kernel and ME densities of log-asset and log-score.
Information measures of the ME models for the bank data and disclosure data.
| Entropy | KL Divergence | K Index | Coin | ||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| Asset | 2.044 | 2.119 | 0.005 | 0.010 | 0.550 |
| Score | 1.774 | 1.903 | 0.009 | 0.018 | 0.568 |
| Bivariate | 3.625 | 3.829 | 0.002 | 0.005 | 0.535 |
| Mutual info | 0.193 | 0.193 | |||
| M index | 0.320 | 0.320 | |||
| Coin index | 0.783 | 0.783 | |||