| Literature DB >> 32641043 |
Abstract
BACKGROUND: Various methods based on k-anonymity have been proposed for publishing medical data while preserving privacy. However, the k-anonymity property assumes that adversaries possess fixed background knowledge. Although differential privacy overcomes this limitation, it is specialized for aggregated results. Thus, it is difficult to obtain high-quality microdata. To address this issue, we propose a differentially private medical microdata release method featuring high utility.Entities:
Keywords: Data anonymization; Data release; Differential privacy; Medical privacy; Privacy-preserving data publishing
Mesh:
Year: 2020 PMID: 32641043 PMCID: PMC7346516 DOI: 10.1186/s12911-020-01171-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Original table
| Age | Gender | Disease |
|---|---|---|
| 10 | M | Anemia |
| 14 | F | Gastritis |
| 19 | F | Pneumonia |
| 12 | F | Anemia |
| 15 | M | Pneumonia |
Contingency table created using Table 1
| Age | Gender | Disease | Count |
|---|---|---|---|
| 10 | M | Anemia | 1 |
| 12 | F | Anemia | 1 |
| 14 | F | Gastritis | 1 |
| 15 | M | Pneumonia | 1 |
| 19 | F | Pneumonia | 1 |
Noisy version of contingency table
| Age | Gender | Disease | Noisy count |
|---|---|---|---|
| 10 | M | Anemia | 2 |
| 10 | M | Gastritis | 0 |
| 10 | M | Pneumonia | 1 |
| 10 | F | Anemia | 0 |
| ... | ... | ... | ... |
| 19 | F | Gastritis | 1 |
| 19 | F | Pneumonia | 1 |
Generalized noisy version of contingency table
| Age | Gender | Disease | Noisy count |
|---|---|---|---|
| [10−19] | * | Anemia | 3 |
| [10−19] | * | Gastritis | 1 |
| [10−19] | * | Pneumonia | 1 |
Fig. 1Taxonomy tree of the Age attribute
Fig. 2Taxonomy tree of the Sex attribute
Fig. 3Taxonomy tree of the Zip attribute
Fig. 4Generalization hierarchical lattice
Fig. 5Process of IPA
Original table
| Age | Gender | Zipcode | Disease |
|---|---|---|---|
| 17 | M | 28912 | Gastritis |
| 16 | M | 23512 | Pneumonia |
| 13 | M | 24231 | Pneumonia |
| 24 | F | 31891 | Anemia |
| 29 | F | 34225 | Anemia |
| 25 | F | 37756 | Diabetes |
| 67 | M | 80061 | Stroke |
Generalized table
| Age | Gender | Zipcode | Disease |
|---|---|---|---|
| [10−19] | M | [20000−29999] | Gastritis |
| [10−19] | M | [20000−29999] | Pneumonia |
| [10−19] | M | [20000−29999] | Pneumonia |
| [20−29] | F | [30000−39999] | Anemia |
| [20−29] | F | [30000−39999] | Anemia |
| [20−29] | F | [30000−39999] | Diabetes |
| [60−69] | M | [80000−89999] | Stroke |
Suppressed table
| Age | Gender | Zipcode | Disease |
|---|---|---|---|
| [10−19] | M | [20000−29999] | Gastritis |
| [10−19] | M | [20000−29999] | Pneumonia |
| [10−19] | M | [20000−29999] | Pneumonia |
| [20−29] | F | [30000−39999] | Anemia |
| [20−29] | F | [30000−39999] | Anemia |
| [20−29] | F | [30000−39999] | Diabetes |
| * | * | * | Stroke |
Inserted table
| Age | Gender | Zipcode | Disease |
|---|---|---|---|
| [10−19] | M | [20000−29999] | Gastritis |
| [10−19] | M | [20000−29999] | Pneumonia |
| [10−19] | M | [20000−29999] | Pneumonia |
| [10−19] | M | [20000−29999] | Gastritis |
| [20−29] | F | [30000−39999] | Anemia |
| [20−29] | F | [30000−39999] | Anemia |
| [20−29] | F | [30000−39999] | Diabetes |
| [20−29] | F | [30000−39999] | Anemia |
| * | * | * | Stroke |
Fig. 6Comparison of the proposed and previous methods in terms of information loss
Fig. 7Information loss with varying ε
Fig. 8Information loss of candidate nodes
Fig. 9Results of the analysis queries
Fig. 10Results of the analysis queries
Result of query Q1
| Age group | Original | The proposed method | k-anonymization |
|---|---|---|---|
| 5 | 1.0 | 1 | 6.3 |
| 10 | 2.0 | 1 | 6.3 |
| 15 | 1.0 | 4 | 6.3 |
| 20 | 9.0 | 4 | 6.3 |
| 25 | 12.0 | 12 | 6.3 |
| 30 | 13.0 | 12 | 6.3 |
| 35 | 27.0 | 34.5 | 186.2 |
| 40 | 44.0 | 34.5 | 186.2 |
| 45 | 104.0 | 150.5 | 186.2 |
| 50 | 205.0 | 150.5 | 186.2 |
| 55 | 341.0 | 367.0 | 186.2 |
| 60 | 396.0 | 367.0 | 186.2 |
| 65 | 472.0 | 452.5 | 329.8 |
| 70 | 442.0 | 452.5 | 329.8 |
| 75 | 430.0 | 386.5 | 329.8 |
| 80 | 360.0 | 386.5 | 329.8 |
| 85 | 197.0 | 141.5 | 329.8 |
| 90 | 78.0 | 141.5 | 329.8 |
Result of query Q2
| Age group | Original | The proposed method | k-anonymization |
|---|---|---|---|
| 5 | 0.0 | 1.0 | 4.5 |
| 10 | 2.0 | 1.0 | 4.5 |
| 15 | 7.0 | 8.0 | 4.5 |
| 20 | 8.0 | 8.0 | 4.5 |
| 25 | 4.0 | 4.5 | 4.5 |
| 30 | 6.0 | 4.5 | 4.5 |
| 35 | 10.0 | 17.5 | 101.7 |
| 40 | 26.0 | 17.5 | 101.7 |
| 45 | 63.0 | 79.5 | 101.7 |
| 50 | 102.0 | 79.5 | 101.7 |
| 55 | 178.0 | 201.0 | 101.7 |
| 60 | 231.0 | 201.0 | 101.7 |
| 65 | 279.0 | 298.0 | 379.2 |
| 70 | 326.0 | 298.0 | 379.2 |
| 75 | 457.0 | 489.5 | 379.2 |
| 80 | 525.0 | 489.5 | 379.2 |
| 85 | 445.0 | 363.5 | 379.2 |
| 90 | 243.0 | 363.5 | 379.2 |
Result of query Q3
| Age group | Original | The proposed method | k-anonymization |
|---|---|---|---|
| 10 | 1.0 | 2.2 | 8.9 |
| 20 | 1.4 | 1.4 | 8.9 |
| 30 | 2.1 | 5.7 | 8.9 |
| 40 | 2.7 | 5.5 | 14.5 |
| 50 | 3.4 | 4.2 | 14.5 |
| 60 | 4.0 | 4.1 | 14.5 |
| 70 | 3.6 | 4.8 | 15.7 |
| 80 | 4.3 | 4.5 | 15.7 |
| 90 | 5.8 | 6.5 | 15.7 |
Result of query Q4
| Age group | Original | The proposed method | k-anonymization |
|---|---|---|---|
| 10 | 3.0 | 1.0 | 15.0 |
| 20 | 3.0 | 1.4 | 15.0 |
| 30 | 7.7 | 2.8 | 15.0 |
| 40 | 1.7 | 2.4 | 11.7 |
| 50 | 2.7 | 3.6 | 11.7 |
| 60 | 3.5 | 3.2 | 11.7 |
| 70 | 3.5 | 4.7 | 21.1 |
| 80 | 5.6 | 6.1 | 21.1 |
| 90 | 8.0 | 8.5 | 21.1 |