| Literature DB >> 28693480 |
Hyukki Lee1, Soohyung Kim2, Jong Wook Kim3, Yon Dohn Chung4.
Abstract
BACKGROUND: Publishing raw electronic health records (EHRs) may be considered as a breach of the privacy of individuals because they usually contain sensitive information. A common practice for the privacy-preserving data publishing is to anonymize the data before publishing, and thus satisfy privacy models such as k-anonymity. Among various anonymization techniques, generalization is the most commonly used in medical/health data processing. Generalization inevitably causes information loss, and thus, various methods have been proposed to reduce information loss. However, existing generalization-based data anonymization methods cannot avoid excessive information loss and preserve data utility.Entities:
Keywords: Data anonymization; K-anonymity; Medical privacy; Utility-preserving data publishing
Mesh:
Year: 2017 PMID: 28693480 PMCID: PMC5504813 DOI: 10.1186/s12911-017-0499-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Original EHR data
| Name | Age | Sex | Zipcode | Disease |
|---|---|---|---|---|
| Mary | 37 | F | 22071 | Pneumonia |
| Alice | 35 | F | 22098 | Diabetes |
| Betsy | 36 | F | 23061 | Anemia |
| David | 61 | M | 55107 | Pneumonia |
| Tom | 63 | M | 55099 | Diabetes |
| James | 66 | M | 55324 | Diabetes |
| Eric | 62 | M | 55229 | Pneumonia |
4-anonymous version of Table 1
| Age | Sex | Zipcode | Disease |
|---|---|---|---|
| [35−66] | * | [22071−55324] | Pneumonia |
| [35−66] | * | [22071−55324] | Diabetes |
| [35−66] | * | [22071−55324] | Anemia |
| [35−66] | * | [22071−55324] | Pneumonia |
| [35−66] | * | [22071−55324] | Diabetes |
| [35−66] | * | [22071−55324] | Diabetes |
| [35−66] | * | [22071−55324] | Pneumonia |
Fig. 1Taxonomy tree of Age attribute
4-anonymous version of Table 1 with suppression
| Age | Sex | Zipcode | Disease |
|---|---|---|---|
| * | * | * | * |
| * | * | * | * |
| * | * | * | * |
| [61−66] | M | [55099−55324] | Pneumonia |
| [61−66] | M | [55099−55324] | Diabetes |
| [61−66] | M | [55099−55324] | Diabetes |
| [61−66] | M | [55099−55324] | Pneumonia |
4-anonymous version of Table 1 with relocation
| Age | Sex | Zipcode | Disease |
|---|---|---|---|
| [61−66] | M | [55099−55324] | Pneumonia |
| [61−66] | M | [55099−55324] | Diabetes |
| [61−66] | M | [55099−55324] | Anemia |
| [61−66] | M | [55099−55324] | Pneumonia |
| [61−66] | M | [55099−55324] | Diabetes |
| [61−66] | M | [55099−55324] | Diabetes |
| [61−66] | M | [55099−55324] | Pneumonia |
0.02-ceiled and 4-anonymous version of Table 1 with insertion
| ClassID | Age | Sex | Zipcode | Disease |
|---|---|---|---|---|
| 1 | [35-37] | F | [22071-23061] | Pneumonia |
| 1 | [35-37] | F | [22071-23061] | Diabetes |
| 1 | [35-37] | F | [22071-23061] | Anemia |
| 1 | [35-37] | F | [22071-23061] | Diabetes |
| 2 | [61-66] | M | [55099-55324] | Pneumonia |
| 2 | [61-66] | M | [55099-55324] | Diabetes |
| 2 | [61-66] | M | [55099-55324] | Diabetes |
| 2 | [61-66] | M | [55099-55324] | Pneumonia |
0.02-ceiled version of Table 1
| ClassID | Age | Sex | Zipcode | Disease |
|---|---|---|---|---|
| 1 | [35-37] | F | [22071-23061] | Pneumonia |
| 1 | [35-37] | F | [22071-23061] | Diabetes |
| 1 | [35-37] | F | [22071-23061] | Anemia |
| 2 | [61-66] | M | [55099-55324] | Pneumonia |
| 2 | [61-66] | M | [55099-55324] | Diabetes |
| 2 | [61-66] | M | [55099-55324] | Diabetes |
| 2 | [61-66] | M | [55099-55324] | Pneumonia |
Privacy breached catalog of counterfeit records for Table 5
| IDList | Disease | Count |
|---|---|---|
| 1 | Diabetes | 1 |
Catalog of counterfeit records for Table 5
| IDList | Disease | Count |
|---|---|---|
| 1, 2 | Diabetes | 1 |
Truthful version of Table 5
| ClassID | Age | Sex | Zipcode | Disease |
|---|---|---|---|---|
| 1 | [35-37] | F | [22071-23061] | Pneumonia |
| 1 | [35-37] | F | [22071-23061] | Diabetes |
| 1 | [35-37] | F | [22071-23061] | Anemia |
| 2 | [61-66] | M | [55099-55324] | Pneumonia |
| 2 | [61-66] | M | [55099-55324] | Diabetes |
| 2 | [61-66] | M | [55099-55324] | Pneumonia |
| * | * | * | * | Diabetes |
Fig. 2Probability density function of original record for Age attribute
Fig. 3Probability density function of generalized record for Age attribute
Fig. 4Probability density function of generalized record for Age attribute with catalog
Fig. 5LM variation with k (The lower the bar in the graph, the better are the results)
Fig. 6LM variation with h
Fig. 7RCE variation with k (The lower the bar in the graph, the better are the results)
Fig. 8RCE variation with h
Fig. 9Query error rate
Fig. 10Query error rate variation with number of attributes
Fig. 11Result of the anaysis queries. a Query Q 1 b Query Q 2 c Query Q 3 d Query Q 4