| Literature DB >> 28481298 |
Abdul Majeed1, Farman Ullah2, Sungchang Lee3.
Abstract
Personally identifiable information (PII) affects individual privacy because PII combinations may yield unique identifications in published data. User PII such as age, race, gender, and zip code contain private information that may assist an adversary in determining the user to whom such information relates. Each item of user PII reveals identity differently, and some types of PII are highly identity vulnerable. More vulnerable types of PII enable unique identification more easily, and their presence in published data increases privacy risks. Existing privacy models treat all types of PII equally from an identity revelation point of view, and they mainly focus on hiding user PII in a crowd of other users. Ignoring the identity vulnerability of each type of PII during anonymization is not an effective method of protecting user privacy in a fine-grained manner. This paper proposes a new anonymization scheme that considers the identity vulnerability of PII to effectively protect user privacy. Data generalization is performed adaptively based on the identity vulnerability of PII as well as diversity to anonymize data. This adaptive generalization effectively enables anonymous data, which protects user identity and private information disclosures while maximizing the utility of data for performing analyses and building classification models. Additionally, the proposed scheme has low computational overheads. The simulation results show the effectiveness of the scheme and verify the aforementioned claims.Entities:
Keywords: adaptive generalization; diversity; identity vulnerability; personally identifiable information; privacy; utility
Year: 2017 PMID: 28481298 PMCID: PMC5469664 DOI: 10.3390/s17051059
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1k-anonymity privacy model overview.
Figure 2Conceptual overview of the proposed anonymization scheme.
Figure 3Steps of determining the identity vulnerability of QIs.
Figure 4Example of domain and value generalization taxonomies of the race attribute.
Figure 5Adaptive generalization algorithm’s flow chart.
Description of the adult dataset.
| Attributes | Type | Quasi-identifier | Description |
|---|---|---|---|
| Age | Numerical | Yes | Taxonomy tree of height 7 |
| Race | Categorical | Yes | Taxonomy tree of height 3 |
| Gender | Categorical | Yes | Taxonomy tree of height 2 |
| Country | Categorical | Yes | Taxonomy tree of height 4 |
| Salary | Categorical | No | Sensitive Attribute |
Figure 6Comparison of privacy between the information-based anonymization for classification given k (IACk) algorithm and the proposed scheme.
Adult dataset: identity vulnerability values of QIs.
| Sr.No | Quasi-Identifiers | Actual Values | Relative Values |
|---|---|---|---|
| 1 | Age | 0.03 | 81.72 |
| 2 | Gender | 0.01 | 16.92 |
| 3 | Race | 0.00047 | 0.83 |
| 4 | Country | 0.00011 | 0.53 |
Distortion measures () comparison.
| Values of | Existing Schemes | Proposed Scheme |
|---|---|---|
| 5 | 0.024 | 0.01 |
| 10 | 0.025 | 0.011 |
| 50 | 0.033 | 0.019 |
| 100 | 0.033 | 0.020 |
| 150 | 0.035 | 0.022 |
| 200 | 0.037 | 0.025 |
| Mean | 0.03 | 0.01 |
| Standard Deviation | 0.005 | 0.006 |
Coverage of generalized values comparison.
| Values of | Existing Schemes | Proposed Scheme |
|---|---|---|
| 5 | 8.3 | 2.5 |
| 10 | 12.3 | 3.7 |
| 50 | 16.3 | 4.9 |
| 100 | 33 | 9.9 |
| 150 | 68 | 52.5 |
| 200 | 160 | 60.5 |
| Mean | 49.6 | 22.3 |
| Standard Deviation | 58.32 | 26.70 |
Figure 7Accuracies: proposed scheme versus IACk and Mondrian algorithms.
Figure 8Accuracies: proposed scheme versus IACk and Mondrian algorithms.
Figure 9Accuracies: proposed scheme versus Mondrian algorithm.
Running time (in s) of the proposed scheme versus Mondrian.
| Values of | Proposed Algorithm | Mondrian Algorithm |
|---|---|---|
| 10 | 15 s | 20 s |
| 20 | 15 s | 20 s |
| 100 | 16 s | 19 s |
| 150 | 18 s | 18 s |
| 200 | 18 s | 19 s |