| Literature DB >> 26844025 |
Abstract
In the era of Big Data, it is almost impossible to completely restrict access to primary non-aggregated statistical data. However, risk of violating privacy of individual respondents and groups of respondents by analyzing primary data has not been reduced. There is a need in developing subtler methods of data protection to come to grips with these challenges. In some cases, individual and group privacy can be easily violated, because the primary data contain attributes that uniquely identify individuals and groups thereof. Removing such attributes from the dataset is a crude solution and does not guarantee complete privacy. In the field of providing individual data anonymity, this problem has been widely recognized, and various methods have been proposed to solve it. In the current work, we demonstrate that it is possible to violate group anonymity as well, even if those attributes that uniquely identify the group are removed. As it turns out, it is possible to use third-party data to build a fuzzy model of a group. Typically, such a model comes in a form of a set of fuzzy rules, which can be used to determine membership grades of respondents in the group with a level of certainty sufficient to violate group anonymity. In the work, we introduce an evolutionary computing based method to build such a model. We also discuss a memetic approach to protecting the data from group anonymity violation in this case.Entities:
Keywords: Fuzzy inference; Group anonymity; Memetic algorithm; Microfile; Privacy-preserving data publishing; Subgroup discovery
Year: 2016 PMID: 26844025 PMCID: PMC4728171 DOI: 10.1186/s40064-016-1692-9
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Guidelines for the interpretation of MB in terms of the strength of evidence in favor of against
|
|
| 0–1 | 1–3 | 3–5 |
|
|---|---|---|---|---|---|
|
| Negative | Bare mention | Positive | Strong | Decisive |
Values of the Occupation, SOC classification attribute that correspond to the value 1 of the harmonized attribute Military Personnel
| Attribute value | Interpretation |
|---|---|
| 551,010 | Military Officer Special and Tactical Operations Leaders |
| 552,010 | First-Line Enlisted Military Supervisors |
| 553,010 | Military Enlisted Tactical Operations and |
| Air/Weapons Specialists and Crew Members | |
| 559,830 | Military, Rank Not Specified |
Basic harmonized attributes used in the practical example
| Index | Name | Type | Values |
|---|---|---|---|
|
| Age |
| 000— |
|
| Educational attainment [general version] |
| 00— |
|
| Sex |
| 1— |
|
| Race [general version] |
| 1— |
|
| Usual hours worked per week |
| 00— |
|
| Hispanic origin [general version] |
| 0— |
|
| Marital status |
| 1— |
|
| Means of transportation to work |
| 00— |
|
| Time of departure for work |
| 0000— |
|
| Travel time to work |
| 000— |
|
| Weeks worked last year, intervalled |
| 0—N/A, 1— |
|
| Total personal income |
| A 7-digit numeric code reporting each respondent’s total pre-tax personal income or losses from all sources for the previous year |
|
| Speaks English |
| 0— |
Ranges of acceptable for each linguistic variable in the practical example
| Name of |
|
|
|---|---|---|
| Age | 18 | 45 |
| Educational attainment [general version] | 1 | 11 |
| Sex | 1 | 2 |
| Race [general version] | 1 | 2 |
| Usual hours worked per week | 0 | 100 |
| Hispanic origin [general version] | 0 | 9 |
| Marital status | 1 | 6 |
| Means of transportation to work | 0 | 70 |
| Time of departure for work | 1 | 2359 |
| Travel time to work | 1 | 119 |
| Weeks worked last year, intervalled | 1 | 6 |
| Total personal income | 0 | 200,000 |
| Speaks English | 2 | 5 |
Fuzzy rules used in the example
|
|
|
| Support |
|---|---|---|---|
|
| 0.032 | 0.755 | 0.032 |
|
| 0.031 | 0.787 | 0.031 |
|
| 0.012 | 0.801 | 0.012 |
|
| 0.010 | 0.781 | 0.010 |
|
| 0.012 | 0.851 | 0.012 |
|
| 0.034 | 0.840 | 0.034 |
|
| 0.025 | 0.765 | 0.025 |
|
| 0.018 | 0.931 | 0.018 |
|
| 0.017 | 0.915 | 0.018 |
|
| 0.025 | 0.754 | 0.026 |
|
| 0.032 | 0.751 | 0.032 |
|
| 0.018 | 0.951 | 0.018 |
|
| 0.019 | 0.767 | 0.019 |
|
| 0.009 | 1.876 | 0.009 |
|
| 0.008 | 0.761 | 0.009 |
|
| 0.010 | 1.325 | 0.010 |
|
| 0.026 | 0.767 | 0.026 |
|
| 0.002 | 0.914 | 0.002 |
Fig. 1Quantity signal (solid line) and auxiliary quantity signal (dashed line) obtained for the state of New York by applying the fuzzy rules from the example to the 2000 U.S. census microfile
Results of applying the evolved fuzzy rules to the 2000 census microfile
| State | Number of outliers in the quantity signal | Number of undisclosed outliers | Number of outliers in the auxiliary quantity Signal | Number of false outliers |
|---|---|---|---|---|
| Alabama | 4 | 3 | 1 | 0 |
| Alaska | 2 | 0 | 2 | 0 |
| Arizona | 4 | 1 | 3 | 0 |
| California | 4 | 0 | 4 | 0 |
| Colorado | 2 | 0 | 2 | 0 |
| Connecticut | 1 | 0 | 1 | 0 |
| Florida | 7 | 4 | 4 | 1 |
| Georgia | 5 | 1 | 5 | 1 |
| Hawaii | 1 | 0 | 1 | 0 |
| Illinois | 2 | 1 | 1 | 0 |
| Kansas | 3 | 2 | 1 | 0 |
| Kentucky | 2 | 0 | 2 | 0 |
| Louisiana | 4 | 2 | 2 | 0 |
| Maryland | 2 | 1 | 1 | 0 |
| Mississippi | 2 | 1 | 1 | 0 |
| Missouri | 2 | 1 | 1 | 0 |
| New Jersey | 3 | 1 | 2 | 0 |
| New York | 2 | 0 | 2 | 0 |
| North Carolina | 4 | 2 | 2 | 0 |
| Ohio | 4 | 3 | 1 | 0 |
| Oklahoma | 3 | 2 | 1 | 0 |
| Pennsylvania | 4 | 2 | 4 | 2 |
| Rhode Island | 1 | 0 | 1 | 0 |
| South Carolina | 6 | 1 | 5 | 0 |
| Tennessee | 3 | 3 | 0 | 0 |
| Texas | 7 | 2 | 5 | 0 |
| Virginia | 9 | 3 | 6 | 0 |
| Washington | 5 | 2 | 3 | 0 |
| Total | 98 | 38 | 64 | 4 |
Fig. 2Quantity signal (solid line) and auxiliary quantity signal (dashed line) obtained for the state of New York by applying the fuzzy rules from the example to the 2013 U.S. ACS microfile
Results of applying the evolved fuzzy rules to the 2013 ACS microfile
| State | Number of outliers in the quantity signal | Number of undisclosed outliers | Number of outliers in the auxiliary quantity signal | Number of false outliers |
|---|---|---|---|---|
| Alabama | 2 | 2 | 1 | 1 |
| Alaska | 2 | 0 | 2 | 0 |
| Arizona | 4 | 1 | 4 | 1 |
| California | 3 | 1 | 2 | 0 |
| Colorado | 2 | 0 | 2 | 0 |
| Connecticut | 1 | 0 | 2 | 1 |
| Florida | 7 | 5 | 3 | 1 |
| Georgia | 7 | 3 | 4 | 0 |
| Hawaii | 1 | 0 | 1 | 0 |
| Illinois | 2 | 1 | 2 | 1 |
| Kansas | 2 | 2 | 0 | 0 |
| Kentucky | 2 | 1 | 1 | 0 |
| Louisiana | 4 | 4 | 0 | 0 |
| Maryland | 3 | 2 | 1 | 0 |
| Mississippi | 1 | 0 | 1 | 0 |
| Missouri | 2 | 2 | 0 | 0 |
| Nevada | 1 | 0 | 1 | 0 |
| New Jersey | 2 | 2 | 0 | 0 |
| New Mexico | 2 | 2 | 0 | 0 |
| New York | 2 | 0 | 2 | 0 |
| North Carolina | 3 | 1 | 2 | 0 |
| Ohio | 2 | 1 | 3 | 2 |
| Oklahoma | 3 | 2 | 1 | 0 |
| South Carolina | 4 | 1 | 3 | 0 |
| Texas | 6 | 1 | 5 | 0 |
| Virginia | 7 | 4 | 4 | 1 |
| Washington | 4 | 1 | 3 | 0 |
| Total | 81 | 39 | 50 | 8 |
Fig. 3Initial (solid line) and modified (dashed line) auxiliary quantity signals for the state of New York (2013 U.S. ACS microfile)