| Literature DB >> 22505949 |
John Cologne1, Eric J Grant, Eiji Nakashima, Yun Chen, Sachiyo Funamoto, Hiroaki Katayama.
Abstract
OBJECTIVE: Ensuring privacy of research subjects when epidemiologic data are shared with outside collaborators involves masking (modifying) the data, but overmasking can compromise utility (analysis potential). Methods of statistical disclosure control for protecting privacy may be impractical for individual researchers involved in small-scale collaborations.Entities:
Mesh:
Year: 2012 PMID: 22505949 PMCID: PMC3307056 DOI: 10.1155/2012/421989
Source DB: PubMed Journal: J Environ Public Health ISSN: 1687-9805
Figure 1Hypothetical security-utility plot. The plot shows the trade-off between degree of masking and analysis potential. The axes are defined to allow comparison with the ideal situation of no loss of security and no loss of utility (the upper right corner).
Cases in the distributed data matching to ≤3 doses in the source database.
| Comparison to | Number of matches in source database (identification risk) | Overall identification risk ( | ||
|---|---|---|---|---|
| 3 (0.33) | 2 (0.5) | 1 (1.0) | ||
| (a) Using original (nonmasked) doses | ||||
|
| ||||
| All organ doses | 2 | 9 | 18 | 0.39 |
| Stomach doses only | 3 | 5 | 29 | 0.52 |
| AHS stomach doses | 1 | 3 | 35 | 0.58 |
|
| ||||
| (b) Using doses rounded to 3 significant digits | ||||
|
| ||||
| All organ doses | 0 (1)a | 0 (5) | 0 (6) | 0 (0.14) |
| Stomach doses only | 0 | 0 | 0 (2) | 0 (0.03) |
| AHS stomach doses | 0 | 0 | 0 | 0 (0) |
aNumbers in parentheses are the numbers of values that exactly matched some entry in the source database, but in no instance was the matching individual in the source database the subject with the three-digit rounded dose.
Figure 2Identifiability risk with the illustration microdataset when compared to stratified source data. Data stratified on city, gender, dose, and (a) five-year intervals of age at risk and age at exposure, (b) ten-year intervals of age at risk and age at exposure, and (c) no ages included.
Results of fitting a linear dose response using binary regression.
| Dose masking scheme | ERRa
| Standard error | Deviance | LR statistic | Relative bias (%) | MSE |
|---|---|---|---|---|---|---|
| None | 0.5235 | 0.1548 | 826.36 | 9.27 (0.0023) | — | 0.0240 |
| Rounded to three decimal digits | 0.5237 | 0.1548 | 826.35 | 9.28 (0.0023) | 0.038 | 0.0240 |
| Rounded to two decimal digits | 0.5235 | 0.1547 | 826.35 | 9.28 (0.0023) | 0 | 0.0239 |
| Rounded to nearest centiGray | 0.5235 | 0.1548 | 826.36 | 9.27 (0.0023) | 0 | 0.0240 |
| Rounded to nearest deciGray | 0.5228 | 0.1547 | 826.36 | 9.27 (0.0023) | 0.13 | 0.0239 |
| Stratifiedb | 0.5320 | 0.1553 | 826.11 | 9.52 (0.0020) | 1.6 | 0.0242 |
| Randomizedc | ||||||
| ± 0.001 | 0.5235 | 0.1548 | 826.36 | 9.27 | 0.015 | 0.0240 |
| (min, max) | (0.5234, 0.5238) | (0.1548, 0.1549) | (0.0023) | (0, 0.057) | (0.02397, 0.02398) | |
| ± 0.01 | 0.5239 | 0.155 | 826.36 | 9.28 | 0.16 | 0.0240 |
| (min, max) | (0.5226, 0.5266) | (0.1548,0.1551) | (0.0023) | (0, 0.59) | (0.02396, 0.02407) | |
| ± 0.1 | 0.5271 | 0.155 | 826.33 | 9.31 | 1.6 | 0.0243 |
| (min, max) | (0.5159, 0.5573) | (0.1537,0.1584) | (0.0023) | (1.4, 6.5) | (0.02415, 0.02557) |
aERR: excess relative risk (relative risk—1). Precision is overrepresented for comparison.
bDoses were stratified according to the categories used in Life Span Study Report 13 [27]. The dose value assigned to each individual was the mean of all database AHS stomach dose values in that group.
cA random uniform deviate between the specified range was added to the dose; if this operation resulted in a negative value, the masked dose was set to zero. Results are the averages from 500 simulations.
Figure 3Empirical security-utility plot. The plot shows the trade-off between degree of masking and analysis potential for the example data. The points do not fall directly on the curve because the masking methods are not nested (i.e., there is not a one-to-one correspondence between degree of anonymization and analysis potential). The curve is a nonlinear regression fit of the model anonymization score = [1−(analysis-potential score)]1/ where θ was estimated to be 178.