| Literature DB >> 29683056 |
Ourania Kounadi1, Bernd Resch1,2.
Abstract
Participatory sensing applications collect personal data of monitored subjects along with their spatial or spatiotemporal stamps. The attributes of a monitored subject can be private, sensitive, or confidential information. Also, the spatial or spatiotemporal attributes are prone to inferential disclosure of private information. Although there is extensive problem-oriented literature on geoinformation disclosure, our work provides a clear guideline with practical relevance, containing the steps that a research campaign should follow to preserve the participants' privacy. We first examine the technical aspects of geoprivacy in the context of participatory sensing data. Then, we propose privacy-preserving steps in four categories, namely, ensuring secure and safe settings, actions prior to the start of a research survey, processing and analysis of collected data, and safe disclosure of datasets and research deliverables.Entities:
Keywords: anonymization methods; disclosure risk; geoprivacy by design; location privacy; mobile participatory sensors; research design; spatial analysis; spatiotemporal data
Mesh:
Year: 2018 PMID: 29683056 PMCID: PMC6011384 DOI: 10.1177/1556264618759877
Source DB: PubMed Journal: J Empir Res Hum Res Ethics ISSN: 1556-2646 Impact factor: 1.742
Privacy and Confidentiality Approaches for Statistical and Spatial Data.
| Dataset | Anonymization approaches | Description | Major effect | Benefits | Limitations |
|---|---|---|---|---|---|
| Microdata | Abbreviation | Reduces the volume or granularity of released information | Imprecision | Easy implementation; mathematical basis for location protection methods | Current applications are restricted to a-spatial data |
| Aggregation | Combines adjacent categories or replaces with nearby values | ||||
| Modification | Changes data values with rounding or perturbation | Inaccuracy | |||
| Fabrication | Creates a fictional dataset that has distributional and inferential similarities with the original | ||||
| Confidential discrete spatial data (e.g., health care, crime, household surveys) | Adaptive geomasking | Actual locations are perturbed considering the spatial k-anonymity | Inaccuracy | Risk of identification information can be adaptively anonymized to meet data-specific regulations and restrictions; anonymized data retain the initial discrete structure that is crucial for many spatial-point pattern analyses | Current applications are restricted to static, nontemporal discrete location data |
| Geomasking with quasi-identifiers | Geographical masks that extend spatial k-anonymity to basic k-anonymity to account for quasi-identifiers | Inaccuracy or imprecision | In addition to the location and sensitive theme, quasi-identifiers may be disclosed that allow further analysis of covariates | ||
| Synthetic geographies | Anonymized data are synthesized from the results of spatial estimation models that use covariates as estimators of confidential locations | Inaccuracy | Retains relationship between locations and covariates | ||
| Spatiotemporal data of individuals (e.g., GPS trajectories, cellular data, LBS, radio-frequency identification devices [RFID]) | Point aggregation | A set of locations is replaced by a single representative location | Imprecision | Adequate for visualizing trajectories of individuals or movement flows in between areas | Point aggregation underperforms random perturbation techniques |
| Cloaking | Lowers the space and/or time precision of individual-level data | Option to decrease the temporal or the spatial resolution | Prohibits spatial-point pattern analysis; polygon clustering may hide significant point clusters | ||
| Dummies | Adds noise that simulates human trajectories | Inaccuracy | Allows spatial-point pattern analysis and analysis by user | The spatial accuracy of the augmented anonymized dataset compared with the original one has not been addressed | |
| Pseudonyms | Identities are stored with pseudonyms | Inferential disclosure is not protected | |||
| Mix zones | Locations are hidden in certain areas, and pseudonyms change when exiting them | High positional accuracy is achieved in low sensitivity areas; it is harder, if not impossible, to perform inference attacks on individuals’ spatiotemporal behavior if pseudonyms are changed periodically | Analysis by user or group of users is not possible if pseudonyms change in time |
Note. GPS = global positioning system; LBS = location-based services.
Privacy and Confidentiality Recommendations From Public and Independent Bodies.
| FCSM | CDC-ATSDR | NRC | |||
|---|---|---|---|---|---|
| Organization and training | 1. Standardize and centralize agency review of disclosure-limited data products | 1. Designate a privacy manager | 1. Methodological training in the acquisition and use of data | ||
| FCSM | CDC-ATSDR | ICO (POA) | |||
| Data processing | 3. Remove direct identifiers and limit other identifying information | 5. Classify each dataset as a restricted-access or a PUDS | 1. Increase a mapping area to cover more properties or occupants | ||
| FCSM | CDC-ATSDR | ICO (POA) | ICO (GCD) | NIJ–CMRC | |
| Publication of data and deliverables | 4. Share information on assessing disclosure risk | 6. Include disclosure statement with PUDS | 2. Reduce the frequency or timeliness of publication | 1. The use of heat maps, blocks, and zones reduces privacy risks | 1. Decide which data to present: Point versus aggregate data |
| CDC-ATSDR | NRC | NIJ–CMRC | |||
| Release data to a third party | 8. Authenticate the identity of data requestors | 4. Data stewards should develop licensing agreements to provide increased access to linked social-spatial datasets that include confidential information | 5. Consider privacy and other implications if data provided will be merged with other data | ||
Note. Recommendations have been grouped into four categories according to the topic they address. FCSM = Federal Committee on Statistical Methodology; CDC-ATSDR = Centers for Disease Control and Prevention and the Agency for Toxic Substances and Disease Registry; PUDS = public-use dataset; ICO = Information Commissioner’s Office; POA = Practice on Anonymization; GCD = Geospatial crime data; NRC = National Research Council; NIJ = National Institute of Justice; CMRC = Crime Mapping Research Center; DSA = disclosure sharing agreement.
A List of Initial Activities Prior to the Starting of the Survey.
| A. Presurvey activities |
|---|
| 1. Design study in the least privacy invasive manner |
| 2. Develop a privacy-preserving research plan |
| 3. Define criteria for access to restricted-access datasets |
| 4. Prepare a participation agreement |
| 5. Ensure inform consent on location privacy disclosure risks |
| 6. Obtain institutional approval preferably reviewed from a DRB |
Note. DRB = disclosure review board.
A List of Recommendations to Ensure Secure and Safe Settings.
| B. Security and safety |
|---|
| 1. Assign a privacy manager |
| 2. Train collectors and/or processors in methods and ethical considerations |
| 3. Ensure a secure IT system |
| 4. Ensure secure sensing devices |
Note. IT = information technology.
A List of Recommendations to Store, Anonymize, and Asses Derived Datasets.
| C. Processing and analysis of collected data |
|---|
| 1. Delete data from sensor devices once stored in the IT system |
| 2. Remove identifiers from the dataset |
| 3. standardize anonymization practices |
| 4. Ensure that the inclusion of pseudonyms does not lead to disclosure |
| 5. Ensure that the inclusion of quasi-identifiers does not lead to disclosure |
| 6. Ensure a sufficient l-diversity of the sensitive attributes |
| 7. Classify each dataset as a restricted-access or anonymized dataset |
| 8. Assess disclosure of anonymized datasets |
| 9. Assess anonymization effect on spatial analysis |
Note. IT = information technology.
A List of Recommendations to Prevent Disclosure When (a) Findings Are Published, (b) Anonymized Datasets Are Published, and (c) Data Are Shared With Third Parties.
| D. Disclosure prevention |
|---|
| Dissemination of findings |
| 1. Reduce spatial precision |
| 2. Reduce temporal precision |
| 3. Consider alternatives to point distribution maps |
| 4. Assess disclosure on a point distribution map |
| 5. Provide protection vs. disclosure information |
| 6. Provide contact information |
| 7. Use disclaimers |
| Anonymized datasets |
| 8. Avoid the release of multiple versions of anonymized datasets |
| 9. Avoid the disclosure of anonymization meta-data |
| 10. Inform about disclosure risk assessment |
| 11. Provide information on protection and effect |
| 12. Provide contact information |
| 13. Maintain log of anonymized disclosed datasets |
| Data sharing with third parties |
| 14. Plan a mandatory licensing agreement |
| 15. Plan a DSA for restricted-access data |
| 16. Authenticate the identity of data requestors |
| 17. Perform background checks on research personnel who will have access to data |
| 18. Ensure requestor’s safe settings |
| 19. Decide what data will be needed |
| 20. Consider implications if restricted-access data will be merged with other data |
| 21. Decide presentation of research outputs |
| 22. Decide length of period of retaining restricted-access data |
| 23. Review research outputs before publication |
| 24. Maintain log of restricted-access disclosed datasets |
Note. DSA = disclosure sharing agreement.
Measures to Evaluate the Anonymization Effect by Type of Spatial Analysis.
| Unit of analysis | Spatial analysis | Measures of spatial error and information loss |
|---|---|---|
|
| Global descriptive statistics | Global divergence index (GDi) |
| Pattern detection/analysis | Divergence to clustering distance in cross | |
| Univariate spatial prediction | Divergence to prediction accuracy index (PAI), prediction efficiency index (PEI) | |
| Local indicators of spatial association | Local divergence index (LDi), stability of hotspot (SoH) | |
| Spatial clustering | Detection rate, accuracy, sensitivity, and specificity | |
| Multivariate spatial relationship | Divergence to | |
|
| Choropleth mapping, density surface estimation | Index of similarity (S), suppression, compactness, discernibility, nonuniform entropy |