| Literature DB >> 24303251 |
Hye-Chung Kum1, Stanley Ahalt.
Abstract
Today there is a constant flow of data into, out of, and between ever-larger and ever-more complex databases about people. Together, these digital traces collectively capture our social genome , the footprints of our society. The burgeoning field of population informatics is the systematic study of populations via secondary analysis of such massive data collections (termed "big data") about people. In particular, health informatics analyzes electronic health records to improve health outcomes for a population. Privacy protection in such secondary data analysis research is complex and requires a holistic approach which combines technology, statistics, policy and a shift in culture of information accountability through transparency rather than secrecy. We review state of the art in privacy protection technology and policy frameworks from widely different fields, and synthesize the findings to present a comprehensive system of privacy protection in population informatics research using the privacy-by-design approach. Based on common activities in the workflow, we describe the pros and cons of four different data access models - restricted access, controlled access, monitored access, and open access - that minimize risk and maximize usability of data. We then evaluate the system by analyzing the risk and usability of data through a realistic example. We conclude that deployed together the four data access models can provide a comprehensive system for privacy protection, balancing the risk and usability of secondary data in population informatics research.Entities:
Year: 2013 PMID: 24303251 PMCID: PMC3845756
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.
Data access models for privacy protection in population informatics research workflow (data to decision)
Comparison of risk and usability (Usability 1.1 & 1.2 for Risk 1, Usability 2 for Risk 2)
| Restricted Access | Controlled Access | Monitored Access | Open Access | ||
|---|---|---|---|---|---|
| Example Systems | US Census RDC | Secure Medical Workspace | Secure Unix Servers | Public Website | |
| Type of Data | De-coupled micro data including PII | De-identified micro data | Aggregate data | Sanitized data | |
|
Privacy
|
Encryption for decoupling locked down computer with physical restriction | Locked down VM to restrict software on the computer and data channels |
Information accountability Exempt IRB | Disclosure limitation methods | |
| Monitor Use | On and off the computer | On the computer | On the computer | No monitoring | |
| Usability | U1.1: Software (SW) | Only preinstalled data integration & tabulation SW. No query capacity | Requested and approved statistical software only | Any software | Any software |
| U1.2: Other data | No outside data allowed | Only preapproved outside data allowed | Any data | Any data | |
| U2: Access | No Remote Access | Remote Access | Remote Access | Remote Access | |
| Risk | R1:Cryptographic Attack | Highly Difficult | Fairly Difficult. Would have to break into VM. | Easy to run sophisticated SW with outside data | NA |
| R2: Data Leakage |
Very difficult.
| Physical data leakage (Take a picture of monitor) | Electronically take data off the system. | NA | |
Figure 2.
Decoupled and chaffed data allows for data integration without sensitive attribute disclosure
Analysis of risk and usability of the conventional system and the proposed system
| Workflow | Data Preparation (Data Integration and Selection) | Analysis of Micro Person Level Data | ||
| System | Conventional System | Proposed System | Conventional System | Proposed System |
| Model | Indirect Access via Health Dept. | Direct Restricted Access | Monitored Access | Controlled Access |
| Access | No direct access to data | Direct access to data | Remote direct access for authorized users | |
| Type of Data | Multiple Identifiable Microdata Tables | Multiple Decoupled Microdata Tables | De-identified integrated microdata | De-identified integrated microdata with P(linkage) |
| Analysis of Risk and Usability |
The proposed model
|
The proposed model r
| ||