| Literature DB >> 22168526 |
Fengjun Li1, Xukai Zou, Peng Liu, Jake Y Chen.
Abstract
BACKGROUND: Along with the rapid digitalization of health data (e.g. Electronic Health Records), there is an increasing concern on maintaining data privacy while garnering the benefits, especially when the data are required to be published for secondary use. Most of the current research on protecting health data privacy is centered around data de-identification and data anonymization, which removes the identifiable information from the published health data to prevent an adversary from reasoning about the privacy of the patients. However, published health data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against health data privacy become practical. Limited efforts have been devoted to studying these attacks yet.Entities:
Mesh:
Year: 2011 PMID: 22168526 PMCID: PMC3247088 DOI: 10.1186/1471-2105-12-S12-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A real-world example of cross-site information aggregation. The target patient “Jean” has profiles on two online medical social networking sites (1) and (2). By comparing the attributes from both profiles, the adversary can link the two with high confidence. Furthermore, the attacker can use the attribute values to get more profiles of the target through searching the Web (3) and other online public data sets (4 and 5). By aggregating and associating the five profiles, Jean’s full name, date of birth, husband’s name, home address, home phone number, cell phone number, two email addresses, occupation, medical information including lab test results are disclosed unintendedly.
Figure 2Population under k-l-anonymity. This example measures the distinguishness of the population under k-l-anonymity, using the first name as the identifier. The histogram plots the number of individuals whose first names differ with other (k-1) records for l letters. It shows within 24,399 records, most of them (more than 70% as shown in the first bar with k=1) are quite distinguishable.