| Literature DB >> 27199298 |
Kostas Pantazos1, Soren Lauesen1, Soren Lippert1.
Abstract
A health record database contains structured data fields that identify the patient, such as patient ID, patient name, e-mail and phone number. These data are fairly easy to de-identify, that is, replace with other identifiers. However, these data also occur in fields with doctors' free-text notes written in an abbreviated style that cannot be analyzed grammatically. If we replace a word that looks like a name, but isn't, we degrade readability and medical correctness. If we fail to replace it when we should, we degrade confidentiality. We de-identified an existing Danish electronic health record database, ending up with 323,122 patient health records. We had to invent many methods for de-identifying potential identifiers in the free-text notes. The de-identified health records should be used with caution for statistical purposes because we removed health records that were so special that they couldn't be de-identified. Furthermore, we distorted geography by replacing zip codes with random zip codes.Entities:
Keywords: anonymity; consistency; correctness; de-identification; electronic health records; readability
Mesh:
Year: 2016 PMID: 27199298 DOI: 10.1177/1460458216647760
Source DB: PubMed Journal: Health Informatics J ISSN: 1460-4582 Impact factor: 2.681