Literature DB >> 28480120

Compromising the Security of "Generating Unique identifiers from Patient Identification Data Using Security Models".

Abstract

Entities: Chemical

Year: 2017 PMID： 28480120 PMCID： PMC5404349 DOI： 10.4103/jpi.jpi_1_17

Source DB: PubMed Journal: J Pathol Inform

× No keyword cloud information.

Sir, I write with respect to the Technical Note “Generating unique identifiers (IDs) from patient identification data using security models,”[1] the authors of which propose a method to “create a unique one-way encrypted ID per patient that can be used for data sharing.” In summary, their method involves concatenation of a patient's date of birth, sex, and surname, utilizing either the MD5 or SHA-1 cryptographic hash of this value as the record ID. The authors conclude that this “can be used to share patient electronic medical records between practitioners without revealing patients' identifiable data.” Here, I demonstrate that this is not the case and wish to recommend that the method should not be utilized under circumstances in which the privacy of underlying patient data is required. The authors state that “the difficulty of coming up with any message having a given MD is on the order of 2128 operations;” however, even in the absence of known weaknesses in the MD5 algorithm,[2] this assumes an unbounded input space. The proposed methodology is strictly limited by the number of feasible birth dates, names, and sexes – excluding leap days and assuming only binary sexes, the input space for a 100-year period is only 73,000 per surname. It is thus possible to perform a brute-force, precomputed attack utilizing common surnames. Known as a rainbow table, I calculated the proposed IDs for two sexes, birth dates spanning all of the century 1917–2016 inclusive, and the top ten most common surnames in the 2000 USA census.[3] This approach reduces the search space to < 223 and performed on my personal laptop; computation took a mere 8.8 s to compromise the IDs of over 13 million people (based on census counts) for both MD5 and SHA-1. The results of my calculations are available for download at https://goo.gl/xqwphs and constitute a reverse-lookup database that fully compromises the security of the proposed method. It is trivial to modify the input format for the precomputed IDs and to extend the rainbow table to cover more surnames; nevertheless, the secrecy of the input format would not contribute to security, under Kerckhoffs' principle (French original;[4] English elucidation[5]). Given the independence between IDs, this brute-force process is known as embarrassingly parallel,[6] allowing for computation to be shared across any number of devices (without modifying code) which results in a decreased time for compromise. A number of other weaknesses exist in the proposed methodology, but I limit myself to detailing the most severe one in the interest of being succinct.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

2 in total

1. Generating unique IDs from patient identification data using security models.

Authors: Emad A Mohammed; Jonathan C Slack; Christopher T Naugler
Journal: J Pathol Inform Date: 2016-12-30

Review 2. Data security in genomics: A review of Australian privacy requirements and their relation to cryptography in data storage.

Authors: Arran Schlosberg
Journal: J Pathol Inform Date: 2016-02-05

2 in total