| Literature DB >> 28213343 |
Xianlai Chen1, Yang C Fann2, Matthew McAuliffe3, David Vismer4, Rong Yang5.
Abstract
BACKGROUND: As one of the several effective solutions for personal privacy protection, a global unique identifier (GUID) is linked with hash codes that are generated from combinations of personally identifiable information (PII) by a one-way hash algorithm. On the GUID server, no PII is permitted to be stored, and only GUID and hash codes are allowed. The quality of PII entry is critical to the GUID system.Entities:
Keywords: computer security; confidentiality; data accuracy; medical record linkage; personally identifiable information; privacy; quality control; registries
Year: 2017 PMID: 28213343 PMCID: PMC5336604 DOI: 10.2196/medinform.5054
Source DB: PubMed Journal: JMIR Med Inform
Personally identifiable information (PII) fields used in global unique identifier (GUID) system.
| Type | Name | Meaning |
| Required | FN | Complete legal given (first) name at birth |
| LN | Complete legal family (last) name at birth | |
| MN | Complete legal additional (middle) name | |
| SEX | Physical sex at birth (male or female) | |
| COB | Country of government issued or national ID | |
| DOB | Day of birth | |
| MOB | Month of birth | |
| YOB | Year of Birth | |
| Optional | GIID | Government issued or national ID |
| MFN | Mother’s complete legal given (first) name at her birth | |
| MLN | Mother’s complete legal family (last) name at her birth | |
| FFN | Father’s complete legal given (first) name at his birth | |
| FLN | Father’s complete legal family (last) name at his birth | |
| MDOB | Mother’s day of birth | |
| MMOB | Mother’s month of birth | |
| FDOB | Father’s day of birth | |
| FMOB | Father’s month of birth |
Personally identifiable information (PII) combination patterns for hash cod.
| Hash code | Combinations patterns |
| 1 | YOB + DOB + SEX + GIIDa |
| 2 | FN + MN + LN + COB + DOB + MOB |
| 3 | FN + YOB + MFNa+ MLNa+ FFNa+ FLNa |
| 4 | FN + LN + COB + SEX + MDOBa+ MMOBa+ FDOBa+ FMOBa |
| 5 | FN + MN + MOB + MFNa+ FFNa+ MLNa |
aThe field that is optional.
Figure 1Components of hash code.
Thresholds of missing fields to determine type of hash code.
| Parameters | Hash code 1 | Hash code 2 | Hash code 3 | Hash code 4 | Hash code 5 |
| Lower threshold | 0 | 1 | 1 | 1 | 1 |
| Upper threshold | 1 | 2 | 3 | 3 | 3 |
Figure 2An example for match among hash codes.
Figure 3The count of probable perfect or good hash codes.
Probable personally identifiable information (PII) combinations for hash codes with different matching types.
| Index | Hash code | Combinations of personally identifiable information fields | Missed fields | Type of hash code | |||||||
| 1 | 1 | GIID | SEX | DOB | YOB | Perfect | |||||
| 2 | a | SEX | DOB | YOB | GIID | Good | |||||
| 3 | 2 | FN | LN | MN | DOB | MOB | COB | Perfect | |||
| 4 | 3 | MFN | MLN | FFN | FLN | FN | YOB | Perfect | |||
| 5 | MLN | a | FFN | FLN | FN | YOB | MFN | Perfect | |||
| 6 | MFN | FFN | a | FLN | FN | YOB | MLN | Perfect | |||
| 7 | MFN | MLN | FLN | a | FN | YOB | FFN | Perfect | |||
| 8 | MFN | MLN | FFN | FN | a | YOB | FLN | Perfect | |||
| 9 | a | a | FFN | FLN | FN | YOB | MFN, MLN | Good | |||
| 10 | a | MLN | a | FLN | FN | YOB | MFN, FFN | Good | |||
| 11 | a | MLN | FFN | a | FN | YOB | MFN, FLN | Good | |||
| 12 | MFN | a | a | FLN | FN | YOB | MLN, FFN | Good | |||
| 13 | MFN | a | FFN | a | FN | YOB | MLN, FLN | Good | |||
| 14 | MFN | MLN | a | a | FN | YOB | FFN, FLN | Good | |||
| 15 | a | a | a | FLN | FN | YOB | MFN, MLN, FFN | Good | |||
| 16 | MFN | a | a | a | FN | YOB | MLN, FFN, FLN | Good | |||
| 17 | a | MLN | a | a | FN | YOB | MFN, FFN, FLN | Good | |||
| 18 | a | a | FFN | a | FN | YOB | MFN, MLN, FLN | Good | |||
| 20 | 4 | MDOB | MMOB | FDOB | FMOB | FN | LN | SEX | COB | Perfect | |
| 19 | a | MMOB | FDOB | FMOB | FN | LN | SEX | COB | MDOB | Perfect | |
| 21 | MDOB | a | FDOB | FMOB | FN | LN | SEX | COB | MMOB | Perfect | |
| 22 | MDOB | MMOB | a | FMOB | FN | LN | SEX | COB | FDOB | Perfect | |
| 23 | MDOB | MMOB | FDOB | a | FN | LN | SEX | COB | Perfect | ||
| 24 | a | a | FDOB | FMOB | FN | LN | SEX | COB | MDOB, MMOB | Good | |
| 25 | a | MMOB | a | FMOB | FN | LN | SEX | COB | MDOB, FDOB | Good | |
| 26 | a | MMOB | FDOB | a | FN | LN | SEX | COB | MDOB, FMOB | Good | |
| 27 | MDOB | a | a | FMOB | FN | LN | SEX | COB | MMOB, FDOB | Good | |
| 28 | MDOB | a | FDOB | a | FN | LN | SEX | COB | MMOB, FMOB | Good | |
| 29 | MDOB | MMOB | a | a | FN | LN | SEX | COB | FDOB, FMOB | Good | |
| 30 | a | a | a | FMOB | FN | LN | SEX | COB | MDOB, MMOB, FDOB | Good | |
| 31 | a | a | FDOB | a | FN | LN | SEX | COB | MDOB, MMOB, FMOB | Good | |
| 32 | a | MMOB | a | a | FN | LN | SEX | COB | MDOB, FDOB, FMOB | Good | |
| 33 | MDOB | a | a | a | FN | LN | SEX | COB | MMOB, FDOB, FMOB | Good | |
| 34 | 5 | FN | MN | MFN | FFN | MLN | MOB | Perfect | |||
| 35 | FN | MN | a | FFN | MLN | MOB | MFN | Perfect | |||
| 36 | FN | MN | MFN | a | MLN | MOB | FFN | Perfect | |||
| 37 | FN | MN | MFN | FFN | a | MOB | MLN | Perfect | |||
| 38 | FN | MN | MFN | a | a | MOB | FFN, MLN | Good | |||
| 39 | FN | MN | a | a | MLN | MOB | MFN, FFN | Good | |||
| 40 | FN | MN | a | FFN | a | MOB | MFN, MLN | Good | |||
| 41 | FN | MN | a | a | a | MOB | MFN, FFN, MLN | Good | |||
aThe optional field that may be missed while being collected.
Figure 4An example for locating questionable personally identifiable information (PII) fields while hash codes are perfect match.
Figure 5An example for locating questionable personally identifiable information (PII) fields while hash codes are good match.
Distribution of planted errors by personally identifiable information (PII) fields.
| PIIa fields | N_Err | Percent (%) | |
| FN | 12,937 | 6.47 | |
| LN | 14,166 | 7.08 | |
| MN | 10,234 | 5.12 | |
| COB | 12,954 | 6.48 | |
| DOB | 10,440 | 5.22 | |
| MOB | 12,645 | 6.32 | |
| YOB | 11,578 | 5.79 | |
| SEX | 11,587 | 5.79 | |
| GIID | 7980 | 3.99 | |
| MFN | 12,984 | 6.49 | |
| MLN | 10,504 | 5.25 | |
| FFN | 10,823 | 5.41 | |
| FLN | 11,656 | 5.83 | |
| MDOB | 13,603 | 6.80 | |
| MMOB | 11,301 | 5.65 | |
| FDOB | 11,188 | 5.59 | |
| FMOB | 13,420 | 6.71 | |
| Total | 200,000 | 100 | |
aPII: personally identifiable information.
Identifying of error-planted subjects.
| Matching type | Recerfa | Recnerfb | Subtotal |
| Unidentified | 13,236 | 0 | 13,236 |
| Identified | 65,383 | 49,081 | 114,464 |
| Total | 78,619 | 49,081 | 127,700 |
aRecerf: the count of subjects with errors in required fields.bRecnerf: the count of subjects with no error in required fields.
Identifying of subjects with different count of planted errors.
| nErr | nRec_Err | nRec_Err_Mtch | Ratio |
| 1 | 74,883 | 71,796 | 95.88 |
| 2 | 37,327 | 32,104 | 86.01 |
| 3 | 12,143 | 8798 | 72.45 |
| 4 | 2792 | 1545 | 55.34 |
| 5 | 476 | 199 | 41.81 |
| 6 | 69 | 18 | 26.09 |
| 7 | 8 | 4 | 50.00 |
| 8 | 2 | 0 | 0.00 |
Identifying of subjects with different count of error required fields.
| nErr_ReqF | nRec_Err_ReqF | nRec_Err_ReqF_Mtch | Ratio |
| 0 | 49,081 | 49,081 | 100.00 |
| 1 | 62,716 | 56,750 | 90.49 |
| 2 | 14,026 | 8038 | 57.31 |
| 3 | 1740 | 569 | 32.70 |
| 4 | 132 | 25 | 18.94 |
| 5 | 5 | 1 | 20.00 |
The count of analyzed questionable fields by count of errors.
| Count of planted errors in a subject | ncqf | ||
| Minimum | Maximum | Average | |
| 1 | 1 | 13 | 4.27 |
| 2 | 2 | 13 | 7.39 |
| 3 | 3 | 13 | 9.42 |
| 4 | 4 | 13 | 10.86 |
| 5 | 6 | 13 | 11.67 |
| 6 | 11 | 13 | 11.83 |
| 7 | 13 | 13 | 13.00 |
The count of analyzed questionable fields by personally identifiable information (PII) fields.
| PIIa fields with planted errors | ncqf_PII | |||
| Minimum | Maximum | Mean | ||
| FN | 13 | 13 | 13 | |
| LN | 6 | 13 | 7.65 | |
| MN | 2 | 13 | 5.56 | |
| SEX | 6 | 12 | 7.30 | |
| COB | 6 | 13 | 7.67 | |
| DOB | 2 | 11 | 5.69 | |
| MOB | 2 | 13 | 5.53 | |
| YOB | 3 | 11 | 5.28 | |
| GIID | 1 | 11 | 3.74 | |
| MFN | 1 | 13 | 6.48 | |
| MLN | 1 | 13 | 6.51 | |
| FFN | 1 | 13 | 6.59 | |
| FLN | 1 | 13 | 4.84 | |
| MDOB | 1 | 13 | 6.12 | |
| MMOB | 1 | 13 | 6.11 | |
| FDOB | 1 | 13 | 6.09 | |
| FMOB | 1 | 13 | 6.06 | |
aPII: personally identifiable information.
The count of analyzed questionable personally identifiable information (PII) fields from subjects with only one error.
| PIIa fields with planted errors | ncqf_1 | |
| FN | 13 | |
| LN | 6 | |
| MN | 2 | |
| SEX | 6 | |
| COB | 6 | |
| DOB | 2 | |
| MOB | 2 | |
| YOB | 3 | |
| GIID | 1 | |
| MFN | 1/4 | |
| MLN | 1/4 | |
| FFN | 1/4 | |
| FLN | 1 | |
| MDOB | 1/4 | |
| MMOB | 1/4 | |
| FDOB | 1/4 | |
| FMOB | 1/4 | |
aPII: personally identifiable information.
Figure 6The application of checking questionable personally identifiable information (PII) fields.