| Literature DB >> 24303325 |
Dongqiuye Pu1, Stavros Garantziotis, Javed Mostafa.
Abstract
Environmental Polymorphisms Registry (EPR) is a large-scale phenotype-by-genotype registry developed by National Institute of Environmental Health Sciences to facilitate translational research. The link between personal identity and collected genomic data was preserved in EPR which creates opportunities for EPR to be linked to phenotype-rich databases, such as the Carolina Data Warehouse for Health (CDW-H) located at the University of North Carolina hospital system. CDW-H contains clinically-relevant data for patients who have been admitted to UNC healthcare system. To validate the feasibility of linking EPR with CDWH, the number of matching records between the two databases had to be established. To that end, combinations of subjects' demographic identifiers from both databases were converted to anonymized hash codes, which were then matched to determine the number of overlapping records. Preliminary results showed that combination of last name, gender, data of birth and zip code would generate over 2,700 matches between the two databases.Entities:
Year: 2013 PMID: 24303325 PMCID: PMC3814467
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Project workflow for anonymized record linkage
Data extraction format agreed by both parties
| Last Name | Date of Birth | Gender | Zip Code |
|---|---|---|---|
| No special requirement | •Formatted as mm/dd/yyyy, slashes should be retained | •Fully spelled words, i.e., male or female | •5-digit number |
Data normalization rules. All white spaces were removed before normalization.
| Last Name | Date of Birth | Gender | Zip Code |
|---|---|---|---|
| •To lower case | •Remove slashes | ||
| •Remove special characters | •Date format is mmddyyyy | •To lower case | •5-digit number |
Figure 2.Pseudo code for anonymized record matching.
Anonymized record linking results using 4 combinations of identifiers
| Combinations | Unique Records in EPR | Matches |
|---|---|---|
| Last name + DOB + zip + Gender | 15705 | 2746 |
| Last name + DOB + zip | 15705 | 2754 |
| Last name + DOB | 15691 | 4071 |
| Last name + zip | 14143 | 7081 |