PURPOSE: A set of deidentified patient data compliant with the Health Information Portability and Accountability Act (HIPAA) was compiled, the data lost as a function of unique data elements (UDEs) were measured, and the deidentified data were tested for potential for reidentification. METHODS: After approval by the institutional review board of an integrated health system, a limited-data set was created by querying the health system's pharmacy, administrative, and financial files for patients discharged between January 1 and December 31, 2000. Using the HIPAA "safe-harbor" method, this limited-data set was converted into a deidentified-data table for future statistical analysis, and UDEs in both data sets were identified and quantified. Unique combinations of commonly available data were also identified. RESULTS: The limited-data set, representing 4,738 patient discharges, contained 810,456 UDEs in 322,657 records organized into four data tables (demographics, diagnoses, medication orders, and laboratory test results). The deidentified-data table, representing 4,722 discharges, contained 562,171 UDEs in 128 data-type columns in a single data table. About 31% of the data volume was lost. Much of the information lost was of the type that is of special interest to researchers (e.g., time between episodes of care, ages of >89 years). CONCLUSION: A study suggested that deidentified patient data with a reasonable degree of protection against reidentification were less complete than may be necessary for good research.
PURPOSE: A set of deidentified patient data compliant with the Health Information Portability and Accountability Act (HIPAA) was compiled, the data lost as a function of unique data elements (UDEs) were measured, and the deidentified data were tested for potential for reidentification. METHODS: After approval by the institutional review board of an integrated health system, a limited-data set was created by querying the health system's pharmacy, administrative, and financial files for patients discharged between January 1 and December 31, 2000. Using the HIPAA "safe-harbor" method, this limited-data set was converted into a deidentified-data table for future statistical analysis, and UDEs in both data sets were identified and quantified. Unique combinations of commonly available data were also identified. RESULTS: The limited-data set, representing 4,738 patient discharges, contained 810,456 UDEs in 322,657 records organized into four data tables (demographics, diagnoses, medication orders, and laboratory test results). The deidentified-data table, representing 4,722 discharges, contained 562,171 UDEs in 128 data-type columns in a single data table. About 31% of the data volume was lost. Much of the information lost was of the type that is of special interest to researchers (e.g., time between episodes of care, ages of >89 years). CONCLUSION: A study suggested that deidentified patient data with a reasonable degree of protection against reidentification were less complete than may be necessary for good research.
Authors: Khaled El Emam; David Buckeridge; Robyn Tamblyn; Angelica Neisa; Elizabeth Jonker; Aman Verma Journal: BMC Med Inform Decis Mak Date: 2011-06-22 Impact factor: 2.796
Authors: Louis Ehwerhemuepha; Gary Gasperino; Nathaniel Bischoff; Sharief Taraman; Anthony Chang; William Feaster Journal: BMC Med Inform Decis Mak Date: 2020-06-19 Impact factor: 2.796
Authors: Kirk D Midkiff; Elizabeth B Andrews; Alicia W Gilsenan; Dennis M Deapen; David H Harris; Maria J Schymura; Francis J Hornicek Journal: Pharmacoepidemiol Drug Saf Date: 2016-04-19 Impact factor: 2.890