James M Naessens1, Sue L Visscher2, Stephanie M Peterson2, Kristi M Swanson2, Matthew G Johnson2, Parvez A Rahman2, Joe Schindler3, Mark Sonneborn3, Donald E Fry4, Michael Pine4.
Abstract
OBJECTIVE: Assess algorithms for linking patients across de-identified databases without compromising confidentiality. DATA SOURCES/STUDY
SETTING: Hospital discharges from 11 Mayo Clinic hospitals during January 2008-September 2012 (assessment and validation data). Minnesota death certificates and hospital discharges from 2009 to 2012 for entire state (application data). STUDY
DESIGN: Cross-sectional assessment of sensitivity and positive predictive value (PPV) for four linking algorithms tested by identifying readmissions and posthospital mortality on the assessment data with application to statewide data. DATA COLLECTION/EXTRACTION
METHODS: De-identified claims included patient gender, birthdate, and zip code. Assessment records were matched with institutional sources containing unique identifiers and the last four digits of Social Security number (SSNL4). PRINCIPAL
FINDINGS: Gender, birthdate, and five-digit zip code identified readmissions with a sensitivity of 98.0 percent and a PPV of 97.7 percent and identified postdischarge mortality with 84.4 percent sensitivity and 98.9 percent PPV. Inclusion of SSNL4 produced nearly perfect identification of readmissions and deaths. When applied statewide, regions bordering states with unavailable hospital discharge data had lower rates.
CONCLUSION: Addition of SSNL4 to administrative data, accompanied by appropriate data use and data release policies, can enable trusted repositories to link data with nearly perfect accuracy without compromising patient confidentiality. States maintaining centralized de-identified databases should add SSNL4 to data specifications. © Health Research and Educational Trust.
OBJECTIVE: Assess algorithms for linking patients across de-identified databases without compromising confidentiality. DATA SOURCES/STUDY
SETTING: Hospital discharges from 11 Mayo Clinic hospitals during January 2008-September 2012 (assessment and validation data). Minnesota death certificates and hospital discharges from 2009 to 2012 for entire state (application data). STUDY
DESIGN: Cross-sectional assessment of sensitivity and positive predictive value (PPV) for four linking algorithms tested by identifying readmissions and posthospital mortality on the assessment data with application to statewide data. DATA COLLECTION/EXTRACTION
METHODS: De-identified claims included patient gender, birthdate, and zip code. Assessment records were matched with institutional sources containing unique identifiers and the last four digits of Social Security number (SSNL4). PRINCIPAL
FINDINGS: Gender, birthdate, and five-digit zip code identified readmissions with a sensitivity of 98.0 percent and a PPV of 97.7 percent and identified postdischarge mortality with 84.4 percent sensitivity and 98.9 percent PPV. Inclusion of SSNL4 produced nearly perfect identification of readmissions and deaths. When applied statewide, regions bordering states with unavailable hospital discharge data had lower rates.
CONCLUSION: Addition of SSNL4 to administrative data, accompanied by appropriate data use and data release policies, can enable trusted repositories to link data with nearly perfect accuracy without compromising patient confidentiality. States maintaining centralized de-identified databases should add SSNL4 to data specifications. © Health Research and Educational Trust.
Entities:
Keywords:
Readmission; de-identified data; posthospital mortality; record linkage
Mesh:
Year: 2015
PMID: 26073819 PMCID: PMC4545335 DOI: 10.1111/1475-6773.12323
Source DB: PubMed Journal: Health Serv Res ISSN: 0017-9124 Impact factor: 3.402