Toan C Ong1, Lindsey M Duca2, Michael G Kahn1, Tessa L Crume2. 1. Department of Pediatrics, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA. 2. Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA.
Abstract
OBJECTIVE: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. MATERIALS AND METHODS: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. RESULTS: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. DISCUSSION: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. CONCLUSION: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
OBJECTIVE: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. MATERIALS AND METHODS: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. RESULTS: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. DISCUSSION: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. CONCLUSION: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
Authors: Mary Kay Theis; Robert J Reid; Monica Chaudhari; Katherine M Newton; Leslie Spangler; David C Grossman; Ronald E Inge Journal: Am J Manag Care Date: 2010-02-01 Impact factor: 2.229
Authors: Abel N Kho; John P Cashy; Kathryn L Jackson; Adam R Pah; Satyender Goel; Jörn Boehnke; John Eric Humphries; Scott Duke Kominers; Bala N Hota; Shannon A Sims; Bradley A Malin; Dustin D French; Theresa L Walunas; David O Meltzer; Erin O Kaleba; Roderick C Jones; William L Galanter Journal: J Am Med Inform Assoc Date: 2015-06-23 Impact factor: 4.497
Authors: Stephen B Johnson; Glen Whitney; Matthew McAuliffe; Hailong Wang; Evan McCreedy; Leon Rozenblit; Clark C Evans Journal: J Am Med Inform Assoc Date: 2010 Nov-Dec Impact factor: 4.497
Authors: Sara K Pasquali; Jeffrey P Jacobs; Gregory J Shook; Sean M O'Brien; Matthew Hall; Marshall L Jacobs; Karl F Welke; J William Gaynor; Eric D Peterson; Samir S Shah; Jennifer S Li Journal: Am Heart J Date: 2010-12 Impact factor: 4.749