Tony Blakely1, Clare Salmond. 1. Department of Public Health, Wellington School of Medicine, University of Otago, PO Box 7343, Wellington, New Zealand. tblakely@wnmeds.ac.nz
Abstract
BACKGROUND: Computerized record linkage is commonly used in cohort studies to ascertain the study outcome, and as such its accuracy classifying the outcome can be described using the standard epidemiological terms of sensitivity and positive predictive value (PPV). METHOD: We describe a 'duplicate method' to calculate the PPV of record linkage when each record can only be involved in one match (e.g. linking population files to death files). The method does not require a validation subset of records from both files with detailed personal information (e.g. name and address), and is therefore ideal for linkage projects using anonymous data. The duplicate method assumes that the number of records from one file with zero, one, two, etc., links from the other file is distributed in a manner predicted by combinatorial probabilities. Having made this assumption, the number of false positive links, and hence the PPV, are estimable. We demonstrate this duplicate method using output from anonymous and probabilistic record linkage of census and mortality records in New Zealand. RESULTS: The PPV estimates conform to the pattern expected based on the underlying theory of probabilistic record linkage, and were robust to sensitivity analyses. We encourage other researchers to further assess the accuracy of this method.
BACKGROUND: Computerized record linkage is commonly used in cohort studies to ascertain the study outcome, and as such its accuracy classifying the outcome can be described using the standard epidemiological terms of sensitivity and positive predictive value (PPV). METHOD: We describe a 'duplicate method' to calculate the PPV of record linkage when each record can only be involved in one match (e.g. linking population files to death files). The method does not require a validation subset of records from both files with detailed personal information (e.g. name and address), and is therefore ideal for linkage projects using anonymous data. The duplicate method assumes that the number of records from one file with zero, one, two, etc., links from the other file is distributed in a manner predicted by combinatorial probabilities. Having made this assumption, the number of false positive links, and hence the PPV, are estimable. We demonstrate this duplicate method using output from anonymous and probabilistic record linkage of census and mortality records in New Zealand. RESULTS: The PPV estimates conform to the pattern expected based on the underlying theory of probabilistic record linkage, and were robust to sensitivity analyses. We encourage other researchers to further assess the accuracy of this method.
Authors: Vivienne J Zhu; Marc J Overhage; James Egg; Stephen M Downs; Shaun J Grannis Journal: J Am Med Inform Assoc Date: 2009-06-30 Impact factor: 4.497
Authors: Michael Pine; Niranjana M Kowlessar; Jason L Salemi; Jill Miyamura; David S Zingmond; Nicole E Katz; Joe Schindler Journal: Health Serv Res Date: 2015-06-26 Impact factor: 3.402
Authors: Susan T Paulukonis; James R Eckman; Angela B Snyder; Ward Hagar; Lisa B Feuchtbaum; Mei Zhou; Althea M Grant; Mary M Hulihan Journal: Public Health Rep Date: 2016 Mar-Apr Impact factor: 2.792