| Literature DB >> 32972036 |
Jana Asher1, Dean Resnick2, Jennifer Brite3, Robert Brackbill3, James Cone3.
Abstract
Since its post-World War II inception, the science of record linkage has grown exponentially and is used across industrial, governmental, and academic agencies. The academic fields that rely on record linkage are diverse, ranging from history to public health to demography. In this paper, we introduce the different types of data linkage and give a historical context to their development. We then introduce the three types of underlying models for probabilistic record linkage: Fellegi-Sunter-based methods, machine learning methods, and Bayesian methods. Practical considerations, such as data standardization and privacy concerns, are then discussed. Finally, recommendations are given for organizations developing or maintaining record linkage programs, with an emphasis on organizations measuring long-term complications of disasters, such as 9/11.Entities:
Keywords: 9/11 health; data matching; disaster epidemiology; epidemiology; interagency cooperation; probabilistic record linkage; record linkage
Mesh:
Year: 2020 PMID: 32972036 PMCID: PMC7558187 DOI: 10.3390/ijerph17186937
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Publications related to record linkage on Google Scholar, by year of publication. Data obtained 9 August 2019 for years prior to 2019; data obtained 31 December 2019 for 2019.
Figure 2Ranking of data combination techniques from data based (deterministic) to model based (probabilistic).
Pair of records compared during a probabilistic record linkage process. In this case, overall match weight, based on three matched fields and two non-matched fields, is 17.20.
| Field 1: First Name | Field 2: Last Name | Field 3: Date of Birth | Field 4: Address | Field 5: Gender |
|---|---|---|---|---|
| Jana | Asher | 10/17/1970 | 603 Brook Court | F |
| Jane | Asher | 10/17/1970 | 1111 Jackson Ave | F |
| log2((1 − | log2( | log2( | log2((1 − | log2( |
m is the probability the fields agree given they represent the same entity; u is the probability the fields agree given they do not represent the same entity.