Literature DB >> 35909667

A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.

Huiping Xu1, Xiaochun Li1, Shaun Grannis2.   

Abstract

The widely used Fellegi-Sunter model for probabilistic record linkage does not leverage information contained in field values and consequently leads to identical classification of match status regardless of whether records agree on rare or common values. Since agreement on rare values is less likely to occur by chance than agreement on common values, records agreeing on rare values are more likely to be matches. Existing frequency-based methods typically rely on knowledge of error probabilities associated with field values and frequencies of agreed field values among matches, often derived using prior studies or training data. When such information is unavailable, applications of these methods are challenging. In this paper, we propose a simple two-step procedure for frequency-based matching using the Fellegi-Sunter framework to overcome these challenges. Matching weights are adjusted based on frequency distributions of the agreed field values among matches and non-matches, estimated by the Fellegi-Sunter model without relying on prior studies or training data. Through a real-world application and simulation, our method is found to produce comparable or better performance than the unadjusted method. Furthermore, frequency-based matching provides greater improvement in matching accuracy when using poorly discriminating fields with diminished benefit as the discriminating power of matching fields increases.
© 2021 Informa UK Limited, trading as Taylor & Francis Group.

Entities:  

Keywords:  Fellegi–Sunter model; frequency-based matching; latent class analysis; probabilistic matching; record linkage

Year:  2021        PMID: 35909667      PMCID: PMC9336505          DOI: 10.1080/02664763.2021.1922615

Source DB:  PubMed          Journal:  J Appl Stat        ISSN: 0266-4763            Impact factor:   1.416


  6 in total

1.  Missing values in deduplication of electronic patient data.

Authors:  M Sariyar; A Borg; K Pommerening
Journal:  J Am Med Inform Assoc       Date:  2011-10-15       Impact factor: 4.497

2.  Ignoring dependency between linking variables and its impact on the outcome of probabilistic record linkage studies.

Authors:  Miranda Tromp; Nora Méray; Anita C J Ravelli; Johannes B Reitsma; Gouke J Bonsel
Journal:  J Am Med Inform Assoc       Date:  2008-06-25       Impact factor: 4.497

3.  An empiric modification to the probabilistic record linkage algorithm using frequency-based weight scaling.

Authors:  Vivienne J Zhu; Marc J Overhage; James Egg; Stephen M Downs; Shaun J Grannis
Journal:  J Am Med Inform Assoc       Date:  2009-06-30       Impact factor: 4.497

4.  Improving record linkage performance in the presence of missing linkage data.

Authors:  Toan C Ong; Michael V Mannino; Lisa M Schilling; Michael G Kahn
Journal:  J Biomed Inform       Date:  2014-02-10       Impact factor: 6.317

5.  Evaluating latent class models with conditional dependence in record linkage.

Authors:  Joanne Daggy; Huiping Xu; Siu Hui; Shaun Grannis
Journal:  Stat Med       Date:  2014-06-17       Impact factor: 2.373

6.  A practical approach for incorporating dependence among fields in probabilistic record linkage.

Authors:  Joanne K Daggy; Huiping Xu; Siu L Hui; Roland E Gamache; Shaun J Grannis
Journal:  BMC Med Inform Decis Mak       Date:  2013-08-30       Impact factor: 2.796

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.