Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 When to conduct probabilistic linkage vs. deterministic linkage? A simulation study.

Literature DB >> 26004791

When to conduct probabilistic linkage vs. deterministic linkage? A simulation study.

Ying Zhu¹, Yutaka Matsuyama², Yasuo Ohashi³, Soko Setoguchi⁴.

Abstract

INTRODUCTION: When unique identifiers are unavailable, successful record linkage depends greatly on data quality and types of variables available. While probabilistic linkage theoretically captures more true matches than deterministic linkage by allowing imperfection in identifiers, studies have shown inconclusive results likely due to variations in data quality, implementation of linkage methodology and validation method. The simulation study aimed to understand data characteristics that affect the performance of probabilistic vs. deterministic linkage.
METHODS: We created ninety-six scenarios that represent real-life situations using non-unique identifiers. We systematically introduced a range of discriminative power, rate of missing and error, and file size to increase linkage patterns and difficulties. We assessed the performance difference of linkage methods using standard validity measures and computation time.
RESULTS: Across scenarios, deterministic linkage showed advantage in PPV while probabilistic linkage showed advantage in sensitivity. Probabilistic linkage uniformly outperformed deterministic linkage as the former generated linkages with better trade-off between sensitivity and PPV regardless of data quality. However, with low rate of missing and error in data, deterministic linkage performed not significantly worse. The implementation of deterministic linkage in SAS took less than 1min, and probabilistic linkage took 2min to 2h depending on file size. DISCUSSION: Our simulation study demonstrated that the intrinsic rate of missing and error of linkage variables was key to choosing between linkage methods. In general, probabilistic linkage was a better choice, but for exceptionally good quality data (<5% error), deterministic linkage was a more resource efficient choice.

Keywords: Comparative validity; Deterministic linkage; Probabilistic linkage; Record linkage; Simulation study

Mesh：

Year: 2015 PMID： 26004791 DOI： 10.1016/j.jbi.2015.05.012

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

21 in total

1. Creating a Real-World Linked Research Platform for Analyzing the Urgent and Emergency Care System.

Authors: Suzanne Mason; Tony Stone; Richard Jacques; Jennifer Lewis; Rebecca Simpson; Maxine Kuczawski; Matthew Franklin
Journal: Med Decis Making Date: 2022-05-14 Impact factor: 2.749

2. Using Security Questions to Link Participants in Longitudinal Data Collection.

Authors: Shu Xu; Anthea Chan; Michael F Lorber; Justin P Chase
Journal: Prev Sci Date: 2020-02

3. Pulmonary embolism and mortality following total ankle replacement: a data linkage study using the NJR data set.

Authors: Razi Zaidi; Alexander MacGregor; Suzie Cro; Andy Goldberg
Journal: BMJ Open Date: 2016-06-21 Impact factor: 2.692

4. Data quality and 30-day survival for out-of-hospital cardiac arrest in the UK out-of-hospital cardiac arrest registry: a data linkage study.

Authors: Sangeerthana Rajagopal; Scott J Booth; Terry P Brown; Chen Ji; Claire Hawkes; A Niroshan Siriwardena; Kim Kirby; Sarah Black; Robert Spaight; Imogen Gunson; Samantha J Brace-McDonnell; Gavin D Perkins
Journal: BMJ Open Date: 2017-11-20 Impact factor: 2.692

5. GUILD: GUidance for Information about Linking Data sets.

Authors: Ruth Gilbert; Rosemary Lafferty; Gareth Hagger-Johnson; Katie Harron; Li-Chun Zhang; Peter Smith; Chris Dibben; Harvey Goldstein
Journal: J Public Health (Oxf) Date: 2018-03-01 Impact factor: 2.341

6. Examining the quality of record linkage process using nationwide Brazilian administrative databases to build a large birth cohort.

Authors: Daniela Almeida; David Gorender; Maria Yury Ichihara; Samila Sena; Luan Menezes; George C G Barbosa; Rosimeire L Fiaccone; Enny S Paixão; Robespierre Pita; Mauricio L Barreto
Journal: BMC Med Inform Decis Mak Date: 2020-07-25 Impact factor: 2.796

7. Privacy-Preserving Record Linkage of Deidentified Records Within a Public Health Surveillance System: Evaluation Study.

Authors: Long Nguyen; Mark Stoové; Douglas Boyle; Denton Callander; Hamish McManus; Jason Asselin; Rebecca Guy; Basil Donovan; Margaret Hellard; Carol El-Hayek
Journal: J Med Internet Res Date: 2020-06-24 Impact factor: 5.428

8. Probabilistic linkage to enhance deterministic algorithms and reduce data linkage errors in hospital administrative data.

Authors: Gareth Hagger-Johnson; Katie Harron; Harvey Goldstein; Robert Aldridge; Ruth Gilbert
Journal: J Innov Health Inform Date: 2017-06-30

9. Comparing record linkage software programs and algorithms using real-world data.

Authors: Alan F Karr; Matthew T Taylor; Suzanne L West; Soko Setoguchi; Tzuyung D Kou; Tobias Gerhard; Daniel B Horton
Journal: PLoS One Date: 2019-09-24 Impact factor: 3.240

10. Quality measures for total ankle replacement, 30-day readmission and reoperation rates within 1 year of surgery: a data linkage study using the NJR data set.

Authors: Razi Zaidi; Alexander J Macgregor; Andy Goldberg
Journal: BMJ Open Date: 2016-05-23 Impact factor: 2.692