Literature DB >> 24524889

Improving record linkage performance in the presence of missing linkage data.

Toan C Ong1, Michael V Mannino2, Lisa M Schilling3, Michael G Kahn4.   

Abstract

INTRODUCTION: Existing record linkage methods do not handle missing linking field values in an efficient and effective manner. The objective of this study is to investigate three novel methods for improving the accuracy and efficiency of record linkage when record linkage fields have missing values.
METHODS: By extending the Fellegi-Sunter scoring implementations available in the open-source Fine-grained Record Linkage (FRIL) software system we developed three novel methods to solve the missing data problem in record linkage, which we refer to as: Weight Redistribution, Distance Imputation, and Linkage Expansion. Weight Redistribution removes fields with missing data from the set of quasi-identifiers and redistributes the weight from the missing attribute based on relative proportions across the remaining available linkage fields. Distance Imputation imputes the distance between the missing data fields rather than imputing the missing data value. Linkage Expansion adds previously considered non-linkage fields to the linkage field set to compensate for the missing information in a linkage field. We tested the linkage methods using simulated data sets with varying field value corruption rates.
RESULTS: The methods developed had sensitivity ranging from .895 to .992 and positive predictive values (PPV) ranging from .865 to 1 in data sets with low corruption rates. Increased corruption rates lead to decreased sensitivity for all methods.
CONCLUSIONS: These new record linkage algorithms show promise in terms of accuracy and efficiency and may be valuable for combining large data sets at the patient level to support biomedical and clinical research.
Copyright © 2014 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Comparative effectiveness research; Data quality; Missing data; Quasi-identifiers; Record linkage

Mesh:

Year:  2014        PMID: 24524889     DOI: 10.1016/j.jbi.2014.01.016

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  6 in total

1.  Data quality assessment framework to assess electronic medical record data for use in research.

Authors:  Andrew P Reimer; Alex Milinovich; Elizabeth A Madigan
Journal:  Int J Med Inform       Date:  2016-03-24       Impact factor: 4.046

2.  A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.

Authors:  Huiping Xu; Xiaochun Li; Shaun Grannis
Journal:  J Appl Stat       Date:  2021-05-04       Impact factor: 1.416

3.  A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology.

Authors:  Toan C Ong; Lindsey M Duca; Michael G Kahn; Tessa L Crume
Journal:  J Am Med Inform Assoc       Date:  2020-04-01       Impact factor: 4.497

4.  Estimating parameters for probabilistic linkage of privacy-preserved datasets.

Authors:  Adrian P Brown; Sean M Randall; Anna M Ferrante; James B Semmens; James H Boyd
Journal:  BMC Med Res Methodol       Date:  2017-07-10       Impact factor: 4.615

5.  CIDACS-RL: a novel indexing search and scoring-based record linkage system for huge datasets with high accuracy and scalability.

Authors:  George C G Barbosa; M Sanni Ali; Bruno Araujo; Sandra Reis; Samila Sena; Maria Y T Ichihara; Julia Pescarini; Rosemeire L Fiaccone; Leila D Amorim; Robespierre Pita; Marcos E Barreto; Liam Smeeth; Mauricio L Barreto
Journal:  BMC Med Inform Decis Mak       Date:  2020-11-09       Impact factor: 2.796

Review 6.  An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries.

Authors:  Jana Asher; Dean Resnick; Jennifer Brite; Robert Brackbill; James Cone
Journal:  Int J Environ Res Public Health       Date:  2020-09-22       Impact factor: 3.390

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.