Literature DB >> 22003173

Missing values in deduplication of electronic patient data.

M Sariyar1, A Borg, K Pommerening.   

Abstract

INTRODUCTION: Systematic approaches to dealing with missing values in record linkage are still lacking. This article compares the ad-hoc treatment of unknown comparison values as 'unequal' with other and more sophisticated approaches. An empirical evaluation was conducted of the methods on real-world data as well as on simulated data based on them.
MATERIAL AND METHODS: Cancer registry data and artificial data with increased numbers of missing values in a relevant variable are used for empirical comparisons. As a classification method, classification and regression trees were used. On the resulting binary comparison patterns, the following strategies for dealing with missingness are considered: imputation with unique values, sample-based imputation, reduced-model classification and complete-case induction. These approaches are evaluated according to the number of training data needed for induction and the F-scores achieved.
RESULTS: The evaluations reveal that unique value imputation leads to the best results. Imputation with zero is preferred to imputation with 0.5, although the latter shows the highest median F-scores. Imputation with zero needs considerably less training data, it shows only slightly worse results and simplifies the computation by maintaining the binary structure of the data.
CONCLUSIONS: The results support the ad-hoc solution for missing values 'replace NA by the value of inequality'. This conclusion is based on a limited amount of data and on a specific deduplication method. Nevertheless, the authors are confident that their results should be confirmed by other empirical analyses and applications.

Entities:  

Mesh:

Year:  2011        PMID: 22003173      PMCID: PMC3392851          DOI: 10.1136/amiajnl-2011-000461

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  7 in total

1.  An empirical comparison of record linkage procedures.

Authors:  Shanti Gomatam; Randy Carter; Mario Ariet; Glenn Mitchell
Journal:  Stat Med       Date:  2002-05-30       Impact factor: 2.373

2.  Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage.

Authors:  Miranda Tromp; Anita C Ravelli; Gouke J Bonsel; Arie Hasman; Johannes B Reitsma
Journal:  J Clin Epidemiol       Date:  2010-10-16       Impact factor: 6.437

3.  Evaluation of record linkage methods for iterative insertions.

Authors:  Murat Sariyar; A Borg; K Pommerening
Journal:  Methods Inf Med       Date:  2009-08-20       Impact factor: 2.176

4.  Evaluation of the effect of breast cancer screening by record linkage with the cancer registry, The Netherlands.

Authors:  L J Schouten; J M de Rijke; J T Schlangen; A L Verbeek
Journal:  J Med Screen       Date:  1998       Impact factor: 2.136

5.  Quartiles, quintiles, centiles, and other quantiles.

Authors:  D G Altman; J M Bland
Journal:  BMJ       Date:  1994-10-15

6.  The art and science of record linkage: methods that work with few identifiers.

Authors:  L L Roos; A Wajda; J P Nicol
Journal:  Comput Biol Med       Date:  1986       Impact factor: 4.589

Review 7.  Evaluation of data quality in the cancer registry: principles and methods Part II. Completeness.

Authors:  D Max Parkin; Freddie Bray
Journal:  Eur J Cancer       Date:  2009-01-06       Impact factor: 9.162

  7 in total
  4 in total

1.  A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.

Authors:  Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Dean F Sittig; Elmer V Bernstam
Journal:  J Am Med Inform Assoc       Date:  2013-05-23       Impact factor: 4.497

2.  Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning.

Authors:  Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Elmer V Bernstam
Journal:  AMIA Annu Symp Proc       Date:  2013-11-16

3.  A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.

Authors:  Huiping Xu; Xiaochun Li; Shaun Grannis
Journal:  J Appl Stat       Date:  2021-05-04       Impact factor: 1.416

4.  Clinical research informatics: a conceptual perspective.

Authors:  Michael G Kahn; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2012-04-20       Impact factor: 4.497

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.