Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Missing values in deduplication of electronic patient data.

Literature DB >> 22003173

Missing values in deduplication of electronic patient data.

Abstract

INTRODUCTION: Systematic approaches to dealing with missing values in record linkage are still lacking. This article compares the ad-hoc treatment of unknown comparison values as 'unequal' with other and more sophisticated approaches. An empirical evaluation was conducted of the methods on real-world data as well as on simulated data based on them.
MATERIAL AND METHODS: Cancer registry data and artificial data with increased numbers of missing values in a relevant variable are used for empirical comparisons. As a classification method, classification and regression trees were used. On the resulting binary comparison patterns, the following strategies for dealing with missingness are considered: imputation with unique values, sample-based imputation, reduced-model classification and complete-case induction. These approaches are evaluated according to the number of training data needed for induction and the F-scores achieved.
RESULTS: The evaluations reveal that unique value imputation leads to the best results. Imputation with zero is preferred to imputation with 0.5, although the latter shows the highest median F-scores. Imputation with zero needs considerably less training data, it shows only slightly worse results and simplifies the computation by maintaining the binary structure of the data.
CONCLUSIONS: The results support the ad-hoc solution for missing values 'replace NA by the value of inequality'. This conclusion is based on a limited amount of data and on a specific deduplication method. Nevertheless, the authors are confident that their results should be confirmed by other empirical analyses and applications.

Entities: Species

Mesh：

Year: 2011 PMID： 22003173 PMCID： PMC3392851 DOI： 10.1136/amiajnl-2011-000461

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

7 in total

1. An empirical comparison of record linkage procedures.

Authors: Shanti Gomatam; Randy Carter; Mario Ariet; Glenn Mitchell
Journal: Stat Med Date: 2002-05-30 Impact factor: 2.373

2. Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage.

Authors: Miranda Tromp; Anita C Ravelli; Gouke J Bonsel; Arie Hasman; Johannes B Reitsma
Journal: J Clin Epidemiol Date: 2010-10-16 Impact factor: 6.437

3. Evaluation of record linkage methods for iterative insertions.

Authors: Murat Sariyar; A Borg; K Pommerening
Journal: Methods Inf Med Date: 2009-08-20 Impact factor: 2.176

4. Evaluation of the effect of breast cancer screening by record linkage with the cancer registry, The Netherlands.

Authors: L J Schouten; J M de Rijke; J T Schlangen; A L Verbeek
Journal: J Med Screen Date: 1998 Impact factor: 2.136

5. Quartiles, quintiles, centiles, and other quantiles.

Authors: D G Altman; J M Bland
Journal: BMJ Date: 1994-10-15

6. The art and science of record linkage: methods that work with few identifiers.

Authors: L L Roos; A Wajda; J P Nicol
Journal: Comput Biol Med Date: 1986 Impact factor: 4.589

Review 7. Evaluation of data quality in the cancer registry: principles and methods Part II. Completeness.

Authors: D Max Parkin; Freddie Bray
Journal: Eur J Cancer Date: 2009-01-06 Impact factor: 9.162

7 in total

4 in total

1. A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.

Authors: Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Dean F Sittig; Elmer V Bernstam
Journal: J Am Med Inform Assoc Date: 2013-05-23 Impact factor: 4.497

2. Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning.

Authors: Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Elmer V Bernstam
Journal: AMIA Annu Symp Proc Date: 2013-11-16

3. A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage.

Authors: Huiping Xu; Xiaochun Li; Shaun Grannis
Journal: J Appl Stat Date: 2021-05-04 Impact factor: 1.416

4. Clinical research informatics: a conceptual perspective.

Authors: Michael G Kahn; Chunhua Weng
Journal: J Am Med Inform Assoc Date: 2012-04-20 Impact factor: 4.497

4 in total