Literature DB >> 20841858

The impact of a growing minority population on identification of duplicate records in an enterprise data warehouse.

Scott L Duvall1, Alison M Fraser, Richard A Kerber, Geraldine P Mineau, Alun Thomas.   

Abstract

Patient medical records are often fragmented across disparate healthcare databases, potentially resulting in duplicate records that may be detrimental to health care services. These duplicate records can be found through a process called record linkage. This paper describes a set of duplicate records in a medical data warehouse found by linking to an external resource containing family history and vital records. Our objective was to investigate the impact database characteristics and linkage methods have on identifying duplicate records using an external resource. Frequency counts were made for demographic field values and compared between the set of duplicate records, the data warehouse, and the external resource. Considerations for understanding the relationship that records labeled as duplicates have with dataset characteristics and linkage methods were identified. Several noticeable patterns were identified where frequency counts between sets deviated from what was expected including how the growth of a minority population affected which records were identified as duplicates. Record linkage is a complex process where results can be affected by subtleties in data characteristics, changes in data trends, and reliance on external data sources. These changes should be taken into account to ensure any anomalies in results describe real effects and are not artifacts caused by datasets or linkage methods. This paper describes how frequency count analysis can be an effective way to detect and resolve anomalies in linkage results and how external resources that provide additional contextual information can prove useful in discovering duplicate records.

Entities:  

Mesh:

Year:  2010        PMID: 20841858

Source DB:  PubMed          Journal:  Stud Health Technol Inform        ISSN: 0926-9630


  7 in total

1.  A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.

Authors:  Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Dean F Sittig; Elmer V Bernstam
Journal:  J Am Med Inform Assoc       Date:  2013-05-23       Impact factor: 4.497

2.  Evaluation of record linkage between a large healthcare provider and the Utah Population Database.

Authors:  Scott L DuVall; Alison M Fraser; Kerry Rowe; Alun Thomas; Geraldine P Mineau
Journal:  J Am Med Inform Assoc       Date:  2011-09-16       Impact factor: 4.497

3.  Mining electronic health records: an additional perspective.

Authors:  John F Hurdle; Ken R Smith; Geraldine P Mineau
Journal:  Nat Rev Genet       Date:  2013-01       Impact factor: 53.242

4.  Identifying clinical/translational research cohorts: ascertainment via querying an integrated multi-source database.

Authors:  John F Hurdle; Stephen C Haroldsen; Andrew Hammer; Cindy Spigle; Alison M Fraser; Geraldine P Mineau; Samir J Courdy
Journal:  J Am Med Inform Assoc       Date:  2012-10-11       Impact factor: 4.497

5.  Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning.

Authors:  Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Elmer V Bernstam
Journal:  AMIA Annu Symp Proc       Date:  2013-11-16

6.  Duplicate patient records--implication for missed laboratory results.

Authors:  Erel Joffe; Charles F Bearden; Michael J Byrne; Elmer V Bernstam
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

7.  Clinical use of an enterprise data warehouse.

Authors:  R Scott Evans; James F Lloyd; Lee A Pierce
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.