Literature DB >> 24551372

Optimized dual threshold entity resolution for electronic health record databases--training set size and active learning.

Erel Joffe1, Michael J Byrne1, Phillip Reeder1, Jorge R Herskovic2, Craig W Johnson1, Allison B McCoy3, Elmer V Bernstam4.   

Abstract

Clinical databases may contain several records for a single patient. Multiple general entity-resolution algorithms have been developed to identify such duplicate records. To achieve optimal accuracy, algorithm parameters must be tuned to a particular dataset. The purpose of this study was to determine the required training set size for probabilistic, deterministic and Fuzzy Inference Engine (FIE) algorithms with parameters optimized using the particle swarm approach. Each algorithm classified potential duplicates into: definite match, non-match and indeterminate (i.e., requires manual review). Training sets size ranged from 2,000-10,000 randomly selected record-pairs. We also evaluated marginal uncertainty sampling for active learning. Optimization reduced manual review size (Deterministic 11.6% vs. 2.5%; FIE 49.6% vs. 1.9%; and Probabilistic 10.5% vs. 3.5%). FIE classified 98.1% of the records correctly (precision=1.0). Best performance required training on all 10,000 randomly-selected record-pairs. Active learning achieved comparable results with 3,000 records. Automated optimization is effective and targeted sampling can reduce the required training set size.

Entities:  

Mesh:

Year:  2013        PMID: 24551372      PMCID: PMC3900213     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  12 in total

1.  Exploring the utility of demographic data and vaccination history data in the deduplication of immunization registry patient records.

Authors:  P L Miller; S J Frawley; F G Sayward
Journal:  J Biomed Inform       Date:  2001-02       Impact factor: 6.317

2.  Analysis of a probabilistic record linkage technique without human review.

Authors:  Shaun J Grannis; J Marc Overhage; Siu Hui; Clement J McDonald
Journal:  AMIA Annu Symp Proc       Date:  2003

3.  Active learning strategies for the deduplication of electronic patient data using classification trees.

Authors:  M Sariyar; A Borg; K Pommerening
Journal:  J Biomed Inform       Date:  2012-02-28       Impact factor: 6.317

4.  Missing values in deduplication of electronic patient data.

Authors:  M Sariyar; A Borg; K Pommerening
Journal:  J Am Med Inform Assoc       Date:  2011-10-15       Impact factor: 4.497

5.  The impact of a growing minority population on identification of duplicate records in an enterprise data warehouse.

Authors:  Scott L Duvall; Alison M Fraser; Richard A Kerber; Geraldine P Mineau; Alun Thomas
Journal:  Stud Health Technol Inform       Date:  2010

6.  Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a 'basic' deterministic algorithm.

Authors:  Kevin M Campbell; Dennis Deck; Antoinette Krupski
Journal:  Health Informatics J       Date:  2008-03       Impact factor: 2.681

7.  A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation.

Authors:  Erel Joffe; Michael J Byrne; Phillip Reeder; Jorge R Herskovic; Craig W Johnson; Allison B McCoy; Dean F Sittig; Elmer V Bernstam
Journal:  J Am Med Inform Assoc       Date:  2013-05-23       Impact factor: 4.497

8.  An empiric modification to the probabilistic record linkage algorithm using frequency-based weight scaling.

Authors:  Vivienne J Zhu; Marc J Overhage; James Egg; Stephen M Downs; Shaun J Grannis
Journal:  J Am Med Inform Assoc       Date:  2009-06-30       Impact factor: 4.497

9.  Evaluation of record linkage between a large healthcare provider and the Utah Population Database.

Authors:  Scott L DuVall; Alison M Fraser; Kerry Rowe; Alun Thomas; Geraldine P Mineau
Journal:  J Am Med Inform Assoc       Date:  2011-09-16       Impact factor: 4.497

10.  Matching identifiers in electronic health records: implications for duplicate records and patient safety.

Authors:  Allison B McCoy; Adam Wright; Michael G Kahn; Jason S Shapiro; Elmer Victor Bernstam; Dean F Sittig
Journal:  BMJ Qual Saf       Date:  2013-01-29       Impact factor: 7.035

View more
  5 in total

1.  Frequency and Consequences of Cervical Lymph Node Overstaging in Head and Neck Carcinoma.

Authors:  Volker Hans Schartinger; Daniel Dejaco; Natalie Fischer; Anna Lettenbichler-Haug; Maria Anegg; Matthias Santer; Joachim Schmutzhard; Barbara Kofler; Samuel Vorbach; Gerlig Widmann; Herbert Riechelmann
Journal:  Diagnostics (Basel)       Date:  2022-06-02

2.  Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

Authors:  Qingyu Chen; Justin Zobel; Xiuzhen Zhang; Karin Verspoor
Journal:  PLoS One       Date:  2016-08-04       Impact factor: 3.240

3.  Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study.

Authors:  Qingyu Chen; Justin Zobel; Karin Verspoor
Journal:  Database (Oxford)       Date:  2017-01-10       Impact factor: 3.451

Review 4.  Blockchain Technology for Healthcare: Facilitating the Transition to Patient-Driven Interoperability.

Authors:  William J Gordon; Christian Catalini
Journal:  Comput Struct Biotechnol J       Date:  2018-06-30       Impact factor: 7.271

5.  Embracing the Sparse, Noisy, and Interrelated Aspects of Patient Demographics for use in Clinical Medical Record Linkage.

Authors:  Stephen M Ash; King Ip-Lin
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2015-03-25
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.