Literature DB >> 17323657

Probabilistic master lists: integration of patient records from different databases when unique patient identifier is missing.

Farrokh Alemi1, Francisco Loaiza, Jee Vang.   

Abstract

We show how Bayesian probability models can be used to integrate two databases, one of which does not have a key for uniquely identifying clients (e.g., social security number or medical record number). The analyst selects a set of imperfect identifiers (last visit diagnosis, first name, etc.). The algorithm assesses the likelihood ratio associated with the identifier from the database of known cases. It estimates the probability that two records belong to the same client from the likelihood ratios. As it proceeds in examining various identifiers, it accounts for inter-dependencies among them by allowing overlapping and redundant identifiers to be used. We test that the procedure is effective by examining data from the Medical Expenditure Panel Survey (MEPS) Population Characteristics data set, a publicly available data set. We randomly selected 1,000 cases for training data set--these constituted the known cases. The algorithm was used to identify if 100 cases not in the training data set would be misclassified in terms of being a case in the training set or a new case. With 12 fields as identifiers, all 100 cases were correctly classified as new cases. We also selected 100 known cases from the training set and asked the algorithm to classify these cases. Again, all 100 cases were correctly classified. Less accurate results were obtained when the training data set was too small (e.g., less than 100 records) or the number of fields used as identifiers was too small (e.g., less than seven fields). In a test of performance of the algorithm, when the ratio of testing to training data set exceeds 4 to 1, the accuracy of the algorithm exceeded 90% of cases. As the ratio increases, the accuracy of algorithm improves further. These data suggest the accuracy of our automated and mathematical procedure to merge data from two different data sets without the presence of a unique identifier. The algorithm uses imperfect and overlapping clues to re-identify cases from information not typically considered to be a patient identifier.

Entities:  

Mesh:

Year:  2007        PMID: 17323657     DOI: 10.1007/s10729-006-9002-7

Source DB:  PubMed          Journal:  Health Care Manag Sci        ISSN: 1386-9620


  24 in total

Review 1.  XML, bioinformatics and data integration.

Authors:  F Achard; G Vaysseix; E Barillot
Journal:  Bioinformatics       Date:  2001-02       Impact factor: 6.937

2.  Integrating medical information and knowledge in the HL7 RIM.

Authors:  G Schadow; D C Russler; C N Mead; C J McDonald
Journal:  Proc AMIA Symp       Date:  2000

3.  Diagnosis. I. Symptom nonindependence in mathematical models for diagnosis.

Authors:  M J Norusis; J A Jacquez
Journal:  Comput Biomed Res       Date:  1975-04

4.  An empirical comparison of record linkage procedures.

Authors:  Shanti Gomatam; Randy Carter; Mario Ariet; Glenn Mitchell
Journal:  Stat Med       Date:  2002-05-30       Impact factor: 2.373

Review 5.  Design strategies and innovations in the medical expenditure panel survey.

Authors:  Steven B Cohen
Journal:  Med Care       Date:  2003-07       Impact factor: 2.983

6.  Automatic linkage of vital records.

Authors:  H B NEWCOMBE; J M KENNEDY; S J AXFORD; A P JAMES
Journal:  Science       Date:  1959-10-16       Impact factor: 47.728

Review 7.  Disparate systems, disparate data: integration, interfaces, and standards in emergency medicine information technology.

Authors:  Edward N Barthell; Kevin Coonan; John Finnell; Dan Pollock; Dennis Cochrane
Journal:  Acad Emerg Med       Date:  2004-11       Impact factor: 3.451

8.  Standards for medical identifiers, codes, and messages needed to create an efficient computer-stored medical record. American Medical Informatics Association.

Authors: 
Journal:  J Am Med Inform Assoc       Date:  1994 Jan-Feb       Impact factor: 4.497

9.  Bayesian diagnostic probabilities without assuming independence of symptoms.

Authors:  A Gammerman; A R Thatcher
Journal:  Methods Inf Med       Date:  1991       Impact factor: 2.176

10.  An evaluation of factors influencing Bayesian learning systems.

Authors:  E L Eisenstein; F Alemi
Journal:  J Am Med Inform Assoc       Date:  1994 May-Jun       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.