| Literature DB >> 35568993 |
Shaun J Grannis1,2, Jennifer L Williams2, Suranga Kasthuri2,3, Molly Murray4, Huiping Xu5.
Abstract
OBJECTIVE: This study sought both to support evidence-based patient identity policy development by illustrating an approach for formally evaluating operational matching methods, and also to characterize the performance of both referential and probabilistic patient matching algorithms using real-world demographic data.Entities:
Keywords: health IT policy; identity management; patient identification; patient matching; record linkage
Mesh:
Year: 2022 PMID: 35568993 PMCID: PMC9277641 DOI: 10.1093/jamia/ocac068
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.Overview of match performance evaluation. Forty-seven million HIE registration records were used to create 324 million record pairs using 5 blocking combinations. The 5 blocking schemes were: SSN, FN+TEL, DB+MB+YB+ZIP, FN-LN-YB, and DB+MB+YB+ZIP. Blocking schemes produced 53, 41.7, 133.5, 193.9, and 191.2 M record-pairs, respectively. (FN: first name, LN: last name, TEL: phone number, ZIP: zip code, MB, DB, and YB: birth month, day, and year, respectively.) The 47 million records were also evaluated by both referential and probabilistic algorithms to identify matches. In- and out-of-block record pair samples were reviewed to establish a combined reference dataset. Based on this dataset match performance metrics were calculated. A sensitivity analysis was conducted to assess match performance under more conservative matching rules.
Figure 2.Comparison of referential and probabilistic matches relative to HIE candidate pairs formed by 5 blocking schemes. (A) Probabilistic matches. Comparison of probabilistic matches relative to HIE pairs. The probabilistic method identified 1.2 M matches outside of the HIE pairs formed by 5 blocking schemes. (B) Referential matches. Comparison of referential matches relative to HIE pairs. The referential method identified 6.5 M matches outside of the HIE pairs formed by 5 blocking schemes. Fifteen thousand pairs were randomly sampled from the 324 million in-block pairs for match analysis. An additional 15 000 pairs were sampled from the 6.5 million out-of-block matches. (C) Out-of-block probabilistic and referential matches. Comparison of out-of-block matches for probabilistic and referential methods. A total of 1280 probabilistic matches were identified by the probabilistic method only, and referential method identified 5.3 M more out–of-block matches than probabilistic.
Results from manual review of potentially matched records
| Probabilistic | Referential | Total frequency | Manual review result | |
|---|---|---|---|---|
| Nonmatch | Match | |||
| Within the blocking schemes | ||||
| Nonmatch | Nonmatch | 7855 | 7351 | 504 |
| Nonmatch | Match | 2134 | 0 | 2134 |
| Match | Nonmatch | 2 | 0 | 2 |
| Match | Match | 5009 | 2 | 5007 |
| Outside of the blocking schemes | ||||
| Nonmatch | Nonmatch | 5972 | 3188 | 2784 |
| Nonmatch | Match | 6401 | 11 | 6390 |
| Match | Nonmatch | 1280 | 600 | 680 |
| Match | Match | 1347 | 14 | 1323 |
Estimated matching performance metrics based on manual review data within and outside the blocking schemes
| Probabilistic | Referential | Difference |
| |
|---|---|---|---|---|
| Sensitivity | 0.6366 (0.6264, 0.6468) | 0.9351 (0.9297, 0.9404) | 0.2985 (0.2888, 0.3082) | <0.001 |
| PPV | 0.9995 (0.9990, 1.0000) | 0.9996 (0.9993, 1.0000) | 0.0001 (−.0001, 0.0003) | 0.34 |
| F-score | 0.7778 (0.7702, 0.7855) | 0.9663 (0.9634, 0.9691) | 0.1885 (0.1814, 0.1956) | <0.001 |
Estimated matching performance metrics in the sensitivity analysis
| Probabilistic | Referential | Difference |
| |
|---|---|---|---|---|
| Sensitivity | 0.6519 (0.6417, 0.6622) | 0.9578 (0.9533, 0.9623) | 0.3059 (0.2960, 0.3157) | <0.001 |
| PPV | 0.9993 (0.9987, 1.0000) | 0.9996 (0.9993, 1.0000) | 0.0003 (−0.0001, 0.0007) | 0.19 |
| F-score | 0.7891 (0.7816, 0.7966) | 0.9783 (0.9759, 0.9806) | 0.1892 (0.1820, 0.1963) | <0.001 |