| Literature DB >> 16219102 |
Erik A Sauleau1, Jean-Philippe Paumier, Antoine Buemi.
Abstract
BACKGROUND: Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity.Entities:
Mesh:
Year: 2005 PMID: 16219102 PMCID: PMC1274322 DOI: 10.1186/1472-6947-5-32
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1Example of a complete graph (left panel) and of an incomplete graph (right panel).
Figure 3Summary of the entire linkage procedure.
Number of exact concordance in global similarity (gS) and atomic similarities (aS) in the 38,083 identified pairs in a database of 300,859 patients
| Atomic similarities | ||||||
| gS | BN* | MN* | BN/MN* | First name | Date of birth | |
| Pairs | 38,083 | 38,083 | 38,083 | 76,166 ** | 38,083 | 38,083 |
| Missing | 0 | 0 | 31,747 | 57,088 | 0 | 0 |
| Values at 1 | 9,566 | 25,990 | 5,348 | 3,065 | 26,017 | 26,505 |
| in % | 25.1 | 68.2 | 14.0 | 4.0 | 68.3 | 69.6 |
* BN = birth name, MN = married name
** Theoretical number of comparisons between BN and MN
Characteristics of global similarity (gS) and atomic similarities (aS) in the identified pairs without exact concordance
| Atomic similarities | ||||||
| gS | BN* | MN* | BN/MN* | First name | Date of birth | |
| Pairs | 28,517 | 12,093 | 988 | 16,013 | 12,066 | 11,578 |
| Mean | 0.92 | 0.79 | 0.78 | 0.36 | 0.77 | 0.82 |
| Stand. error | 0.48 | 0.20 | 0.23 | 0.25 | 0.17 | 0.07 |
| Minimum | 0.85 | 0.00 | 0.00 | 0.00 | 0.00 | 0.65 |
| Percentiles | ||||||
| 25th | 0.87 | 0.76 | 0.62 | 0.22 | 0.63 | 0.77 |
| 50th | 0.92 | 0.88 | 0.90 | 0.41 | 0.82 | 0.85 |
| 75th | 0.97 | 0.92 | 0.94 | 0.52 | 0.92 | 0.88 |
| 90th | 0.99 | 0.95 | 0.96 | 0.63 | 0.95 | 0.88 |
| 95th | 0.99 | 0.96 | 0.96 | 0.75 | 0.96 | 0.88 |
* BN = birth name, MN = married name
Figure 2Percentage of true positive couples by global similarity threshold value.
Weighting procedure of the atomic similarities (Appendix)
| Weights | ||||||
| Case 1 | ||||||
| If | 2/4 | - | - | - | 1/4 | 1/4 |
| Else | 2/6 | - | - | - | 1/6 | 3/6 |
| Case 2 | ||||||
| If | 2/4 | - | - | - | 1/4 | 1/4 |
| Else | 2/6 | - | - | - | 1/6 | 3/6 |
| 1/6 | - | 1/6 | - | 1/6 | 3/6 | |
| Case 3 | ||||||
| Case 4 | ||||||
| If | 2/7 | 1/7 | - | - | 1/7 | 3/7 |
| If | 1/7 | 1/7 | - | 1/7 | 1/7 | 3/7 |
| If | 1/7 | 1/7 | 1/7 | - | 1/7 | 3/7 |
| If | 1/8 | 1/8 | 1/8 | 1/8 | 1/8 | 3/8 |