| Literature DB >> 27716178 |
Maxence Guesdon1,2, Eric Benzenine1, Kamel Gadouche3, Catherine Quantin4,5,6.
Abstract
Administrative records in France, especially medical and social records, have huge potential for statistical studies. The NIR (a national identifier) is widely used in medico-social administrations, and this would theoretically provide considerable scope for data matching, on condition that the legislation on such matters was respected.The law, however, forbids the processing of non-anonymized medical data, thus making it difficult to carry out studies that require several sources of social and medical data.We would like to benefit from computer techniques introduced since the 70 s to provide safe linkage of anonymized files, to release the current constraints of such procedures.We propose an organization and a data workflow, based on hashing and cyrptographic techniques, to strongly compartmentalize identifying and not-identifying data.The proposed method offers a strong control over who is in possession of which information, using different hashing keys for each linkage. This allows to prevent unauthorized linkage of data, to protect anonymity, by preventing cumulation of not-identifying data which can become identifying data when linked.Our proposal would make it possible to conduct such studies more easily, more regularly and more precisely while preserving a high enough level of anonymity.The main obstacle to setting up such a system, in our opinion, is not technical, but rather organizational in that it is based on the existence of a Key-Management Authority.Entities:
Keywords: Data linkage; Patient data privacy; Population statistics
Mesh:
Year: 2016 PMID: 27716178 PMCID: PMC5053094 DOI: 10.1186/s12911-016-0366-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Duplicates and collisions
| Same individual | Different individuals | |
|---|---|---|
| Same name | True positive | False positive = collision |
| Different names | False negative = duplicates | True negative |
Fig. 1Linkage decision according to compound weight
Results of the unit weight calculation for each of the 3 fields used for linkage
| Family Name | First name | Date of birth | |
|---|---|---|---|
| Weight is equal (1) | 8.4 | 5.7 | 10.3 |
| Weight is different (0) | -2.8 | -3.5 | -3.1 |
Computation of the compound weight according to the configuration of equalities and differences
| Family name | First name | DoB | Compound weight | |
|---|---|---|---|---|
| Without disagreement (111) | +8.4 | +5.7 | +10.3 | +24.4 |
| Dis. on the family name (011) | -2.8 | +5.7 | +10.3 | +13.2 |
| Disagreement on the DoB (110) | +8.4 | +5.7 | -3.1 | +11 |
| Disagreement in all fields (000) | -2.8 | -3.5 | -3.1 | -9.4 |
Example of a computation of compound weights for two records
| Family name | First name | Date of birth | ||
|---|---|---|---|---|
| Dupont | François | 29/01/1940 | ||
| Dupont | François | 29/03/1940 | ||
| Weight | +8.4 | +5.7 | -3.1 | = 11 |
Thresholds according to the compound weight
| Agreement | |||||||
|---|---|---|---|---|---|---|---|
| Fam. name | First name | DoB | Frequency | Thresholds | Weight |
|
|
| 0 | 0 | 0 | 1 452 966 248 | -9.4 | 6e-08 | 99.99 | |
| 0 | 1 | 0 | 4 880 218 | -0.2 | 5e-04 | 99.99 | |
| 1 | 0 | 0 | 304 887 | 1.8 | 4e-03 | 99.99 | |
| 0 | 0 | 1 | 46 081 | 1.4 | 0.04 | 99.96 | |
| 1 | 1 | 0 | 1 438 | “unmatched” threshold | 11 | 28.79 | 71.21 |
| 0 | 1 | 1 | 725 | 13.2 | 78.66 | 21.34 | |
| 1 | 0 | 1 | 291 | “matched” threshold | 15.2 | 96.68 | 3.32 |
| 1 | 1 | 1 | 8 852 | 24.4 | 99.99 | 4e-04 |
P(m) : Probability that the 2 records of the pair correspond to the same individual
G(u) : Probability that the 2 records correspond to 2 different individuals
Example of calculation of compound weights for two anonymized records
| Family name | First name | Date of birth | ||
|---|---|---|---|---|
| fe1fb20e56bd... | 5b7808252fec... | aeed71d1dc67... | ||
| fe1fb20e56bd... | 5b7808252fec... | 9b1549d98eab... | ||
| Weight | +8.4 | +5.7 | -3.1 | = 11 |
Fig. 2Proposed organization for secure matching