| Literature DB >> 15222890 |
Tim Churches1, Peter Christen.
Abstract
BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects - a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors.Entities:
Mesh:
Year: 2004 PMID: 15222890 PMCID: PMC471556 DOI: 10.1186/1472-6947-4-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Example encrypted tuple for value "peter" created by Alice. For illustrative purposes the record key as well as the bigram subset values are shown unencrypted. The tuple matching with Bob's value "pete" (with highest bigram score) is shown bold-faced (see also Table 2).
| 10 | ('er') | 0a3be282870998fe7332ae0fecff68cc0d370152 | 1 | 4 |
| 10 | ('et') | 8898f53d6225f464bb2640779cb17b9378237149 | 1 | 4 |
| 10 | ('pe') | 6fc83a87ee04335a58aa576cb5157625b1b5c51b | 1 | 4 |
| 10 | ('te') | f2bcfb3d76d7fc010e3adc08663090f29c5e928a | 1 | 4 |
| 10 | ('er', 'et') | f86abb0c84889d004b817e86199b3837708d70e9 | 2 | 4 |
| 10 | ('er', 'pe') | df99d8658d8165af4552f60ade3662ba98006298 | 2 | 4 |
| 10 | ('er', 'te') | edfb618d37ecfafc9735e6ad4675245a4071aa9d | 2 | 4 |
| 10 | ('et', 'pe') | bd7ada000c2b9004b7519b989bfcfdff7ad36678 | 2 | 4 |
| 10 | ('et', 'te') | fdcb71db96d2da9b1d19b62944c5f36448cb2668 | 2 | 4 |
| 10 | ('pe', 'te') | 71322eeebabff9828aeed3281a86577163e16a78 | 2 | 4 |
| 10 | ('er', 'et', 'pe') | 8bf2788ef28443b7a0298f19defa5532db40f63a | 3 | 4 |
| 10 | ('er', 'et', 'te') | c7e9a32e54ba33d3769c4813616fdfcc6306459c | 3 | 4 |
| 10 | ('er', 'pe', 'te') | 33287ce86aa02af0f31d4857a79671c1f4645277 | 3 | 4 |
| 10 | ('er', 'et', 'pe', 'te') | 65e568493a08a3428595b8be35f6ae2a0f48d170 | 4 | 4 |
Example encrypted tuple for value "pete" created by Bob. For illustrative purposes the record key as well as the bigram sub set values are shown unencrypted. The tuple matching with Alice's value "peter" (with highest bigram score) is shown bold-faced (see also Table 1).
| 42 | ('et') | 8898f53d6225f464bb2640779cb17b9378237149 | 1 | 3 |
| 42 | ('pe') | 6fc83a87ee04335a58aa576cb5157625b1b5c51b | 1 | 3 |
| 42 | ('te') | f2bcfb3d76d7fc010e3adc08663090f29c5e928a | 1 | 3 |
| 42 | ('et', 'pe') | bd7ada000c2b9004b7519b989bfcfdff7ad36678 | 2 | 3 |
| 42 | ('et', 'te') | fdcb71db96d2da9b1d19b62944c5f36448cb2668 | 2 | 3 |
| 42 | ('pe', 'te') | 71322eeebabff9828aeed3281a86577163e16a78 | 2 | 3 |
Keyed hash bigram subset communication volume. Overhead for various minimum bigram score thresholds compared to the unencrypted communication of the original values. The average (unencrypted) surname and suburb name lengths were 6.4 and 9.3 characters, giving rise to an average of 166 and 2,521 bigram subsets respectively. A total of 2,323,355 records were processed.
| Megabytes communicated | Overhead | Megabytes communicated | Overhead | |
| 0 | 7540 | 520 | 114399 | 5435 |
| 0.1 | 7540 | 520 | 114384 | 5435 |
| 0.2 | 7484 | 516 | 113787 | 5406 |
| 0.3 | 7132 | 492 | 109679 | 5211 |
| 0.4 | 6287 | 434 | 97300 | 4623 |
| 0.5 | 4238 | 292 | 63892 | 3036 |
| 0.6 | 2836 | 196 | 35091 | 1667 |
| 0.7 | 1242 | 86 | 12154 | 577 |
| 0.8 | 511 | 35 | 3056 | 145 |
| 0.9 | 185.0 | 12.7 | 442.4 | 21.0 |
| 1 | 45.4 | 3.13 | 45.4 | 2.16 |
| Original values | 14.5 | 1 | 21.1 | 1 |