| Literature DB >> 32935028 |
S M Randall1, A P Brown1, A M Ferrante1, J H Boyd1.
Abstract
INTRODUCTION: Available and practical methods for privacy preserving linkage have shortcomings: methods utilising anonymous linkage codes provide limited accuracy while methods based on Bloom filters have proven vulnerable to frequency-based attacks.Entities:
Year: 2019 PMID: 32935028 PMCID: PMC7482515 DOI: 10.23889/ijpds.v4i1.1094
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
Figure 1: Privacy preserving linkage using an independent third party| First Name | Surname | Sex | Year of Birth | Summed Score | |
|---|---|---|---|---|---|
| 1 | Agree | Agree | Agree | Agree | 17 |
| 2 | Agree | Agree | Disagree | Agree | 15.5 |
| 3 | Disagree | Agree | Disagree | Agree | 10 |
| … | … | … | … | … | |
| Original Data | ||||
|---|---|---|---|---|
| Record ID | First Name | Surname | Sex | Year of Birth |
| Record1 | Sean | Randall | M | 1986 |
| Record2 | John | Doe | 1957 | |
| FER12 | BRO17 | SA Emergency | NSW Emergency | SA Hospital | NSW Hospital | |
|---|---|---|---|---|---|---|
| No. Records | 400,000 | 1,000,000 | 813,839 | 4,304,459 | 1,007,242 | 6,658,380 |
| First Name | 2.4% | 10.0% | 2.2% | 0.1% | 3.1% | 33.2% |
| Middle Name | - | 10.0% | 74.4% | 83.4% | 79.3% | 66.9% |
| Surname | 2.6% | 10.0% | 1.3% | 0.0% | 2.4% | 33.3% |
| Date of Birth | 11.8% | 10.0% | 0.0% | 0.0% | 0.0% | 0.0% |
| Sex | 5.2% | 10.0% | 0.0% | 0.0% | 0.0% | 0.0% |
| Address | - | 10.0% | 4.6% | 4.2% | 7.8% | 10.4% |
| Postcode | 1.1% | 10.0% | 7.5% | 1.2% | 9.4% | 0.6% |
1 Privacy preserving record linkage
2 Statistical linkage key
| Dataset 1: FER09 | Precision | Recall | F-measure | |
| New PPRL1 method | Multiple match-key PPRL | 0.928 | 0.788 | 0.856 |
| PPRL | SLK2 | 0.871 | 0.570 | 0.689 |
| PPRL | Record-level bloom filter | 0.937 | 0.778 | 0.850 |
| PPRL | Field-level bloom filter | 0.941 | 0.793 | 0.860 |
| Un-encoded | Probabilistic linkage using approximate string matching | 0.986 | 0.805 | 0.886 |
| Un-encoded | Probabilistic linkage using exact matching only | 0.940 | 0.777 | 0.851 |
| Dataset 2: BRO17 | Precision | Recall | F-measure | |
| New PPRL method | Multiple match-key PPRL | 0.992 | 0.943 | 0.967 |
| PPRL | SLK | 0.960 | 0.239 | 0.383 |
| PPRL | Record-level bloom filter | 0.934 | 0.691 | 0.794 |
| PPRL | Field-level bloom filter | 0.997 | 0.813 | 0.896 |
| Un-encoded | Probabilistic linkage using approximate string matching | 0.996 | 0.815 | 0.897 |
| Un-encoded | Probabilistic linkage using exact matching only | 0.993 | 0.810 | 0.892 |
| Dataset 3: SA Emergency | Precision | Recall | F-measure | |
| New PPRL method | Multiple match-key PPRL | 0.967 | 0.990 | 0.978 |
| PPRL | SLK | 0.995 | 0.945 | 0.969 |
| PPRL | Record-level bloom filter | 0.992 | 0.956 | 0.974 |
| PPRL | Field-level bloom filter | 0.984 | 0.978 | 0.981 |
| Un-encoded | Probabilistic linkage using approximate string matching | 0.985 | 0.980 | 0.982 |
| Un-encoded | Probabilistic linkage using exact matching only | 0.969 | 0.990 | 0.979 |
| Dataset 4: NSW Emergency | Precision | Recall | F-measure | |
| New PPRL method | Multiple match-key PPRL | 0.997 | 0.983 | 0.990 |
| PPRL | SLK | 0.999 | 0.966 | 0.982 |
| PPRL | Record-level bloom filter | 0.989 | 0.978 | 0.983 |
| PPRL | Field-level bloom filter | 0.995 | 0.987 | 0.991 |
| Un-encoded | Probabilistic linkage using approximate string matching | 0.995 | 0.990 | 0.993 |
| Un-encoded | Probabilistic linkage using exact matching only | 0.995 | 0.985 | 0.990 |
| Dataset 5: SA Hospital | Precision | Recall | F-measure | |
| New PPRL method | Multiple match-key PPRL | 0.993 | 0.991 | 0.992 |
| PPRL | SLK | 0.975 | 0.988 | 0.981 |
| PPRL | Record-level bloom filter | 0.991 | 0.992 | 0.992 |
| PPRL | Field-level bloom filter | 0.995 | 0.989 | 0.992 |
| Un-encoded | Probabilistic linkage using approximate string matching | 0.996 | 0.987 | 0.992 |
| Un-encoded | Probabilistic linkage using exact matching only | 0.995 | 0.988 | 0.991 |
| Dataset 6: NSW Hospital | Precision | Recall | F-measure | |
| New PPRL method | Multiple match-key PPRL | 0.983 | 0.991 | 0.987 |
| PPRL | SLK | 0.072 | 0.920 | 0.134 |
| PPRL | Record-level bloom filter | 0.754 | 0.921 | 0.829 |
| PPRL | Field-level bloom filter | 0.992 | 0.989 | 0.990 |
| Un-encoded | Probabilistic linkage using approximate string matching | 0.992 | 0.989 | 0.991 |
| Un-encoded | Probabilistic linkage using exact matching only | 0.988 | 0.991 | 0.990 |