| Literature DB >> 24094134 |
Khaled El Emam1, Fida K Dankar, Angelica Neisa, Elizabeth Jonker.
Abstract
BACKGROUND: Our objective was to develop a model for measuring re-identification risk that more closely mimics the behaviour of an adversary by accounting for repeated attempts at matching and verification of matches, and apply it to evaluate the risk of re-identification for Canada's post-marketing adverse drug event database (ADE).Re-identification is only demonstrably plausible for deaths in ADE. A matching experiment between ADE records and virtual obituaries constructed from Statistics Canada vital statistics was simulated. A new re-identification risk is considered, it assumes that after gathering all the potential matches for a patient record (all records in the obituaries that are potential matches for an ADE record), an adversary tries to verify these potential matches. Two adversary scenarios were considered: (a) a mildly motivated adversary who will stop after one verification attempt, and (b) a highly motivated adversary who will attempt to verify all the potential matches and is only limited by practical or financial considerations.Entities:
Mesh:
Year: 2013 PMID: 24094134 PMCID: PMC4137558 DOI: 10.1186/1472-6947-13-114
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Example data: this hypothetical example table is used to illustrate a number of concepts that we use throughout our analysis
| | |||||||
|---|---|---|---|---|---|---|---|
| 1 | John Smith | (412) 688-5468 | Male | 1959 | Albumin, Serum | 4.8 | 37 |
| 2 | Alan Smith | (413) 822-5074 | Male | 1969 | Creatine kinase | 86 | 36 |
| 3 | Alice Brown | (416) 886-5314 | Female | 1955 | Alkaline Phosphatase | 66 | 52 |
| 4 | Hercules Green | (613) 763-5254 | Male | 1959 | Bilirubin | Negative | 36 |
| 5 | Alicia Freds | (613) 586-6222 | Female | 1942 | BUN/Creatinine Ratio | 17 | 82 |
| 6 | Gill Stringer | (954) 699-5423 | Female | 1975 | Calcium, Serum | 9.2 | 34 |
| 7 | Marie Kirkpatrick | (416) 786-6212 | Female | 1966 | Free Thyroxine Index | 2.7 | 23 |
| 8 | Leslie Hall | (905) 668-6581 | Female | 1987 | Globulin, Total | 3.5 | 9 |
| 9 | Douglas Henry | (416) 423-5965 | Male | 1959 | B-type natriuretic peptide | 134.1 | 38 |
| 10 | Fred Thompson | (416) 421-7719 | Male | 1967 | Creatine kinase | 80 | 21 |
| 11 | Joe Doe | (705) 727-7808 | Male | 1968 | Alanine aminotransferase | 24 | 33 |
| 12 | Lillian Barley | (416) 695-4669 | Female | 1955 | Cancer antigen 125 | 86 | 28 |
| 13 | Deitmar Plank | (416) 603-5526 | Male | 1967 | Creatine kinase | 327 | 37 |
| 14 | Anderson Hoyt | (905) 388-2851 | Male | 1967 | Creatine kinase | 82 | 16 |
| 15 | Alexandra Knight | (416) 539-4200 | Female | 1966 | Creatinine | 0.78 | 44 |
| 16 | Helene Arnold | (519) 631-0587 | Female | 1955 | Triglycerides | 147 | 59 |
| 17 | Almond Zipf | (519) 515-8500 | Male | 1967 | Creatine kinase | 73 | 20 |
| 18 | Britney Goldman | (613) 737-7870 | Female | 1956 | Monocytes | 12 | 34 |
| 19 | Lisa Marie | (902) 473-2383 | Female | 1956 | HDL Cholesterol | 68 | 141 |
| 20 | William Cooper | (905) 763-6852 | Male | 1978 | Neutrophils | 83 | 21 |
| 21 | Kathy Last | (705) 424-1266 | Female | 1966 | Prothrombin Time | 16.9 | 23 |
| 22 | Deitmar Plank | (519) 831-2330 | Male | 1967 | Creatine kinase | 68 | 16 |
| 23 | Anderson Hoyt | (705) 652-6215 | Male | 1971 | White Blood Cell Count | 13.0 | 151 |
| 24 | Alexandra Knight | (416) 813-5873 | Female | 1954 | Hemoglobin | 14.8 | 34 |
| 25 | Helene Arnold | (705) 663-1801 | Female | 1977 | Lipase, Serum | 37 | 27 |
| 26 | Anderson Heft | (416) 813-6498 | Male | 1944 | Cholesterol, Total | 147 | 18 |
| 27 | Almond Zipf | (617) 667-9540 | Male | 1965 | Hematocrit | 45.3 | 53 |
Example of an extract from an adverse drug reaction database where the reported outcome was death, and a potentially matching extract from an obituary table
| 1 | 42 | F | British | 5 May 1998 | TALWIN FOR INJECTION | Suicide |
| Columbia | ||||||
| 2 | 71 | M | Alberta | 2 Jan 1998 | MAXERAN | Dehydration |
| 3 | 34 | M | Ontario | 21 Sept 1998 | Procainamide | Cardiac arrest |
| 4 | 55 | F | Quebec | 1 Apr 1998 | Rifampin | Congestive heart failure |
| 5 | 38 | F | Nova | 25 Nov 2004 | Tegretol | Non-accidental overdose |
| Scotia | ||||||
| 6 | 44 | M | Ontario | 23 Oct 2006 | Penicillin | Respiratory arrest |
| 7 | 65 | M | Quebec | 24 Jun 2001 | Morphine | Haemorrhage intracranial |
Example of an extract from an obituary table
| John Smith | 44 | M | Ontario | 23 Oct 2006 |
| Alan Black | 44 | M | Ontario | 23 Oct 2006 |
| Hugh Tremblay | 44 | M | Ontario | 23 Oct 2006 |
| Joe White | 44 | M | Ontario | 23 Oct 2006 |
| Mary Lambert | 65 | F | Quebec | 25 Nov 2004 |
| Leslie Long | 77 | F | British Columbia | 24 Jun 2001 |
We assume that the obituary has all deaths. The names in this table are fabricated names and do not knowingly match real individuals.
Figure 1Example to illustrate how the percentage of ADE records at risk are computed from the matched equivalence classes. In this example we assume there are only two quasi-identifying attributes: age at death and gender.
Figure 2Plot showing how the re-identification risk, , varies with the number of attempts ( ) and the probability of being able to verify a match ( ).
Figure 3An example illustrating how province is assigned to the ADE dataset from the virtual obituary file.
The percentage of ADE deaths that are at a high risk of re-identification by matching to an obituary for different combinations of quasi-identifiers
| | X | X | X | X | X | 1.95 | 3.46 | 5.05 |
| | X | X | | X | X | 0 | 0 | 0 |
| | X | X | | | X | 0 | 0 | 0 |
| | 2 yr | X | | | X | 0 | 0 | 0 |
| | 5 yr | X | | | X | 0 | 0 | 0 |
| | 10 yr | X | | | X | 0 | 0 | 0 |
| X | X | X | X | X | X | 18.44 | 25.17 | 30.78 |
| X | X | X | | X | X | 0.21 | 0.4 | 0.63 |
| X | X | X | | | X | 0.12 | 0.24 | 0.39 |
| X | 2 yr | X | | | X | 0.04 | 0.07 | 0.13 |
| X | 5 yr | X | | | X | 0.02 | 0.03 | 0.05 |
| X | 10 yr | X | X | 0 | 0 | 0.01 | ||
We considered age converted into a 2 year interval, 5 year interval, and a 10 year interval.
The minimum size of an equivalence class in a simulated obituary
| | X | X | X | X | X | 263 |
| | X | X | | X | X | 6514 |
| | X | X | | | X | 8459 |
| | 2 yr | X | | | X | 16671 |
| | 5 yr | X | | | X | 39780 |
| | 10 yr | X | | | X | 79619 |
| X | X | X | X | X | X | 253 |
| X | X | X | | X | X | 6486 |
| X | X | X | | | X | 8459 |
| X | 2 yr | X | | | X | 16671 |
| X | 5 yr | X | | | X | 39780 |
| X | 10 yr | X | X | 79619 | ||
Figure 4The relationship between M, p, and k.