| Literature DB >> 28749318 |
Gareth Hagger-Johnson1, Katie Harron2, Harvey Goldstein3, Robert Aldridge4, Ruth Gilbert5.
Abstract
BACKGROUND: The pseudonymisation algorithm used to link together episodes of care belonging to the same patients in England (HESID) has never undergone any formal evaluation, to determine the extent of data linkage error.Entities:
Keywords: Deterministic record linkage; Evaluation; Hospital discharge; Probabilistic record linkage
Mesh:
Year: 2017 PMID: 28749318 PMCID: PMC6217911 DOI: 10.14236/jhi.v24i2.891
Source DB: PubMed Journal: J Innov Health Inform ISSN: 2058-4555
Scenarios that resulted in unlinked records using deterministic linkage were subsequently linked following probabilistic linkage, ranked from most likely to be correct to least likely
| NHS number | Sex | Date of birth | Local ID within hospital | Postcode | Match weight | n |
|---|---|---|---|---|---|---|
| A | D | A | A | A | 62.02 | 5 |
| A | . | A | A | A | 62.02 | 55 |
| A | D | A | A | . | 45.96 | 10 |
| A | . | A | A | . | 45.96 | 5 |
| A | A | A | D | A | 44.49 | 9 |
| A | D | A | A | D | 44.32 | 5 |
| A | . | A | . | A | 42.75 | 22 |
| . | . | A | A | A | 42.24 | 25 |
| . | D | A | A | A | 42.24 | 14 |
| A | . | A | D | A | 41.32 | 41 |
| A | D | A | D | A | 41.32 | 9 |
| . | A | A | A | . | 29.35 | 1809 |
| A | A | A | D | . | 28.43 | 7 |
| . | A | A | A | D | 27.71 | 722 |
| A | A | A | D | D | 26.79 | 76 |
| . | . | A | A | . | 26.18 | 2 |
| . | D | A | A | . | 26.18 | 3 |
| . | A | A | . | A | 26.14 | 5 |
| D | A | A | A | . | 26.03 | 8 |
| A | . | A | D | . | 25.26 | 2 |
| A | D | A | D | . | 25.26 | 9 |
| . | A | A | D | A | 24.71 | 3642 |
| . | . | A | A | D | 24.54 | 2 |
| . | D | A | A | D | 24.54 | 10 |
| D | A | A | A | D | 24.39 | 5 |
| A | . | A | D | D | 23.62 | 5 |
| A | D | A | D | D | 23.62 | 11 |
| . | . | A | . | A | 22.97 | 1 |
Note. A = identifier agreed; D = identifier disagreed; . = identifier missing.
Number (%) of records with missing nhs number or postcode by birth year (inpatient hospital episodes from 1998 to 2015)
| Birth year (for records with day of birth 13th and 28th of each month in these years) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1992 (n = 100,443) | 1998 (n = 120,470) | 2005 (n = 106,450) | 2012 (n = 89,896) | |||||||||
| Record characteristics | NHS number (%) | Postcode (%) | NHS number (%) | Postcode (%) | NHS number (%) | Postcode (%) | NHS number (%) | Postcode (%) | ||||
| Overall | n | 43.8 | 3.8 | n | 4.6 | 12.3 | n | 5.3 | 30.7 | n | 0.7 | 47.3 |
| Missing | 138 | 15.9 | 1.4 | 166 | 66.3 | 16.9 | 27 | 55.6 | 33.3 | 19 | 10.5 | 57.9 |
| Male | 60,285 | 4.2 | 3.8 | 54,779 | 37.6 | 13.0 | 47,939 | 5.5 | 33.0 | 40,849 | 0.7 | 50.4 |
| Female | 40,020 | 7.7 | 3.9 | 65,525 | 35.6 | 11.7 | 58,484 | 4.9 | 28.7 | 49,028 | 0.7 | 44.7 |
| Missing | 18,988 | 11.3 | 3.9 | 48,098 | 53.5 | 16.9 | 19,938 | 8.5 | 47.5 | 7,966 | 2.1 | 40.6 |
| White | 70,455 | 3.7 | 4.0 | 59,977 | 24.0 | 9.3 | 67,887 | 4.2 | 26.1 | 62,037 | 0.5 | 47.6 |
| Mixed | 1,264 | 4.4 | 4.0 | 1,023 | 4.7 | 2.1 | 2,755 | 4.3 | 26.7 | 3,876 | 0.6 | 51.6 |
| Asian | 4,643 | 7.8 | 2.6 | 5,529 | 25.8 | 7.2 | 8,473 | 4.1 | 28.6 | 9,027 | 0.6 | 49.4 |
| Black | 3,103 | 6.9 | 2.2 | 2,909 | 33.2 | 6.6 | 4,665 | 6.4 | 32.0 | 3,963 | 1.2 | 50.0 |
| Chinese/Other | 1,990 | 14.2 | 2.7 | 2,934 | 49.6 | 16.1 | 2,732 | 7.4 | 29.3 | 3,027 | 1.6 | 44.1 |
| Missing | 3,836 | 8.2 | 14,784 | 61.7 | 32,644 | 6.5 | 42,523 | 0.4 | ||||
| Low deprivation | 66,956 | 5.1 | 75,884 | 32.3 | 51,755 | 4.1 | 33,525 | 0.8 | ||||
| High deprivation | 28,921 | 5.3 | 29,438 | 34.3 | 21,613 | 4.6 | 13,601 | 0.9 | ||||
| Foreign | 620 | 57.6 | 341 | 80.6 | 324 | 78.7 | 214 | 43.5 | ||||
| No fixed abode | 110 | 21.8 | 23 | 39.1 | 114 | 14.0 | 33 | 0.0 | ||||
| Other episode | 83,356 | 17.1 | 2.8 | 66,454 | 4.5 | 2.7 | 45,200 | 1.1 | 2.5 | |||
| Birth episode | 37,114 | 80.1 | 32.1 | 39,996 | 6.3 | 77.0 | 44,696 | 0.3 | 92.6 | |||
Note. Missing data on postcode refer to missing after excluding invalid or communal postcodes, and postcodes denoting ‘no fixed abode’ or foreign patients. The ‘mixed’ ethnic group was not recorded until April 2001 but appears in the 1992 and 1998 cohorts if taken from episodes from 2001 onwards. Proportions of missing data for local patient ID within provider are very small (<0.1%) and not shown here. High deprivation is defined as the most deprived quintile of IMD2004
Variation in odds ratios (95% cIs) and percentage bias for demographic risk factors for hospital readmission within one year according to hesId, comparing data linkage algorithms
| Reference standard | Deterministic (n = 181,395) | Det+Probabilistic | ||||||
|---|---|---|---|---|---|---|---|---|
| Odds ratio | Odds ratio | Bias | Odds ratio | Bias | ||||
| 1-year readmission rate | n | 18.7% | n | 18.4% | n | 18.7% | ||
| 175,773 | 181,395 | 176,990 | ||||||
| Male | 85,887 | 1.08 | 88,243 | 1.07 | 6% | 90,303 | 1.07 | 2% |
| Female | 89,710 | (reference) | 92,818 | (reference) | 86,484 | (reference) | ||
| 0 to 3 | 119,742 | 3.39 | 123,712 | 2.88 | 13% | 120,186 | 3.40 | 0% |
| 4 to 7 | 14,856 | (reference) | 15,510 | (reference) | 15,137 | (reference) | ||
| 8 to 11 | 10,787 | 1.08 | 11,306 | 0.99 | 119% | 10,945 | 1.07 | 15% |
| 12 to 15 | 8,920 | 0.94 | 9,186 | 0.83 | 9,049 | 0.93 | ||
| 16 to 19 | 10,080 | 1.19 | 10,239 | 1.04 | 77% | 10,243 | 1.18 | 6% |
| 20 to 23 | 11,388 | 1.31 | 11,442 | 1.15 | 50% | 11,430 | 1.31 | 1% |
| Missing | 44,733 | 0.52 | 47,518 | 0.51 | −1% | 45,044 | 0.52 | 1% |
| White | 103,831 | (reference) | 106,070 | (reference) | 104,603 | (reference) | ||
| Mixed | 3,978 | 0.96 | 4,005 | 0.98 | 3,989 | 0.97 | ||
| Asian | 11,481 | 0.98 | 11,731 | 0.98 | 11,543 | 0.99 | ||
| Black | 6,457 | 0.78 | 6,557 | 0.81 | 13% | 6,491 | 0.8 | 9% |
| Chinese/Other | 5,293 | 0.73 | 5,514 | 0.72 | −3% | 5,320 | 0.73 | 3% |
| Missing | 50,322 | 0.04 | 51,746 | 0.03 | −1% | 50,546 | 0.04 | 0% |
| Not deprived | 89,966 | (reference) | 92,727 | (reference) | 90,631 | (reference) | ||
| Deprived | 34,563 | 1.17 | 35,749 | 1.16 | 9% | 34,884 | 1.17 | 2% |
| No fixed abode | 782 | 0.58 | 1,031 | 0.40 | −70% | 787 | 0.60 | 7% |
| Foreign | 140 | 0.42 | 142 | 0.42 | 2% | 142 | 0.47 | 14% |
| 1998 to 2003 | 49,456 | (reference) | 53,463 | (reference) | 49,895 | (reference) | ||
| 2004 to 2009 | 50,348 | 1.83 | 51,429 | 1.92 | −7% | 50,751 | 1.82 | 1% |
| 2010 to 2015 | 75,969 | 2.08 | 76,503 | 2.2 | −7% | 76,344 | 2.08 | 0% |
This refers to the percentage by which the log odds coefficient in each model is over-or under-estimated, compared to the reference standard model 100*[(logitreference–logitcomparison/logitreference)], shown where the subgroup has a significantly increased risk of readmission in one year
Models exclude records were sex is missing (n = 334 after deterministic match, 203 after probabilistic match and 176 for reference standard)
Percentage (95% CIs) of records classified as missed matches compared with reference standard following deterministic and probabilistic data linkage
| Records from 1998 to 2003 | Records from 2004 to 2009 | Records from 2010 to 2015 | ||||
|---|---|---|---|---|---|---|
| Deterministic | Probabilistic | Deterministic | Probabilistic | Deterministic | Probabilistic | |
| Links | 58,768 | 62,941 | 100,282 | 101,086 | 153,226 | 153,547 |
| Specificity % | 0.985 | 0.981 | 0.987 | 0.996 | 0.998 | 0.997 |
| Sensitivity % | 0.914 | 0.976 | 0.993 | 0.994 | 0.997 | 0.999 |
| Missed match % | 8.6 (8.4, 8.8) | 2.4 (2.2, 2.5) | 1.3 (1.2, 1.3) | 0.4 (0.4, 0.5) | 0.4 (0.3, 0.4) | 0.1 (0.1, 0.1) |
| 0–3 | 12.1 (11.8, 12.5) | 3.2 (3.1, 3.4) | 1.4 (1.3, 1.4) | 0.4 (0.3, 0.4) | 0.2 (0.2, 0.3) | 0.0 (0.0, 0.1) |
| 4–7 | 3.8 (3.4, 4.1) | 1.3 (1.1, 1.5) | 1.7 (1.5, 2.0) | 0.8 (0.6, 1.0) | 0.4 (0.3, 0.5) | 0.2 (0.1, 0.2) |
| 8–11 | 2.2 (1.9, 2.4) | 0.6 (0.4, 0.7) | 1.7 (1.5, 2.0) | 0.7 (0.5, 0.9) | 0.3 (0.2, 0.4) | 0.1 (0.0, 0.1) |
| 12–15 | 0.7 (0.5, 0.8) | 0.2 (0.1, 0.3) | 0.8 (0.7, 1.0) | 0.5 (0.4, 0.6) | ||
| 16–19 | 0.6 (0.5, 0.7) | 0.1 (0.0, 0.2) | 0.6 (0.5, 0.7) | 0.3 (0.2, 0.3) | ||
| 20–23 | 0.2 (0.2, 0.3) | 0.1 (0.0, 0.1) | ||||
| Missing | 100.0 (all missed) | 21.0 (14.6, 26.5) | 100.0 (all missed) | 40.0 (9.6, 58.3) | 100.0 (all missed) | 16.7 (1.8, 21.9) |
| Male | 8.6 (8.3, 9.0) | 2.4 (2.3, 2.6) | 1.3 (1.2, 1.4) | 0.4 (0.3, 0.5) | 0.3 (0.3, 0.3) | 0.1 (0.1, 0.1) |
| Female | 8.2 (7.9, 8.5) | 2.2 (2.1, 2.4) | 1.3 (1.2, 1.4) | 0.4 (0.4, 0.5) | 0.4 (0.3, 0.4) | 0.1 (0.1, 0.2) |
| Missing | 10.4 (10.1, 10.8) | 2.7 (2.6, 2.9) | 2.2 (2.0, 2.4) | 0.6 (0.5, 0.7) | 0.9 (0.7, 1.0) | 0.2 (0.1, 0.3) |
| White | 6.8 (6.5, 7.1) | 2.0 (1.8, 2.1) | 1.0 (0.9, 1.0) | 0.4 (0.3, 0.4) | 0.3 (0.2, 0.3) | 0.1 (0.1, 0.1) |
| Mixed | 3.7 (0.1, 4.4) | 0.9 (-0.9, 1.8) | 1.0 (0.6, 1.3) | 0.3 (0.1, 0.4) | 0.2 (0.1, 0.3) | 0.2 (0.1, 0.3) |
| Asian | 7.0 (6.0, 8.0) | 2.2 (1.6, 2.6) | 1.0 (0.8, 1.2) | 0.3 (0.2, 0.4) | 0.6 (0.4, 0.7) | 0.2 (0.1, 0.2) |
| Black | 7.0 (5.5, 8.3) | 2.3 (1.4, 2.9) | 1.4 (1.0, 1.7) | 0.4 (0.2, 0.6) | 0.4 (0.2, 0.5) | 0.1 (0.0, 0.2) |
| Chinese/Other | 11.7 (10.1, 13.2) | 2.7 (1.8, 3.3) | 4.4 (3.5, 5.3) | 0.6 (0.3, 0.9) | 0.9 (0.6, 1.1) | 0.1 (0.0, 0.2) |
| Missing | 26.4 (25.2, 27.6) | 4.3 (3.8, 4.9) | 1.7 (1.5, 1.8) | 0.5 (0.4, 0.6) | 0.2 (0.1, 0.2) | 0.1 (0.0, 0.1) |
| Low deprivation | 7.1 (6.8, 7.3) | 2.2 (2.1, 2.4) | 1.0 (0.9, 1.1) | 0.4 (0.3, 0.4) | 0.3 (0.2, 0.3) | 0.1 (0.1, 0.1) |
| High deprivation | 6.8 (6.4, 7.1) | 2.2 (2.0, 2.4) | 1.1 (1.0, 1.3) | 0.5 (0.4, 0.5) | 0.3 (0.2, 0.3) | 0.2 (0.1, 0.2) |
| Foreign | 69.4 (61.2, 78.1) | 66.3 (59.8, 73.1) | 1.5 (-0.2, 3.2) | 31.6 (27.7, 35.4) | 0.8 (0.0, 0.9) | |
| No fixed abode | 8.7 (2.0, 12.0) | 0.0 (none missed) | ||||
Note. Number of records in each category shown in Table 6
False and missed matches after different thresholds for probabilistic matching
| Relaxed | Middle (as in main results) | Strict | |
|---|---|---|---|
| False matches | 1.8% | 1.8% | 1.4% |
| Missed matches | 0.6% | 0.7% | 1.6% |
| False matches | 3.7% | 3.6% | 2.5% |
| Missed matches | 2.3% | 2.4% | 6.8% |
| False matches | 1.2% | 1.1% | 1.2% |
| Missed matches | 0.4% | 0.4% | 0.8% |
| False matches | 0.4% | 0.4% | 0.4% |
| Missed matches | 0.1% | 0.1% | 0.2% |
Missing data on nhs number or postcode by data year
| 1998–2003 (n = 99,200) | 2004–2009 (n = 128,666) | 2010–2015 (n = 189,373) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| n | NHS | Postcode (%) | n | NHS | Postcode (%) | n | NHS | Postcode (%) | |
| % missing | |||||||||
| 0–3 | 68,225 | 61.2 | 19.0 | 77,270 | 6.5 | 41.3 | 89,896 | 0.7 | 47.3 |
| 4–7 | 17,774 | 17.5 | 2.6 | 13,112 | 4.5 | 2.9 | 13,435 | 1.7 | 2.8 |
| 8–11 | 13,221 | 10.8 | 4.0 | 11,833 | 4.4 | 4.1 | 10,309 | 1.2 | 2.0 |
| 12–15 | 15,370 | 4.7 | 4.0 | 14,144 | 1.5 | 3.7 | |||
| 16–19 | 11,081 | 3.4 | 3.8 | 24,304 | 1.9 | 4.0 | |||
| 20–23 | 37,285 | 1.8 | 3.9 | ||||||
| Sex | |||||||||
| Missing | 278 | 46.8 | 10.4 | 27 | 59.3 | 37.0 | 45 | 6.7 | 24.4 |
| Male | 43,996 | 48.7 | 15.2 | 60,181 | 5.9 | 27.3 | 99,675 | 1.1 | 22.8 |
| Female | 54,946 | 45.0 | 13.1 | 68,458 | 5.3 | 25.4 | 89,653 | 1.4 | 26.0 |
| Missing | 51,188 | 52.9 | 16.2 | 26,669 | 7.9 | 36.4 | 17,133 | 3.1 | 21.0 |
| White | 40,263 | 37.8 | 12.0 | 83,078 | 4.6 | 22.4 | 137,015 | 0.8 | 23.5 |
| Mixed | 142 | 16.9 | 2.8 | 2,698 | 5.5 | 27.2 | 6,078 | 1.2 | 34.0 |
| Asian | 3,399 | 42.1 | 8.6 | 8,812 | 6.0 | 28.2 | 15,461 | 1.5 | 29.8 |
| Black | 1,807 | 55.0 | 6.9 | 4,804 | 7.9 | 31.1 | 8,029 | 1.9 | 26.3 |
| Chinese/Other | 2,421 | 62.4 | 17.1 | 2,605 | 10.1 | 30.2 | 5,657 | 3.8 | 25.8 |
| Missing | 13,937 | 65.7 | 33,808 | 6.7 | 46,042 | 0.7 | |||
| Low deprivation | 59,914 | 43.5 | 67,485 | 4.8 | 100,721 | 1.0 | |||
| High deprivation | 25,122 | 43.2 | 26,851 | 5.0 | 41,600 | 1.3 | |||
| Foreign | 236 | 80.9 | 401 | 81.0 | 862 | 53.8 | |||
| No fixed abode | 11 | 45.5 | 121 | 20.7 | 148 | 12.8 | |||