| Literature DB >> 26297363 |
Gareth Hagger-Johnson1, Katie Harron2, Tom Fleming3, Ruth Gilbert2, Harvey Goldstein4, Rebecca Landy5, Roger C Parslow3.
Abstract
OBJECTIVES: Our aim was to estimate the rate of data linkage error in Hospital Episode Statistics (HES) by testing the HESID pseudoanonymisation algorithm against a reference standard, in a national registry of paediatric intensive care records.Entities:
Keywords: EPIDEMIOLOGY; STATISTICS & RESEARCH METHODS; data linkage
Mesh:
Year: 2015 PMID: 26297363 PMCID: PMC4550723 DOI: 10.1136/bmjopen-2015-008118
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Data linkage errors following application of the Hospital Episode Statistics ID (HESID) algorithm to Paediatric Intensive Care Audit Network (PICANet) records.
Scenarios involving different combinations of patient identifiers that resulted in false matches (n=176, 0.1%)
| Sex (100% valid) | Date of birth (100% valid) | NHS number (58.5% valid) | Hospital (100% valid) | Local ID (100% valid) | Postcode (98.3% valid) | False matches n (%) | |
|---|---|---|---|---|---|---|---|
| Scenarios involving NHS numbers that differ | |||||||
| 1 | 1 | 0* | 0 | 0 | 0 | 20 (11.8) | |
| 1 | 1 | 0* | 0 | 0 | 1 | 70 (41.4) | |
| 1 | 1 | 0* | 1 | 0 | 0 | 1 (0.6) | |
| 1 | 1 | 0* | 1 | 0 | 1 | 39 (23.8) | |
| 1 | 1 | 0* | 1 | 1 | 1 | 1 (0.6) | =131 (74.4%) |
| Scenarios involving NHS numbers that match | |||||||
| 1 | 1 | 1 | 0 | 0 | 0 | 33 (19.5) | |
| 1 | 1 | 1 | 1 | 0 | 0 | 4 (2.4) | |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 (0.6) | =38 (21.6%) |
| Other scenarios† | 7 (4.0) | =7 (5%) | |||||
1=identifier is the same, 0=identifier is different or missing.
*Records with missing NHS numbers are linked by the HESID algorithm, unless a different NHS number is present.
†These scenarios occur when a record has an NHS number (and can therefore link to other records) but is compared to an initial record where the NHS number is missing and the postcode is different. The initial record itself can be linked to other records that have the same postcode.
HESID, Hospital Episode Statistics ID; NHS, National Health Service.
Scenarios involving different combinations of patient identifiers that resulted in missed matches (where sex and date of birth match)
| Sex | Date of birth | NHS number | Hospital | Local ID | Postcode | Missed matches (% of 3609) | |
|---|---|---|---|---|---|---|---|
| Scenarios where NHS number differs and postcode differs | |||||||
| 1 | 1 | 0 | 1 | 1 | 0 | 840 (23.3) | |
| 1 | 1 | 0 | 1 | 0 | 0 | 43 (1.2) | |
| 1 | 1 | 0 | 0 | 1 | 0 | 33 (0.9) | |
| 1 | 1 | 0 | 0 | 0 | 0 | 316 (8.8) | =1232 (34.1%) |
| Scenarios where NHS number differs and postcode is the same | |||||||
| 1 | 1 | 0 | 1 | 1 | 1 | 15 (0.4) | |
| 1 | 1 | 0 | 1 | 0 | 1 | 17 (0.5) | |
| 1 | 1 | 0 | 0 | 0 | 1 | 5 (0.1) | =37 (1%) |
| Other scenarios (see online supplementary table S1) | =2340 (64.8%) | ||||||
NHS. National Health Service.
Characteristics of linkage error for different patient groups
| Matched patients | Non-matched patients | |||||||
|---|---|---|---|---|---|---|---|---|
| At least one false (n=115) | All correct (n=25 671) | p Value | Total (n=25 786) | At least one missed (n=1554) | All correct (n=84 987) | p Value | Total (n=86 541) | |
| Total | (% of total) | (% of total) | ||||||
| Age group | ||||||||
| <1 month | 14 (0.05) | 7749 | (ref) | 7763 | 422 (0.49) | 13 215 | (ref) | 13 637 |
| 1–12 months | 48 (0.19) | 8060 | 8108 | 509 (0.59) | 23 268 | 23 777 | ||
| 1–4 years | 39 (0.15) | 4487 | 4526 | 315 (0.36) | 20 849 | 21 164 | ||
| 5–10 years | 12 (0.05) | 2611 | 2623 | 177 (0.20) | 12 149 | 12 326 | ||
| 11+ years | 2 (0.01) | 2764 | 0.52* | 2766 | 126 (0.15) | 15 405 | <0.001* | 15 531 |
| Missing | 0 | 0 | – | 0 | 5 (0.01) | 101 | 0.03 | 106 |
| Sex | ||||||||
| Male | 81 (0.31) | 14 711 | (ref) | 14 792 | 862 (1%) | 47 567 | (ref) | 48 429 |
| Female | 34 (0.13) | 10 960 | 0.01 | 10 994 | 674 (0.78%) | 37 281 | 0.96 | 37 955 |
| Missing | 0 | 0 | 0 | 18 (0.02%) | 139 | <0.001 | 157 | |
| Ethnic group | ||||||||
| White | 34 (0.13) | 15 150 | (ref) | 15 184 | 562 (0.65) | 52 590 | (ref) | 53 152 |
| Mixed | 2 (0.01) | 576 | 0.55 | 578 | 25 (0.03) | 1804 | 0.93 | 1829 |
| Asian | 10 (0.04) | 2340 | 0.07 | 2350 | 89 (0.10) | 6394 | 0.67 | 6483 |
| Black | 11 (0.04) | 1021 | <0.001 | 1032 | 69 (0.08) | 3219 | 0.001 | 3288 |
| Other | 8 (0.03) | 579 | <0.001 | 587 | 114 (0.13) | 1905 | <0.001 | 2019 |
| Missing | 50 (0.19) | 6005 | <0.001 | 6055 | 695 (0.08) | 19 075 | <0.001 | 19 770 |
| Admission | ||||||||
| Planned | 42 (0.16) | 10 012 | 10 054 | 673 (0.78) | 32 439 | (ref) | 33 112 | |
| Unplanned | 73 (0.28) | 15 659 | 0.59 | 15 732 | 881 (1.02) | 52 548 | <0.001 | 53 429 |
| Deprivation | ||||||||
| Low | 32 (0.12) | 7083 | (ref) | 7115 | 270 (0.31) | 20 314 | (ref) | 20 584 |
| Middle | 38 (0.15) | 6861 | 6899 | 311 (0.36) | 21 194 | 21 505 | ||
| High | 25 (0.10) | 6786 | 0.48* | 6811 | 254 (0.29) | 22 200 | 0.09* | 22 454 |
| Missing† | 20 (0.08) | 4941 | 0.62 | 4961 | 719 (0.83) | 21 279 | <0.001 | 21 998 |
| Provider size | ||||||||
| Small | 47 (0.18) | 8900 | (ref) | 8947 | 535 (0.62) | 31 102 | (ref) | 31 637 |
| Medium | 24 (0.09) | 7275 | 7299 | 606 (0.70) | 27 026 | 27 632 | ||
| Large | 44 (0.17) | 9496 | 0.53* | 9540 | 413 (0.48) | 26 859 | 0.18* | 27 272 |
*p Values for linear trend across groups (univariate).
†Typically due to missing postcode data.
OR (95% CIs) for patients having at least one false (n=82) or missed (n=2499) match
| At least one false match | p Value | At least one missed match | p Value | |
|---|---|---|---|---|
| OR (95% CI) | OR (95% CI) | |||
| Age | 0.95 (0.90 to 0.99) | 0.03 | 0.93 (0.92 to 0.94) | <0.001 |
| Male | 1.77 (1.18 to 2.65) | 0.006 | 0.98 (0.89 to 1.09) | 0.75 |
| Unplanned admissions | 0.93 (0.63 to 1.38) | 0.71 | 1.03 (0.92 to 1.16) | 0.61 |
| Provider medium size (vs small) | 0.59 (0.25 to 1.40) | 0.23 | 0.80 (0.41 to 1.58) | 0.52 |
| Provider large size (vs small) | 0.78 (0.31 to 1.93) | 0.59 | 0.67 (0.30 to 1.49) | 0.32 |
| Deprivation=missing (vs low) | 1.25 (0.68 to 2.30) | 0.48 | 3.10 (2.65 to 3.62) | <0.001 |
| Deprivation=medium (vs low) | 1.26 (0.77 to 2.05) | 0.35 | 0.91 (0.77 to 1.08) | 0.30 |
| Deprivation=high (vs low) | 0.99 (0.57 to 1.73) | 0.98 | 0.79 (0.66 to 0.95) | 0.01 |
| Ethnic group=missing (vs White) | 1.25 (0.30 to 5.26) | 0.76 | 0.92 (0.61 to 1.39) | 0.69 |
| Ethnic group=Mixed (vs White) | 1.56 (0.75 to 3.25) | 0.24 | 1.17 (0.92 to 1.48) | 0.20 |
| Ethnic group=Asian (vs White) | 3.16 (1.51 to 6.62) | 0.02 | 1.35 (1.03 to 1.77) | 0.03 |
| Ethnic group=Black (vs White) | 4.12 (1.81 to 9.38) | 0.001 | 3.59 (2.84 to 4.53) | <0.001 |
| Ethnic group=Other (vs White) | 3.23 (1.93 to 5.39) | <0.001 | 2.38 (2.07 to 2.73) | <0.001 |
| B (95% CI) | B (95% CI) | |||
| Between-hospital variance* | 0.75 (0.47 to 1.22) | 0.80 (0.61 to 1.04) |
*SD of the hospital-level effect.