| Literature DB >> 28173759 |
Katie Harron1, Gareth Hagger-Johnson2, Ruth Gilbert3, Harvey Goldstein4.
Abstract
BACKGROUND: Linkage of administrative data sources often relies on probabilistic methods using a set of common identifiers (e.g. sex, date of birth, postcode). Variation in data quality on an individual or organisational level (e.g. by hospital) can result in clustering of identifier errors, violating the assumption of independence between identifiers required for traditional probabilistic match weight estimation. This potentially introduces selection bias to the resulting linked dataset. We aimed to measure variation in identifier error rates in a large English administrative data source (Hospital Episode Statistics; HES) and to incorporate this information into match weight calculation.Entities:
Keywords: Administrative data; Data linkage; Hospital admission; Linkage error; Linkage evaluation; Record linkage
Mesh:
Year: 2017 PMID: 28173759 PMCID: PMC5297137 DOI: 10.1186/s12874-017-0306-8
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Percentage of HES-PDS linked records with disagreeing or missing identifiers according to age, ethnicity and sex. The larger identifier error rates in postcode reflect that postcode was missing for 83% of records for infants aged 0–1 years
Fig. 2Variation in identifier error rates by hospital provider (n = 167). Each dot represents one hospital (hospitals with <500 matches were excluded). Inner lines = 95% control limits; outer lines = 99.8% control limits
Traditional match weights, match weights incorporating dependence between identifiers, and attribute-specific match weights according to agreement pattern {date of birth, sex, postcode}. Record pairs with no agreement on any identifiers, or where only sex agreed (agreement patterns {000} and {010}), were assumed to be non-matches and excluded
| Agreement pattern {date of birth, sex, postcode} | |||||||
|---|---|---|---|---|---|---|---|
| 001 | 100 | 011 | 101 | 110 | 111 | ||
| N Matches | 1 | 21 | 18 | 12 | 15,924 | 14,009 | |
| N Non-matches | 259 | 414,307 | 248 | 4 | 415,888 | 10 | |
| Match probabilitya | 0.0039 | 0.0001 | 0.0726 | 0.7500 | 0.0369 | 0.9993 | |
| Traditional match weight | 5.3 | −1.0 | 9.6 | 14.9 | 8.6 | 23.7 | |
| Match weight assuming dependence | −0.5 | −1.2 | 9.3 | 17.8 | 8.6 | 27.6 | |
| Attribute-specific match weight: | |||||||
| Sex | Female | −1.7 | −1.7 | 8.7 | 17.3 | 8.7 | 27.7 |
| Male | 0.4 | −0.9 | 9.7 | 18.1 | 8.5 | 27.5 | |
| Age | 0–1 years | −0.5 | 0.1 | 8.6 | 18.5 | 9.2 | 27.6 |
| 5–6 years | −0.2 | −2.0 | 9.6 | 18.2 | 7.9 | 28.0 | |
| 18–19 years | −1.5 | −2.6 | 9.5 | 16.2 | 8.3 | 27.2 | |
| Ethnicity | Missing | 2.7 | 0.3 | 10.8 | 19.5 | 8.5 | 27.6 |
| White | −1.3 | −1.6 | 8.8 | 17.4 | 8.6 | 27.6 | |
| Mixed | 1.8 | −0.4 | 10.9 | 19.4 | 8.8 | 28.6 | |
| Asian | −0.4 | −2.3 | 10.4 | 16.9 | 8.6 | 27.8 | |
| Black | 1.6 | 0.9 | 9.6 | 19.1 | 8.9 | 27.1 | |
| Other | 1.4 | −0.2 | 10.5 | 19.0 | 8.8 | 28.1 | |
| Organisational-specific match weight (mean) | 5.7 | 1.3 | 12.3 | 20.7 | 8.1 | 25.4 | |
aMatch probability = N matches/Total record pairs
Simulation study results: estimated readmission rates. The ‘true’ readmission rate was 8.8%
| NHS number error distribution in simulated data | |||||
|---|---|---|---|---|---|
| 30%, random | 0.5%, random | 30%, associated with other identifier errors | 30%, associated with ethnicity | ||
| Traditional match weight | % readmitted | 7.4 | 7.4 | 6.9 | 7.4 |
| Standard error | 0.002 | 0.002 | 0.002 | 0.002 | |
| % bias | −15.9 | −15.9 | −21.3 | −15.7 | |
| Match weight incorporating dependence | % readmitted | 7.4 | 7.4 | 6.9 | 7.4 |
| Standard error | 0.002 | 0.002 | 0.002 | 0.002 | |
| % bias | −16.0 | −16.0 | −21.4 | −15.8 | |
| Attribute-specific match weight | % readmitted | 8.7 | 8.7 | 8.7 | 8.8 |
| Standard error | 0.002 | 0.002 | 0.002 | 0.002 | |
| % bias | −0.6 | −0.6 | −0.9 | −0.2 | |
| Organisation-specific match weight | % readmitted | 8.8 | 8.8 | 8.8 | 8.8 |
| Standard error | 0.002 | 0.002 | 0.002 | 0.002 | |
| % bias | 0.2 | 0.2 | −0.1 | 0.2 | |
Fig. 3Simulation study results: estimated readmission rates by ethnicity, according to NHS number error rate distribution
Fig. 4Simulation study results: absolute bias in estimated readmission rates for NHS number error associated with ethnicity. Results for traditional match weights fall behind those for weights incorporating dependence between identifiers