| Literature DB >> 32864476 |
J C Doidge1,2, J K Morris3, K L Harron1, S Stevens4, R Gilbert1,5.
Abstract
INTRODUCTION: Disease registers and electronic health records are valuable resources for disease surveillance and research but can be limited by variation in data quality over time. Quality may be limited in terms of the accuracy of clinical information, of the internal linkage that supports person-based analysis of most administrative datasets, or by errors in linkage between multiple datasets.Entities:
Keywords: Down’s syndrome; data linkage; disease surveillance; electronic health records; linkage error; prevalence
Year: 2020 PMID: 32864476 PMCID: PMC7115985 DOI: 10.23889/ijpds.v5i1.1157
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
Figure 1: Linkage overview
Figure 2: Subgroups for estimating prevalence and case ascertainmentDOB: Date of birth
HES: Hospital Episode Statistics for England
MW: match weight
NDSCR: National Down Syndrome Cytogenetic Register.
NDSCR records exclude those with missing birth outcome. All data are column proportions, ignoring missing data, so that associations between record characteristics and linkage quality are reflected by differences in proportion across columns within each row. Probabilistic links are grouped by match weight, a score reflecting the level of agreement over matching variables (see Methods).
1The number of candidate links may be higher than the number of records in either file, indicating ambiguity of multiple links with equal agreement; for two of such candidate links, either at least one is false or both are true and it is the records in the contributing files that have not been completely deduplicated.
Source: Hospital Episode Statistics (HES), NHS Digital (Copyright © 2019. Re-used with the permission of NHS Digital. All rights reserved) and the National Down Syndrome Cytogenetic Register (NDSCR), Public Health England.
| Deterministic links | Probabilistic (MW> 40.6) | Probabilistic (MW: 30.5–40.6) | Probabilistic (MW: 18.1–30.5) | Probabilistic (MW< 18.1) | Unlinked NDSCR records | Unlinked HES cases | |
| n | |||||||
| NDSCR records | 4939 | 3694 | 449 | 662 | 534 | 137 | - |
| HES records | 4941 | 3703 | 446 | 646 | 654 | - | 2280 |
| Candidate links1 | 4941 | 3720 | 454 | 799 | 739 | - | - |
| Q90 code (in HES records) | 96.4% | 91.1% | 70.2% | 81.4% | 17.0% | - | - |
| Difference in DOB > 180 days (in candidate links) | 0.4% | < 0.3% | 1.5% | 3.6% | 6.0% | - | - |
| Sex = male | |||||||
| in NDSCR records | 55.4% | 53.9% | 52.2% | 52.9% | 54.7% | 49.3% | - |
| in HES records | 55.1% | 53.8% | 52.7% | 52.9% | 56.3% | - | 53.6% |
| Premature (<37 weeks) | |||||||
| in NDSCR records | 22.3% | 19.1% | 16.7% | 12.7% | 18.4% | 10.5% | - |
| in HES records | 23.3% | 22.3% | 22.5% | 20.7% | 10.8% | - | 23.3% |
| Age at diagnosis (in NDSCR records) | |||||||
| Prenatal | 9.9% | 10.0% | 7.1% | 3.7% | 8.5% | 7.4% | - |
| < 12 months | 89.5% | 89.7% | 91.9% | 93.7% | 81.4% | 85.2% | - |
| ≥ 12 months | 0.6% | 0.3% | 1.0% | 2.6% | 10.1% | 7.4% | - |
| Age at first diagnosis code (in HES records) | |||||||
| < 12 months | 90.9% | 89.8% | 90.4% | 88.2% | 88.9% | - | 77.7% |
| ≥ 12 months | 9.1% | 10.2% | 9.6% | 11.8% | 11.1% | - | 22.3% |
| Number of episodes in first year of life (in HES records) | |||||||
| 1 | 22.5% | 38.4% | 48.6% | 42.4% | 78.2% | - | 36.1% |
| 2–4 | 42.5% | 37.1% | 30.4% | 31.5% | 15.4% | - | 34.5% |
| ≥ 5 | 35.0% | 24.4% | 20.9% | 26.1% | 6.5% | - | 29.4% |
Figure 3: Linkage overview
Figure 4: Linkage overview