| Literature DB >> 19149883 |
Ronan A Lyons1, Kerina H Jones, Gareth John, Caroline J Brooks, Jean-Philippe Verplancke, David V Ford, Ginevra Brown, Ken Leake.
Abstract
BACKGROUND: Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress.Entities:
Mesh:
Year: 2009 PMID: 19149883 PMCID: PMC2648953 DOI: 10.1186/1472-6947-9-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Assessing the accuracy of NHS numbers in routine data.
| Data Source | Type of Record Linkage | Result of comparing the NHS number allocated by the record linkage process with the original submitted NHS number | |||||
|---|---|---|---|---|---|---|---|
| Same | Different | Not found | % Agreement | % Disagreement | % Linked | ||
| Allocated NHS number | Allocated NHS number | An NHS number is | Of the records that were allocated an NHS number, the percentage that were allocated an NHS number equal to the NHS number submitted | Of the records that were allocated an NHS number, the percentage that were allocated an NHS number different to the NHS number submitted | Of the records that were processed, the percentage that were allocated an NHS Number | ||
| a | b | c | = a/(a+b) | = b/(a+b) | = (a+b)/(a+b+c) | ||
| Primary Care Practice Clinical Systems (GP) (n = 229,117) | DRL | 223,344 | 40 | 5,733 | 99.982% | 0.018% | 97.498% |
| PRL – 99% cut off | 227,778 | 51 | 1,288 | 99.978% | 0.022% | 99.438% | |
| PRL – 95% cut off | 228,288 | 55 | 774 | 99.976% | 0.024% | 99.662% | |
| PRL – 90% cut off | 228,479 | 56 | 582 | 99.976% | 0.025% | 99.746% | |
| PRL – 50% cut off | 228,699 | 61 | 357 | 99.973% | 0.027% | 99.844% | |
| Secondary Care Hospital Admissions (PEDW) (n = 264,868) | DRL | 216,062 | 323 | 48,483 | 99.851% | 0.149% | 81.695% |
| PRL – 99% cut off | 244,692 | 410 | 19,766 | 99.833% | 0.167% | 92.537% | |
| PRL – 95% cut off | 247,865 | 439 | 16,564 | 99.823% | 0.177% | 93.746% | |
| PRL – 90% cut off | 249,024 | 453 | 15,391 | 99.818% | 0.182% | 94.189% | |
| PRL – 50% cut off | 250,155 | 465 | 14,248 | 99.815% | 0.186% | 94.621% | |
This shows the level of agreement between NHS numbers supplied in the General Practice (GP) dataset (n = 229,117) and the Patient Episode Database for Wales (PEDW) dataset (n = 264,868) with those allocated by the matching process using by DRL and PRL. The NHS Administrative Register (NHSAR) was used as the reference.
Levels of matched records using a variety of techniques.
| Levels of matched records | ||||||
|---|---|---|---|---|---|---|
| Primary Care General Practice | Secondary Care Hospital Admissions | Social Services | ||||
| Number | % | Number | % | Number | % | |
| Sample size | 229,127 | 290,650 | 18,540 | |||
| Valid NHS Number | 229,117 | 99.996% | 264,868 | 91.13% | - | 0.00% |
| Valid NHS Number plus DRL: | 229,123 | 99.998% | 280,729 | 96.59% | 14,158 | 76.36% |
| Valid NHS Number plus PRL (99% cut off): | 229,125 | 99.999% | 287,572 | 98.94% | 17,095 | 92.21% |
| Valid NHS Number plus PRL (95% cut off): | 229,125 | 99.999% | 288,186 | 99.15% | 17,431 | 94.02% |
| Valid NHS Number plus PRL (90% cut off): | 229,125 | 99.999% | 288,424 | 99.23% | 17,553 | 94.68% |
| Valid NHS Number plus PRL (50% cut off): | 229,125 | 99.999% | 288,670 | 99.32% | 17,639 | 95.14% |
The numbers (and percentages) of records that could be matched using deterministic record linkage (DRL) and a various thresholds of probabilistic record linkage (PRL) were assessed for each of three test datasets: the GP dataset, the PEDW dataset and the PARIS database. Records with a valid NHS number were accepted. The matching rate achieved by applying DRL followed by PRL (to the 50% threshold) was also assessed, and the final row shows this result of operating the MACRAL algorithm as illustrated in Figure 1.
Figure 1The matching process conducted via the MACRAL algorithm. Firstly, records found to have a valid NHS number are accepted. The Matching Algorithm for Consistent Results in Anonymised Linkage (MACRAL) begins with DRL for exact matching on the set of five variables. Following from this, the remaining unmatched records are subjected to PRL methods down to the 50% threshold. Datasets from non-NHS organisations enter the process at DRL.