| Literature DB >> 30219957 |
Shivani Padmanabhan1, Lucy Carty2, Ellen Cameron3, Rebecca E Ghosh2, Rachael Williams2, Helen Strongman2.
Abstract
Record linkage is increasingly used to expand the information available for public health research. An understanding of record linkage methods and the relevant strengths and limitations is important for robust analysis and interpretation of linked data. Here, we describe the approach used by Clinical Practice Research Datalink (CPRD) to link primary care data to other patient level datasets, and the potential implications of this approach for CPRD data analysis. General practice electronic health record software providers separately submit de-identified data to CPRD and patient identifiers to NHS Digital, excluding patients who have opted-out from contributing data. Data custodians for external datasets also send patient identifiers to NHS Digital. NHS Digital uses identifiers to link the datasets using an 8-stage deterministic methodology. CPRD subsequently receives a de-identified linked cohort file and provides researchers with anonymised linked data and metadata detailing the linkage process. This methodology has been used to generate routine primary care linked datasets, including data from Hospital Episode Statistics, Office for National Statistics and National Cancer Registration and Analysis Service. 10.6 million (M) patients from 411 English general practices were included in record linkage in June 2018. 9.1M (86%) patients were of research quality, of which 8.0M (88%) had a valid NHS number and were eligible for linkage in the CPRD standard linked dataset release. Linking CPRD data to other sources improves the range and validity of research studies. This manuscript, together with metadata generated on match strength and linkage eligibility, can be used to inform study design and explore potential linkage-related selection and misclassification biases.Entities:
Keywords: Clinical Practice Research Datalink; Deterministic linkage; Electronic health records; Primary care data; Record linkage
Mesh:
Year: 2018 PMID: 30219957 PMCID: PMC6325980 DOI: 10.1007/s10654-018-0442-4
Source DB: PubMed Journal: Eur J Epidemiol ISSN: 0393-2990 Impact factor: 8.082
CPRD routine linkages
| Hospital Episode Statistics Admitted Patient Care (HES APC) |
| Hospital Episode Statistics Outpatient (HES OP) |
| Hospital Episode Statistics Accident and Emergency (HES A&E) |
| Hospital Episode Statistics Diagnostic Imaging Dataset (HES DID) |
| Office of National Statistics (ONS) Death Registration |
| National Cancer Registration and Analysis Service (NCRAS) data from Public Health England (PHE) including: |
| Cancer registration data |
| Cancer Patient Experience Survey (CPES) data |
| Systemic Anti-Cancer Treatment (SACT) data |
| National Radiotherapy Dataset (RTDS) |
| Mental Health Dataset (MHDS) data |
| Measures of relative deprivation and rural urban classification at Lower Layer Super Output Area (LSOA) level for practices and patients |
Fig. 1Primary care and linked data flow. De-identified linked data can either flow from external data custodians to NHS Digital and subsequently to CPRD, or directly from external data custodians to CPRD
Data submitted to NHS Digital. Italicized text indicates personal identifier used for linkage
| General practice system providers | External data custodians |
|---|---|
| System patient identifier | Patient linkage identifier |
| System practice identifier |
|
|
|
|
|
|
|
|
|
|
|
|
Deterministic linkage steps
| Step (match rank) | Match required |
|---|---|
| 1 | Exact NHS number, gender, DOB and postcode |
| 2 | Exact NHS number, gender and DOB |
| 3 | Exact NHS number, gender, postcode and partial DOB |
| 4 | Exact NHS number, gender and partial DOB |
| 5 | Exact NHS number and postcode |
| 6 | Exact gender, DOB and postcode |
| 7 | Exact gender, DOB and postcode |
| 8 | Exact NHS number |
aCommunal establishments include: hospitals, care homes, prisons, defence bases, boarding schools and student halls of residence
Proportion of patients matched in CPRD GOLD-HES linkage at each match rank for the three most recent linkage sets
| Linkage set version 14 June 2017 | Linkage set version 15 December 2017 | Linkage set version 16 June 2018 | |
|---|---|---|---|
| Patients in CPRD GOLD cohort | 10,425,601 | 10,494,935 | 10,553,586 |
| Patients eligible to be linked to HES data in CPRD standard linked dataset | 8,328,954 | 8,391,529 | 8,444,946 |
| Patients matched to HES on match rank 1 | 5,098,291 (67.19%) | 5,186,589 (67.50%) | 5,241,901 (67.59%) |
| Patients matched to HES on match rank 2 | 2,204,352 (29.05%) | 2,211,157 (28.78%) | 2,227,150 (28.72%) |
| Patients matched to HES on match rank 3 | 13,316 (0.18%) | 13,318 (0.17%) | 13,344 (0.17%) |
| Patients matched to HES on match rank 4 | 17,241 (0.23%) | 17,385 (0.23%) | 17,528 (0.23%) |
| Patients matched to HES on match rank 5 | 3678 (0.05%) | 3600 (0.05%) | 3567 (0.05%) |
| Patients matched to HES on match rank 6 | 232,331 (3.06%) | 232,287 (3.02%) | 232,007 (2.99%) |
| Patients matched to HES on match rank 7 | 13,730 (0.18%) | 13,948 (0.18%) | 13,992 (0.18%) |
| Patients matched to HES on match rank 8 | 5483 (0.07%) | 5431 (0.07%) | 5396 (0.07%) |
As of June 2018, the latest set of linkage data, referred to as set 16, is available for both CPRD GOLD, based on the Vision software system, and CPRD Aurum, based on EMIS software. This table is based on the CPRD GOLD data