| Literature DB >> 34970633 |
Gill Harper1, David Stables2, Paul Simon2, Zaheer Ahmed1, Kelvin Smith1, John Robson1, Carol Dezateux1.
Abstract
INTRODUCTION: Linking places to people is a core element of the UK government's geospatial strategy. Matching patient addresses in electronic health records to their Unique Property Reference Numbers (UPRNs) enables spatial linkage for research, innovation and public benefit. Available algorithms are not transparent or evaluated for use with addresses recorded by health care providers.Entities:
Keywords: address-matching; addresses; data linkage; electronic health record; place-based health; population health; quality assurance
Mesh:
Year: 2021 PMID: 34970633 PMCID: PMC8678979 DOI: 10.23889/ijpds.v6i1.1674
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
|
|
|
|
| mapped also to | & | Indicates a match using more than one candidate field |
| moved to | > | Means that the candidate field was moved to another field to match e.g. number moved to flat |
| moved from | < | Means that the candidate field was moved from another field to match on this field |
| field merged | f | when moved from and to, the fields are then merged to match |
| ABP field ignored | i | ABP field was ignored in order to match i.e. the ABP address contained more precise detail than the candidate but was unnecessary in order to match. This usually means that the candidate field is null |
| Candidate field dropped | d | The candidate field was dropped in order to match i.e. the candidate address has more precise detail than the authority address. The ABP address would probably be null |
| Matched as parent | a | The candidate field matched as being at a higher level than the ABP field, for example flat 6 matching to flat 6a |
| Matched as child | c | The candidate field matched as being at a lower level than the ABP field, for example candidate flat 6a, ABP flat 6 |
| Partial match | p | The candidate field was partially matched to the ABP field (or vice versa) typically 2 out of 3 words |
| Possible spelling error | l | The candidate field and ABP field were matched using the Levenshtein distance algorithm taking account of misspellings |
| Level based match | v | The level of a flat in a building (vertical from the street) was used to create the match e.g. 2b for second floor b |
| Equivalent | e | The fields are equivalent, albeit not necessarily spelled the same, using various equivalence lists, word swaps, word drops etc |
|
|
|
|
| |
| Algorithm version | Version of algorithm used |
| Match date | Date match made |
| Qualifier | One of four match qualifiers: best match, child, parent, sibling |
| Match rule | Label identifying which section of code made the match |
| Match pattern | The combination of manipulation qualifiers used on each of the five address fields used to make the match |
|
| |
| UPRN | Unique Property Reference Number, from ABP |
| Epoch | ABP Epoch used |
| Property classification | The property classification type, from ABP |
| x-coordinate | UPRN geographical easting coordinate, from ABP |
| y-coordinate | UPRN geographical northing coordinate, from ABP |
| latitude | UPRN geographical latitude coordinate, from ABP |
| longitude | UPRN geographical longitude coordinate, from ABP |
| ABP address | UPRN associated address string, from ABP |
ABP = AddressBase Premium.
|
| |
| Age on census date (16/11/2020) | Years |
| Self-reported ethnic group | NHS 16 + 1 classification [ |
| Sex | Male, female, other |
| Deprivation | LSOA level IMD 2019 quintiles |
| Mobility | Number of different GP registrations in previous 12 months, number of address changes in previous 12 months |
|
| |
| Age at registration | Years <1, 1–14, 15–29, 30–64, 65–84, 85 and over); |
| Duration of registration | Days (quartiles) |
|
| |
| Commissioner | GP practice Clinical Commissioning Group (CCG) |
| EHR supplier system | EMIS, SystmOne, or Vision |
LSOA = Lower Super Output Area, IMD = Index of Multiple deprivation, CCG = Clinical Commissioning Group, EHR = Electronic Health Record.
Table 4: Data linkage accuracy metrics (modified from GUILD [23])
|
|
|
| Positive Predictive Value (PPV) | The proportion of record pairs classified by the algorithm as links that are true matches. Also known as precision. |
| Sensitivity | The proportion of true matches that are correctly classified as links. Also known as recall. |
| F-measure | The harmonic mean between positive predictive value and sensitivity. Often used to compare the overall efficiency of a method. |
| F-measure = 2*(PPV*sensitivity)/(PPV + sensitivity) |
|
|
| Prevalence ratio | 99% CI lower | 99% CI upper | |
|
| |||||
| < |
|
| |||
| 1–14 | 93,868 | -0.13 | 0.999 | 0.997 | 1 |
| 15–29 | 485,945 | - | 0.986 |
|
|
| 30–64 | 811,582 | -0.8 | 0.992 | 0.990 | 0.994 |
| 65–84 | 52,385 | -0.55 | 0.995 | 0.992 | 0.997 |
| 85 and over | 4,334 | - | 0.981 |
|
|
|
| |||||
|
|
|
| |||
| African | 100,142 | 0 | 1 | 0.998 | 1.002 |
| Any other Asian background | 60,999 | -0.2 | 0.998 | 0.994 | 1.002 |
| Any other Black background | 43,764 | 0.28 | 1.003 | 1.000 | 1.005 |
| Any other White background | 336,182 | -0.31 | 0.997 | 0.995 | 0.999 |
| Any other ethnic group | 52,610 | -0.31 | 0.997 | 0.993 | 1.001 |
| Any other Mixed background | 14,944 | -0.8 | 0.992 | 0.988 | 0.996 |
| Bangladeshi | 145,379 | 0.55 | 1.006 | 1.003 | 1.009 |
| Caribbean | 47,653 | 0.43 | 1.004 | 1.002 | 1.006 |
| Chinese | 21,819 | - | 0.968 |
|
|
| Indian | 120,431 | -0.11 | 0.999 | 0.994 | 1.003 |
| Irish | 12,945 | -0.22 | 0.998 | 0.994 | 1.002 |
| Not stated | 25,826 | -0.79 | 0.993 | 0.987 | 0.999 |
| Pakistani | 93,146 | 0.26 | 1.003 | 1 | 1.005 |
| White and Asian | 4,905 | -0.51 | 0.995 | 0.990 | 1 |
| White and Black African | 9,918 | -0.62 | 0.994 | 0.986 | 1.002 |
| White and Black Caribbean | 12,075 | -0.42 | 0.996 | 0.991 | 1.001 |
|
| |||||
| Female |
|
| |||
| Male | 741,745 | -0.19 | 0.998 | 0.997 | 0.999 |
|
| |||||
|
|
|
| |||
| 2 | 666,104 | 0.07 | 1.001 | 0.998 | 1.003 |
| 3 | 268,097 | 0.12 | 1.001 | 0.997 | 1.005 |
| 4 | 116,122 | -0.25 | 0.997 | 0.989 | 1.005 |
| 5 (least deprived) | 60,391 | 0.07 | 1.001 | 0.995 | 1.006 |
|
| |||||
|
|
|
| |||
| 2 | 388,014 | 0.66 | 1.007 | 1.004 | 1.009 |
| 3 | 381,225 |
| 1.014 |
|
|
| 4 (longest) | 322,294 |
| 1.018 |
|
|
|
| |||||
| 1 | 1,336,709 | Ref | |||
| 2 | 126,645 | 0.1 | 1.001 | 0.999 | 1.003 |
| 3 or more | 14,789 | -0.29 | 0.997 | 0.992 | 1.002 |
| Number of address changes in preceding 12 months | |||||
|
|
|
| |||
| 2 | 305,838 | -0.69 | 0.993 | 0.99 | 0.996 |
| 3 or more | 88,422 | - | 0.969 |
|
|
|
| |||||
|
|
|
| |||
| SystmOne | 77,354 | - | 0.973 |
|
|
| VISION | 30,419 | 0.52 | 1.005 | 0.999 | 1.012 |
|
| |||||
|
|
|
| |||
| Barking & Dagenham | 122,432 | -0.12 | 0.999 | 0.993 | 1.004 |
| City & Hackney | 232,840 | -0.88 | 0.991 | 0.986 | 0.996 |
| Havering | 135,262 | 0.22 | 1.002 | 0.998 | 1.006 |
| Redbridge | 212,088 | -0.36 | 0.996 | 0.990 | 1.003 |
| Tower Hamlets | 244,643 | - | 0.986 |
|
|
| Waltham Forest | 224,440 | -0.65 | 0.993 | 0.987 | 1 |
Complete case analysis; N = 1,478,143.
IMD = Index of Multiple Deprivation.
Quartile definitions for GP registration duration: Quartile 1 (shortest): 0–32 months; Quartile 2: 33–77 months; Quartile 3: 78–183 months; Quartile 4 (longest) > 184 months.
EMIS: Egton Medical Information Systems.
Reference groups and values with an absolute match rate difference to the reference group of >1% are in bold.
Figure 1: Adjusted prevalence ratios and 99% CIs for number of address changes in the preceding 12 months, GP EHR system, and GP registration duration