| Literature DB >> 20167118 |
Rosemary Karmel1, Phil Anderson, Diane Gibson, Ann Peut, Stephen Duckett, Yvonne Wells.
Abstract
BACKGROUND: In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services. Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases.Entities:
Mesh:
Year: 2010 PMID: 20167118 PMCID: PMC2842267 DOI: 10.1186/1472-6963-10-41
Source DB: PubMed Journal: BMC Health Serv Res ISSN: 1472-6963 Impact factor: 2.655
Data in the PIAC project
| Program | Description | Data source and years (client numbers) |
|---|---|---|
| Aged Care Assessment Program (ACAP) | Multi-disciplinary Aged Care Assessment Teams (ACATs) determine people's care needs and eligibility for RAC and packaged care (EACH and CACP), and make recommendations concerning the preferred long-term living arrangement. | National datasets: 2003-04 (105077) 2004-05 (141911) |
| Residential aged care (RAC) | RAC provides accommodation and care services to people who are no longer able to support themselves or be supported by others in their own homes, either permanently or for the short term (respite care). Care level required may be either 'low' or 'high'. Access requires approval via an ACAT assessment. | Administrative data: 1 July 2002 - 30 June 2006 (373183, including EACH) |
| Extended Aged Care at Home and Extended Aged Care at Home for people with Dementia (EACH) | Programs provide care at home that is equivalent to high-level residential care. Access requires approval via an ACAT assessment. | Administrative data: 1 July 2002 - 30 June 2006 (integrated with RAC data) |
| Community Aged Care Package program (CACP) | Program provides support services for older people with complex needs living at home who would otherwise be eligible for admission to 'low-level' residential care. CACPs provide a range of home-based services (excluding home nursing assistance and allied health services), with care being coordinated by the package provider. Access requires approval via an ACAT assessment. | Administrative data: 1 July 2002 - 30 June 2006 (80028) |
| Home and Community Care (HACC) | Program provides a large range of services (including allied health and home nursing services) to support people at home and to prevent premature or inappropriate admission to residential care. No ACAT assessment is required. | National datasets: 2002-03 (615642) 2003-04 (675446) 2004-05 (710781) 2005-06 (705261) |
| Veterans' Home Care (VHC) | Program provides a limited range of services to help veterans, war widows and widowers with low-level care needs to remain living in their own homes longer. Eligible veterans who need higher amounts of personal care than provided under VHC may be referred to other community care programs. No ACAT assessment is required. | Administrative data: 1 January 2001 - 28 February 2008 (164192) |
| National Death Index (NDI) | National register of deaths in Australia | Administrative data: 1 July 2003 - 30 December 2006 (415057 records) |
Note: For details see [25-30]. See table 3.17 [30] for types of services provided by the aged care programs.
Figure 1Linkage stages for PIAC cohort study.
Linkage stages in the PIAC project
| Stage | Dataset 1 | Dataset 2 | Minimum match rate (from one-step deterministic linkage on SLK-581) | Final match rate (from stepwise deterministic linkage) | ||
|---|---|---|---|---|---|---|
| 1 | Residential care 2002-06(a) | Community care packages 2002-06(a) | 9.4 | 44.0 | 10.2 | 47.7 |
| 2 | Deaths July 03-Dec 06(a) | RCCP 2002-06(a) (from stage 1) | 32.6 | 36.9 | 36.2 | 41.0 |
| 4 | Deaths July 03-Dec 06 not linked to RCCP (a) | ACAP 2003-04 not linked to RCCP(b) | 3.0 | 31.4 | 3.3 | 34.7 |
| 5 | PIAC cohort(b) | Aged care assessments 2004-05(b) | 29.4 | 21.8 | 30.9 | 22.9 |
| 6a | PIAC cohort(b) | Home and Community Care 2002-03(b) | 41.6 | 7.1 | 46.3 | 7.9 |
| 6b | PIAC cohort(b) | Home and Community Care 2003-04(b) | 53.4 | 8.3 | 58.5 | 9.1 |
| 6c | PIAC cohort(b) | Home and Community Care 2004-05(b) | 35.9 | 5.3 | 39.5 | 6.0 |
| 6d | PIAC cohort(b) | Home and Community Care 2005-06(b) | 22.8 | 3.4 | 26.2 | 3.9 |
| 7 | PIAC cohort(b) | Veterans' Home Care January 2001 - March 2008(a) | 11.4 | 7.3 | 12.2 | 7.8 |
(a) Clients identified through administrative processes, and de-duplication.
(b) Clients identified by SLK-581 within broad region (1st digit of postcode)
Note: For the datasets with administratively-derived person identifiers, data for all years in the study were linked at the same time. For data sets with clients identified by SLK-581 data were linked for each year separately to allow for variation over time in reported SLK-581.
128 keys based on components of SLK-581 and region
| Key no. | Key description(a) | Key no. | Key description(a) | Key no. | Key description(a) |
|---|---|---|---|---|---|
| 1 | s3g2|dmyob|s|pc | 44 | s3g2|__yob|s|st | 87 | _g2|__yob|_|pc2 |
| 2 | s3g2|dmyob|_|pc | 45 | _g2|dmyob|s|st | 88 | __|__yob|_|pc |
| 3 | s3g2|dm_ob|s|pc | 46 | s3_|dm_ob|_|pc2 | 89 | s3g2|___ob|_|_ |
| 4 | s3g2|dmyob|s|pc2 | 47 | s3g2|__yob|_|st | 90 | _g2|dm_ob|_|_ |
| 5 | s3_|dmyob|s|pc | 48 | _g2|dmyob|_|st | 91 | s3_|_yob|_|_ |
| 6 | s3g2|dm_ob|_|pc | 49 | _g2|__yob|s|pc | 92 | s3_|___ob|s|pc2 |
| 7 | s3g2|dmyob|_|pc2 | 50 | s3_|dm_ob|s|st | 93 | __|dmyob|_|_ |
| 8 | s3_|dmyob|_|pc | 51 | s3g2|__yob|s|_ | 94 | __|dm_ob|s|pc2 |
| 9 | s3g2|dmyob|s|st | 52 | _g2|dmyob|s|_ | 95 | _g2|__yob|s|st |
| 10 | s3g2|dmyob|_|st | 53 | _g2|__yob|_|pc | 96 | s3_|___ob|_|pc2 |
| 11 | s3g2|__yob|s|pc | 54 | s3_|dm_ob|_|st | 97 | __|dm_ob|_|pc2 |
| 12 | _g2|dmyob|s|pc | 55 | s3g2|__yob|_|_ | 98 | _g2|__yob|_|st |
| 13 | s3g2|dmyob|s|_ | 56 | s3g2|___ob|s|pc2 | 99 | s3_|___ob|s|st |
| 14 | s3g2|__yob|_|pc | 57 | s3_|___ob|s|pc | 100 | __|dm_ob|s|st |
| 15 | _g2|dmyob|_|pc | 58 | _g2|dmyob|_|_ | 101 | _g2|__yob|s|_ |
| 16 | s3g2|dmyob|_|_ | 59 | _g2|dm_ob|s|pc2 | 102 | s3_|___ob|_|st |
| 17 | s3g2|dm_ob|s|pc2 | 60 | __|dm_ob|s|pc | 103 | __|dm_ob|_|st |
| 18 | s3_|dm_ob|s|pc | 61 | s3_|__yob|s|pc2 | 104 | _g2|__yob|_|_ |
| 19 | s3_|dmyob|s|pc2 | 62 | __|dmyob|s|pc2 | 105 | _g2|___ob|s|pc2 |
| 20 | s3g2|dm_ob|_|pc2 | 63 | s3_|dm_ob|s|_ | 106 | __|___ob|s|pc |
| 21 | s3_|dm_ob|_|pc | 64 | s3g2|___ob|_|pc2 | 107 | __|__yob|s|pc2 |
| 22 | s3_|dmyob|_|pc2 | 65 | s3_|___ob|_|pc | 108 | s3_|___ob|s|_ |
| 23 | s3g2|dm_ob|s|st | 66 | _g2|dm_ob|_|pc2 | 109 | __|dm__ob|s|_ |
| 24 | s3_|dmyob|s|st | 67 | __|dm_ob|_|pc | 110 | _g2|___ob|_|pc2 |
| 25 | s3g2|dm_ob|_|st | 68 | s3_|__yob|_|pc2 | 111 | __|___ob|_|pc |
| 26 | s3g2|___ob|s|pc | 69 | __|dmyob|_|pc2 | 112 | __|__yob|_|pc2 |
| 27 | s3_|dmyob|_|st | 70 | s3_|dm_ob|_|_ | 113 | s3_|_ob|_|_ |
| 28 | _g2|dm_ob|s|pc | 71 | s3g2|___ob|s|st | 114 | __|dm_ob|_|_ |
| 29 | s3g2|__yob|2|pc2 | 72 | _g2|dm_ob|s|st | 115 | _g2|___ob|s|st |
| 30 | s3_|__yob|s|pc | 73 | s3_|__yob|s|st | 116 | __|__yob|s|st |
| 31 | _g2|dmyob|s|pc2 | 74 | __|dmyob|s|st | 117 | _g2|___ob|_|st |
| 32 | __|dmyob|s|pc | 75 | s3g2|___ob|_|st | 118 | __|__yob|_|st |
| 33 | s3g2|dm_ob|s|_ | 76 | _g2|dm_ob|_|st | 119 | _g2|___ob|s|_ |
| 34 | s3g2|___ob|_|pc | 77 | s3_|__yob|_|st | 120 | __|__yob|s|_ |
| 35 | s3_|dmyob|s|_ | 78 | _g2|___ob|s|pc | 121 | _g2|___ob|_|_ |
| 36 | _g2|dm_ob|_|pc | 79 | __|dmyob|_|st | 122 | __|__yob|_|_ |
| 37 | s3g2|__yob|_|pc2 | 80 | _g2|__yob|s|pc2 | 123 | __|___ob|s|pc2 |
| 38 | s3_|__yob|_|pc | 81 | __|__yob|s|pc | 124 | __|___ob|_|pc2 |
| 39 | _g2|dmyob|_|pc2 | 82 | s3g2|___ob|s|_ | 125 | __|___ob|s|st |
| 40 | __|dmyob|_|pc | 83 | _g2|dm_ob|s|_ | 126 | __|___ob|_|st |
| 41 | s3g2|dm_ob|_|_ | 84 | s3_|__yob|s|_ | 127 | __|___ob|s|_ |
| 42 | s3_|dmyob|_|_ | 85 | _g2|___ob|_|pc | 128 | __|___ob|_|_ |
| 43 | s3_|dm_ob|s|pc2 | 86 | __|dmyob|s|_ | ||
(a) Key is a concatenation of data elements, indicated as follows:
• s3: the 2nd, 3rd and 5th letters of the family name substituting '2' for short names
• g2: the 2nd and 3rd letters of the given name substituting '2' for short names
• yob: year of birth
• dmob: day and month of birth
• s: sex
• pc: 4 digit postcode
• pc2: first two digits of 4 digit postcode
• st: state
• _: indicates that the component is not included in the match key.
Note: Order is from multiplying the number of the largest categories that account for roughly half of the clients within each key element (s3: 204, g2: 20, dmob: 182, yob: 16, s: 1, st: 2, pc2: 8, pc: 290). Key 13 is SLK-581.
Criteria for selecting match keys for PIAC stage 3 (examples for 60 keys)
| Key no. | Linkage key | Joint. unique key rate (measure A) | (a)Est. number of links | Est. FMR (measure B) | (b)Comparison key | Marginal true: false (measure C) | (c)Est. 'worst case' FMR |
|---|---|---|---|---|---|---|---|
| 1 | s3g2|dmYOB|s|pc | 99.999 | 55631 | 0.00 | 701 | >1000 | 0.04 |
| 2 | s3g2|dmYOB|_|pc | 99.957 | 56120 | 0.00 | 702 | >1000 | 0.09 |
| 3 | s3g2|dm_ob|s|pc | 99.878 | 57047 | 0.01 | 703 | >1000 | 0.82 |
| 4 | s3g2|dmYOB|s|pc2 | 99.993 | 63788 | 0.01 | 704 | >1000 | 0.55 |
| 5 | s3_|dmYOB|s|pc | 99.896 | 56819 | 0.01 | 705 | 925.9 | 0.48 |
| 6 | s3g2|dm_ob|_|pc | 99.878 | 57547 | 0.02 | 706 | 578.7 | 1.63 |
| 7 | s3g2|dmYOB|_|pc2 | 99.934 | 64338 | 0.02 | 707 | 592.1 | 1.09 |
| 8 | s3_|dmYOB|_|pc | 99.896 | 57326 | 0.03 | 708 | 466.2 | 0.95 |
| 9 | s3g2|dmYOB|s|st | 99.981 | 67206 | 0.04 | 709 | 317.7 | 1.93 |
| 10 | s3g2|dmYOB|_|st | 99.897 | 67781 | 0.08 | 710 | 159.5 | 3.82 |
| 11 | s3g2|__YOB|s|pc | 99.715 | 58484 | 0.12 | 711 | 103.9 | 15.40 |
| 12 | _g2|dmYOB|s|pc | 99.797 | 56031 | 0.14 | 712 | 88.2 | 3.17 |
| 13 | s3g2|dmYOB|s|_ | 99.792 | 67743 | 0.17 | 713 | 80.7 | 5.74 |
| 14 | s3g2|__YOB|_|pc | 99.613 | 59012 | 0.23 | 714 | 51.9 | 30.52 |
| 15 | _g2|dmYOB|_|pc | 99.707 | 56541 | 0.27 | 715 | 44.0 | 6.28 |
| 16 | s3g2|dmYOB|_|_ | 99.650 | 68327 | 0.29 | 716 | 44.9 | 10.23 |
| 17 | s3g2|dm_ob|s|pc2 | 99.647 | 65447 | 0.34 | 717 | 36.9 | 10.16 |
| 18 | s3_|dm_ob|s|pc | 99.478 | 58319 | 0.41 | 718 | 28.9 | 8.84 |
| 19 | s3_|dmYOB|s|pc2 | 99.583 | 65185 | 0.43 | 719 | 29.5 | 5.90 |
| 601 | s3g2|dmYOB|s|pc | 100.000 | 44977 | 0.00 | . . | . . | 0.00 |
| 602 | s3g2|dmYOB|_|pc | 99.998 | 45392 | 0.00 | 601 | >1000 | 0.00 |
| 603 | s3g2|dm_ob|s|pc | 99.998 | 46105 | 0.00 | 601 | >1000 | 0.01 |
| 604 | s3g2|dmYOB|s|pc2 | 100.000 | 51170 | 0.00 | 601 | >1000 | 0.00 |
| 605 | s3_|dmYOB|s|pc | 99.992 | 45855 | 0.00 | 601 | >1000 | 0.00 |
| 606 | s3g2|dm_ob|_|pc | 99.998 | 46529 | 0.00 | 603 | >1000 | 0.01 |
| 607 | s3g2|dmYOB|_|pc2 | 99.998 | 51629 | 0.00 | 604 | >1000 | 0.01 |
| 608 | s3_|dmYOB|_|pc | 99.992 | 46276 | 0.00 | 602 | >1000 | 0.01 |
| 609 | s3g2|dmYOB|s|st | 100.000 | 53592 | 0.00 | 604 | >1000 | 0.02 |
| 610 | s3g2|dmYOB|_|st | 99.998 | 54071 | 0.00 | 609 | >1000 | 0.03 |
| 611 | s3g2|__YOB|s|pc | 99.976 | 47166 | 0.00 | 601 | >1000 | 0.12 |
| 612 | _g2|dmYOB|s|pc | 99.978 | 45258 | 0.00 | 601 | >1000 | 0.02 |
| 613 | s3g2|dmYOB|s|_ | 100.000 | 53901 | 0.00 | 609 | >1000 | 0.04 |
| 614 | s3g2|__YOB|_|pc | 99.962 | 47607 | 0.00 | 602 | >1000 | 0.23 |
| 615 | _g2|dmYOB|_|pc | 99.976 | 45678 | 0.00 | 612 | >1000 | 0.05 |
| 616 | s3g2|dmYOB|_|_ | 99.998 | 54382 | 0.00 | 613 | >1000 | 0.09 |
| 617 | s3g2|dm_ob|s|pc2 | 99.994 | 52466 | 0.00 | 604 | >1000 | 0.08 |
| 618 | s3_|dm_ob|s|pc | 99.986 | 47016 | 0.00 | 606 | 776.8 | 0.07 |
| 619 | s3_|dmYOB|s|pc2 | 99.968 | 52178 | 0.00 | 604 | >1000 | 0.05 |
| 620 | s3g2|dm_ob|_|pc2 | 99.992 | 52936 | 0.00 | 617 | 772.5 | 0.16 |
| 701 | s3g2|dmYOB|s|pc | 100.000 | 49060 | 0.00 | . . | . . | 0.00 |
| 702 | s3g2|dmYOB|_|pc | 99.984 | 49502 | 0.00 | 701 | >1000 | 0.00 |
| 703 | s3g2|dm_ob|s|pc | 99.957 | 50305 | 0.00 | 701 | >1000 | 0.04 |
| 704 | s3g2|dmYOB|s|pc2 | 99.996 | 55840 | 0.00 | 701 | >1000 | 0.03 |
| 705 | s3_|dmYOB|s|pc | 99.952 | 50034 | 0.00 | 701 | >1000 | 0.02 |
| 706 | s3g2|dm_ob|_|pc | 99.957 | 50757 | 0.00 | 703 | >1000 | 0.07 |
| 707 | s3g2|dmYOB|_|pc2 | 99.977 | 56333 | 0.00 | 704 | >1000 | 0.05 |
| 708 | s3_|dmYOB|_|pc | 99.952 | 50486 | 0.00 | 702 | >1000 | 0.04 |
| 709 | s3g2|dmYOB|s|st | 99.989 | 58515 | 0.00 | 704 | >1000 | 0.09 |
| 710 | s3g2|dmYOB|_|st | 99.965 | 59029 | 0.00 | 709 | >1000 | 0.18 |
| 711 | s3g2|__YOB|s|pc | 99.847 | 51479 | 0.00 | 701 | >1000 | 0.70 |
| 712 | _g2|dmYOB|s|pc | 99.869 | 49369 | 0.00 | 701 | 152.5 | 0.14 |
| 713 | s3g2|dmYOB|s|_ | 99.944 | 58836 | 0.01 | 709 | 144.2 | 0.26 |
| 714 | s3g2|__YOB|_|pc | 99.792 | 51951 | 0.01 | 702 | 679.1 | 1.39 |
| 715 | _g2|dmYOB|_|pc | 99.822 | 49816 | 0.01 | 712 | 220.5 | 0.29 |
| 716 | s3g2|dmYOB|_|_ | 99.892 | 59352 | 0.01 | 713 | 174.0 | 0.52 |
| 717 | s3g2|dm_ob|s|pc2 | 99.855 | 57266 | 0.01 | 704 | 251.2 | 0.46 |
| 718 | s3_|dm_ob|s|pc | 99.715 | 51314 | 0.01 | 706 | 91.6 | 0.40 |
| 719 | s3_|dmYOB|s|pc2 | 99.803 | 56960 | 0.01 | 704 | 156.5 | 0.27 |
| 720 | s3g2|dm_ob|_|pc2 | 99.796 | 57771 | 0.02 | 717 | 85.5 | 0.92 |
(a) Estimated number of links was derived from simple deterministic matching on the key (retaining only one occurrence of duplicates).
(b) Comparative linkage key is one which is slightly more detailed and includes all the match key elements of the current key. There is not a strict hierarchy for the linkage keys, so in some cases there may be more than one appropriate key for the comparison.
(c) 'Worst case' FMR is estimated assuming that the number of categories within a key element is equal to that implied by the most common category (s3: 72, g2: 11, dmob: 182, yob: 19, s: 2, st: 3, pc2: 11, pc: 156, aged care assessment date: 161, assessment team identifier: 25).
Note: See note to Table 3 for definition of keys; '600' series include assessment date; '700' series linkage keys include assessment team identifier. Table only includes keys that were expected to have fewer than four times as many people with non-unique match keys as SLK-581. This equates to key 20 if client region is the only additional match data, key 64 if aged care assessment date and region are included and key 46 if assessment team identifier and region are included. Keys in are those identified as not selected for use. Table showing all tested keys is available from the authors on request.
Match results from stepwise matching for first 50 steps for PIAC stage 3
| Step | Linkage key | Matches | Step | Linkage key | Matches |
|---|---|---|---|---|---|
| 1 | key_601 | 45031 | 26 | key_610 | 15 |
| 2 | key_601_2 | 904 | 27 | key_610_2 | . |
| 3 | key_601_3 | 1357 | 28 | key_616 | 1 |
| 4 | key_601_4 | 26 | 29 | key_616_2 | . |
| 5 | key_604 | 5151 | 30 | key_617 | 113 |
| 6 | key_604_2 | 99 | 31 | key_617_2 | 2 |
| 7 | key_604_3 | . | 32 | key_617_3 | 6 |
| 8 | key_604_4 | 146 | 33 | key_617_4 | . |
| 9 | key_609 | 2027 | 34 | key_605 | 725 |
| 10 | key_609_2 | 28 | 35 | key_605_2 | 22 |
| 11 | key_613 | 257 | 36 | key_605_3 | 30 |
| 12 | key_613_2 | 3 | 37 | key_605_4 | 1 |
| 13 | key_602 | 383 | 38 | key_608 | 6 |
| 14 | key_602_2 | 13 | 39 | key_620 | 2 |
| 15 | key_602_3 | 4 | 40 | key_618 | 30 |
| 16 | key_602_4 | . | 41 | key_612 | 227 |
| 17 | key_603 | 883 | 42 | key_612_2 | 1 |
| 18 | key_603_2 | 12 | 43 | key_612_3 | 7 |
| 19 | key_603_3 | 19 | 44 | key_612_4 | . |
| 20 | key_603_4 | 1 | 45 | key_611 | 1697 |
| 21 | key_606 | 8 | 46 | key_611_2 | 13 |
| 22 | key_607 | 36 | 47 | key_611_3 | 52 |
| 23 | key_607_2 | . | 48 | key_611_4 | 1 |
| 24 | key_607_3 | 3 | 49 | key_615 | 3 |
| 25 | key_607_4 | . | 50 | key_623 | 53 |
| Matches with aged care assessment date (steps 1-105) | 60780 | ||||
| Matches with assessment team identifier (steps 106-188) | 7255 | ||||
| Other matches (steps 189-215) | 8254 | ||||
| Total matches | 76289 | ||||
Legend for linkage key suffixes:
No suffix: linkage used preferred RCCP SLK-581 and preferred postcode
_2: linkage used alternative RCCP SLK-581 and preferred postcode
_3: linkage used preferred RCCP SLK-581 and alternative postcode
_4: linkage used alternative RCCP SLK-581 and alternative postcode
Note: See Table 3 for definition of keys. '600' series linkage keys use assessment date. Order of use is based on the joint unique key rate in Table 4. Table showing results for all selected keys is available from the authors on request.