Literature DB >> 31892664

Internal deterministic record linkage using indirect identifiers for matching of same-patient hospital transfers and early readmissions after acute coronary syndrome in a nationwide hospital discharge database: a retrospective observational validation study.

Afonso Rocha1,2, Luıs Filipe Azevedo3, J C Silva Cardoso4, Thomas G Allison5, Alberto Freitas3.   

Abstract

OBJECTIVES: To assess validity of record linkage using multiple indirect personal identifiers to identify same-patient hospitalisations and definition of episode of care (EC) due to acute coronary syndrome (ACS).
METHODS: Using national hospital discharge data to identify all admissions due to ACS, we used six different linkage rules using indirect identifiers with increasing level of detail and compared validity against a pseudonymised unique identifier used as gold standard (GS). Contiguous hospitalisations within each matched group of hospitalizations occurring within 28 days of each other were considered one EC. We classified hospitalisations according to time between the first pair of hospitalisations as hospital transfer (HT: ≤1 day), early readmission (ER: 2-28 days) or recurrent cases (>28 days).
RESULTS: There were 146 671 hospitalisations (unlinked), 121 987 ACS 28-day EC (linked GS), with 18 398 HTs (≤1 day), and 6286 ERs (≤28 days). Linkage rules using demographic and residence code variables produced linkage rates with highest validity for rule using sex, date of birth and four-digit residence code with sensitivity of 98.4 (95% CI: 98.4 to 98.5); specificity of 97.8 (95% CI: 97.6 to 98.0) and Cohen's κ of 0.9 to detect ACS-EC, compared with GS linkage rule. Similarly, validity for HT and ER was high and of similar magnitude, with sensitivity ranging between 97.2% and 98.1%, and specificity between 98.8% and 99.9%, respectively.
CONCLUSIONS: Our internal linkage validation study using indirect patient identifiers will allow calibration of incidence rates and performance indicators, accounting for the effect of HT and readmissions. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities:  

Keywords:  acute coronary syndrome; deterministic linkage; hospital admissions; medical record linkage

Year:  2019        PMID: 31892664      PMCID: PMC6955528          DOI: 10.1136/bmjopen-2019-033486

Source DB:  PubMed          Journal:  BMJ Open        ISSN: 2044-6055            Impact factor:   2.692


Demonstrates the validity of using deterministic record linkage with indirect identifiers to link patient-level hospitalisations allowing for aggregation of hospitalisations within the same episode of care. Shows a valid method to overcome limitations of anonymised large administrative databases, allowing for epidemiological research with retrospective analysis and calibration of past and present ACS hospitalisation incidence trends, in-hospital mortality and performance indicators. This methodology is applicable in different countries and settings having high rates of hospital transfers and readmissions, such as trauma, stroke and intensive care patients. There was no assessment of quality of gold standard used for validation (unique pseudonymised identifier). Validation was done on the National Hospital Discharge Database which has very low missing/invalid rates, whereby it may not be applicable in databases with higher error rates and for external record linkage between different data sets, where stringent deterministic methods result in a high number of false negatives.

Background

Hospital administrative data provide a valuable source of information to address healthcare management, resource utilisation and quality of care research questions. Strong points of these databases are very wide coverage and low-cost systematic data collection.1 On the downside, hospital administrative data are not designed for research purposes, often lack unique patient identifier and pertain to each hospitalisation not allowing linkage of multiple hospitalisations within the same episode of care (EC), being thereby susceptible to imprecisions and overestimation when patients are transferred between hospitals or have multiple readmissions for a single EC.2 This is especially problematic in the case of acute coronary syndromes (ACS) where clinical pathways and referral networks have been implemented to assure timely access to coronary angiography and revascularisation procedures, with hospital transfer (HT) rates up to 30%.3 4 Identifying whether a hospital admission is a transfer from another hospital, an early readmission (ER) within the same EC, or a late readmission due to a new ACS event remains challenging and is of paramount importance for analysing and interpreting outcome data and for monitoring trends of ACS subtypes, therapeutic measures and healthcare services performance.5 Additionally in the US, from 2012 onwards, hospitals in which 30‐day hospital readmission rates for certain conditions, including acute myocardial infarction, exceed the national average are financially penalised under the Patient Protection and Affordable Care Act.6 A standard approach to minimise multiple counting has been to exclude inter-HTs and readmissions but, since these are not random events, this method introduces bias and leads to loss of relevant information.7 8 On the other hand, treating sequential hospitalisations as independent EC results in overestimation of standardised ACS trends, lowers estimates of the proportion of patients submitted revascularisation treatment and may artificially decrease in-hospital mortality rates.4 5 7 9 Therefore, sequential hospitalisations for the same patient, occurring within a preset time frame, should be combined as one EC as this should be considered the preferred unit of analysis. When only unlinked data is available and there is no unique patient identifier, using an internal linkage method through demographic and event-based variables is desirable to identify and account for HTs and readmissions within the same EC.10 We aimed to build and assess the validity of a matching algorithm using secondary non-unique patient identifiers and event-based variables, using a stepwise deterministic linkage method, to identify patient-level ACS hospitalisations and contiguous hospitalisations occurring within 28 days from each other, classified as one ACS-EC, by using pseudonymised data (unique direct identifier) as gold standard (GS).

Methods

Study population and data sources

Data for the study were obtained retrospectively from the administrative national hospital discharge database provided by the Portuguese Ministry of Health’s Central Administration for the Health System which includes hospitalisations occurring in all public acute care hospitals of the Portuguese National Health Service in mainland Portugal. Data providing is mandatory for every hospitalisation and used for hospital’s reimbursement purposes, but also for disease prevalence estimation and healthcare utilisation assessment. Collected information includes demographics (age, sex, residence code), hospital admission and discharge dates, discharge diagnosis in a principal diagnosis field and up to 30 secondary diagnosis fields using the International Classification of Diseases—ninth revision—clinical modification (ICD9-CM) and discharge status (deceased or alive). Due to data privacy issues, administrative health data has traditionally been released to researchers without unique direct identifiers. From 2011 onwards, a pseudonymised unique patient identifier was provided, allowing to track same patient hospitalisations against which we aimed to assess and validate our matching algorithm. Therefore, our analysis was restricted to all hospitalisation episodes, both inpatient and outpatient, between 2011 and 2015. We followed the modified Standards for Reporting of Diagnostic Accuracy criteria to report our findings.11

Event identification and classification

Coding procedures for ACS-EC vary considerably between institutions and with time, especially in the case of HTs for specialised care and treatment, ranging from both institutions (referring and receiving) coding the ACS hospitalisation and the procedure (duplicating both counts) to only the receiving institution coding the hospitalisation episode and procedure either as an inpatient or outpatient code. To capture all information pertaining to each ACS-EC, we included all hospitalisation episodes, both inpatient and outpatient, with a primary discharge diagnosis field showing ICD9-CM codes 410.x, 411.0–411.1 and 414.x and procedural codes: cardiac catheterisation (37.21, 37.22, 37.23), percutaneous coronary intervention (00.66, 36.03, 36.04; 36.06, 36.07, 36.09) and surgical coronary revascularisation (coronary artery bypass grafting (CABG): 36.10–36.17, 36.19). An exploratory analysis revealed heterogeneity of coding practices for HTs and elective readmissions among hospitals, ranging from both institutions coding admission with an ACS coding (410.x, 411.0–411.1), to first institution coding ACS and receiving institution coding hospitalisation with a 414.x code. Since we wanted to capture and aggregate all the information related to hospitalisations within each ACS-EC, including revascularisation procedures, we decided to use all codes (410.x, 411.0–411.1 and 414.x) and selected, for each matching rule, only episodes having, at least, one hospitalisation with a 410.x or 411.0–411.1 code. Since there is no specific coding allowing identification of ACS subtypes, we used codes 410.0–410.6 and 410.8 for ST-segment elevation myocardial infarction (STEMI), codes 410.7 and 410.9 for non-STEMI (NSTEMI) and code 411.0 and 411.1 for unstable angina.12 We used the diagnostic hierarchy method proposed by Lopez et al which reflects the severity of ACS subtypes, from STEMI (most severe), over NSTEMI to unstable angina (less severe). For an ACS-EC with multiple hospitalisations, the most severe category was used.2 The steps taken to select linkable inpatient and outpatient hospitalisation episodes with an ACS-related primary diagnosis is shown in online supplementary figure 1. First, we identified redundant episodes (n=21) that had the same combination of values for all variables (60 variables), and kept only one record among duplicates. Second, we excluded records with missing or invalid values in the linking variables contained in each linkage rule (n=1095). Lastly, we restricted hospitalisation episodes to patients aged ≥30 years (n=171) due to concerns of unreliability of ACS estimates in younger patients.

Linkage method

We used internal deterministic data linkage requiring matches on different combinations of person-level identifiers and calculated time interval (in days) between matched hospitalisations to define a 28-day ACS-EC comprising first admission and all contiguous admissions occurring within 28-day period from each other (HT: ≤1 day; ER: 2–7 days; late readmission: 8–28 days).2 13 Cases with identical demographic identifiers (matched hospitalisations) admitted to the same hospital or in two separate hospitals within 28 days of each other were considered as belonging to the same 28-day ACS-EC, counted only once and had all their information aggregated. Matched hospitalisations occurring beyond 28 days from each other were considered as a new ACS-EC. We set six test linkage rules using various combinations and granularity of the following linkage variables: sex; date of birth and residence code. Deterministic linkage rules require, for identification of matched hospitalisations (hospitalisations pertaining to the same patient) an exact match on values of all linkage variables specified on each matching the rules (table 1). Residence code consists of a sequential combination of six digits according to the administrative level of detail: two identifying districts (total of 18); two for municipalities within each district (total of 278) and two for parishes within each district and municipality (4050 up to 2012; and 2882 after the administrative reform of 2013).14 A direct pseudonymised identifier (unique patient ID nine-digit combination derived from national identification number) was used as the GS.15 We sequentially tested rules with increasing level of granularity to assess validity and linkage error rate compared with the GS.
Table 1

Stepwise deterministic matching algorithms according to detail of identifying variables

Matching rulesMatching variables algorithm
Rule 1Sex, YearBirth
Rule 2Sex, YearBirth, MonthBirth
Rule 3Sex, YearBirth, MonthBirth, DayBirth
Rule 4Sex, YearBirth, MonthBirth, DayBirth, ResidCode-2digits
Rule 5Sex, YearBirth, MonthBirth, DayBirth, ResidCode-4digits
Rule 6Sex, YearBirth, MonthBirth, DayBirth, ResidCode-6digits
Gold standardUnique patient ID

ID, identification number; ResidCode, residence code.

Stepwise deterministic matching algorithms according to detail of identifying variables ID, identification number; ResidCode, residence code. Despite contiguous admissions, within 28 days from each other, being counted only once as a single 28-day ACS-EC, we aggregated information from different hospitalisations regarding revascularisation procedures, severity indicators, comorbidities and in-hospital mortality.

Statistical analysis

For each matching rule we calculated total matching rate (proportion of total hospitalisations successfully linked) and matching rate for ACS-EC. Using unique ID as GS we calculated the number of matching errors (missed matches; false matches) for each matching rule. Comparative linkage quality was assessed by calculating sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) with their 95% CIs using the one-sample Clopper–Pearson and the standard logit methods, respectively.16 17 Chance-weighted proportional agreement between matching rules and GS was calculated using Cohen’s κ and classified as poor if κ≤0.20; fair if 0.21<κ≤0.40; moderate if 0.41<κ≤0.60; high 0.61<κ≤0.80 and excellent agreement if κ>0.80.18 We compared baseline characteristics between true matches and false matches and between missed matches and true non-matches, using independent samples t-test and Chi-square for continuous and categorical variables, respectively. We then described the characteristics of the study population with a single hospital admission compared with those having multiple hospitalisations within a 28-day ACS-EC. Analyses were performed using IBM SPSS Statistics V.25 and Microsoft Excel V.16.30.

Patient and public involvement

There was no patient or public involvement in any step of this study.

Results

During the study period there were 146 671 hospitalisations due to ACS with mean age 67.7 (12.3) years and 68.9% were men. Median length of stay was 3 days (IQR 6). Unlinked data revealed 26 842 (18.3%) hospitalisations with STEMI, 36 597 (24.9%) with NSTEMI, 10 347 (7.5%) with unstable angina and 72 885 (49.6%) classified as other acute and subacute forms of ischaemic heart disease. Cardiac catheterisation was performed in 70% of hospitalisation episodes, percutaneous coronary intervention in 38.2% and CABG in 6.3%, while in 23.8% of hospitalisations no cardiac procedure was performed. Heart failure was present in 19 699 (13.4%), 15 019 (10.2%) had atrial fibrillation, 2981 (2.0%) ventricular fibrillation, along with 1469 (1.0%) cardiac arrests and a total of 6241 (4.3%) in-hospital deaths (online supplementary table 1). The linkage rule requiring an exact match on the unique patient ID (GS) identified 34 948 matched hospitalisations corresponding to 23.8% of all hospitalisations, with 16.8% readmissions within 28 days from initial hospitalisation. Among the test rules based on indirect identifiers, matching rate decreased from 99.9% for rule 1% to 28.1% for rule 6, and the matching rate of ACS-EC with multiple hospitalisations decreased from 99.5% to 15.3%, from rules 1 to 6 (table 2).
Table 2

Total number and proportion of matched hospitalisations and 28-day ACS episode of care using each matching rule

Matching rulesNumber of matched hospitalisations% of matched hospitalisationsNumber of same-patient contiguous hospitalisations% of total HE identified as same-patient contiguous hospitalisations*
Rule 1146 51899.9145 90999.5
Rule 2144 97898.8126 47286.2
Rule 3113 78977.636 11324.6
Rule 462 39942.527 39118.7
Rule 547 92332.726 06417.8
Rule 6†40 90928.122 29015.3
Gold standard34 94823.824 68416.8

*Contiguous corresponds to sum of HE identified as hospital transfer, early readmission or late readmission.

†exclusion of 1151 invalid/missing fifth/sixth digit residence code (leaving a total of 139 863 hospitalisation episodes).

ACS, acute coronary syndrome; HE, hospitalisation episode.

Total number and proportion of matched hospitalisations and 28-day ACS episode of care using each matching rule *Contiguous corresponds to sum of HE identified as hospital transfer, early readmission or late readmission. †exclusion of 1151 invalid/missing fifth/sixth digit residence code (leaving a total of 139 863 hospitalisation episodes). ACS, acute coronary syndrome; HE, hospitalisation episode. The proportion of ACS-EC with multiple hospitalisations increased from 16.9% in 2011 to 17.9% in 2015, being less frequent in women and with advancing age compared with single hospitalisation ACS-EC. The rate of multiple hospitalisations within same EC was lower for those with unstable angina, and higher in those submitted to cardiac procedures, especially CABG. There was considerable geographical heterogeneity in incidence of ACS hospitalisations and proportion of ACS-EC with multiple hospitalisations with major coastal districts (Lisbon and Porto) being responsible for 40.8% of all ACS-EC, but smaller inland districts depicting the highest rate of ACS-EC with multiple hospitalisations ranging up to 37.4% (online supplementary table 2). All test rules overestimated the number of recurrent ACS and underestimated first ACS hospitalisation compared with the GS. Rule 6 had the lowest detection of HTs (11.0% vs 12.5% for GS) but the second highest proportion of first ACS hospitalisation identification (71.9% vs 76.2% for GS) (table 3).
Table 3

Total number and proportion of ACS hospitalisations according to time between first and subsequent hospital admission for matched hospitalisations using each matching rule

Linkage rulesFirst ACS hospitalisation*Hospital transfers(≤1 day)Early readmissions (>1 day and ≤7 days)Late readmissions(>1 day and ≤28 days)Recurrence(>28 days)
Rule 1153 (0.1)128 450 (87.6)15 178 (10.3)2281 (1.6)609 (0.4)
Rule 21693 (1.2)59 799 (40.8)30 151 (20.6)36 522 (24.9)18 506 (12.6)
Rule 332 882 (22.4)22 141 (15.1)4123 (2.8)9849 (6.7)77 676 (53.0)
Rule 484 272 (57.5)20 079 (13.7)2482 (1.7)4830 (3.3)35 008 (23.9)
Rule 598 748 (67.3)19 855 (13.5)2055 (1.4)4154 (2.8)21 859 (14.9)
Rule 6†104 442 (71.9)16 036 (11.0)2216 (1.5)4038 (2.8)18 619 (12.8)
Gold standard111 723 (76.2)18 398 (12.5)2243 (1.5)40432 8 10 264 (7.0)

*Includes both non-matched ACS hospitalisation (single hospital admission) and first hospitalisation in ACS episodes of care with multiple hospitalisations.

†Exclusion of 1151 invalid/missing fifth/sixth digit residence code (leaving a total of 139 863 hospitalisations).

ACS, acute coronary syndrome.

Total number and proportion of ACS hospitalisations according to time between first and subsequent hospital admission for matched hospitalisations using each matching rule *Includes both non-matched ACS hospitalisation (single hospital admission) and first hospitalisation in ACS episodes of care with multiple hospitalisations. †Exclusion of 1151 invalid/missing fifth/sixth digit residence code (leaving a total of 139 863 hospitalisations). ACS, acute coronary syndrome. As level of detail of variables included in matching rules increased, the number of false matches decreased from 121 255 (82.7% of matches) for rule 1 to 1490 for rule 6 (3.6%) and, inversely, the proportion of missed matches increased from 0 (0.0%) for rule 1 to 3343 (8.1%) for rule 6. Furthermore, validity measures showed that adding residence code to demographic variables in matching rules significantly increased validity against the GS, with sensitivity decreasing only slightly from 100% for rule 1 to 97.8% for rule 5, with a steeper decrease to 86.2% for rule 6, with both specificity and PPV increasing as matching rule granularity increases. Cohen’s κ depicted an excellent agreement between rules using sex, date of birth and residence codes (linkage rules 4 to 6) and the GS for the detection of 28-day ACS-EC, with rule 5 showing the highest degree of agreement (κ=0.941), closely followed by rule 4 (κ=0.927), then decreasing for rule 6 (κ=0.876) (table 4). Table 5 shows the matching quality according to time between first and subsequent hospital admission for matched hospitalisations identified using matching rule 5, with somewhat lower PPV for HTs and late readmissions.
Table 4

Measures of matching quality for each matching rule in the detection of 28-day ACS episode of care

Linkage rulesMissed matchesFalse matchesCohen’s κ(95% CI)Sensitivity% (95% CI)Specificity% (95% CI)Positive predictive value % (95% CI)
Rule 10121 2250.002 (0.000 to 0.004)100.00 (99.9 to 100.0)0.62 (0.58 to 0.67)16.92 (16.91 to 16.92)
Rule 217101 8050.062 (0.059 to 0.065)99.93 (99.90 to 99.96)16.54 (16.34 to 16.75)19.50 (19.46 to 19.54)
Rule 31511 4440.764 (0.760 to 0.768)99.94 (99.91 to 99.97)90.62 (90.45 to 90.78)68.31 (67.93 to 68.69)
Rule 420729140.927 (0.092 to 0.930)99.16 (99.05 to 99.28)97.61 (97.53 to 97.70)89.36 (89.00 to 89.70)
Rule 554219220.941 (0.939 to 0.944)97.80 (97.62 to 97.99)98.42 (98.35 to 98.49)92.63 (92.30 to 92.94)
Rule 6*334314900.876 (0.873 to 0.880)86.15 (85.72 to 86.59)98.77 (98.71 to 98.83)93.32 (92.99 to 93.62)
Gold standard

*Exclusion of 1151 invalid/missing fifth/sixth digit residence code (leaving a total of 139 863 hospitalisation episodes).

ACS, acute coronary syndrome.

Table 5

Measures of matching quality according to time between first and subsequent hospital admission for matched hospitalisations identified using matching rule 5

% (95% CI)Primary ACS*Hospital transfersEarly readmissionsLate readmissions
Sensitivity98.44 (98.37 to 98.51)97.21 (96.96 to 97.44)98.13 (97.48 to 98.65)98.14 (97.68 to 98.54)
Specificity97.80 (97.61 to 97.98)98.75 (98.69 to 98.81)99.94 (99.92 to 99.95)99.79 (99.77 to 99.82)
PPV99.55 (99.51 to 99.59)91.78 (91.40 to 92.14)95.99 (95.12 to 96.70)93.10 (92.33 to 93.80)
NPV92.70 (92.40 to 93.00)99.60 (99.56 to 99.63)99.97 (99.96 to 99.98)99.95 (99.93 to 99.96)
Cohen’s κ0.942 (0.940 to 0.944)0.934 (0.933 to 0.939)0.970 (0.965 to 0.975)0.954 (0.950 to 0.959)

*Refers to ACS episodes of care (primary non-matched ACS plus matched hospitalisation episodes classified as recurrent >28 days).

ACS, acute coronary syndrome; NPV, negative predictive value; PPV, positive predictive value.

Measures of matching quality for each matching rule in the detection of 28-day ACS episode of care *Exclusion of 1151 invalid/missing fifth/sixth digit residence code (leaving a total of 139 863 hospitalisation episodes). ACS, acute coronary syndrome. Measures of matching quality according to time between first and subsequent hospital admission for matched hospitalisations identified using matching rule 5 *Refers to ACS episodes of care (primary non-matched ACS plus matched hospitalisation episodes classified as recurrent >28 days). ACS, acute coronary syndrome; NPV, negative predictive value; PPV, positive predictive value. Using matching rule 5 to identify multiple hospitalisations within the same ACS-EC, 98.3% of episodes were correctly classified, while there was a false match rate of 7.3% and a false non-match rate of 0.4%. Table 6 compares the characteristics of ACS hospitalisation erroneously classified by rule 5 as a match (false match) or non-match (missed match) compared with true match and true non-matches, respectively. False match rate was more common in those presenting with unstable angina or coded as other ACS and subacute ACS (ICD9-CM 414) and in patients submitted to either cardiac catheterisation or percutaneous coronary intervention. Missed matches were more common in younger age, in hospitalisations coded as ICD9-CM 414 and in patients submitted to coronary artery bypass surgery (table 6). When analysing mismatch rates at district level, Lisbon had an exceedingly high proportion of false matches (21.8%) compared with the other districts (4.8%), with three municipalities alone being responsible for 77.2% of all false matches. Exclusion of these three municipalities from the analysis resulted in a drop in false-match rate in Lisbon district from 21.8% to 6.6%, approaching the district and national average.
Table 6

Characteristics of matching errors of 28-day ACS-EC identified, using matching rule 5

Matched as 28-day ACS-ECNon-matched 28-day ACS-EC
False match (n=1900)True matchP valueMissed match (n=542)True non-match (n=1 20 087)P value
(n=24 142)
Sex0.210.52
 Male1386 (7.4)17 285 (92.6)363 (0.4)81 995 (99.6)
 Female514 (7.0)6857 (93.0)179 (0.5)38 092 (99.5)
Age groups0.420.02
 30–4474 (7.4)930 (92.6)33 (0.8)4228 (99.2)
 45–54251 (7.5)3100 (92.5)67 (0.5)14 066 (99.5)
 55–64464 (7.3)5925 (92.7)131 (0.5)27 516 (99.5)
 65–74593 (7.7)7111 (92.3)141 (0.4)34 565 (99.6)
 75–84431 (6.7)5976 (93.3)132 (0.4)29 474 (99.6)
 85+87 (7.3)1100 (92.7)38 (0.4)10 238 (99.6)
District level<0.001<0.001
 Lisboa892 (21.8)3195 (78.2)108 (0.9)29 362 (99.6)
 Guarda139 (13.0)928 (87.0)36 (0.3)2297 (98.5)
 C. Branco186 (8.9)1911 (91.1)15 (0.2)4696 (99.7)
 Faro31 (6.4)452 (93.6)13 (1.5)4384 (99.7)
 Coimbra37 (5.8)603 (94.2)11 (0.3)6442 (99.8)
 Portalegre44 (5.3)792 (94.7)17 (0.5)1884 (99.1)
 Bragança35 (5.1)655 (94.9)12 (0.3)1841 (99.4)
 V Real13 (4.8)256 (95.2)3 (0.1)2249 (99.9)
 Viseu19 (4.6)394 (95.4)4 (0.4)3620 (99.9)
 Évora11 (4.5)235 (95.5)9 (0.3)2833 (99.7)
 Beja40 (4.4)876 (95.6)16 (0.4)5753 (99.7)
 Leiria32 (4.4)702 (95.6)10 (1.0)2244 (99.6)
 Braga74 (4.1)1733 (95.9)74 (0.6)7621 (99.0)
 Porto157 (3.8)3973 (96.2)100 (0.5)19 418 (99.5)
 Aveiro64 (3.0)2100 (97.0)22 (0.4)6174 (99.6)
 Setúbal58 (2.6)2181 (97.4)53 (0.3)10 577 (99.5)
 V Castelo16 (2.3)672 (97.7)7 (0.1)2774 (99.7)
 Santarém52 (2.1)2484 (97.9)32 (0.5)5918 (99.5)
ACS subtypes0.020.05
 STEMI165 (5.7)2709 (94.3)102 (0.4)23 866 (99.6)
 NSTEMI232 (4.8)4588 (95.2)123 (0.4)31 654 (99.6)
 UA108 (8.9)1100 (91.1)35 (0.4)9104 (99.6)
Cardiac procedures (aggregated)<0.001<0.001
 No procedure195 (4.8)3872 (95.2)102 (0.3)30 672 (99.7)
 Catheterisation673 (8.6)7110 (91.4)166 (0.4)38 937 (99.6)
 PCI866 (8.3)9576 (91.7)202 (0.4)45 130 (99.6)
 CABG166 (4.4)3584 (95.6)72 (1.3)5348 (98.7)
Comorbidity burden
 Number of comorbidities0.050.65
 0–31796 (7.4)22 535 (92.6)497 (0.5)109 452 (99.5)
 ≥3104 (6.1)1607 (93.9)45 (0.4)10 635 (99.6)
 Charlson index0.90.41
 0–31716 (7.3)21 825 (92.7)478 (0.5)104 462 (99.5)
 ≥3184 (7.4)2317 (92.6)64 (0.4)15 625 (99.6)

Results report absolute number and percentage for each variable category, unless stated otherwise.

.ACS, acute coronary syndrome; ACS-EC, acute coronary syndrome episode of care; CABG, coronary artery bypass grafting; ICD-9, International Classification of disease ninth revision; NSTEMI, non ST-segment elevation myocardial infarction; PCI, percutaneous coronary intervention;STEMI, ST-segment elevation myocardial infarction; UA, unstable angina.

Characteristics of matching errors of 28-day ACS-EC identified, using matching rule 5 Results report absolute number and percentage for each variable category, unless stated otherwise. .ACS, acute coronary syndrome; ACS-EC, acute coronary syndrome episode of care; CABG, coronary artery bypass grafting; ICD-9, International Classification of disease ninth revision; NSTEMI, non ST-segment elevation myocardial infarction; PCI, percutaneous coronary intervention;STEMI, ST-segment elevation myocardial infarction; UA, unstable angina.

Discussion

Using the National Hospital Discharge Database, we built, tested and compared the validity of deterministic internal record linkage using different combinations of indirect identifiers for the identification of 28-day EC consisting of patient-level sequential hospitalisations occurring within 28 days from each other. We found that linkage rules which include demographic and residence code variables showed comparable linkage rates and high validity compared with the GS. We found that false match rate was significantly reduced by increasing the level of detail of residence code, from district to municipality and to parish but, in case of inclusion of parish coding (rule 6), at the expense of an increase in missed matches, loss of sensitivity and agreement with the GS. To our knowledge this is the first study to validate a matching algorithm, without direct identifiers, for matching and identification of ACS patient-level hospitalisations, incorporating all subtypes of ACS and including a wider range of hospitalisations (eg, code 414.x) in order to capture and aggregate information from all sequential hospitalisations within the same ACS-EC, and to assess the impact of aggregating information on ACS hospitalisation counts, characterisation of ACS patients and on indicators of performance. Most record linkage studies using indirect identifiers have been designed to externally link different data sets, namely clinical registries with claims data, whereby two records are considered a true match, given agreement or disagreement on a set of partial identifiers.19 20 For our study we took a different perspective, we aimed to internally link same-patient hospitalisation episodes due to ACS to build patient-level data on consecutive hospitalisations using event-based variables to define a time frame to build an EC. We chose deterministic linkage for its simplicity and appropriateness in scenarios in which missing and invalid values in matching variables are rare and these matching variables are sufficiently discriminative, as is often the case in large administrative data sets.21 By doing an analysis of different sets of identifiers against the GS, we demonstrated that combination of demographic and residence code (at district and municipality levels) variables showed the highest validity. Westfall and McGloin7 found similar results in a subanalysis of 120 206 myocardial infarction hospital admissions where they used a matching algorithm of indirect identifiers (age or month–year of birth, sex, zip code, ICD9 code) to detect HTs as same-patient hospitalisations occurring within 7 days of first hospitalisation, and found a sensitivity of 96.7% and specificity of 98.7%. Choosing the appropriate matching rule is highly dependent on the aim of the analysis, and on the type, quality and completeness of data pertaining to the matching variables chosen.15 22 Moreover, use of a matching rule for record linkage should ideally be preceded by a pilot study, where validity against a given GS (usually a unique patient identifier) is assessed. In our study, we found that stricter residence code matching rule (six digits) resulted in a higher proportion of missed matches and loss of agreement with GS compared with more relaxed rules (four digits), possibly because it is more susceptible to coding errors, changes of residency and to administrative reforms with change in parishes’ number and codes overtime.23 Our matching algorithm with highest face validity (rule using sex, date of birth and four-digit residence code—rule 5) showed a low missed-match rate of 0.4% and higher false-match rate of 7.4%. We found high regional heterogeneity with clustering of false matches in three municipalities within the same district. It possibly reflects regional variations in data quality, reporting or coding procedures,24 and it reinforces the need for detailed analysis of characteristics associated with linkage error when doing validation studies for matching algorithms used in record linkage studies. We found missed matches to be more common in younger ages and those with planned procedures (ICD9-CM 414; surgical revascularisation); while false matches were more frequent in those with unstable angina and submitted to catheterisation and/or percutaneous revascularisation procedures. Nonetheless, these linkage errors had limited impact on the overall performance of the matching algorithm with specificity above 98% in detection of all contiguous hospitalisations’ subtypes. Linkage methods that maximise specificity lead to the most robust study results and should therefore be the main focus when building matching rules for record linkage studies.25 Our study has some limitations. Although we used a pseudonymised unique identifier as GS, it consists of a long string of numbers and is therefore susceptible to errors, the impact of which has not been assessed. In our study, we did an internal record linkage to identify patient-level contiguous hospitalisations, classified according to time elapsed between sequential hospitalisations, using indirect identifiers with low missing/invalid rates. Different studies have compared linkage rates for a linkage rule using indirect identifiers with one using direct identifiers to link records from registries to Medicare claims data and showed, like we did in our study, highly valid linkages compared with the GS rule(s) that included direct identifiers.10 15 We used a deterministic linkage method and required exact matches on >3 variables in our rules. The expected error rates are low, and the rate for false-positive linkages is anticipated to be small. However, false-negative linkages are a concern in all rules, including the GS. The degree of bias from the imperfect GS depends on the number of false-negative matches in the GS and the prevalence of the true linkage. Our results are likely generalisable to attempts that link hospitalisation-level records, but both expected error rates of linkage variables and prevalence of the condition should be considered. We have used standard demographic variables as linking variables, such as gender, date of birth and residence code, which are much less prone to errors and missing data, since most of these are automatically uploaded to the database. Nonetheless, our results may not be applicable in settings in which databases have high error rates in these linkage variables, since they will produce a large number of false-negative links warranting for the addition of probabilistic linkage methods.

Conclusion

Deterministic linkage using multiple indirect identifiers allows for accurate and valid internal linkage of patient-level contiguous hospitalisations in a preset time frame defining an EC, comparable with linkage with direct identifiers in hospital administrative data. Most data on nationwide or large-scale trends of ACS incidence, management and mortality have been abstracted from unlinked administrative health data and released to researchers without a unique patient identifier, and even in those jurisdictions that have recently introduced pseudonymised databases, longer-term trends analysis still relies heavily on unlinked records.26 Therefore, our method of identifying, classifying and aggregating information of contiguous hospitalisations within the same EC will allow calibration of incidence rates and performance indicators to the number of EC and not to hospitalisations, and will be of value in different countries. Furthermore, it might also be useful in other clinical conditions that have high rates of transfers and readmissions, such as trauma,27 stroke28 and intensive care patients.23
  25 in total

1.  Improved confidence intervals for the difference between binomial proportions based on paired data.

Authors:  R G Newcombe
Journal:  Stat Med       Date:  1998-11-30       Impact factor: 2.373

2.  Validity of deterministic record linkage using multiple indirect personal identifiers: linking a large registry to claims data.

Authors:  Soko Setoguchi; Ying Zhu; Jessica J Jalbert; Lauren A Williams; Chih-Ying Chen
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2014-04-22

3.  Interfacility transfers for US ischemic stroke and TIA, 2006-2014.

Authors:  Benjamin P George; Sara J Doyle; George P Albert; Ania Busza; Robert G Holloway; Kevin N Sheth; Adam G Kelly
Journal:  Neurology       Date:  2018-04-04       Impact factor: 9.910

4.  The art and science of record linkage: methods that work with few identifiers.

Authors:  L L Roos; A Wajda; J P Nicol
Journal:  Comput Biol Med       Date:  1986       Impact factor: 4.589

5.  Outcomes of patients admitted to tertiary intensive care units after interhospital transfer: comparison with patients admitted from emergency departments.

Authors:  Arthas Flabouris; Graeme K Hart; Carol George
Journal:  Crit Care Resusc       Date:  2008-06       Impact factor: 2.159

6.  Methodological issues on the use of administrative data in healthcare research: the case of heart failure hospitalizations in Lombardy region, 2000 to 2012.

Authors:  Cristina Mazzali; Anna Maria Paganoni; Francesca Ieva; Cristina Masella; Mauro Maistrello; Ornella Agostoni; Simonetta Scalvini; Maria Frigerio
Journal:  BMC Health Serv Res       Date:  2016-07-08       Impact factor: 2.655

7.  Constructing episodes of inpatient care: data infrastructure for population-based research.

Authors:  Randy Fransoo; Marina Yogendran; Kendiss Olafson; Clare Ramsey; Kari-Lynne McGowan; Allan Garland
Journal:  BMC Med Res Methodol       Date:  2012-09-03       Impact factor: 4.615

8.  Trends in hospital discharges, management and in-hospital mortality from acute myocardial infarction in Switzerland between 1998 and 2008.

Authors:  Charlène Insam; Fred Paccaud; Pedro Marques-Vidal
Journal:  BMC Public Health       Date:  2013-03-25       Impact factor: 3.295

9.  Linked versus unlinked hospital discharge data on hip fractures for estimating incidence and comorbidity profiles.

Authors:  Trang Vu; Lesley Day; Caroline F Finch
Journal:  BMC Med Res Methodol       Date:  2012-08-01       Impact factor: 4.615

10.  Exploring the effects of transfers and readmissions on trends in population counts of hospital admissions for coronary heart disease: a Western Australian data linkage study.

Authors:  Derrick Lopez; Lee Nedkoff; Matthew Knuiman; Michael S T Hobbs; Thomas G Briffa; David B Preen; Joseph Hung; John Beilby; Sushma Mathur; Anna Reynolds; Frank M Sanfilippo
Journal:  BMJ Open       Date:  2017-11-17       Impact factor: 2.692

View more
  1 in total

1.  Time-trends and predictors of interhospital transfers and 30-day rehospitalizations after acute coronary syndrome from 2000-2015.

Authors:  J Afonso Rocha; José Carlos Cardoso; Alberto Freitas; Thomas G Allison; Luís F Azevedo
Journal:  PLoS One       Date:  2021-07-22       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.