| Literature DB >> 24597489 |
Katie Harron1, Angie Wade, Ruth Gilbert, Berit Muller-Pebody, Harvey Goldstein.
Abstract
BACKGROUND: Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities.Entities:
Mesh:
Year: 2014 PMID: 24597489 PMCID: PMC4015706 DOI: 10.1186/1471-2288-14-36
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Figure 1Creation of simulated data.
Description of original and simulated datasets
| Error varied by hospital | Matches: 1496/20924 (7%) | 0-5% error, | |
| Non-matches: 19431/20924 (93%) | <1% missing values | ||
| | | | |
| 1 | Random identifier error | | |
| 2 | Non-random error (associated with hospital) | Matches: 1000/10000 (10%) | 5% error, |
| Non-random error (associated with outcome) | Non-matches: 9000/10000 (90%) | 5% missing values | |
| 3 | |||
| 4 | Random identifier error | | |
| 5 | Non-random error (associated with hospital) | Matches: 5000/10000 (50%) | 5% error, |
| Non-random error (associated with outcome) | Non-matches: 5000/10000 (50%) | 5% missing values | |
| 6 | |||
| 7 | Random identifier error | | |
| 8 | Non-random error (associated with hospital) | Matches: 7000/10000 (70%) | 5% error, |
| 9 | Non-random error (associated with outcome) | Non-matches: 3000/10000 (30%) | 5% missing values |
| 10 | Random identifier error | | |
| 11 | Non-random error (associated with hospital) | Matches: 1000/10000 (10%) | 10% error, |
| 12 | Non-random error (associated with outcome) | Non-matches: 9000/10000 (90%) | 10% missing values |
All data were linked using both highest-weight classification and PII.
Figure 2Prior-informed imputation for linkage of PICU and infection records.
Figure 3Linkage between PICANet and gold-standard microbiology data.
Figure 4Comparison of crude PICU-acquired BSI rate obtained through highest-weighted classification and prior-informed imputation: original data.
Figure 5Comparison of HW classification and PII for estimating BSI rate. Data from simulated datasets 1-9; Symbols = point estimate; Lines = 95% confidence intervals. One extreme value for HW relaxed excluded (=49.08).
Figure 6Comparison of HW classification and PII for estimating the difference in adjusted rates between PICUs. Data from simulated datasets 1-9; Symbols = point estimate; Lines = 95% confidence intervals.
Comparison of classification methods for estimating BSI rate and difference in adjusted rates with 10% identifier error (simulated datasets 10-12)
| | | ||||||
| HW | 869 | 15.84 (14.79, 16.90) | 0.646 | -13.1 | 4.55 (2.33, 6.76) | 1.129 | -17.5 |
| PII MPD = 0.1 | 1038 | 18.94 (17.67, 20.21) | 0.646 | 3.8 | 5.32 (2.75, 7.89) | 1.313 | -3.5 |
| PII MPD = 0.9 | 860 | 15.69 (14.61, 16.77) | 0.551 | -14.0 | 4.45 (2.18, 6.72) | 1.160 | -19.3 |
| HW | 886 | 16.15 (15.09, 17.21) | 0.543 | -11.4 | 10.93 (8.61, 13.24) | 1.183 | 98.2 |
| PII MPD = 0.1 | 1010 | 18.41 (17.21, 19.62) | 0.614 | 1.0 | 11.69 (9.12, 14.26) | 1.311 | 111.9 |
| PII MPD = 0.9 | 858 | 15.65 (14.57, 16.72) | 0.548 | -14.2 | 11.454 (9.09, 13.82) | 1.208 | 107.7 |
| HW | 364 | 6.65 (5.98, 7.32) | 0.343 | -63.6 | 1.94 (0.53, 3.35) | 0.720 | -64.9 |
| PII MPD = 0.1 | 684 | 12.48 (10.87, 14.09) | 0.822 | -31.6 | 3.36 (0.51, 6.20) | 1.453 | -39.1 |
| PII MPD = 0.9 | 217 | 3.95 (3.39, 4.51) | 0.287 | -78.3 | 1.20 (0.00, 2.39) | 0.610 | -78.3 |