| Literature DB >> 27206571 |
Andrei S Morgan1, Neil Marlow2, Kate Costeloe3, Elizabeth S Draper4.
Abstract
BACKGROUND: A 44 % increase was observed in admissions to neonatal intensive care of babies born ≤26 weeks completed gestational age in England between 1995 and 2006. Hospital Episode Statistics (HES) may provide supplementary information to investigate this. The methods and results of a probabilistic data linkage exercise are reported.Entities:
Keywords: England; Extreme prematurity; Hospital Episode Statistics; Record linkage
Mesh:
Year: 2016 PMID: 27206571 PMCID: PMC4875750 DOI: 10.1186/s12874-016-0152-0
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Probability estimates for linkage analyses
| Matching variable | Baseline best guesses | Dattani et al. [ | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| Date of birth | 0.90 | 0.00274 | 5.794 | -2.3 | 0.7405 | 0.0015 | 6.202 | -1.347 |
| GA at birth | 0.80 | 0.02 | 3.689 | -1.589 | 0.4941 | 0.0494 | 2.3028 | -0.6308 |
| Sex | 0.999 | 0.49 | 0.7123 | -6.2344 | 0.7208 | 0.0062 | 4.756 | -1.270 |
| Discharge date | 0.20 | 0.002 | 4.6052 | -0.2211 | — | — | — | — |
| Date of death c | 0.20 | 0.00274 | 4.2904 | -0.2204 | 0.30 | 0.002 | 5.0106 | -0.3547 |
| Birth weight | 0.60 | 0.001 | 6.3969 | -0.9153 | 0.7405 | 0.0074 | 4.606 | -1.342 |
| Birth order | 0.87 | 0.95 | -0.08797 | 0.95551 | 0.8153 | 0.0033 | 5.510 | -1.686 |
| Delivery method c | 0.80 | 0.80 | 0 | 0 | 0.67 | 0.1 | 1.902 | -1.003 |
| Ethnic category | 0.20 | 0.10 | 0.6931 | -0.1178 | 0.7308 | 0.095 | 2.040 | -1.212 |
| Mother’s age at delivery | 0.95 | 0.05 | 2.944 | -2.944 | — | — | — | — |
| Mother’s date of birth | 0.90 | 0.0001 | 9.105 | -2.302 | — | — | — | — |
| Postcode | 0.90 | 0.001 | 6.802 | -2.302 | 0.9291 | 0.065 | 2.660 | -2.579 |
| Number of previous pregnancies | 0.60 | 0.90 | -0.4055 | 1.3863 | — | — | — | — |
| Number of babies | 0.95 | 0.95 | 0 | 0 | 0.8153 | 0.0033 | 5.510 | -1.686 |
Probability estimates for linkage analyses between Hospital Episode Statistics and EPICure data based on best guesses and prior knowledge (adapted from data linkage performed by Dattani et al between Hospital Episode Statistics (HES) and NHS Numbers 4 Babies data sets) [15]
a w m= weight if pairs match
b w nm= weight if pairs do not match
cDate of death and delivery method were both modified using an adjusted best guess for the second linkage analysis performed using estimates from Dattani et al.
Fig. 1Known and calculated values for matching algorithms, used in assessment of linkage error. Data linkage is performed by pairing data from two data sets, followed by manual verification of linked pairs to identify true matches. Values for cells were identified in the following manner: (1) The total number of row pairs, maximum number of matches, total number of linked pairs and number of true matches within those linked pairs were identified. (2) The numbers of false links, false non-links, total non-links and number of non-matches were then derived. (3) Finally, the true number of non-matches among the non-linked pairs was calculated
Missingness among the matching variables
| Variable | HES 1995 | EPICure (1995) | HES 2006 | EPICure-2 (2006) |
|---|---|---|---|---|
| Missing (%) | Missing (%) | Missing (%) | Missing (%) | |
| Date of birth | 8807 (1.53) | 0 (0.00) | 4265 (0.68) | 0 (0.00) |
| GA at birth | 164006 (28.50) | 0 (0.00) | 336178 (53.23) | 7 (0.25) |
| Sex | 2616 (0.45) | 0 (0.00) | 3202 (0.51) | 9 (0.33) |
| Discharge date | 16912 (2.94) | 373 (55.84) | — | — |
| Date of death | 571417 (99.29) | 268 (40.12) | — | — |
| Birth weight | 152641 (26.52) | 0 (0.00) | 288014 (45.61) | 26 (0.95) |
| Birth order | 250718 (43.56) | 0 (0.00) | 224632 (35.57) | 0 (0.00) |
| Delivery method | 168018 (29.19) | 1 (0.15) | — | — |
| Number.of babies | 152378 (26.48) | 0 (0.00) | 209455 (33.17) | 0 (0.00) |
| Previous pregnancies number | — | — | 618692 (97.97) | 101 (3.67) |
| Ethnic category | 462999 (80.45) | 0 (0.00) | — | — |
| Postcode | — | — | 290462 (46.00) | 1 (0.04) |
| Mother’s dob | — | — | 273426 (43.30) | 2750 (100.00) |
| Mother’s age at delivery | 214999 (37.36) | 4 (0.60) | 273430 (43.30) | 8 (0.29) |
Variables in each of the Hospital Episode Statistics and EPICure data sets that were used for matching in 1995 and 2006 and their levels of missingness. (HES (1995) n=575,509 (for the entire year); EPICure n=668 (March – December); EPICure 2 n=2,750; HES (2006) n=631,401)
Birth weight v gestational age in HES 1995 data
| Birth weight category | Gestational age (weeks) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 21 | 22 | 23 | 24 | 25 | 26-29 | 30-34 | 35-39 | 40+ | Missing | Total | |
| <500 | 19 | 23 | 39 | 36 | 53 | 40 | 117 | 91 | 960 | 1224 | 32 | 2634 |
| (0.003) | (0.004) | (0.007) | (0.006) | (0.009) | (0.007) | (0.020) | (0.016) | (0.167) | (0.213) | (0.006) | (0.458) | |
| 500-999 | 3 | 7 | 29 | 103 | 273 | 292 | 864 | 181 | 36 | 19 | 89 | 1896 |
| (0.001) | (0.001) | (0.005) | (0.018) | (0.047) | (0.051) | (0.150) | (0.031) | (0.006) | (0.003) | (0.015) | (0.329) | |
| 1000-1499 | 0 | 1 | 0 | 1 | 6 | 20 | 1179 | 1590 | 326 | 180 | 152 | 3455 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | (0.003) | (0.205) | (0.276) | (0.057) | (0.031) | (0.026) | (0.600) | |
| 1500-1999 | 0 | 2 | 1 | 0 | 4 | 2 | 141 | 3785 | 2010 | 94 | 224 | 6263 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | (0.000) | (0.025) | (0.658) | (0.349) | (0.016) | (0.039) | (1.088) | |
| 2000-2499 | 0 | 0 | 0 | 1 | 0 | 2 | 26 | 3367 | 13684 | 1831 | 617 | 19528 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.005) | (0.585) | (2.378) | (0.318) | (0.107) | (3.393) | |
| 2500-2999 | 0 | 0 | 1 | 0 | 0 | 5 | 24 | 1053 | 47425 | 21521 | 2069 | 72098 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | (0.004) | (0.183) | (8.241) | (3.739) | (0.360) | (12.528) | |
| 3000-3499 | 0 | 5 | 2 | 3 | 6 | 2 | 30 | 334 | 67091 | 80740 | 4400 | 152613 |
| (0.000) | (0.001) | (0.000) | (0.001) | (0.001) | (0.000) | (0.005) | (0.058) | (11.658) | (14.029) | (0.765) | (26.518) | |
| 3500-3999 | 2 | 1 | 4 | 1 | 3 | 5 | 13 | 117 | 33709 | 82270 | 3480 | 119605 |
| (0.000) | (0.000) | (0.001) | (0.000) | (0.001) | (0.001) | (0.002) | (0.020) | (5.857) | (14.295) | (0.605) | (20.782) | |
| 4000-4499 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 32 | 7202 | 29943 | 1040 | 38222 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | (0.006) | (1.251) | (5.203) | (0.181) | (6.641) | |
| 4500-4999 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 916 | 4763 | 175 | 5860 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | (0.159) | (0.828) | (0.030) | (1.018) | |
| 5000+ | 0 | 0 | 0 | 1 | 1 | 0 | 8 | 4 | 132 | 520 | 28 | 694 |
| (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.001) | (0.001) | (0.023) | (0.090) | (0.005) | (0.121) | |
| Missing | 4 | 5 | 6 | 10 | 23 | 30 | 110 | 169 | 315 | 269 | 151700 | 152641 |
| (0.001) | (0.001) | (0.001) | (0.002) | (0.004) | (0.005) | (0.019) | (0.029) | (0.055) | (0.047) | (26.359) | (26.523) | |
| Total | 28 | 44 | 82 | 156 | 369 | 398 | 2519 | 10727 | 173806 | 223374 | 164006 | 575509 |
| (0.005) | (0.008) | (0.014) | (0.027) | (0.064) | (0.069) | (0.438) | (1.864) | (30.200) | (38.813) | (28.498) | (100.000) | |
Numbers of subjects (percentages of overall data set) according to birth weight (g) by gestational age (weeks), as recorded in the 1995 Hospital Episode Statistics data set
Number of pairs matched using guestimate probabilities (1995)
| Cut off weight | N pairs | N EPICure | % EPICure | N HES | % HES |
|---|---|---|---|---|---|
| 15 | 2093 | 537 | 80.39 | 1846 | 0.38 |
| 16 | 1939 | 528 | 79.04 | 1726 | 0.35 |
| 17 | 792 | 365 | 54.64 | 692 | 0.14 |
| 18 | 467 | 302 | 45.21 | 401 | 0.08 |
| 19 | 435 | 285 | 42.66 | 380 | 0.08 |
| 20 | 335 | 256 | 38.32 | 294 | 0.06 |
| 21 | 270 | 216 | 32.34 | 237 | 0.05 |
| 22 | 229 | 200 | 29.94 | 208 | 0.04 |
| 23 | 202 | 182 | 27.25 | 193 | 0.04 |
| 24 | 175 | 166 | 24.85 | 167 | 0.03 |
| 25 | 158 | 150 | 22.46 | 152 | 0.03 |
| 26 | 145 | 138 | 20.66 | 142 | 0.03 |
| 27 | 140 | 133 | 19.91 | 137 | 0.03 |
| 28 | 112 | 110 | 16.47 | 109 | 0.02 |
| 29 | 97 | 96 | 14.37 | 96 | 0.02 |
| 30 | 86 | 86 | 12.87 | 86 | 0.02 |
| 31 | 67 | 67 | 10.03 | 67 | 0.01 |
| 32 | 50 | 50 | 7.49 | 50 | 0.01 |
| 34 | 47 | 47 | 7.04 | 47 | 0.01 |
| 35 | 41 | 41 | 6.14 | 41 | 0.01 |
| 37 | 31 | 31 | 4.64 | 31 | 0.01 |
| 38 | 26 | 26 | 3.89 | 26 | 0.01 |
| 39 | 9 | 9 | 1.35 | 9 | 0.00 |
| 40 | 4 | 4 | 0.60 | 4 | 0.00 |
| 42 | 2 | 2 | 0.30 | 2 | 0.00 |
| 43 | 0 | 0 | 0.00 | 0 | 0.00 |
Table of the number of pairs in 1995 matched from each data set for differing cutoffs in the value of the weight calculated by the Fellegi-Sunter (guestimate) method of data linkage
Fig. 2Density distribution of weights from the stochastic linkage analyses using best guess probabilities. Axes are not to the same scale
Fig. 3Numbers of individual matches according to weight from each of the Hospital Episode Statistics (HES) (blue line) and EPICure (red line) data sets in the stochastic linkage analysis using best guess probabilities. “Weight” is on the x-axis, number of matches on the y-axis; axes are not to the same scale
Number of pairs matched using the EpiLink algorithm (1995)
| Cut off weight | N pairs | N EPICure | % EPICure | N HES | % HES |
|---|---|---|---|---|---|
| 0.35 | 45349 | 662 | 99.10 | 38163 | 7.84 |
| 0.40 | 9329 | 612 | 91.62 | 8533 | 1.75 |
| 0.45 | 1670 | 421 | 63.02 | 1541 | 0.32 |
| 0.50 | 492 | 279 | 41.77 | 461 | 0.09 |
| 0.55 | 213 | 193 | 28.89 | 209 | 0.04 |
| 0.60 | 157 | 147 | 22.01 | 153 | 0.03 |
| 0.65 | 117 | 111 | 16.62 | 114 | 0.02 |
| 0.70 | 78 | 74 | 11.08 | 78 | 0.02 |
| 0.75 | 51 | 51 | 7.63 | 51 | 0.01 |
| 0.80 | 20 | 20 | 2.99 | 20 | 0.00 |
| 0.85 | 8 | 8 | 1.20 | 8 | 0.00 |
| 0.90 | 0 | 0 | 0.00 | 0 | 0.00 |
| 0.95 | 0 | 0 | 0.00 | 0 | 0.00 |
| 1.00 | 0 | 0 | 0.00 | 0 | 0.00 |
Table of the number of pairs matched in 1995 from each data set for differing cutoffs in the value of the weight calculated by the EpiLink (Contiero) method of data linkage
Linkage error measures (1995)
| Linkage algorithm | Cutoff | True matches | PPV | Sensitivity |
|---|---|---|---|---|
| EM | 10.00 | 238 | 0.005 | 0.356 |
| EpiLink (Contiero) | 0.35 | 387 | 0.009 | 0.579 |
| FS (baseline model) | 15.00 | 402 | 0.192 | 0.602 |
| FS (Dattani estimates) | 35.00 | 244 | 0.008 | 0.365 |
Positive predictive value (PPV) and sensitivity of results obtained using different methods for linkage between the HES and EPICure data sets in 1995. EM: estimation-maximisation, FS: Fellegi-Sunter
Linkage error measures (1995)
| Linkage algorithm | Cutoff | True matches | PPV | Sensitivity |
|---|---|---|---|---|
| EM | 10 | 1408 | 0.025 | 0.512 |
| EpiLink (Contiero) | 0.35 | 1501 | 0.237 | 0.546 |
| Fellegi-Sunter (baseline model) | 10 | 1740 | 0.039 | 0.633 |
| Fellegi-Sunter (Dattani estimates) | 15 | 1665 | 0.031 | 0.606 |
Positive predictive value (PPV) and sensitivity of results obtained using different methods for linkage between the HES and EPICure data sets in 2006. EM: estimation-maximisation, FS: Fellegi-Sunter
Changes in the number of births in HES data over time
| HES data set a | 1995 b | 2006 (<26 weeks) c | Percentage change d | 2006 (<27 weeks) e | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Live | Still | Not | Total | Live | Still | Not | Total | Live | Still | Not | Total | ||
| births | births | known | births | births | known | births | births | known | |||||
| Reported | 621 | 213 | 33 | 867 | 887 | 121 | 180 | 1188 | 37 % | 1856 | 201 | 278 | 2535 |
| “True” | 396 | 16 | 10 | 422 | 699 | 127 | 187 | 1013 | 140 % | 1158 | 213 | 291 | 1662 |
| “Confirmed” | 282 | 13 | 5 | 300 | 412 | 81 | 75 | 568 | 89 % | 684 | 134 | 114 | 932 |
| “Misclassified” | 339 | 200 | 28 | 567 | 475 | 40 | 105 | 620 | 9 % | 1172 | 67 | 364 | 1603 |
Changes in the number of births in Hospital Episode Statistics (HES) data between 1995 and 2006: reported, “true”, “confirmed” and “misclassified” data
aFor each year, data sets were created based upon : a) gestational age as reported in the original HES data; b) only the “true” data identified by the data linkage exercise (i.e. contained in both HES and EPICure); c) HES data “confirmed” by the “true” data; and, d) “misclassified” data, which are those reported by HES but that were not identified as “true” during data linkage
bIn 1995, data were available from March 1st – December 31st for babies of <26 completed weeks gestational age
cComparison data sets from 2006 were created to include babies born between 1st March and 31st December at less than 26 weeks gestational age
dThe total percentage increase in all births is presented
eThe complete data sets from 2006 include births of <27 completed weeks gestational age from the entire year