| Literature DB >> 28716074 |
Enny S Paixão1, Katie Harron2, Kleydson Andrade3, Maria Glória Teixeira3, Rosemeire L Fiaccone4, Maria da Conceição N Costa3, Laura C Rodrigues2.
Abstract
BACKGROUND: Due to the increasing availability of individual-level information across different electronic datasets, record linkage has become an efficient and important research tool. High quality linkage is essential for producing robust results. The objective of this study was to describe the process of preparing and linking national Brazilian datasets, and to compare the accuracy of different linkage methods for assessing the risk of stillbirth due to dengue in pregnancy.Entities:
Keywords: Data linkage; Dengue; Electronic health records; Linkage accuracy; Linkage quality; Routine data; Stillbirth
Mesh:
Year: 2017 PMID: 28716074 PMCID: PMC5513351 DOI: 10.1186/s12911-017-0506-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Number of records from Brazilian Information System of Notifiable Disease and Brazilian Information System of Mortality
Deterministic rules used to create the gold-standard database
| Linkage rule | Number of links (%) |
|---|---|
| Full name and agea | 46 (24.1%) |
| Agea and combination of first and last name | 13 (6.8%) |
| Agea and combination of first and second name | 9 (4.8%) |
| Agea and combination of second and last name | 19 (9.9%) |
| Full name | 65 (34.0%) |
| First and last name | 19 (9.9%) |
| First and second name | 8 (4.2%) |
| Second and last name | 12 (6.3%) |
| Agea and Jaro-Winkler string comparator >0.95 | 0 (0%) |
|
|
aAge in years as recorded in the data or as derived from date of birth
Comparison of linkage strategies for the bespoke algorithm and ReclinkIII
| Bespoke algorithm |
| |
|---|---|---|
| Manipulation of names | • Multiple variables created for first name, second name, and last name | • Variables created for first and last name |
| Blocking | • Municipality | • Soundex for name + municipality |
| Calculation of | • | • |
| Match weight calculation | • Separate weights calculated for the five most common names | • Did not account for common names |
Performance of linkage algorithms and thresholds
| Bespoke algorithm |
| |||
|---|---|---|---|---|
| Conservative threshold = 21 | Relaxed threshold = 20 | Conservative threshold = 12 | Relaxed threshold = 10 | |
| N linked | 131 | 193 | 125 | 788 |
| N true links | 123 | 132 | 102 | 114 |
| N false-matches | 8 | 61 | 23 | 674 |
| N missed-matches | 68 | 59 | 89 | 77 |
| Sensitivity % (95% CI) | 64.4 (57.2–71.2) | 69.1 (62.0–75.6) | 53.4 (46.1–60.6) | 59.7 (52.4–66.7) |
| Positive predictive value % (95% CI) | 93.9 (86.6–96.3) | 68.4 (61.3–74.8) | 81.6 (73.7–87.9) | 14.5 (12.1–17.1) |
Associations between linkage accuracy (using the bespoke algorithm) and characteristics of the cohort
| True matches | Missed-matches | OR (95% CI) |
| False-matches | OR (95% CI) |
| |
|---|---|---|---|---|---|---|---|
| Age of the mother in years | |||||||
| < 20 | 25 (20.3) | 19 (27.9) | 1 |
| - |
| |
| 20–35 | 67 (54.5) | 30 (44.1) | 1.7 (0.8–3.5) | 8 (100) | |||
| > 35 | 22 (17.9) | 7 (10.3) | 2.3 (0.8–6.7) | - | |||
| Missing | 9 (7.3) | 12 (17.5) | - | ||||
| Maternal literacy | |||||||
| Illiterate | 7 (5.7) | 5 (7.3) | 1 |
| - | 1 |
|
| 1–3 years | 8 (6.5) | 4 (5.9) | 1.4 (0.3–7.5) | 1 (12.5) | 0.9 (0.4–1.9)a | ||
| 4–7 years | 32 (26.0) | 14 (20.6) | 1.6 (0.4–6.0) | 2 (25.0) | |||
| > 8 years | 34 (27.6) | 20 (29.4) | 1.2 (0.3–4.3) | 4 (50.0) | |||
| > 11 years | 19 (15.4) | 3 (4.4) | 4.5 (0.8–24.1) | - | |||
| Missing | 23 (18.7) | 22 (32.3) | 1 (12.5) | ||||
| Previous fetal death or abortion | |||||||
| No | 36 (29.3) | 17 (25.0) | 1 |
| 2 (25.0) | 1 |
|
| Yes | 59 (48.0) | 31 (45.6) | 0.9 (0.4–1.8) | 4 (50.0) | 1.2 (0.2–7)a | ||
| Missing | 28 (22.7) | 20 (29.4) | 2 (25.0) | ||||
| Gestational age | |||||||
| < 22 weeks | 9 (7.3) | 2 (2.9) | 1 |
| - |
| |
| 22–27 weeks | 29 (23.6) | 18 (26.5) | 0.3 (0.1–1.8) | 3 (37.5) | 1 | ||
| 28–31 weeks | 21 (17.8) | 11 (16.2) | 0.4 (0.1–2.3) | 3 (37.5) | 0.7 (0.4–1.3)a | ||
| 32–36 weeks | 26 (21.1) | 19 (27.9) | 0.3 (0.1–1.5) | 2 (25.0) | |||
| 37–41 weeks | 28 (22.8) | 9 (13.2) | 0.7 (0.1–3.8) | - | |||
| ≥ 42 weeks | 2 (1.6) | - | - | ||||
| Missing | 8 (6.5) | 9 (13.2) | - | ||||
| Birth or death weight | |||||||
| ≥ 2500 | 23 (18.7) | 16 (23.5) | 1 |
| - |
| |
| 1500–2500 | 26 (21.2) | 16 (23.5) | 1.1 (0.4–2.7) | - | |||
| < 1500 | 64 (52.0) | 31 (45.6) | 1.4 (0.6–3.0) | 7 (87.5) | |||
| Missing | 10 (8.1) | 7 (7.3) | 1 (12.5) | ||||
aDue to the small number of observations, we used only two categories, the first category without missing value as the reference one