| Literature DB >> 30526518 |
Christopher T Rentsch1, Katie Harron2, Mark Urassa3, Jim Todd4,3, Georges Reniers4,5, Basia Zaba4.
Abstract
BACKGROUND: Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania.Entities:
Keywords: Bias; Data accuracy; HIV; Linkage error; Record linkage; Sub-Saharan Africa
Mesh:
Year: 2018 PMID: 30526518 PMCID: PMC6288858 DOI: 10.1186/s12874-018-0632-5
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Characteristics of patients in the analytic sample
| Characteristic | Sero-survey participants ( | Clinic patients ( | |
|---|---|---|---|
| Clinic | |||
| ANC | – | 16 (11.3) | – |
| HTC | – | 126 (88.7) | |
| Sex | |||
| Female | 173 (65.8) | 98 (69.0) | 0.5092 |
| Male | 90 (34.2) | 44 (31.0) | |
| Age, years | |||
| 15–29 | 62 (23.6) | 51 (35.9) | 0.0222 |
| 30–39 | 96 (36.5) | 53 (37.3) | |
| 40–49 | 59 (22.4) | 22 (15.5) | |
| 50+ | 46 (17.5) | 16 (11.3) | |
| Village | |||
| Igekemaja | 27 (10.3) | 14 (9.9) | 0.0167 |
| Ihayabuyaga | 30 (11.4) | 6 (4.2) | |
| Isangijo | 27 (10.3) | 14 (9.9) | |
| Kanyama | 38 (14.5) | 23 (16.2) | |
| Kisesa | 73 (27.8) | 51 (35.9) | |
| Kitumba | 32 (12.2) | 26 (18.3) | |
| Welamasonga | 36 (13.7) | 8 (5.6) | |
| Rurality of sub-village | |||
| Rural | 140 (53.2) | 55 (38.7) | 0.0204 |
| Peri-urban | 54 (20.5) | 39 (27.5) | |
| Urban | 69 (26.2) | 48 (33.8) | |
| Sub-village had paved road | |||
| Yes | 109 (41.4) | 70 (49.3) | 0.1290 |
| No | 154 (58.6) | 72 (50.7) | |
| Distance from household to CTC, km | |||
| < 1 | 53 (20.2) | 37 (26.1) | 0.0162 |
| 1–1.9 | 58 (22.1) | 45 (31.7) | |
| 2–4.9 | 60 (22.8) | 29 (20.4) | |
| 5–11 | 92 (35.0) | 31 (21.8) | |
| Registered at CTC | 42 (16.0) | 75 (52.8) | < 0.0001 |
Abbreviations: CTC - HIV care and treatment centre; ANC - antenatal clinic; HTC - HIV testing and counselling clinic
Note: all statistics are given in n (%); differences tested using chi-square
Completeness of matching identifiers in clinic data and demographic surveillance data
| % records with complete information | |||
|---|---|---|---|
| Matching identifier | Sero-surveys ( | Clinic data ( | HDSS data ( |
| First name | 100.0% | 100.0% | 100.0% |
| Second name | 100.0% | 100.0% | 100.0% |
| Third name | 13.3% | 88.7% | – |
| Year of birth | 100.0% | 100.0% | 99.4% |
| Sex | 100.0% | 100.0% | 100.0% |
| Village | 100.0% | 99.3% | 100.0% |
| Sub-village | 100.0% | 99.3% | 100.0% |
| TCL first name | 48.3% | 91.5% | 99.4% |
| TCL second name | 48.3% | 74.6% | 99.4% |
| Household member first name | 71.5% | 11.3% | 99.9% |
| Household member second name | 71.5% | 11.3% | 99.9% |
Abbreviations: HDSS - health and demographic surveillance system; TCL - ten-cell leader
Fig. 1Sensitivity (Se), positive predictive value (PPV), and false match rate (False), by match score threshold. Notes: Se = linked records over N true matches; PPV = true matches over N linked records; False = false matches over N linked records; or simply the inverse of PPV
Comparison of regression model diagnostics by match score threshold
| Sample | n | β | SE | χ2 | p | HR (95% CI) | PPV |
|---|---|---|---|---|---|---|---|
| Gold standard | 405 | 1.61 | 0.2033 | 62.4 | <.0001 | 4.98 (3.34, 7.42) | – |
| Probabilistic linkage threshold, by match score threshold | |||||||
| minimum | 405 | 1.02 | 0.2383 | 18.2 | <.0001 | 2.76 (1.73, 4.41) | 0.612 |
| low | 359 | 1.20 | 0.2579 | 21.7 | <.0001 | 3.32 (2.00, 5.51) | 0.649 |
| medium | 235 | 0.86 | 0.4621 | 3.5 | 0.0615 | 2.37 (0.96, 5.87) | 0.745 |
| high | 106 | 0.53 | 1.1707 | 0.2 | 0.6501 | 1.70 (0.17, 16.87) | 0.896 |
Abbreviations: n - sample size; β - primary exposure coefficient; SE - standard error; χ2 - chi-square; p - p-value; HR - hazard ratio; CI - confidence interval; PPV - automated linkage algorithm’s positive predictive value
Note: All models adjusted for age, sex, sub-village, and distance from household to CTC
Fig. 2Associations between primary exposure and outcome variables by match score threshold