| Literature DB >> 18849301 |
Antonio G Pacheco1, Valeria Saraceni, Suely H Tuboi, Lawrence H Moulton, Richard E Chaisson, Solange C Cavalcante, Betina Durovni, José C Faulhaber, Jonathan E Golub, Bonnie King, Mauro Schechter, Lee H Harrison.
Abstract
Loss to follow-up is a major source of bias in cohorts of patients with human immunodeficiency virus (HIV) and could lead to underestimation of mortality. The authors developed a hierarchical deterministic linkage algorithm to be used primarily with cohorts of HIV-infected persons to recover vital status information for patients lost to follow-up. Data from patients known to be deceased in 2 cohorts in Rio de Janeiro, Brazil, and data from the Rio de Janeiro State mortality database for 1999-2006 were used to validate the algorithm. A fully automated procedure yielded a sensitivity of 92.9% and specificity of 100% when no information was missing. When the automated procedure was combined with clerical review, in a scenario of 5% death prevalence and 20% missing mothers' names, sensitivity reached 96.5% and specificity 100%. In a practical application, the algorithm significantly increased death rates and decreased the rate of loss to follow-up in the cohorts. The finding that 23.9% of matched records did not give HIV or acquired immunodeficiency syndrome as the cause of death reinforces the need to search all-cause mortality databases and alerts for possible underestimation of death rates. These results indicate that the algorithm is accurate enough to recover vital status information on patients lost to follow-up in cohort studies.Entities:
Mesh:
Year: 2008 PMID: 18849301 PMCID: PMC2638543 DOI: 10.1093/aje/kwn249
Source DB: PubMed Journal: Am J Epidemiol ISSN: 0002-9262 Impact factor: 4.897
Classification of Matched Records Used to Validate a Record-Linkage Algorithm, Brazil, 1999–2006
| Automatic Inclusion Codes | Patient's Name | Date of Birth | Mother's Name |
| 0 | Exact | Exact | Exact |
| 1 | Exact | Exact | Same PC |
| 2 | Exact | 1 error or swap | Exact |
| 3 | Exact | 1 error or swap | Same PC |
| 4 | Score > 0.75 | Exact | Exact |
| 5 | Score > 0.75 | 1 error or swap | Exact |
| 6 | Score > 0.75 | Exact | Same PC + score > 0.75 |
| 7 | Score > 0.9 | 1 error or swap | Score > 0.8 |
| 8 | Exact | Exact | Missing |
| 9 | Exact | 1 error or swap | Missing |
| 10 | Score > 0.9 | Exact | Missing |
| Exclusion | Not missing | >1 error | Different PC |
| Score ≤ 0.9 | >1 error | Score ≤ 0.8 | |
| Not missing | >1 error | Score ≤ 0.7 | |
| Score < 0.8 | Not missing | Not missing | |
| Not missing | Day, month, and year are different | Missing | |
| Score < 0.8 | Missing |
Abbreviation: PC, phonetic code.
After passing the first blocking phase: same PC of patient's first and last name OR same PC of mother's first and last name OR same PC of patient's and mother's first names.
PC is for mother's name only in this case.
Records that are not included or excluded are left over for clerical review. Score values were chosen empirically (see text).
Accuracy of Exact Matches and Automatic Codes When Records in the Test Database Have Full or Partial Information (50% and 5% Prevalence Scenarios), Brazil, 1999–2006
| Accuracy | Full Information | No Mother's Name | Name Only | |||||||
| Exact Match | Automatic Codes | Exact Match | Automatic Codes | |||||||
| % | 95% CI | % | 95% CI | % | 95% CI | % | 95% CI | % | 95% CI | |
| Sensitivity | 50.8 | 45.6, 56.0 | 92.9 | 88.3, 94.2 | 71.2 | 66.3, 75.8 | 91.8 | 88.6, 94.4 | 77.4 | 72.8, 81.6 |
| Specificity | 100.0 | 99.0, 100.0 | 100.0 | 99, 100.0 | 100.0 | 99.0, 100.0 | 100.0 | 99.0, 100.0 | 82.1 | 77.8, 85.8 |
| 50% prevalence | ||||||||||
| PPV | 100.0 | 98.0, 100.0 | 100.0 | 98.9, 100.0 | 100.0 | 98.6, 100.0 | 100.0 | 98.9, 100.0 | 81.2 | 76.7, 85.1 |
| NPV | 67.0 | 62.9, 70.9 | 93.4 | 90.5, 95.6 | 77.6 | 73.6, 81.3 | 92.5 | 89.4, 94.9 | 78.4 | 74.0, 82.4 |
| 5% prevalence | ||||||||||
| PPV | 100.0 | 98.0, 100.0 | 99.4 | 97.9, 99.9 | 98.9 | 96.7, 99.8 | 92.6 | 89.4, 95.0 | 19.5 | 17.5, 21.6 |
| NPV | 97.5 | 97.1, 97.8 | 99.6 | 99.5, 99.8 | 98.5 | 98.2, 98.8 | 99.6 | 99.4, 99.7 | 98.6 | 98.3, 98.9 |
Abbreviations: CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value.
Since only name was available in this case, only exact matches were considered.
Exact match means a perfect match between the available variables in both databases.
The automatic inclusion codes listed in Table 1.