Literature DB >> 18849301

Validation of a hierarchical deterministic record-linkage algorithm using data from 2 different cohorts of human immunodeficiency virus-infected persons and mortality databases in Brazil.

Antonio G Pacheco¹, Valeria Saraceni, Suely H Tuboi, Lawrence H Moulton, Richard E Chaisson, Solange C Cavalcante, Betina Durovni, José C Faulhaber, Jonathan E Golub, Bonnie King, Mauro Schechter, Lee H Harrison.

Abstract

Loss to follow-up is a major source of bias in cohorts of patients with human immunodeficiency virus (HIV) and could lead to underestimation of mortality. The authors developed a hierarchical deterministic linkage algorithm to be used primarily with cohorts of HIV-infected persons to recover vital status information for patients lost to follow-up. Data from patients known to be deceased in 2 cohorts in Rio de Janeiro, Brazil, and data from the Rio de Janeiro State mortality database for 1999-2006 were used to validate the algorithm. A fully automated procedure yielded a sensitivity of 92.9% and specificity of 100% when no information was missing. When the automated procedure was combined with clerical review, in a scenario of 5% death prevalence and 20% missing mothers' names, sensitivity reached 96.5% and specificity 100%. In a practical application, the algorithm significantly increased death rates and decreased the rate of loss to follow-up in the cohorts. The finding that 23.9% of matched records did not give HIV or acquired immunodeficiency syndrome as the cause of death reinforces the need to search all-cause mortality databases and alerts for possible underestimation of death rates. These results indicate that the algorithm is accurate enough to recover vital status information on patients lost to follow-up in cohort studies.

Entities: Chemical Disease Species

Mesh：

Year: 2008 PMID： 18849301 PMCID： PMC2638543 DOI： 10.1093/aje/kwn249

Source DB: PubMed Journal: Am J Epidemiol ISSN： 0002-9262 Impact factor: 4.897

Database linkage is the process of comparing records from different databases that contain enough information to determine whether those records refer to the same person or, more generally, to the same entity (1). There are 3 main types of record linkage: manual (or clerical), deterministic, and probabilistic. These methods can be combined, depending on the strategy used. The first type consists of manually comparing records in 2 databases and deciding whether they are true matches or not. This was the standard method used before the availability of computers. It is often highly labor-intensive and is sometimes not feasible, particularly when the amount of data is too large. Deterministic methods are classically based on exact-match comparisons of either 1 unique identifier common to both databases (e.g., Social Security number) or a combination of variables that allow unique discrimination (e.g., name, surname, date of birth, gender) (2–4). Probabilistic methods are also based on several variables, but comparisons are made on the basis of the prior probability of 2 records’ belonging to the same entity and then calculating a maximum likelihood estimator to reach a score for similarity between records (1, 5, 6). The method (or combination of methods) to be chosen depends on the type of analysis to be carried out with the linked data and the types of databases available (7). Record linkage is widely used in population-based studies to make inferences about specific outcomes and in cohort studies to make inferences at the level of the individual (3, 7–10). Morbidity and mortality databases are often employed for this purpose, given their wide availability and the fact that their records generally contain sufficient information for linkage with other databases. Investigators in cohort studies usually use linkage techniques to gather additional information about patients being followed over time. Even when studying cohorts with active follow-up, investigators tend to complement their information with external databases in order to minimize underreporting of specific conditions (e.g., vital status), including in their protocol a passive follow-up component. In the case of cohorts of human immunodeficiency virus (HIV)-infected patients, morbidity databases (e.g., tuberculosis, cancer) are important sources of additional information (11–13). In Brazil, official surveillance and mortality databases contain variables, such as full name, date of birth, and mother's name (either maiden or married surname), that are suitable for linkage procedures because of their potentially high discriminatory power, particularly when used in combination. In the present study, we describe the validation of a new deterministic linkage algorithm that we developed to be used for passive data collection with cohorts of HIV-infected patients. The algorithm has a hierarchical structure and allows for specific errors in names and dates of birth. It can be used in combination with clerical review of records that are not classified as true matches or are not excluded as nonmatches. Our main objectives when developing the algorithm were to maximize accuracy and to minimize the need for clerical review.

MATERIALS AND METHODS

Data sources

Three data sources were used in this study. The Rio de Janeiro cohort database was originally designed to validate the World Health Organization HIV staging system in a developing country (14). It currently comprises information from 2,666 HIV-infected patients being followed at the Clementino Fraga Filho University Hospital in Rio de Janeiro, Brazil. All patients are aged 16 years or older and are included only if they have made at least 1 follow-up visit. The rate of loss to follow-up between 2000 and 2005 in the Rio de Janeiro cohort was 2.9 per 100 person-years. The TB-HIV in Rio (THRio) Study is an ongoing cohort study designed to assess the impact of implementing isoniazid prophylactic therapy among HIV-positive patients with indications for prophylaxis in Rio de Janeiro. It has enrolled more than 15,000 patients from 29 clinics, where care is provided both for HIV and for tuberculosis (15, 16). There has not yet been enough follow-up time to calculate accurately the rates of loss to follow-up in this study. The third database is the Rio de Janeiro State mortality database for 2000–2006, with a total of 835,066 records. The Rio de Janeiro State mortality database is part of the “Sistema de Informação sobre Mortalidade” database, which is the official mortality system in Brazil. The death certificate is a standardized form that is filled out by a physician. It includes demographic information and primary, secondary, and contributing causes of death coded according to the International Classification of Diseases, Tenth Revision (ICD-10), among other variables. An electronic version of these forms was introduced countrywide in 1979. Information that can identify patients, such as name, mother's name, date of birth, and address, is also recorded and was made available through a special request to the state health department. According to the Brazilian Ministry of Health, the mortality system in Rio de Janeiro State has 100% coverage of deaths (17), even though the percentage of undefined causes of death remained somewhat high (9.3%) in 2005. Data linkage between both cohort databases and the mortality database is part of routine procedures for assessment of vital status among patients who are lost to follow-up and was approved by the institutional review boards of all involved institutions. Data from patients known to be deceased through an independent source (generally medical charts) with identifying variables were used for validation purposes. To validate the algorithm, we assembled test data sets and then linked them with the mortality database. We then studied the outcome “finding a record in the mortality database.” To determine the sensitivity of the linkage to the mortality database for identifying deceased patients, all patients known to be deceased and who had complete information on full name, date of birth, and mother's name (either maiden or married surname) in the Rio de Janeiro cohort between 2000 and 2005 (53 patients) and in the THRio cohort between 2003 and 2006 (315 patients) were included in the analysis. To assess specificity, we incorporated into the test database records that were not supposed to be in the state mortality database between 2000 and 2006 and that would be subject to similar typing mistakes as those for the patients known to be deceased. We chose to use a random sample of control records of patients who died in 1999, a year that was not included in the linkage. The overall completeness of information for the THRio cohort was 98.3% for full information and 99.7% for name and date of birth. In the Rio de Janeiro cohort, 60% had full information and 100% had at least name and date of birth.

Data preprocessing

The first step in data linkage was preprocessing of data to guarantee that all variables conformed to the same format. For names, all letters were capitalized, and accents and characters other than letters were removed. Suffixes referring to a person of the same name, such as the individual's father (Junior, Filho, etc.), were also removed. A specific software function (see supplementary data posted on the Journal’s website (http://aje.oxfordjournals.org/)) was developed for this purpose and has the ability to preprocess a string field as a whole, either with Windows-based Latin alphabet encoding (cp1252) or with the DOS-based alphabet, which is still used in older “.dbf” files (cp850).

Linkage algorithm

To avoid exponential growth of processing time, we first blocked records (i.e., grouped them) by means of a phonetic code adapted from the original Soundex algorithm (18) to account for Brazilian Portuguese names (see supplementary data for details (http://aje.oxfordjournals.org/)). Blocks were composed by combining either the phonetic codes from the first and last names, the phonetic codes from the mother's first and last names, or the phonetic codes from the first name and the mother's first name. We used the third category to account for last names that are difficult to spell and that are recorded equally for the individual and his/her mother in both databases but are misspelled in one of them. Records within each block were then compared, using exact comparisons and also allowing for some errors, in a hierarchical fashion, as described below. Errors in name fields were evaluated by means of the phonetic codes and also by a string similarity score, based on a recursive longer common substring algorithm, implemented in the “difflib” library from the programming language Python (19). Dates of birth were allowed to have, at most, a 1-digit mistake in any position or the common swap between day and month (only if they were exactly the same, but swapped). Score values used in the algorithm as described in Table 1 were chosen empirically in the beginning of the algorithm development, using different data sources (municipal databases for surveillance of acquired immunodeficiency syndrome (AIDS) and tuberculosis; data not shown). The combinations of these measurements and the values for the scores determine several levels of inclusion—which in the present paper are referred to as “automatic codes” and depend on how much information is available, as shown in Table 1.

Table 1.

Classification of Matched Records Used to Validate a Record-Linkage Algorithm, Brazil, 1999–2006

Automatic Inclusion Codesa	Patient's Name	Date of Birth	Mother's Nameb
0	Exact	Exact	Exact
1	Exact	Exact	Same PC
2	Exact	1 error or swap	Exact
3	Exact	1 error or swap	Same PC
4	Score > 0.75	Exact	Exact
5	Score > 0.75	1 error or swap	Exact
6	Score > 0.75	Exact	Same PC + score > 0.75
7	Score > 0.9	1 error or swap	Score > 0.8
8	Exact	Exact	Missing
9	Exact	1 error or swap	Missing
10	Score > 0.9	Exact	Missing
Exclusionc	Not missing	>1 error	Different PC
	Score ≤ 0.9	>1 error	Score ≤ 0.8
	Not missing	>1 error	Score ≤ 0.7
	Score < 0.8	Not missing	Not missing
	Not missing	Day, month, and year are different	Missing
	Score < 0.8		Missing

Abbreviation: PC, phonetic code.

After passing the first blocking phase: same PC of patient's first and last name OR same PC of mother's first and last name OR same PC of patient's and mother's first names.

PC is for mother's name only in this case.

Records that are not included or excluded are left over for clerical review. Score values were chosen empirically (see text).

Classification of Matched Records Used to Validate a Record-Linkage Algorithm, Brazil, 1999–2006 Abbreviation: PC, phonetic code. After passing the first blocking phase: same PC of patient's first and last name OR same PC of mother's first and last name OR same PC of patient's and mother's first names. PC is for mother's name only in this case. Records that are not included or excluded are left over for clerical review. Score values were chosen empirically (see text). Records with complete information (automatic codes 0–7; Table 1) are treated independently from records with missing data (automatic codes 8–10 when mother's name is missing; Table 1). Whenever a pair of records is neither automatically included with one of the inclusion codes described nor automatically excluded by the criteria in Table 1, this pair is kept in the final merged database, marked as an unresolved pair for possible further clerical review. The algorithm is hierarchical in the sense that lower codes mean more similar records—0 and 8 are perfect matches, but codes 0–7 are used for records with full information and thus are more robust than codes 8–10 for records that are missing mother's name. The algorithm is not “greedy” in that the same record in the test database linked with a 0 code to one record could also be linked to another one with a code 7, for example. This feature is important, because the algorithm can also be used for databases with 1-to-many relations, as in the case of tuberculosis surveillance databases. For mortality, which is supposed to have a 1-to-1 relation with the cohort databases, multiple matches for the same patient can easily be resolved by automatically picking the match with the lowest value. This was done in the present study. If a pair was neither included nor excluded, it was eligible for clerical review. For records with name only, only perfect matches were considered. The algorithm was written in Python for Windows (19).

Algorithm validation

We used 3 different scenarios to validate the algorithm. First we considered a hypothetical situation in which patients lost to follow-up in a cohort of HIV-infected persons would be searched for in the mortality database, and we assumed that 50% of these lost patients had actually died. Thus, we constructed a database by combining the 368 records of patients known to be deceased in the cohorts with a random sample of 368 records from the 1999 mortality database. In this scenario, we compared accuracy for exact matches between the records in all fields with the automatic inclusion codes, when 1) full information was available for all individuals, 2) only name and date of birth were available, and 3) only name was available. In the second scenario, we tested the impact on accuracy if all patients in a cohort of HIV-infected persons were linked to the mortality database, considering that only 5% of the patients were truly deceased. This is a reasonable percentage for an open cohort of HIV-infected patients in developing countries, where death rates are generally around 5 per 100 person-years (20). In this case, we made up a data set with 368 records of patients known to be deceased in the cohorts and by randomly selecting 6,992 patients (368/0.05 – 368 = 6,992) from the 1999 mortality database. In these 2 scenarios, ties were resolved automatically by choosing the pair with the lowest score, and no manual search was performed—the aim being merely to assess the potential impact on accuracy of missing information in the test database. In the third scenario, we mimicked a situation similar to what one may encounter in practical research with a cohort of HIV-infected patients: assuming a 50% prevalence of deaths among patients lost to follow-up and a 20% prevalence of records missing the mother's name. The database was set up for this scenario in the same way as it was for the first scenario, but 20% of mothers’ names were randomly deleted from the test database. In this run, we did not consider records that were missing date of birth. Unresolved pairs were submitted to clerical review by 2 independent researchers, and disagreements were resolved by a third reviewer. In addition, records with automatic inclusion codes were manually reviewed for quality control purposes. To minimize selection bias, reviewers had no access to the group membership status of records being reviewed. Sensitivity, specificity, positive predictive values (PPVs), and negative predictive values (NPVs) were calculated for the experiments, along with 95% confidence intervals, using appropriate methods (21). The total numbers of records in the test databases were used as denominators for calculations. For records of patients found in the mortality system, we assessed the proportion of cases for which AIDS-related ICD-10 codes (codes B20–B24) were not mentioned on the death certificate. The coding system used for death certificates follows the World Health Organization guidelines (22). For a preliminary practical application of the algorithm, rates of loss to follow-up were compared before and after the algorithm was used for the Rio de Janeiro cohort and death rates were compared before and after the algorithm was used for both cohorts. Exact Poisson 95% confidence intervals are presented for the differences. Calculations were done in the R software environment (23).

RESULTS

Table 2 shows results from the first 2 scenarios for assessing the impact of missing information on the accuracy of the algorithm. As expected, sensitivity for exact matches increased when less information was available for linking the records, while the addition of the automatic codes without manual review represented a significant increase both when full information was available (from 50.8% to 92.9%) and when the mother's name was missing (from 71.2% to 91.8%). Specificity for both cases was very high, and no misclassification was made by the algorithm when the death prevalence was 50% (PPV = 100%). To evaluate the impact of having 5% or 50% of prevalence, we compared PPV and NPV in these 2 scenarios. While PPV was 100% at 50% prevalence and no misclassifications occurred, it was reduced, as expected, at 5% prevalence. Even though the percentages of automatic codes with full information and missing mother's name were still high (99.4% and 92.6%, respectively), these represented 2 false-positive cases in the first instance and 27 in the second. Accuracy for records with patients’ names only was not as good as with the other variables, even when we considered exact matches only (Table 2, last column), since sensitivities were lower than the ones for automatic codes for the other scenarios. Specificity in this case was very low, with a PPV of only 81.2% in the 50% scenario, yielding 66 false-positive cases, and as low as 19.5% in the 5% scenario, reaching 1,175 false-positive cases.

Table 2.

Accuracy of Exact Matches and Automatic Codes When Records in the Test Database Have Full or Partial Information (50% and 5% Prevalence Scenarios), Brazil, 1999–2006

Accuracy	Full Information				No Mother's Name				Name Onlya (Exact Matchb)
	Exact Matchb		Automatic Codesc		Exact Matchb		Automatic Codesc		Name Onlya (Exact Matchb)
	%	95% CI	%	95% CI	%	95% CI	%	95% CI	%	95% CI
Sensitivity	50.8	45.6, 56.0	92.9	88.3, 94.2	71.2	66.3, 75.8	91.8	88.6, 94.4	77.4	72.8, 81.6
Specificity	100.0	99.0, 100.0	100.0	99, 100.0	100.0	99.0, 100.0	100.0	99.0, 100.0	82.1	77.8, 85.8
50% prevalence
PPV	100.0	98.0, 100.0	100.0	98.9, 100.0	100.0	98.6, 100.0	100.0	98.9, 100.0	81.2	76.7, 85.1
NPV	67.0	62.9, 70.9	93.4	90.5, 95.6	77.6	73.6, 81.3	92.5	89.4, 94.9	78.4	74.0, 82.4
5% prevalence
PPV	100.0	98.0, 100.0	99.4	97.9, 99.9	98.9	96.7, 99.8	92.6	89.4, 95.0	19.5	17.5, 21.6
NPV	97.5	97.1, 97.8	99.6	99.5, 99.8	98.5	98.2, 98.8	99.6	99.4, 99.7	98.6	98.3, 98.9

Abbreviations: CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value.

Since only name was available in this case, only exact matches were considered.

Exact match means a perfect match between the available variables in both databases.

The automatic inclusion codes listed in Table 1.

Accuracy of Exact Matches and Automatic Codes When Records in the Test Database Have Full or Partial Information (50% and 5% Prevalence Scenarios), Brazil, 1999–2006 Abbreviations: CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value. Since only name was available in this case, only exact matches were considered. Exact match means a perfect match between the available variables in both databases. The automatic inclusion codes listed in Table 1. With reference to records to be manually checked, 1,189 of those with full information and 4,146 of those with missing mother's name would have to be searched for a prevalence of 50% and 9,351 and 48,333, respectively, would have to be searched for a prevalence of 5%. In the third scenario, with 50% prevalence and 20% of the records missing mother's name, the results obtained were: sensitivity = 96.5% (95% confidence interval (CI): 94.0, 98.1); specificity = 100% (95% CI: 99, 100); PPV = 100% (95% CI: 99.0, 100); and NPV = 96.6% (95% CI: 94.2, 98.2). Manual review was performed on the 1,929 pairs that the algorithm was not able to include or exclude as a true match. Of those, 9 pairs were considered true matches by reviewer 1 and 11 were considered true matches by reviewer 2. The 2 disagreements were submitted to a third reviewer, who considered 1 of them a true match. In a manual review of the automatic codes, all of them were considered true matches. The combination of automatic codes and clerical review yielded high sensitivity and specificity. For this test database, the PPV was 100% and the NPV was 96.6%. Among the 355 patients who were found by the algorithm, 85 (23.9%) did not have HIV- or AIDS-related ICD-10 codes (codes B20–B24) given as the underlying cause of death. Before the algorithm was used, the rate of loss to follow-up in the Rio de Janeiro cohort between 2000 and 2005 was 2.9 per 100 person-years; it dropped to 2.1 per 100 person-years after recovery from the mortality system (difference = −0.8, 95% CI: −1.1, −0.6). In the same period, the mortalxity rate increased with inclusion of deaths from the mortality system, from 2.2 per 100 person-years to 3.2 per 100 person-years (difference = 1.0, 95% CI: 0.7, 1.3). For the THRio Study cohort, the death rate in 2006 before the use of the algorithm was 1.2 per 100 person-years; it increased to 4.2 per 100 person-years after deaths were recovered for all patients in the cohort, using automatic codes only, without manual review (difference = 3.0, 95% CI: 2.7, 3.3).

DISCUSSION

The deterministic algorithm validated in the present study was developed primarily to assist investigators actively following cohorts of HIV-infected persons to improve their performance by searching for patients lost to follow-up in mortality databases. The performance characteristics of the algorithm were excellent, with a sensitivity of over 90% for automatic codes, either in the 5% prevalence scenario or in the 50% prevalence scenario, which was minimally affected when mother's name was not available. These figures were well over the sensitivity for exact matches of 50% and 71% when full information was available and the mother's name was missing, respectively. Specificity was close to 100% for all cases, meaning that not a single pair was misclassified as a false-positive, but when we considered records with patients’ names only, even for exact matches specificity was unacceptably low (approximately 82%). These results are in agreement with those of the study by Quantin et al. (24), who found that date of birth and first and last patient's name would have sufficient discriminatory power, even though their study was carried out using a probabilistic approach and they did not test mother's name as one of the variables. In the 5% scenario, the PPV remained close to 100% in all situations; there were 2 false-positives in the full information data set and an excess of 27 when mother's name was missing. This indicates that even though false-positives are very unlikely (PPV = 98.9% and PPV = 92.5%, respectively), caution must be taken when designating a patient deceased. In the third scenario, with 50% prevalence and with 20% of the records missing mother's name, clerical review increased sensitivity to over 96%, while preserving 100% specificity. Although sensitivity was high, it was still impossible to find 13 patients reported as deceased in medical charts. There are 2 possible explanations for this finding: 1) these patients indeed were not included in the mortality database or 2) these patients were in the database but the algorithm was not able to find their records. In the former case, if the patient was in fact still alive, he or she was truly lost to follow-up. On the other hand, if the patient was indeed deceased, either the event was not detected by the system or the patient had moved out of the state and died elsewhere. In the case of patients who were in the mortality database, the main reasons for not finding a record were major spelling errors—especially for the first letter, which is very sensitive to Soundex-like phonetic algorithms, but also deletion of the last name in the case of persons with 4 or more names—and incorrectly entered dates of birth. The number of records left for manual review suggests that the best option is to preselect records to be linked to the databases in order to increase the number of patients who could be found, decreasing clerical review. Even though probabilistic algorithms have been extensively studied and there is at least 1 algorithm validated for Brazilian databases (25, 26), we chose to employ a deterministic approach, allowing for some uncertainty with regard to the variables used. This decision was based on the fact that even though probabilistic algorithms tend to yield higher sensitivities, they do so by sacrificing specificity—which is not a major problem when studying population-based characteristics, given that false-positives and false-negatives would tend to cancel out (7, 27). Conversely, for inferring the vital status of individual patients being followed in a cohort, this approach is not advisable and deterministic algorithms are more indicated (7), given that ethical problems may emerge when a patient is declared deceased and he or she shows up for a subsequent visit. In either case, caution should always be exercised, since bias due to false-positives and false-negatives would lead to over- or underestimation of the parameters being studied, although the impact of false-positives on overestimation tends to be more severe than the impact of false-negatives on underestimation (27). One of the reasons cohorts with active tracing of patients might suffer from loss to follow-up is that information on deaths which occur in other health-care facilities might be outside the area of the clinic where routine care was provided, especially if the cause of death was not related to AIDS. In cohorts of patients who are intrinsically highly prone to morbidity and/or mortality, as with HIV-infected patients, this can be particularly problematic. For example, in a study involving 6,498 patients being followed in 18 treatment programs in lower-income countries, the estimated death rate 1 year after initiation of antiretroviral therapy would increase from 6.4 per 100 person-years to 15 per 100 person-years if mortality among those lost to follow-up was similar to that observed in patients without antiretroviral therapy (20). In another report on cohorts in sub-Saharan countries, 41% of patients had an unknown vital status on medical charts, and 65% of those, initially considered lost to follow-up, were found to be deceased after appropriate vital status investigation procedures were applied (28). A practical application of the algorithm showed very good results for both cohorts. In the THRio Study cohort, applying the algorithm for all patients with the 2006 mortality database, there was a significant increase in the death rate for that year, even using automatic codes only. For the Rio de Janeiro cohort, the impact of the algorithm on patients lost to follow-up was significant both in reducing losses to follow-up and in increasing the death rate for the period 2000–2005. Finally, the fact that almost 24% of the death certificates of cases that were found through our algorithm did not have AIDS-related ICD-10 codes (codes B20–B24) mentioned on them underscores the need to search in all-cause mortality databases and not to restrict searches to HIV/AIDS deaths. On the other hand, this finding suggests that official HIV/AIDS mortality figures that are based solely on the mortality system might significantly underestimate the true figures, a possibility that should be formally evaluated. This might lead to adjustments in mortality statistics in Brazil.

21 in total

1. Association of cancer with AIDS-related immunosuppression in adults.

Authors: M Frisch; R J Biggar; E A Engels; J J Goedert
Journal: JAMA Date: 2001-04-04 Impact factor: 56.272

2. An empirical comparison of record linkage procedures.

Authors: Shanti Gomatam; Randy Carter; Mario Ariet; Glenn Mitchell
Journal: Stat Med Date: 2002-05-30 Impact factor: 2.373

3. Automatic linkage of vital records.

Authors: H B NEWCOMBE; J M KENNEDY; S J AXFORD; A P JAMES
Journal: Science Date: 1959-10-16 Impact factor: 47.728

4. Practical introduction to record linkage for injury research.

Authors: D E Clark
Journal: Inj Prev Date: 2004-06 Impact factor: 2.399

Review 5. Use of computerized record linkage in cohort studies.

Authors: G R Howe
Journal: Epidemiol Rev Date: 1998 Impact factor: 6.222

6. Record linkage strategies, outpatient procedures, and administrative data.

Authors: L L Roos; R Walld; A Wajda; R Bond; K Hartford
Journal: Med Care Date: 1996-06 Impact factor: 2.983

7. Effects of record linkage errors on registry-based follow-up studies.

Authors: H Brenner; I Schmidtmann; C Stegmaier
Journal: Stat Med Date: 1997-12-15 Impact factor: 2.373

8. Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry.

Authors: D E Clark; D R Hahn
Journal: Proc Annu Symp Comput Appl Med Care Date: 1995

9. Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study. The West of Scotland Coronary Prevention Study Group.

Authors:
Journal: J Clin Epidemiol Date: 1995-12 Impact factor: 6.437

10. Predicting CD4 counts in HIV-infected Brazilian individuals: a model based on the World Health Organization staging system.

Authors: M Schechter; R Zajdenverg; L L Machado; M E Pinto; L A Lima; M A Perez
Journal: J Acquir Immune Defic Syndr (1988) Date: 1994-02

34 in total

1. Estimating the extent of underreporting of mortality among HIV-infected individuals in Rio de Janeiro, Brazil.

Authors: Antonio G Pacheco; Valeria Saraceni; Suely H Tuboi; Lilian M Lauria; Lawrence H Moulton; José Cláudio Faulhaber; Bonnie King; Jonathan E Golub; Betina Durovni; Solange Cavalcante; Lee H Harrison; Richard E Chaisson; Mauro Schechter
Journal: AIDS Res Hum Retroviruses Date: 2010-10-07 Impact factor: 2.205

2. Retention and loss to follow-up in antiretroviral treatment programmes in southeast Nigeria.

Authors: C A Onoka; B S Uzochukwu; O E Onwujekwe; C Chukwuka; J Ilozumba; C Onyedum; E A Nwobi; C Onwasigwe
Journal: Pathog Glob Health Date: 2012-03 Impact factor: 2.894

3. Evaluating the impact of PCV-10 on invasive pneumococcal disease in Brazil: A time-series analysis.

Authors: Ana Lucia Andrade; Ruth Minamisava; Gabriela Policena; Elier B Cristo; Carla Magda S Domingues; Maria Cristina de Cunto Brandileone; Samanta Cristine Grassi Almeida; Cristiana Maria Toscano; Ana Luiza Bierrenbach
Journal: Hum Vaccin Immunother Date: 2016 Impact factor: 3.452

4. Increased risk of death and readmission after hospital discharge of critically ill patients in a developing country: a retrospective multicenter cohort study.

Authors: Vanessa Chaves Barreto Ferreira de Lima; Ana Luiza Bierrenbach; Gizelton Pereira Alencar; Ana Lucia Andrade; Luciano Cesar Pontes Azevedo
Journal: Intensive Care Med Date: 2018-07-12 Impact factor: 17.440

5. Prevalent tuberculosis at HIV diagnosis in Rio de Janeiro, Brazil: the TB/HIV in Rio (THRio) Cohort.

Authors: Valeria Saraceni; Silvia Cohn; Solange C Cavalcante; Antonio G F Pacheco; Lawrence H Moulton; Richard E Chaisson; Betina Durovni; Jonathan E Golub
Journal: J Acquir Immune Defic Syndr Date: 2014-09-01 Impact factor: 3.731

6. Poor retention in early care increases risk of mortality in a Brazilian HIV-infected clinical cohort.

Authors: Daniel S Teixeira da Silva; Paula M Luz; Jordan E Lake; Sandra W Cardoso; Sayonara Ribeiro; Ronaldo I Moreira; Jesse L Clark; Valdilea G Veloso; Beatriz Grinsztejn; Raquel B De Boni
Journal: AIDS Care Date: 2016-07-27

7. Accuracy of a probabilistic record-linkage methodology used to track blood donors in the Mortality Information System database.

Authors: Ligia Capuani; Ana Luiza Bierrenbach; Fatima Abreu; Pedro Losco Takecian; João Eduardo Ferreira; Ester Cerdeira Sabino
Journal: Cad Saude Publica Date: 2014-08 Impact factor: 1.632

Review 8. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States.

Authors: Shuang Wang; Xiaoqian Jiang; Siddharth Singh; Rebecca Marmor; Luca Bonomi; Dov Fox; Michelle Dow; Lucila Ohno-Machado
Journal: Ann N Y Acad Sci Date: 2016-09-28 Impact factor: 5.691

9. Temporal changes in causes of death among HIV-infected patients in the HAART era in Rio de Janeiro, Brazil.

Authors: Antonio G Pacheco; Suely H Tuboi; Silvia B May; Luiz F S Moreira; Luciana Ramadas; Estevão P Nunes; Mônica Merçon; José C Faulhaber; Lee H Harrison; Mauro Schechter
Journal: J Acquir Immune Defic Syndr Date: 2009-08-15 Impact factor: 3.731

10. Sickle cell disease incidence among newborns in New York State by maternal race/ethnicity and nativity.

Authors: Ying Wang; Joseph Kennedy; Michele Caggana; Regina Zimmerman; Sanil Thomas; John Berninger; Katharine Harris; Nancy S Green; Suzette Oyeku; Mary Hulihan; Althea M Grant; Scott D Grosse
Journal: Genet Med Date: 2012-09-27 Impact factor: 8.822