Literature DB >> 29024978

Incidence rate estimation, periodic testing and the limitations of the mid-point imputation approach.

Alain Vandormael^1,2, Adrian Dobra³, Till Bärnighausen^1,4,5,6, Tulio de Oliveira^2,7, Frank Tanser^1,6,7,8.

Abstract

Background: It is common to use the mid-point between the latest-negative and earliest-positive test dates as the date of the infection event. However, the accuracy of the mid-point method has yet to be systematically quantified for incidence studies once participants start to miss their scheduled test dates.
Methods: We used a simulation-based approach to generate an infectious disease epidemic for an incidence cohort with a high (80-100%), moderate (60-79.9%), low (40-59.9%) and poor (30-39.9%) testing rate. Next, we imputed a mid-point and random-point value between the participant's latest-negative and earliest-positive test dates. We then compared the incidence rate derived from these imputed values with the true incidence rate generated from the simulation model.
Results: The mid-point incidence rate estimates erroneously declined towards the end of the observation period once the testing rate dropped below 80%. This decline was in error of approximately 9%, 27% and 41% for a moderate, low and poor testing rate, respectively. The random-point method did not introduce any systematic bias in the incidence rate estimate, even for testing rates as low as 30%. Conclusions: The mid-point assumption of the infection date is unjustified and should not be used to calculate the incidence rate once participants start to miss the scheduled test dates. Under these conditions, we show an artefactual decline in the incidence rate towards the end of the observation period. Alternatively, the single random-point method is straightforward to implement and produces estimates very close to the true incidence rate.

Entities: CellLine Disease Species

Mesh：

Year: 2018 PMID： 29024978 PMCID： PMC5837439 DOI： 10.1093/ije/dyx134

Source DB: PubMed Journal: Int J Epidemiol ISSN： 0300-5771 Impact factor: 7.196

Key Messages

Recent evidence suggests that the mid-point of the latest-negative and earliest-positive test dates—the censoring interval—can be used to infer the timing of the infection event. Using a simulation-based approach, we show that the infection date does not occur at the mid-point of the censored interval once participants start to miss their scheduled test dates. Under these circumstances, the mid-point method may lead epidemiologists to falsely conclude that the incidence rate is declining toward the end of the observation period. Imputation of a random infection date within the censored interval, based on a Monte Carlo approach, is straightforward to implement and produces estimates very close to the true incidence rate.

Background

The incidence rate is a fundamental concept in infectious disease epidemiology. It is used to measure the frequency at which new infection events occur per unit of person-time. An important task for any incidence study is to precisely identify the timing of a new infection event. But this is difficult to do because we cannot, at least in most situations, test participants on a daily basis. Instead, the current ‘gold-standard’ approach is to schedule test dates at fixed time intervals, say on a weekly, monthly or yearly basis. In this case, we can only infer that the infection event occurred at some time-point between the latest-negative and earliest-positive test dates. The use of periodic testing to identify the infection event gives rise to the standard interval censoring problem. Even if participants present at all of their scheduled test dates, we would still not know the exact amount of person-time that has been contributed since the start of the study period. The standard interval censoring problem therefore reflects an enumeration uncertainty in the denominator of the incidence rate measure. It is intuitive that our uncertainty of the infection date will be proportional to the length of the testing interval. This uncertainty, by inference, will increase once participants start to miss their scheduled test dates. In sub-Saharan Africa, for example, reasons for missed HIV test dates have been associated with work commitments, illness, transportation costs, frequent migration and the fear of stigma or discrimination, among many others. Irregular testing means that participants will have some probability of missing a test date that is contiguous to the interval containing the true (but unobserved) infection date. In other words, missed test dates are likely to extend the width of the censoring interval across one or more fixed testing intervals. This scenario, which we describe as extended interval censoring, means that we cannot definitively identify the testing interval in which the infection event truly occurs. Extended interval censoring therefore reflects an enumeration uncertainty in both the denominator and numerator of the incidence rate measure, which we illustrate with a straightforward example in Figure 1.

Figure 1

An example of Standard (Panel A) and Extended (Panel B) interval censoring. In Panel A, the participant is successfully tested at each scheduled test date, represented by the solid circles. We know that the infection event occurs somewhere between the latest-negative (L) and earliest-positive (R) test date. But we do not know the exact amount of person-time that should be contributed to the denominator of the incidence rate measure for the last time interval. In Panel B, the participant misses two scheduled test dates, as represented by the hollow circles. This makes it difficult to determine if the true infection event occurs in the 3rd or 4th or 5th time interval. In this case, there is an enumeration uncertainty in both the denominator and numerator of the incidence rate measure for each of these time intervals. In recent years, a number of advanced and sophisticated methods have been designed to address the interval censoring problem. However, there is no clear guidance on how these interval censoring methods can be used to estimate the incidence rate, and in which situations they should be applied. In practice, epidemiologists are likely to treat the infection date as a missing data point for which more familiar imputation methods are available. One popular ad hoc approach, which is the focus of this study, is to impute the infection date at the mid-point of the participant’s censored interval. There is some evidence that the mid-point method can give a reasonable approximation of the incidence rate if the standard interval censoring assumption is satisfied. However, to the best of our knowledge, the performance of the mid-point method has not been systematically evaluated for incidence studies once participants start to miss their scheduled test dates. To learn more about the mid-point method, we used a simulation-based approach to generate an infectious disease epidemic for an incidence cohort with a high (80–100%), moderate (60–79.9%), low (40–59.9%) and poor (30–39.9%) testing rate. Our work has implications for infectious diseases studies that use the mid-point method to address the interval censoring problem.

Methods

Study design

This study is motivated by the low and irregular testing rate that we have observed in one of sub-Saharan Africa’s largest HIV seroconverter cohorts., Despite annual household visits by trained field-workers, an average censored interval length of 3.2 years has made it difficult to infer the timing of the HIV infection event. For this reason, we use the case of missed HIV test dates to systematically investigate the limitations of the mid-point method for incidence rate estimation. To do this, we used an epidemic model to generate HIV infection events for an incidence cohort (in either an open or closed system) with a fixed number of scheduled test dates. We then varied the rate at which participants missed their scheduled test dates and imputed a mid-point and a random-point value within each participant’s censored interval. With this approach, we could then compare the incidence rate derived from these imputed values with the true incidence rate generated from the epidemic model.

Incidence cohort

Consider a cohort of i = 1, … , N study participants who are enrolled into a longitudinal survey or a randomized controlled trial. In the former study design, a single cohort of participants are followed over time; in the latter study design, participants are randomized to either a treatment or control cohort and followed over time. Let j denote the j = 1, … J intervals between the scheduled test dates for the observation period. For both study designs, participants must be HIV-uninfected when they enter into the study, so that their survival times start at the beginning of the first interval for a closed cohort or at the beginning of their entry interval for an open cohort. Survival time stops at the earliest HIV-positive date or at the end of the observation period if they remain HIV-negative. The test date could occur on any day within the testing interval. For this analysis, we scaled j on the unit interval [0, 1] so that the length of the testing interval was invariant to the unit of calendar time (i.e. month, half-year or year, etc.) between the scheduled test dates.

Epidemic model

We used a Susceptible-Infected-Recovered (SIR) model to generate the exact infection dates, denoted by T, over the J intervals of the observation period. The system of differential equations for the SIR model is given as: which represents the rate at which participants transition from a susceptible (S) to an infected (I) to a recovered (R) compartment. Known as the force of infection, λ is given by , where β is the probability of HIV transmission per contact, c is the rate of contact and N is the population size for the jth interval. The SIR model also includes a parameter b, which is the entry rate for participants into the study, where b = 0 for a closed cohort, and the parameter v, which is the recovery rate for infected participants. We used realistic parameter values for the SIR model, based on earlier HIV studies that have been undertaken in the sub-Saharan Africa context. To this extent, we varied c within the range of 50 to 120 sexual acts per year based on data collected from serodiscordant couples across eastern and southern African sites. Previous research has shown considerable heterogeneity in the probability of HIV transmission per sexual contact, largely due to factors associated with the viral load level, genital ulcer disease, stage of HIV progression, condom use, circumcision and use of antiretroviral therapy (ART). Following a systematic review of this topic by Boily et al., we selected values for β within the range of 0.003–0.008. Further, we based the recovery rate (v) on the potential for ART to reduce the virologic suppression level of the infected population. The concentration of HIV RNA in the blood or genital tract is highly correlated with the onward sexual transmission of the virus., Here, we chose values for v within the range of 0.15–0.35, which are slightly conservative, but supported by population-based estimates from the sub-Saharan African context., For the longitudinal survey, we selected parameter values to generate a truly stable, increasing and decreasing incidence rate across 5, 10 and 15 testing intervals. For the randomized controlled trial, we selected an intervention efficacy E to reduce the HIV transmission rate for the treatment cohort when compared with the control cohort. We used the EpiModel package of Jenness et al. to implement the SIR model and performed all remaining calculations with R software (version 3.3.3). Further details of the SIR model and the parameter values are provided in Section 1.1 of the Supplementary Data, available as Supplementary Data at IJE online.

Standard and extended interval censoring

Our next task was to simulate a testing rate over the observation period. For this analysis, we considered a successful HIV test date to be an independent random variable with a Bernoulli distribution. We denoted this random variable by H and the probability of a successful test date by Pr(H = 1) = p for (0 ≤ p ≤ 1). Using this definition, we could then vary the testing rate for the incidence cohort by selecting a value for p. For standard interval censoring, we set p = 1.0 to ensure that all participants would be successfully tested at each of their scheduled dates. For extended interval censoring, we set p < 1.0 so that some participants would miss one or more of their scheduled test dates. As an example, a probability p = 0.6 means that participants would be successfully tested at their scheduled dates 60% of the time. We considered a high testing rate to range from 80% to 100%, a moderate testing rate to range from 60% to 79.9%, a low testing rate to range from 40% to 59.9% and a poor testing rate to range from 30% to 39.9%. Due to periodic testing, the infection event is known only to occur within the censored interval. For both standard and extended forms of interval censoring, the censored interval has non-zero length and bounds the infection date so that L < T < R, where L and R are observable random variables that denote the latest-negative and earliest-positive test dates of the ith participant. For each participant, we obtained the censoring dates with L = max(H:H < T) and R = min(H:H ≥ T). Apart from the observed L and R test dates, the censored interval does not provide any extra information on the timing of the participant’s infection event.

Imputation of the infection dates

For the mid-point approach, we imputed an infection date for the ith participant using = (L + R)/2. Alternatively, the mid-point can be obtained by sampling dates with replacement from the set , where ; and then taking the average of these dates, denoted by . To show this, let the probability density function of a uniform distribution be with mean . According to the Law of Large Numbers, the sample mean of random variables converges to in probability as K increases in size, where . For the single random-point approach, we set and sampled a value from a uniform distribution bounded by [.

Calculating the incidence rate

We used the infection dates (T) generated by the SIR model to calculate the true incidence rate, denoted by . Using the standard formula, is the number of new infection events (E) divided by the person-time (PT) contributed for the jth interval. Thus, where if T occurs within the jth interval (otherwise ). We express as a rate per 100 person-units, since j is scaled on the unit interval [0, 1]. Equation (2) can also be described as an instantaneous incidence rate because it is calculated at fixed time points over the observation period. The numerator of Equation (2) makes it clear that the infection events are being counted over the length of the jth testing interval. In some instances, the length of j will be less than the length of the aggregating interval: e.g. when test dates are scheduled on a monthly basis but the infection events are counted over 1-year intervals. When calculating this instantaneous measure, we assumed that the length of the testing interval j was always equal to the length of the aggregating interval. We also calculated the cumulative incidence rate from the start of the observation period to the end of the jth interval, changing the notation slightly so that . Boily et al. have shown that the cumulative incidence rate ratio (CIRR) is a more appropriate measure for evaluating the intervention efficacy of a randomized controlled trial. We calculated the CIRR by dividing the cumulative incidence rate of the treatment cohort by the cumulative incidence rate of the control cohort, so that . For the cumulative incidence rate, we note that the length of the aggregating interval [1, j] will always be greater than the length of the testing interval for j > 1. We estimated the incidence rate after imputing an infection date within each participant’s censored interval. Because the testing rate is a function of a stochastic process (i.e. H has a Bernoulli distribution), it was necessary to obtain more than one incidence rate estimate in order to quantify the uncertainty introduced by our simulation-based approach. Let denote the estimated incidence rate for the jth time interval. To calculate , we right censored the data at the imputed values and indexed the resulting dataset with (d). We then obtained for datasets using the standard formula, so that . For this analysis, we set D = 1000.

Measures of accuracy

To evaluate the accuracy of the mid-point and single random-point methods, we calculated the deviation between the estimated and true incidence rate for the jth interval. We used two principal measures for this purpose: the bias or error, which is given by , and the mean-square error, . Using these two measures, we also calculated the mean absolute percentage error as and the root mean-square deviation as R. The mean percentage error (MPE) and RMSD give a single measure of accuracy for each imputation method over the entire observation period.

Real-world example

To empirically demonstrate the performance of the mid-point and random-point methods, we used data from a population-based HIV surveillance programme based in the northern KwaZulu-Natal province of South Africa. Since 2004, trained field-workers have visited over 10 000 households annually and repeatedly tested 17 400 adults (>15 years of age) for HIV antibodies. We calculated the annual HIV incidence rate for this cohort using the methodology described above.

Results

We observed that the mid-point method did not give accurate incidence rate estimates once the testing rate dropped below 80%. The poor performance of the mid-point method can be clearly seen in Figure 2, which shows the results for a longitudinal survey with an open cohort of size N = 1000. Here, the mid-point imputed incidence rate artefactually increases in the early stages, and then artefactually decreases in the later stages, of the observation period once participants start to miss their scheduled test dates. We report similar mid-point incidence rate results for sample sizes >500 participants, for both open and closed cohorts, and for 10 and 15 scheduled test dates (shown in Supplementary Figures 1 and 2, available as Supplementary Data at IJE online).

Figure 2

Compares the performance of the mid-point method (left column) against the single-random point method (right column) for a longitudinal survey with 5 testing intervals. The solid line is the true incidence rate and the non-solid lines represent the estimated incidence rates for a high (80–100%), moderate (60–79.9%), low (40–59.9%), and poor (30–39.9%) testing rate. We show that the mid-point incidence rate artefactually increases in the early stages, and then decreases in the later stages, of the observation period once the testing rate drops below 80%. Details of the epidemic models are discussed in Section 1.1 of the Supplement. Compares the performance of the mid-point method (left column) against the single-random point method (right column) for a randomized controlled trial with 5 scheduled test dates. The solid line is the true cumulative incidence rate ratio (CIRR) and non-solid lines are the estimated CIRRs for a high (80–100%), moderate (60–79.9%), low (40–59.9%), and poor (30–39.9%) testing rate. No treatment effect is represented by a CIRR = 1. We show that the mid-point method significantly overestimates the treatment effect at the beginning of the observation period, although deviations from the true CIRR are attenuated at the last scheduled test date. Details of the epidemic models are discussed in Section 1.1 of the Supplement. Table 1 shows the percentage errors for both imputation methods when compared with the truly stable incidence rate presented in Row 1 of Figure 2. For example, in the fifth testing interval, the decline in the mid-point estimate is in error of 9.05%, 27.07% and 40.63% for a moderate, low and poor testing rate, respectively (see Row 5 of the upper panel in Table 1). Table 2 shows the MPE results for the incidence rate estimates presented in Figure 2. For example, the mid-point MPE is in the range of 23.28–38.11% for a low and poor testing rate, when compared with a range of 1.60–4.42% for the single random-point method (see Rows 3 and 4 of Table 2). The MPE results for 10 and 15 scheduled test dates are presented in Supplementary Table 1, available as Supplementary Data at IJE online; see also Supplementary Table 2, available as Supplementary Data at IJE online, for the RMSD results.

Table 1

Shows the percentage bias results for the mid-point (MP) and single random-point (SRP) methods

	Testing rate
	High (80–100%)		Moderate (60–79.9%)		Low (40–59.9%)		Poor (30–39.9%)
	MP	SRP	MP	SRP	MP	SRP	MP	SRP
Longitudinal survey
1	–2.95	–0.21	–20.51	–1.02	–36.77	–2.46	–45.07	–2.52
2	0.81	0.38	11.81	1.19	21.32	1.35	26.5	0.95
3	0.29	0.00	9.68	0.98	30.88	2.31	47.93	2.64
4	0.70	0.11	4.53	–0.84	0.37	–1.58	–7.88	–1.42
5	0.84	–0.14	–9.05	–0.49	–27.07	–0.29	–40.63	–1.75
Randomized controlled trial
1	–3.60	–0.96	–21.97	–6.65	–38.99	–10.81	–45.97	–12.55
2	–0.82	–0.23	–3.46	–1.89	–6.80	–3.00	–9.11	–4.31
3	–0.29	0.00	1.46	–0.18	6.09	0.03	8.77	–0.75
4	–0.02	0.14	2.01	0.56	4.77	1.29	5.21	0.90
5	0.33	0.33	1.73	1.73	3.20	3.20	3.11	3.11

The upper panel results correspond with the incidence rates presented in Row 1 of Figure 2. We do not include the remaining results from Figure 2 due to limitations of space. The lower panel results correspond with the CIRRs presented in Figure 3. Overall, the MP method gives a higher percentage bias for lower testing rates when compared with the SRP method.

Table 2

Mean percentage bias results for the mid-point (MP) and single random-point (SRP) methods

	Longitudinal survey						RCT
	Stable		Increasing		Decreasing		Cumulative
	Incidence		Incidence		Incidence		Incidence
	Rate		Rate		Rate		Rate Ratio
Testing Rate	MP	SRP	MP	SRP	MP	SRP	MP	SRP
High	1.12	0.17	1.21	0.40	1.54	0.32	1.01	0.33
Moderate	11.12	0.90	11.42	0.81	12.31	1.65	6.13	2.20
Low	23.28	1.60	24.13	2.2	26.56	3.57	11.97	3.67
Poor	33.6	1.86	33.12	1.93	38.11	4.42	14.43	4.33

Shows the mean percentage bias results for the mid-point (MP) and single random-point (SRP) methods. Results correspond with the estimates presented in Figures 2 and 3 (for five scheduled test dates). We show that the MP method introduces a greater degree of bias into the incidence rate estimates once participants start to miss their scheduled test dates.

Shows the percentage bias results for the mid-point (MP) and single random-point (SRP) methods The upper panel results correspond with the incidence rates presented in Row 1 of Figure 2. We do not include the remaining results from Figure 2 due to limitations of space. The lower panel results correspond with the CIRRs presented in Figure 3. Overall, the MP method gives a higher percentage bias for lower testing rates when compared with the SRP method.

Figure 3

Compares the performance of the mid-point method (left column) against the single-random point method (right column) for a randomized controlled trial with 5 scheduled test dates. The solid line is the true cumulative incidence rate ratio (CIRR) and non-solid lines are the estimated CIRRs for a high (80–100%), moderate (60–79.9%), low (40–59.9%), and poor (30–39.9%) testing rate. No treatment effect is represented by a CIRR = 1. We show that the mid-point method significantly overestimates the treatment effect at the beginning of the observation period, although deviations from the true CIRR are attenuated at the last scheduled test date. Details of the epidemic models are discussed in Section 1.1 of the Supplement.

Mean percentage bias results for the mid-point (MP) and single random-point (SRP) methods Shows the mean percentage bias results for the mid-point (MP) and single random-point (SRP) methods. Results correspond with the estimates presented in Figures 2 and 3 (for five scheduled test dates). We show that the MP method introduces a greater degree of bias into the incidence rate estimates once participants start to miss their scheduled test dates. Figure 3 shows the CIRRs for a randomized controlled trial in which N = 2000 participants were assigned to either a control or treatment cohort. The mid-point method significantly overestimates the efficacy of the treatment intervention in the early stages of the observation period. For example, the attributed efficacy is in error of 45.97% and 38.99% in the first of five scheduled test dates under a poor and low testing rate, respectively (see Row 1 of the lower panel in Table 1). However, the mid-point estimates converged to the true incidence rate at the end of the observation period. Overall, the MPE for the mid-point method is in the range of 1.01–14.43% when compared with a range of 0.33–4.33% for the single random-point method (see Columns 7 and 8 of Table 2). We show, in Figure 4, the results for the two imputation methods using data from our population-based HIV surveillance programme. The estimates from the mid-point method are consistent with our simulation results. We see an increase and then a decrease in the HIV incidence rate at the beginning and end of the observation period, respectively. These annual estimates can be compared with the random-point method, which suggests that the HIV incidence rate has been relatively stable over the 2004–15 period.

Figure 4

Compares the HIV incidence rates for the mid-point method (left) and single randompoint method (right) using data from a population-based HIV surveillance program (N ∼ 17 400) in the KwaZulu-Natal province of South Africa. The dramatic difference in the estimates is due to a wide censoring interval (on average 3.2 years), which exposes the limitations of the mid-point method. This is because the mid-point method concentrates the imputed infection events at the middle of the observation period once participants start to miss their scheduled test dates. In this case, we would falsely conclude that the incidence rate rapidly increased in the beginning and then sharply decreased toward the end of the observation period. As our simulation results demonstrate, the single-random point is a far more accurate method for incidence rate estimation, which shows that the HIV incidence rate in our study population has been relatively stable over the last 10 years.

Discussion

Our results show that the infection event does not occur at the mid-point of the censored interval once participants start to miss their scheduled test dates. Under these conditions, the mid-point method gives systematically biased incidence rate estimates. Importantly, we found that the instantaneous incidence rate artefactual increased in the early stages, and then artefactually decreased in the later stages, of the observation period. This pattern became more extreme as we systematically increased the probability of missing a scheduled test date, e.g. the decline in the incidence rate was in error of 9%, 27% and 41% for a moderate (60–79.9%), low (40–59.9%) and poor (30–39.9%) testing rate, respectively, in the later stages of the observation period. We observed this trend irrespective of a truly stable, increasing or decreasing incidence rate, for a closed and open cohort, for a range of sample sizes and for a different number of scheduled test dates. An important limitation of the mid-point method is that it clusters the imputed the infection events at the middle of the observation period. This is because there are more left (latest-negative) and right (earliest-positive) test date combinations that give a mid-point in the middle interval of the observation period than all other combinations for the remaining testing intervals. We provide a simple and intuitive example of this mid-point behaviour in Section 2.1 of the Supplementary Data, available as Supplementary Data at IJE online. A better approach, based on the Monte Carlo methodology, would be to impute a single random infection date within the participant’s censored interval, obtain an estimate from the resulting dataset, repeat this procedure several times and then take the average of the estimates for each interval. We show in this paper that the single random-point method approach makes less restrictive assumptions about the infection date when compared with the mid-point method (even for testing rates as low as 30%). A number of advanced interval censoring methods have been developed within a survival analysis and Cox proportional hazards framework.,,,,,, Some of these interval censoring methods can be found in statistical software programs such as SAS, Stata and R., But these programs do not directly or intuitively estimate an incidence rate for continuous or discrete time periods as far as we can tell. We do acknowledge an approach by Hsu et al., who used the auxiliary information of participants to identify a set of nearest neighbours and then imputed multiple HIV infection times from a non-parametric distribution based on this neighbourhood. Importantly, their method produced more accurate survival rates and hazard ratios when compared with the single random-point method. However, the authors did not directly extend their approach to estimate the incidence rate over time. Their method could be adapted for such a purpose; however, a potential improvement in accuracy would have to be traded for the convenience of the single random-point method. We comment on the findings of Skar et al., who concluded that mid-point dating is a valid approach for population-based HIV incidence studies with regular testing intervals. Here, they are describing the performance of the mid-point method under the standard interval censoring assumption. But missed test dates are an unavoidable consequence of the periodic testing for an infectious disease. The surprising finding of our analysis is that participants need to be tested more than 80% of the time to produce accurate mid-point incidence rate estimates. If a high testing rate cannot be achieved, then we discourage use of the mid-point method for incidence rate estimation. Indeed, this method would lead us to falsely conclude that the HIV incidence rate in our study area has been dramatically declining over the last 3 years (as shown Figure 4). In contrast, results from the random-point method suggest a stable incidence rate over time, which are confirmed by the findings of an external phylodynamic analysis using HIV sequence data from the same incidence cohort. In conclusion, if an ad hoc imputation method is to be considered, then the single random-point method, as described in this paper, is straightforward to implement and produces estimates close enough to the true incidence rate.

Supplementary Data

Supplementary data are available at IJE online.

Funding

A.V., T.D. and F.T. were supported by the South African Medical Research Council (SA MRC) Flagship grant (MRC-RFA-UFSP-01–2013/UKZN HIVEPI). F.T. was supported by two National Institute of Health (NIH) grants (R01HD084233 and R01AI124389) as well as a UK Academy of Medical Sciences Newton Advanced Fellowship (NA150161). T.B. was supported by the Alexander von Humboldt Foundation through the endowed Alexander von Humboldt Professorship funded by the German Federal Ministry of Education and Research, as well as by the Wellcome Trust, the European Commission, the Clinton Health Access Initiative and the National Institutes of Health’s Fogarty International Center (D43-TW009775). Funding for the Africa Health Research Institute’s Demographic Surveillance Information System and Population-based HIV Survey was received from the Wellcome Trust. The funders had no role in the design of the study, data analysis, interpretation of results or writing of the manuscript. Conflict of interest: The authors have no conflicts of interest to declare. Click here for additional data file.

53 in total

1. Probability of HIV-1 transmission per coital act in monogamous, heterosexual, HIV-1-discordant couples in Rakai, Uganda.

Authors: R H Gray; M J Wawer; R Brookmeyer; N K Sewankambo; D Serwadda; F Wabwire-Mangen; T Lutalo; X Li; T vanCott; T C Quinn
Journal: Lancet Date: 2001-04-14 Impact factor: 79.321

2. A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies.

Authors: W Pan
Journal: Biometrics Date: 2001-12 Impact factor: 2.571

3. Analysis of failure time data with dependent interval censoring.

Authors: Dianne M Finkelstein; William B Goggins; David A Schoenfeld
Journal: Biometrics Date: 2002-06 Impact factor: 2.571

4. Maximum likelihood estimation for interval-censored data using a Weibull-based accelerated failure time model.

Authors: P M Odell; K M Anderson; R B D'Agostino
Journal: Biometrics Date: 1992-09 Impact factor: 2.571

5. A Markov chain Monte Carlo EM algorithm for analyzing interval-censored data under the Cox proportional hazards model.

Authors: W B Goggins; D M Finkelstein; D A Schoenfeld; A M Zaslavsky
Journal: Biometrics Date: 1998-12 Impact factor: 2.571

6. Prevention of HIV-1 infection with early antiretroviral therapy.

Authors: Myron S Cohen; Ying Q Chen; Marybeth McCauley; Theresa Gamble; Mina C Hosseinipour; Nagalingeswaran Kumarasamy; James G Hakim; Johnstone Kumwenda; Beatriz Grinsztejn; Jose H S Pilotto; Sheela V Godbole; Sanjay Mehendale; Suwat Chariyalertsak; Breno R Santos; Kenneth H Mayer; Irving F Hoffman; Susan H Eshleman; Estelle Piwowar-Manning; Lei Wang; Joseph Makhema; Lisa A Mills; Guy de Bruyn; Ian Sanne; Joseph Eron; Joel Gallant; Diane Havlir; Susan Swindells; Heather Ribaudo; Vanessa Elharrar; David Burns; Taha E Taha; Karin Nielsen-Saines; David Celentano; Max Essex; Thomas R Fleming
Journal: N Engl J Med Date: 2011-07-18 Impact factor: 91.245

Review 7. Interval censoring.

Authors: Zhigang Zhang; Jianguo Sun
Journal: Stat Methods Med Res Date: 2009-08-04 Impact factor: 3.021

8. Effectiveness of an integrated intimate partner violence and HIV prevention intervention in Rakai, Uganda: analysis of an intervention in an existing cluster randomised cohort.

Authors: Jennifer A Wagman; Ronald H Gray; Jacquelyn C Campbell; Marie Thoma; Anthony Ndyanabo; Joseph Ssekasanvu; Fred Nalugoda; Joseph Kagaayi; Gertrude Nakigozi; David Serwadda; Heena Brahmbhatt
Journal: Lancet Glob Health Date: 2014-11-28 Impact factor: 26.763

9. Testing bias in calculating HIV incidence from the Serologic Testing Algorithm for Recent HIV Seroconversion.

Authors: Robert S Remis; Robert W H Palmer
Journal: AIDS Date: 2009-02-20 Impact factor: 4.177

10. Use of antiretroviral therapy in households and risk of HIV acquisition in rural KwaZulu-Natal, South Africa, 2004–12: a prospective cohort study.

Authors: Alain Vandormael; Marie-Louise Newell; Till Bärnighausen; Frank Tanser
Journal: Lancet Glob Health Date: 2014-04 Impact factor: 26.763

26 in total

1. Longitudinal Trends in the Prevalence of Detectable HIV Viremia: Population-Based Evidence From Rural KwaZulu-Natal, South Africa.

Authors: Alain Vandormael; Till Bärnighausen; Joshua Herbeck; Andrew Tomita; Andrew Phillips; Deenan Pillay; Tulio de Oliveira; Frank Tanser
Journal: Clin Infect Dis Date: 2018-04-03 Impact factor: 9.079

2. High percentage of undiagnosed HIV cases within a hyperendemic South African community: a population-based study.

Authors: Alain Vandormael; Tulio de Oliveira; Frank Tanser; Till Bärnighausen; Joshua T Herbeck
Journal: J Epidemiol Community Health Date: 2017-11-24 Impact factor: 3.710

3. Association of Head Injury With Late-Onset Epilepsy: Results From the Atherosclerosis Risk in Communities Cohort.

Authors: Andrea L C Schneider; Rebecca F Gottesman; Gregory L Krauss; James Gugger; Ramon Diaz-Arrastia; Anna Kucharska-Newton; Juebin Huang; Emily L Johnson
Journal: Neurology Date: 2021-12-17 Impact factor: 11.800

4. Sociobehavioral and community predictors of unsuppressed HIV viral load: multilevel results from a hyperendemic rural South African population.

Authors: Andrew Tomita; Alain Vandormael; Till Bärnighausen; Andrew Phillips; Deenan Pillay; Tulio De Oliveira; Frank Tanser
Journal: AIDS Date: 2019-03-01 Impact factor: 4.177

5. BMI and All-Cause Mortality in a Population-Based Cohort in Rural South Africa.

Authors: Jennifer Manne-Goehler; Kathy Baisley; Alain Vandormael; Till Bärnighausen; Frank Tanser; Kobus Herbst; Deenan Pillay; Mark J Siedner
Journal: Obesity (Silver Spring) Date: 2020-10-18 Impact factor: 9.298

6. Identifying 'corridors of HIV transmission' in a severely affected rural South African population: a case for a shift toward targeted prevention strategies.

Authors: Frank Tanser; Till Bärnighausen; Adrian Dobra; Benn Sartorius
Journal: Int J Epidemiol Date: 2018-04-01 Impact factor: 7.196

7. Large age shifts in HIV-1 incidence patterns in KwaZulu-Natal, South Africa.

Authors: Adam Akullian; Alain Vandormael; Joel C Miller; Anna Bershteyn; Edward Wenger; Diego Cuadros; Dickman Gareta; Till Bärnighausen; Kobus Herbst; Frank Tanser
Journal: Proc Natl Acad Sci U S A Date: 2021-07-13 Impact factor: 12.779

8. Trends in HIV incidence between 2013-2019 and association of baseline factors with subsequent incident HIV among gay, bisexual, and other men who have sex with men attending sexual health clinics in England: A prospective cohort study.

Authors: Nadia Hanum; Valentina Cambiano; Janey Sewell; Alison J Rodger; Nneka Nwokolo; David Asboe; Richard Gilson; Amanda Clarke; Ada R Miltz; Simon Collins; Valerie Delpech; Sara Croxford; Andrew N Phillips; Fiona C Lampe
Journal: PLoS Med Date: 2021-06-18 Impact factor: 11.069

9. Head injury and 25-year risk of dementia.

Authors: Andrea L C Schneider; Elizabeth Selvin; Lawrence Latour; L Christine Turtzo; Josef Coresh; Thomas Mosley; Geoffrey Ling; Rebecca F Gottesman
Journal: Alzheimers Dement Date: 2021-03-09 Impact factor: 16.655

10. HIV incidence declines in a rural South African population: a G-imputation approach for inference.

Authors: Alain Vandormael; Diego Cuadros; Adrian Dobra; Till Bärnighausen; Frank Tanser
Journal: BMC Public Health Date: 2020-08-06 Impact factor: 3.295