Graham R Northrup1, Lei Qian2, Katia Bruxvoort2, Florian M Marx3,4, Lilith K Whittles5,6,7, Joseph A Lewnard1,8,9. 1. From the Center for Computational Biology, College of Engineering, University of California, Berkeley, Berkeley, CA. 2. Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA. 3. Department of Paediatrics and Child Health, Faculty of Medicine and Health Sciences, Desmond Tutu Tuberculosis Centre, Stellenbosch University, Cape Town, South Africa. 4. DST-NRF South African Centre of Excellence and Epidemiological Modelling and Analysis, Stellenbosch University, Stellenbosch, South Africa. 5. Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom. 6. MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, United Kingdom. 7. NIHR Health Protection Research Unit in Modelling Methodology, School of Public Health, Imperial College London, London, United Kingdom. 8. Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, CA. 9. Division of Infectious Diseases and Vaccinology, School of Public Health, University of California, Berkeley, Berkeley, CA.
Abstract
Host adaptive immune responses may protect against infection or disease when a pathogen is repeatedly encountered. The hazard ratio of infection or disease, given previous infection, is typically sought to estimate the strength of protective immunity. However, variation in individual exposure or susceptibility to infection may introduce frailty bias, whereby a tendency for infections to recur among individuals with greater risk confounds the causal association between previous infection and susceptibility. We introduce a self-matched "case-only" inference method to control for unmeasured individual heterogeneity, making use of negative-control endpoints not attributable to the pathogen of interest. To control for confounding, this method compares event times for endpoints due to the pathogen of interest and negative-control endpoints during counterfactual risk periods, defined according to individuals' infection history. We derive a standard Mantel-Haenszel (matched) odds ratio conveying the effect of prior infection on time to recurrence. We compare performance of this approach to several proportional hazards modeling frameworks and estimate statistical power of the proposed strategy under various conditions. In an example application, we use the proposed method to reestimate naturally acquired protection against rotavirus gastroenteritis using data from previously published cohort studies. This self-matched negative-control design may present a flexible alternative to existing approaches for analyzing naturally acquired immunity, as well as other exposures affecting the distribution of recurrent event times.
Host adaptive immune responses may protect against infection or disease when a pathogen is repeatedly encountered. The hazard ratio of infection or disease, given previous infection, is typically sought to estimate the strength of protective immunity. However, variation in individual exposure or susceptibility to infection may introduce frailty bias, whereby a tendency for infections to recur among individuals with greater risk confounds the causal association between previous infection and susceptibility. We introduce a self-matched "case-only" inference method to control for unmeasured individual heterogeneity, making use of negative-control endpoints not attributable to the pathogen of interest. To control for confounding, this method compares event times for endpoints due to the pathogen of interest and negative-control endpoints during counterfactual risk periods, defined according to individuals' infection history. We derive a standard Mantel-Haenszel (matched) odds ratio conveying the effect of prior infection on time to recurrence. We compare performance of this approach to several proportional hazards modeling frameworks and estimate statistical power of the proposed strategy under various conditions. In an example application, we use the proposed method to reestimate naturally acquired protection against rotavirus gastroenteritis using data from previously published cohort studies. This self-matched negative-control design may present a flexible alternative to existing approaches for analyzing naturally acquired immunity, as well as other exposures affecting the distribution of recurrent event times.
Host adaptive immune responses often protect against infection or disease when a pathogen is repeatedly encountered. Vaccines aim to exploit this protection by exposing hosts to an attenuated infection or to immunizing subunits of a pathogen. Evidence of protective naturally acquired immunity thus provides a strong rationale for vaccine development.[1] Quantitative estimates of the strength of naturally acquired protection also inform the interpretation of epidemiologic data, for instance providing a baseline against which vaccine performance can be evaluated.[2] These estimates are further sought to parameterize mathematical models of pathogen transmission.[3]Naturally acquired immunity is often estimated via the hazard ratio of infection or disease, comparing counterfactual periods in the presence and absence of prior infection.[4-10] Thus, inference centers on the distribution of recurrent event times. Unmeasured heterogeneity in individuals’ hazards of infection or disease presents a challenge in such analyses, originally termed a problem of “varying liabilities” by Greenwood and Yule[11] and subsequently addressed as “accident-proneness”[12] or “frailty.”[13] The tendency for events to recur among certain individuals must be accounted for in statistical analyses[14]; recurrence of infection or disease among individuals with the greatest susceptibility or exposure to a pathogen, irrespective of previous infection, may bias estimates of naturally acquired protection.[15]This consideration may have relevance to several diseases against which immune responses generate imperfect protection. Tuberculosis presents a notable example, where despite evidence of protective cell-mediated and humoral immunity,[16] several epidemiologic studies have reported higher rates of new-onset infection or disease among persons previously treated successfully for active tuberculosis, as compared to those without history of tuberculosis.[17-20] Similar conflict about the consequences of prior infection has arisen in epidemiologic studies of gonorrhea.[21,22] In recent analyses of a multi-site pediatric cohort study addressing enteric disease, previous infection predicted higher rates of recurrent infection or disease associated with several pathogens, including Shigella spp., Campylobacter spp., and various diarrheagenic Escherichia coli strains.[23] Evidence supporting the feasibility of protective vaccines against many of these pathogens suggests a need to revisit the impacts of naturally acquired immunity.[24-26] Similar causal inference challenges arise in the relationship between chronic inflammation and repeated infection in conditions such as cystic fibrosis,[27,28] otitis media,[29,30] and environmental enteric dysfunction.[31]Formalizing unmeasured heterogeneity as confounding suggests potential strategies to identify naturally acquired protection. Terming Y1 and Y2 as primary and recurrent infection or disease outcomes, respectively, and U as the constellation of unmeasured individual factors influencing exposure or susceptibility to a pathogen of interest, a directed acyclic graph (Figure 1) reveals that Y1←U→Y2 may introduce bias into the estimation of Y1→Y2, the effect of primary infection on recurrence. Conditioning on unmeasured individual factors by comparing observations during counterfactual risk periods from the same individual (Y1←U→Y2) permits unbiased inference of the effect of Y1. This intuition provides the basis for numerous self-matched designs (e.g., case-crossover, case-time control, and self-controlled case series), which have garnered increasing interest in epidemiology.[32]
FIGURE 1.
Directed acyclic graph addressing unmeasured confounding. We illustrate a causal framework wherein the effect of previous infection on time to subsequent infection (Y1→Y2) is of interest for analysis. One or more unmeasured confounding factors (U) creates a backdoor path (A) which can be blocked by conditioning on U (B).
Directed acyclic graph addressing unmeasured confounding. We illustrate a causal framework wherein the effect of previous infection on time to subsequent infection (Y1→Y2) is of interest for analysis. One or more unmeasured confounding factors (U) creates a backdoor path (A) which can be blocked by conditioning on U (B).In this article, we present an adaptation of these methods harnessing data from “negative control” events to permit causal inference in the presence of heterogeneous individual frailty. We derive a matched (Mantel-Haenszel) odds ratio (OR)[33,34] estimator for the hazard ratio of infection or disease, given previous infection. We conduct simulations to compare this approach against alternative methods based on proportional hazards models common in the analysis of longitudinal data and to assess statistical power under varying conditions. Last, we use the proposed method to reassess protective effects of rotavirus infection in data from previously published birth-cohort studies.[4,5]
APPROACH
Self-matched Negative-Control Design
Consider an outcome such as acquisition of a pathogen of interest, or onset of disease due to this pathogen (Table 1). The proposed design only includes individuals who experience recurrent episodes of this outcome of interest (case-only). Define Y and X as variables indicating outcome and exposure status for individual i at each observation, with indicating infection or disease with the pathogen of interest and indicating a negative-control outcome. Consideration of negative-control observations is of interest for studies involving event-based data capture (e.g., episodes of acute illness) and provides a basis for competing risks estimation frameworks as we detail below. Last, let indicate an individual has previously experienced infection with the pathogen of interest, and let indicate the individual has no history of infection with the pathogen of interest.
TABLE 1.
Parameters and Definitions
Parameter
Definition
Rate at which individual i experiences a prespecified clinical endpoint due to the pathogen of interest (“outcome of interest”), in the absence of naturally acquired immunity
Rate at which individual i experiences a negative-control outcome
Hazard ratio for the outcome of interest, owing to naturally acquired protection
Hazard ratio for the outcome of interest during the period after primary infection, relative to the period before primary infection, for individual i, due to all (confounding) factors other than naturally acquired protection
Hazard ratio for the negative control outcome during the period after primary infection, relative to the period before primary infection, for individual i
Parameters and DefinitionsDefine A to D as random variables indicating event times for observations of and , conditioned on X (Table 2). A and B are the time to first occurrence of the outcome of interest and the negative-control outcome, respectively, for an individual with no history of infection (X = 0). C and D are the time to the first occurrence of the outcome of interest and the negative-control outcome, respectively, following infection with the pathogen of interest (such that ; Figure 2). Here, we note that B and D are censored if and , respectively.
TABLE 2.
Contingency Table for Event Time Distributions, Given Prior Infection
Exposure Status
Outcome Status
Outcome of Interest
Negative Control Outcome
Previously uninfected
Previously infected
FIGURE 2.
Schematic presentation of potential outcomes. We illustrate potential outcomes in terms of the sequence of events A-D for a given individual. In cases 1 and 2, we observe (truncating observation of a negative-control event although ). In cases 3 and 4, we observe , with the negative-control outcome preceding infection with the pathogen of interest. We illustrate the corresponding potential outcomes for C and D, when , in the right-hand side of the figure.
Contingency Table for Event Time Distributions, Given Prior InfectionSchematic presentation of potential outcomes. We illustrate potential outcomes in terms of the sequence of events A-D for a given individual. In cases 1 and 2, we observe (truncating observation of a negative-control event although ). In cases 3 and 4, we observe , with the negative-control outcome preceding infection with the pathogen of interest. We illustrate the corresponding potential outcomes for C and D, when , in the right-hand side of the figure.
Event Time Distributions
Define the hazard for the “positive” outcomes owing to infection with the pathogen of interest (P) for individual i as , and define as the hazard ratio of this outcome given previous infection (Table 1). Assuming infectious exposures occur independently and continuously during the follow-up period, conditioning on each individual’s unique hazard, event times are exponentially distributed at the individual level. We note this circumstance gives rise to a nonexponential event time distribution at the population level, with variance augmented (relative to the pure exponential case) by heterogeneity in individual frailty. The probability of the outcome by time t, for a previously uninfected individual, is , while the probability of the outcome by time t, had the same individual counterfactually been previously infected, is . We discuss alternative event time distributions in a later section.Consider that data are collected from each individual for endpoints besides the primary outcome of interest. Among these, suppose a negative-control outcome (N) occurs at a rate for individual i. This rate should be unaffected by individuals’ prior exposure to the pathogen of interest, according to the definition of a negative control in this context.[35] Under the same assumptions, the probability of experiencing the negative-control outcome by time t, for individual i, is .
Estimating the Effect of Naturally Acquired Immunity
For an individual with no history of previous infection, consider the outcome of interest and the negative-control outcome to be competing risks. The observations E to H may be defined to indicate the relative ordering of event times A to D (Table 3), for each individual i. Specifically, take and to indicate the outcome of interest precedes the negative-control outcome during the periods with and , respectively. Define and as complements to E and G. By the memoryless property of the exponential distribution, we may start the clock for event times A and B at cohort entry or at any alternative milestone of interest (e.g., birth or exposure onset); for C and D, we consider time from the most recent infection, although any subsequent milestone is likewise appropriate.
TABLE 3.
Contingency Table for Competing Risks, Given Prior Infection
Exposure Status
Outcome Status
Outcome of Interest Precedes Negative Control Outcome
Negative Control Outcome Precedes Outcome of Interest
Previously uninfected
Previously infected
Contingency Table for Competing Risks, Given Prior InfectionAs formulated in Appendices A and B, we haveConsider the Mantel-Haenszel odds ratio[33,34] constructed from the competing risks of and , given , matching observations from each individual i:such thatUsing the above derivations of through ,Thus, the ratio of the matched odds for the outcome of interest to precede a negative-control outcome, given individuals’ unique hazards and history of prior infection, provides an unbiased estimate of the effect of previous infection on time to recurrence of the outcome of interest.
Further Considerations
Time-varying Confounding
At a design level, self-matched inference reduces or eliminates the potential for bias due to time-invariant factors that individually influence risk.[36] However, complications arise when individuals’ risk of experiencing these endpoints differs over time.Consider continuously varying hazards and . When variation over time is identical for both disease of interest and negative-control events (e.g., due to shared seasonal patterns, or exposure to interventions affecting both conditions equally), is unaffected. Formally, we express this scenario asDividing the observation period into small windows of length , where the hazard is approximately constant (i.e., the probability of a negative-control event happening is ), we may consider the windows to be multinomial trials, whereWe illustrate bias that may occur when hazards do not vary synchronously in eFigure 1 (http://links.lww.com/EDE/B749), identifying the greatest bias under conditions where seasonal patterns for the two conditions are inverted in relation to one another.Broadly, these findings support the selection of negative-control outcomes that resemble the outcome of interest in their seasonal pattern or association with other time-varying confounders, such as individuals’ age, health status, and sociodemographic exposures. Where time-varying confounders differ in their association with the outcome of interest and the negative-control outcome, covariate adjustment via. the use of conditional logistic regression models may present a strategy to mitigate bias.
Event Time Distributions
We may also relax the assumption that event times are exponentially distributed conditional on individual hazards and exposures, as long as this equivalence is key to canceling out the individual effects in the exponential case. For gamma-distributed event times, is the regularized incomplete beta function while for Weibull-distributed event times it is , where k is the shared shape parameter of the distributions. We illustrate potential bias under these alternative parameterizations in eFigures 2 and 3 (http://links.lww.com/EDE/B749).
COMPARISON TO COHORT DESIGN USING PROPORTIONAL HAZARDS ANALYSIS
Simulation Study
We conducted a simulation study across various underlying distributions of and to test for bias of point estimates under the proposed approach and under alternative methods often used in the analysis of cohort study data (without consideration of negative controls). As comparisons, we considered several proportional hazards models, which could be applied to time-to-event data for recurrent observations of the outcome of interest. We considered four approaches to control for differences in hazards among individuals:“Naive” proportional hazards model without inclusion of additional terms to account for differences in event times among individuals. We define the resulting hazard ratioProportional hazards model accounting for variation in individual frailty via. “random effects.” Fitting this model yields for the effect of previous infection, as well as representing the estimated variance in (log) individual-specific event rates, assumed to represent independent draws from a Normal distribution with mean 0.[37,38]Proportional hazards model including Gamma-distributed frailty terms.[13] Fitting this model yields for the effect of previous infection, along with the parameters of the underlying Gamma distribution describing individual-specific frailties.Proportional hazards model with “fixed effects” for individual subjects. Fitting this model yields for the effect of previous infection and estimates subject-specific rates of infection (via. individual-specific intercepts), which have no preassumed distribution.We defined for the proposed analysis strategy of a self-matched, negative-control design and considered various distributions for :Truncated Normal distribution (with a prespecified lower bound at a = 0);Truncated Cauchy distribution (with a prespecified lower bound at a = 0);Uniform distribution;Gamma distribution;Mixtures of Gamma distributions.We considered multiple parameterizations of each of these distributions (Table 4), holding the mean rate (or location parameter of the Cauchy distribution) constant at one infection per year across all simulations to determine effects of inter-individual heterogeneity on estimates of . We illustrate the distributions in Figure 3. Considering cohorts of 500 individuals, we drew values at random and sampled exponentially distributed event times of first and second infections for each individual, truncating observations at five years. We repeated simulations 500 times for each , drawing values independently for each simulation. We used the simulated datasets to estimate and taking the average of estimates obtained across all 500 iterations to obtain a single point estimate for each parameterization.
TABLE 4.
Event Rate Distributions Applied to Simulation Study
Distribution
Parameters
Parameterizationsa
I
II
III
IV
V
Truncated Normal
Mean
Variance
Lower bound a
Upper bound b
Truncated Cauchy
Location
Scale
Lower bound a
Upper bound b
Uniform
Lower bound a
Upper bound b
Gamma
Shape k
Scale
Gamma mixture (i)
Shapes ,
Scale ,
Weightb
Gamma mixture (ii)
Shapes ,
Scale ,
Weightb
aParameterizations are listed in order of increasing variance from I to V.
bThe weight parameter of the Gamma mixture distribution indicates the proportion of individuals whose rates are parameterized according to , ; the proportion with rates parameterized according to , is .
FIGURE 3.
Simulated distributions and hazard ratio estimates under “naive” inference approaches and under the proposed approach of self-matched inference with negative controls. Panels are organized to present, in each row, the assumed distribution (column 1), the estimate based on a Cox proportional hazards model without any correction for inter-individual heterogeneity (column 2), proportional hazards models employing various frailty frameworks (columns 3–5), and the estimate based on the proposed approach (column 6). One-to-one lines plotted in gray in columns 2–6 indicate where estimates would recover the true value, that is, . Horizontal gray lines plotted at indicate where estimates exceed 1, indicating directionally misspecified estimates of the causal effect of interest. Values are plotted on a red-to-blue color ramp corresponding to the parameterizations I-V, respectively, in order of least (I; red) to greatest (V; blue) variance as detailed in Table 4. (A) Truncated normal distribution; (B) truncated Cauchy distribution; (C) uniform distribution; (D) Gamma distribution; (E) mixture of Gamma distributions (i) with means at 0.125 and 1.875; and (F) mixture of gamma distributions (ii) with means at 0.5 and 1.5.
Event Rate Distributions Applied to Simulation StudyaParameterizations are listed in order of increasing variance from I to V.bThe weight parameter of the Gamma mixture distribution indicates the proportion of individuals whose rates are parameterized according to , ; the proportion with rates parameterized according to , is .Simulated distributions and hazard ratio estimates under “naive” inference approaches and under the proposed approach of self-matched inference with negative controls. Panels are organized to present, in each row, the assumed distribution (column 1), the estimate based on a Cox proportional hazards model without any correction for inter-individual heterogeneity (column 2), proportional hazards models employing various frailty frameworks (columns 3–5), and the estimate based on the proposed approach (column 6). One-to-one lines plotted in gray in columns 2–6 indicate where estimates would recover the true value, that is, . Horizontal gray lines plotted at indicate where estimates exceed 1, indicating directionally misspecified estimates of the causal effect of interest. Values are plotted on a red-to-blue color ramp corresponding to the parameterizations I-V, respectively, in order of least (I; red) to greatest (V; blue) variance as detailed in Table 4. (A) Truncated normal distribution; (B) truncated Cauchy distribution; (C) uniform distribution; (D) Gamma distribution; (E) mixture of Gamma distributions (i) with means at 0.125 and 1.875; and (F) mixture of gamma distributions (ii) with means at 0.5 and 1.5.To compute we drew hazards () and event times for negative-control observations for each subject, assuming event times were exponentially distributed with respect to the sampled, individual-specific rate parameters. To standardize comparisons of under differing distributions of , we defined
under each simulation.To investigate how the different modeling frameworks performed in capturing the distribution of individual-specific hazards, we saved estimates of individual-specific fixed effects, random effects, and frailties alongside estimates of . We fitted a single density kernel to the distribution of individual-specific estimates across 10 simulated cohorts for each true value of and underlying distribution of .
RESULTS
We plot distributions and estimates under each approach in Figure 3. The naive hazards ratio tended to overestimate of the causal effect , leading to under-estimation of the degree of protection (). Bias was minimized as approached zero, consistent with a scenario of strong protective immunity. Values of often exceeded 1 in scenarios where ; in practice, such an estimate would lead to inference that prior infection increases susceptibility to infection or disease attributable the pathogen of interest, when in fact prior infection is protective. For all distributions considered, bias in was greatest under parameterizations yielding the highest between-individual variance in .Alternative methods performed variably under the differing conditions (Figure 3). Lower degrees of bias were evident in as compared to estimates generated under the other methods assessed. Gamma frailty models and random effects models tended to yield less-biased estimates of than . However, the same direction of bias (resulting in under-estimation of the reduction in susceptibility, or ) was evident with all three of these approaches. Bias was worst when values were drawn from Gamma or Gamma mixture distributions and tended to increase under distributions with greater variance in , or greater irregularity in the case of Gamma mixture distributions. In contrast, fixed-effects models estimating multipliers on hazards for each individual tended to under-estimate under most distributions of , although both and were apparent in simulations using the truncated Cauchy distribution for . For the truncated Normal distribution, bias in decreased with greater variance in , whereas for the Uniform, Gamma, and Gamma mixture distributions, bias increased with greater variance in .Biased estimation of occurred in connection with a failure to accurately recover the underlying individual-specific frailty distributions. For each modeling approach, the extent of this misspecification in individual frailties varied over values of and distributions of (eFigures 4 to 6; http://links.lww.com/EDE/B749).
SAMPLE SIZE CONSIDERATIONS
To inform applications of the proposed method, we next assessed statistical power under differing conditions. A test statistic () has previously been identified for under the null hypothesis of no difference in risk given exposure.[39] For the contingency structure (Table 3) formulated from the terms E to H, this statistic can be written generally aswhich can then be simplified according to , , and . Thus,which is expected to follow a distribution with one degree of freedom under the null hypothesis. We calculated values of obtained for cohorts of varying sizes under differing parameterizations of , , and . For values of , we sampled individual event times A to D for a population of 100,000 individuals whom we subsequently partitioned (without replacement) into 2,000 hypothetical study cohorts each of sizes from N=25 to N=1,000. For these analyses, we considered values drawn from truncated Normal, Gamma, and Gamma mixture distributions, under the parameterizations of each of these distributions with greatest and least variance listed in Table 4. We determined statistical power via. the proportion of simulated cohorts for which the upper bound of a 95% confidence interval around would be expected to correctly exclude the null value, that is,To assess how correlation between and could affect the statistical power of estimates, we conducted simulations under two sets of assumptions. Under the first, we considered (equal to the expected value of under all parameterizations), so that..; under the second, we defined , under the assumption that individuals with greater risk of the outcome of interest would also experience higher incidence of the negative-control condition. These conditions provided upper and lower bounds on power, corresponding to assumptions of no correlation and perfect correlation between and , respectively.We present results of the power analyses in Figure 4. Analyses with as few as 50 subjects had roughly 80% power or greater to estimate (corresponding to 90% protection) under all conditions explored; analyses with 500 subjects had 80% power or greater to estimate (corresponding to 50% protection or greater) under all conditions. No scenarios revealed 80% or greater power for estimation of (corresponding to less than 20% protection), even with 1,000 subjects; statistical power for estimation of was 10% or lower under nearly all conditions explored. This limitation makes our proposed method potentially inefficient for scenarios where immunity is very weak; generally, the use of stochastic negative-control observations for inference diminishes statistical power in comparison to approaches that do not require such data.
FIGURE 4.
Statistical power for simulated analyses using the proposed approach of self-matched inference via. negative controls. Each panel presents the statistical power for rejecting the null hypothesis with two-sided P < 0.05 under varying conditions. Lines plotted in red to blue correspond to decreasing values of : 0.9 (red), 0.8, 0.7, …, 0.1 (blue), corresponding to increasing protection from 10% to 90%. Plots are presented in groups of four panels, each corresponding to analyses with values drawn from the following distributions: (A) Truncated Normal distribution; (B) Gamma distribution; (C) Mixture of Gamma distributions with means at 0.125 and 1.875 (as detailed in Table 4). Panels in the top row (A.1, A.2, B.1, B.2, C.1, C.2) represent analyses in which no correlation is assumed between rates of the outcome of interest and negative control outcome (). Panels in the bottom row (A.3, A.4, B.3, B.4, C.3, C.4) represent analyses in which the correlation between rates of the outcome of interest and the negative control outcome are is maximized (. Within each grouping, panels on the left-hand side (A.1, A.3, B.1, B.3, C.1, C.3) correspond to distributions with the least variance in individual rates of the outcome of interest (; i.e., parameterization I in Table 4). Panels on the right-hand side within each grouping (A.2, A.4, B.2, B.4, C.2, C.4) correspond to distributions with the greatest variance in individual rates of the outcome of interest (; i.e., parameterization V in Table 4).
Statistical power for simulated analyses using the proposed approach of self-matched inference via. negative controls. Each panel presents the statistical power for rejecting the null hypothesis with two-sided P < 0.05 under varying conditions. Lines plotted in red to blue correspond to decreasing values of : 0.9 (red), 0.8, 0.7, …, 0.1 (blue), corresponding to increasing protection from 10% to 90%. Plots are presented in groups of four panels, each corresponding to analyses with values drawn from the following distributions: (A) Truncated Normal distribution; (B) Gamma distribution; (C) Mixture of Gamma distributions with means at 0.125 and 1.875 (as detailed in Table 4). Panels in the top row (A.1, A.2, B.1, B.2, C.1, C.2) represent analyses in which no correlation is assumed between rates of the outcome of interest and negative control outcome (). Panels in the bottom row (A.3, A.4, B.3, B.4, C.3, C.4) represent analyses in which the correlation between rates of the outcome of interest and the negative control outcome are is maximized (. Within each grouping, panels on the left-hand side (A.1, A.3, B.1, B.3, C.1, C.3) correspond to distributions with the least variance in individual rates of the outcome of interest (; i.e., parameterization I in Table 4). Panels on the right-hand side within each grouping (A.2, A.4, B.2, B.4, C.2, C.4) correspond to distributions with the greatest variance in individual rates of the outcome of interest (; i.e., parameterization V in Table 4).For simulations with , statistical power was weaker under parameterizations resulting in greater variance in . In contrast, for simulations with differences in statistical power were less strongly apparent with increasing variance in . Taken together, these findings suggest statistical power is maximized when negative-control endpoints are chosen which tend to occur more commonly among individuals who are at greatest risk of the outcome of interest.
APPLICATION TO ROTAVIRUS BIRTH-COHORT DATA
Last, we applied the proposed method to real-world data collected in two birth-cohort studies of rotavirus infection and disease among 200 children in Mexico City, Mexico, and 373 children in Vellore, India. These datasets have been described extensively in primary study publications[4,5] and subsequent reanalyses[15,40]; we present details in the Supplemental Digital Content (http://links.lww.com/EDE/B749).Initial analyses of the datasets gave differing conclusions about the strength of protection against rotavirus gastroenteritis (RVGE). Based on Cox proportional hazards models that did not account for variation in individual frailty, children in Mexico City were estimated to have experienced 77% (95% confidence interval = 60, 88), 83% (64–92), and 92% (44–99) lower rates of RVGE following one, two, and three previous infections, respectively, as compared to zero infections.[4] In contrast, children in Vellore, where the rate of rotavirus acquisition was higher, were estimated to have experienced 43% (24–56%), 71% (59–80%), and 81% (69–88%) lower rates of RVGE after one, two, and three previous infections, as compared to zero infections, based on a parametric (exponential) survival model allowing Gamma frailty terms.[5] Subsequent analyses of the datasets revealed substantial variation in rates of rotavirus infection and risk of RVGE among individual children, as well as a potential for confounding due to declining risk of RVGE when infections were acquired at older ages, irrespective of previous infection.[40] In contrast, model-based analyses accounting for the independent effects of age and previous infection on children’s susceptibility to RVGE estimated that children experienced 33% (23–41%), 50% (42–57%), and 64% (55–70%) lower rates of RVGE after one, two, and three previous infections, respectively, as compared to zero infections.[15]We used the proposed self-matched negative-control design to reestimate naturally acquired protection against RVGE in the cohort datasets. Here, RVGE episodes (acute, new-onset diarrhea with rotavirus detected in the stool) were the outcome of interest and acute, new-onset diarrhea episodes without rotavirus detection were the negative controls. Our analyses did not adjust for age (owing to the limited number of children with 2 or 3 infections observed) or seasonality (due to year-round rotavirus transmission in these high-incidence settings). We compared the times of RVGE and rotavirus-negative diarrhea episodes from each child beginning from birth and thereafter following detection of the first, second, and third rotavirus infection, generating confidence intervals via. resampling of individual children. This yielded estimates of 27% (–1–48%), 50% (13–73%), and 48% (0–77%) lower rates of RVGE following one, two, and three previous infections, as compared to zero infections (Figure 5). Notwithstanding lower statistical power for the proposed method, these estimates are in agreement with previous findings[15] suggesting lower strength of naturally acquired protection than what was estimated in initial analyses of the studies.[4,5]
FIGURE 5.
Estimated protection against rotavirus gastroenteritis associated with previous infection. We plot point estimates and 95% confidence intervals (lines) for estimates of the hazard ratio of rotavirus gastroenteritis associated with having previously experienced one, two, and three previous infections, versus zero previous infections, estimated via. reanalysis of the Mexico City and Vellore rotavirus birth-cohort studies.[4,5] Analyses include rotavirus-negative diarrhea occurrences as a negative control endpoint.
Estimated protection against rotavirus gastroenteritis associated with previous infection. We plot point estimates and 95% confidence intervals (lines) for estimates of the hazard ratio of rotavirus gastroenteritis associated with having previously experienced one, two, and three previous infections, versus zero previous infections, estimated via. reanalysis of the Mexico City and Vellore rotavirus birth-cohort studies.[4,5] Analyses include rotavirus-negative diarrhea occurrences as a negative control endpoint.
DISCUSSION
We propose a novel self-matched negative-control method for estimating naturally acquired protection against recurrent infection or disease endpoints associated with a pathogen of interest. Analytically and via. simulation, we show this method recovers unbiased estimates under a range of conditions, including when individual hazards are drawn from highly irregular or skewed distributions. We find such scenarios may lead to bias under proportional hazards models with commonly used frailty estimation frameworks. Desirably, the proposed approach requires no parametric assumptions about individual-specific frailty. While our approach assumes exponentially distributed event times, this can be expected to hold provided infectious exposures occur continuously and independently, according to the individual-specific hazards. Beyond infectious disease natural history studies, this approach may have value for assessing the effects of other exposures on recurrent event times.Our findings provide several practical insights for real-world longitudinal cohort studies. Collecting data on multiple endpoints affords the opportunity to leverage negative-control observations to support causal inference. For studies applying the proposed approach, negative-control endpoints affected by the same risk factors or exposures as the outcome of interest are desirable both to reduce potential risks of confounding due to time-varying factors and to maximize statistical power based on the correlation between event rates for the outcome of interest and the negative-control outcome. “Test-negative” control conditions which resemble the outcome of interest, but are not attributable to the same pathogen,[41,42] may provide a compelling choice, particularly if their occurrence is predicted by similar factors. For instance, shared risk factors are well-documented for rotavirus-positive and rotavirus-negative diarrhea.[15,31] Considering respiratory illness, multiple etiologic viruses may share similar seasonal transmission patterns,[43] routes of transmission via. high-risk contact,[44] and predictors of severe illness.[45] For sexually transmitted infections, particular risk behaviors[46] differing among individuals or over time could alter risk of any infection, rather than infection with the pathogen of interest alone.[25] In the context of real-world cohort studies, test-negative control conditions that are clinically similar to the outcome of interest would likely result in a study visit or other recorded interaction with similar probability. This further supports consideration of inference methods making use of test-positive and test-negative occurrences of a particular clinical syndrome.In summary, self-matched inference via. negative controls may provide a flexible strategy to circumvent bias introduced by variation in individual frailty for analyses of naturally acquired immunity. Applications to other exposures affecting the distribution of recurrent event times merit consideration, given the possible limitations we identify in existing analysis frameworks.
Authors: Helen Petousis-Harris; Janine Paynter; Jane Morgan; Peter Saxton; Barbara McArdle; Felicity Goodyear-Smith; Steven Black Journal: Lancet Date: 2017-07-10 Impact factor: 79.321
Authors: F A Plummer; J N Simonsen; H Chubb; L Slaney; J Kimata; M Bosire; J O Ndinya-Achola; E N Ngugi Journal: J Clin Invest Date: 1989-05 Impact factor: 14.808
Authors: Leah C Katzelnick; Lionel Gresh; M Elizabeth Halloran; Juan Carlos Mercado; Guillermina Kuan; Aubree Gordon; Angel Balmaseda; Eva Harris Journal: Science Date: 2017-11-02 Impact factor: 47.728
Authors: Joseph A Lewnard; Benjamin A Lopman; Umesh D Parashar; Aisleen Bennett; Naor Bar-Zeev; Nigel A Cunliffe; Prasanna Samuel; M Lourdes Guerrero; Guillermo Ruiz-Palacios; Gagandeep Kang; Virginia E Pitzer Journal: PLoS Comput Biol Date: 2019-07-26 Impact factor: 4.475
Authors: Joël Mossong; Niel Hens; Mark Jit; Philippe Beutels; Kari Auranen; Rafael Mikolajczyk; Marco Massari; Stefania Salmaso; Gianpaolo Scalia Tomba; Jacco Wallinga; Janneke Heijne; Malgorzata Sadkowska-Todys; Magdalena Rosinska; W John Edmunds Journal: PLoS Med Date: 2008-03-25 Impact factor: 11.069