Literature DB >> 22028340

A frequentist approach to estimating the force of infection for a respiratory disease using repeated measurement data from a birth cohort.

H Mwambi¹, S Ramroop, Lj White, Ea Okiro, Dj Nokes, Z Shkedy, G Molenberghs.

Abstract

This article aims to develop a probability-based model involving the use of direct likelihood formulation and generalised linear modelling (GLM) approaches useful in estimating important disease parameters from longitudinal or repeated measurement data. The current application is based on infection with respiratory syncytial virus. The force of infection and the recovery rate or per capita loss of infection are the parameters of interest. However, because of the limitation arising from the study design and subsequently, the data generated only the force of infection is estimable. The problem of dealing with time-varying disease parameters is also addressed in the article by fitting piecewise constant parameters over time via the GLM approach. The current model formulation is based on that published in White LJ, Buttery J, Cooper B, Nokes DJ and Medley GF. Rotavirus within day care centres in Oxfordshire, UK: characterization of partial immunity. Journal of Royal Society Interface 2008; 5: 1481-1490 with an application to rotavirus transmission and immunity.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2011 PMID： 22028340 PMCID： PMC3704207 DOI： 10.1177/0962280210385749

Source DB: PubMed Journal: Stat Methods Med Res ISSN： 0962-2802 Impact factor: 3.021

1 Introduction

Respiratory syncytial virus (RSV) infection, which manifests primarily as bronchiolitis and/or pneumonia, is the leading cause of viral lower respiratory tract (LRT) infection in infants and young children. The clinical entity of bronchiolitis was described at least 100 years ago. In 1956, RSV, as the causative agent of most epidemic bronchiolitis cases, initially was isolated by Morris et al.[1] from chimpanzees with upper respiratory tract (URT) infections. Subsequently, Collins et al.[2] associated this agent with bronchiolitis and LRT infection in infants. Since then, multiple epidemiologic studies have confirmed the role of this virus as a leading cause of LRT infection in infants and young children. Cane and Pringle[3] states that human RSV causes LRT disease in about 40% of primary cases and is responsible for the hospitalisation of 0.1–2% of infants under the age group of 1 year annually. Peak incidence of occurrence is observed at age 2–8 months. Overall, 3.5–4 million children younger than 4 years acquire an RSV infection, and in the United States alone, more than 100 000 children are hospitalised annually because of this infection. This translates to 9–14 per 1000 children younger than 1 year who are hospitalised annually because of this condition. The virus does not induce solid immunity, re-infection is the norm (though progressively less severe), and, as yet no vaccine appears to be on the horizon. Virtually all children have had at least one RSV infection by their third birthday.4 Given the prevalence and potential severity of this condition, it is not surprising that the World Health Organization has targeted RSV for vaccine development. The frequency of RSV can be categorised as follows: In the field of infectious disease modelling, one area that is now re-attracting a lot of attention, is that of the statistical estimation of key parameters associated with disease processes. These key parameter estimates are based on observed data that are generated by the underlying disease process. In this article, we consider the estimation of the force of infection. It was not possible to estimate per capita loss of infection or recovery rate of the disease process. The disease of interest is a respiratory infection of children mainly under the age of 1 year. It is a viral disease caused by the RSV. Mathematical models to study the disease are not new. Greehalgh et al.[5] used both theoretical and deterministic models to study the RSV dynamics. Other relevant references on previous modelling work on RSV include Weber et al.[6] and White et al.[7,8] In this article, we address the problem of combining the dynamics of the disease and the estimation of model parameters from the observed data. The data used in our case are repeated measurements representing the status of whether a child is infected (1) or not (0) at a particular time point t where the index i denotes an individual (child) and j denotes the observation occasion. Thus, we are faced with the problem of repeated non-normal data suggesting the use of statistical methods of analysis able to account for the correlation of responses within the same subject or cluster. In the current study we employ direct likelihood estimation and also discuss the use and implement the generalised linear modelling (GLM) approach[9] for the estimation of time-varying stepwise force of infection and the per capita loss of infection (recovery rate). White et al.[10] solved a similar type of a problem using hierarchical Bayesian formulation to study rotavirus transmission and immunity. In this article, application to repeated measurements data was implemented via Markov Chain Monte-Carlo modelling using WinBuGs software. Thus the White et al.[10] method can also be applied to the current data set as an alternative method. Internationally: RSV infection is prevalent worldwide, with similar clinical manifestations and young age of RSV LRT infection; Race: All races appear susceptible to RSV, with similar disease patterns; Sex: Although boys and girls are affected equally by milder RSV disease, the frequency of hospitalisation for RSV disease is higher in males, with a male:female ratio of approximately 2:1; Age: Severe RSV disease is primarily a disease of young infants and children, with a peak occurrence at age 2–8 months. Reinfection with RSV occurs throughout life, with disease becoming more limited to the URT. The description of the Kilifi RSV study and the available data are given in Section 2. In Section 3, the basic dynamics of RSV are discussed in relation to the Susceptible-Infected-Susceptible (SIS), Susceptible-Infected-Recovered (SIR) and SIRS models. In Section 4, we present how the estimation of the model parameters was carried out. One complicating factor in the process is that of the time-varying disease parameters in the underlying process, hence the need to allow time dependence in the estimation of the parameters. A piecewise modelling approach was used to address this aspect. Section 5 is devoted to conclusions and suggestions of possible future extensions.

2 The Kilifi RSV study

The Kilifi RSV study yields a repeated measurement (longitudinal) data set measuring the presence or absence of the RSV in children in coastal Kenya. A longitudinal study is one where data are obtained when a response is measured repeatedly on the same observational or experimental unit(s). The Kilifi data set is part of a study carried out by the Kenyan Medical Research Institute in collaboration with the Wellcome Trust in Kilifi, Kenya. The data used in this analysis comprise of a single birth cohort with observations primarily over the first year of life[11] and form part of a larger cohort study.[12] The data set exhibits, simultaneously, several forms of so-called coarsening,[13,14] in the sense that the data structure assumed is richer than the data that are actually observed. First, the real underlying process of the disease is not directly observable, but only through the explicitly observed outcomes of the process. Second, the observations are not equally spaced within and between individuals and, importantly, the number of observations is not the same between individuals. This is less refined, or coarser, than the hypothetical observation of a continuous-time process. Third, information in between two observed events is unknown, additional events could have happened between any such pair of time points. Fourth, it is possible for children to drop out prior to the scheduled end of the study. This last form of coarsening is the more conventional missingness or dropout. A priori, it is possible for these coarsening processes, in particular dropout, to depend on (1) observed outcomes, (2) covariates and (3) unobserved (and unobservable) outcomes. If option (3) is the case, a so-called missing not at random (MNAR) mechanism is operating[15,16] and, arguably, a wholly satisfactory analysis is beyond reach, and the most sensible route forward is by what is currently known as a sensitivity analysis, where a variety of complex models, accommodating MNAR, is considered. In this article, we will make the assumption of missing at random (MAR), where missingness, or more generally coarsening, is allowed to depend on covariates and observed outcomes but, conditional upon these, not further on unobserved outcomes. This is considered by many a plausible assumption (for a review, see Molenberghs and Kenward[16]) and, very importantly, in a likelihood-based inferential framework, MAR is sufficient (provided some mild regularity conditions hold) to allow the analyst to ignore the missing data mechanism, that is, there is no need to model it explicitly. In other words, one can proceed by fitting a model, such as a generalised linear model, using maximum likelihood, provided all data are subjected to analysis, from both completely and incompletely observed subjects.17,18 These considerations imply that our analyses are valid under the assumption of MAR. Hence, this way of treating dropout is both broadly valid and, from a practical standpoint, does not require additional programming or otherwise technical work. Evidently, it might be of interest to conduct sensitivity analyses relative to the assumption of MAR, but this would go beyond the current research. The model that will be developed to represent this data will aid in understanding the process and in the design of more complex models in the future in order to be able to capture the kinds of incompleteness mentioned above. Proper inference about the disease process can be drawn through such models and eventually to aid in the design of intervention strategies. The Kilifi data set had 368 children that were recruited in the study; however, only 334 childrens' data were measured and recorded. In total, there are 9374 responses that were measured and the number of times each child is measured varies from one child to another. For example, child number 344 is measured at 12 different occasions with unequal spaced time intervals while child number 368 was measured at 20 different occasions with equally spaced time intervals. Let Y denote the outcome at observation time t for individual i. Then assuming a first order Markov model,[19] the observed matrix of the number of transitions between the two states ‘infected’ and ‘uninfected’ can be represented as in Table 1. The table is for anyone who ever transited in the entire study period. It is meant to indicate the number of transitions into each of these states given the immediate past state. Thus, they are conditional transitions. With reference to the current data set, there are two important remarks about the state transitions. First, the transitions from uninfected state to the infected state refer to symptomatic infections only since samples from children without at least mild symptoms or a cold were not collected. This means some infections will have been missed. Second, following a confirmed RSV infection event it was assumed that the child was resistant to re-infection and hence no further sampling was scheduled for 2 weeks. This will clearly lead to an underestimate of ‘infected state to infected state’ transitions.

Table 1.

Matrix of the number of transitions into the infected and uninfected states conditional on the immediate past state

Y_ij
Uninfected	Infected
Y_ij−1	Uninfected	8598	132
Y_ij−1	Infected	131	13

Matrix of the number of transitions into the infected and uninfected states conditional on the immediate past state It should be noted that Table 1 gives the number of visits to the uninfected and infected states conditional on the previous state indicated by the row label. From the resulting matrix of transitions, it is clear that the rate of sampling far exceeded the rate of infection because most of the transitions were from uninfected to uninfected states. There are a total of 131 transitions among the children from the infected to the uninfected state. Similarly, there are 132 transitions from uninfected to infected states. This represents about 40% of infections in the first year of life. Furthermore, the number of transitions from infected to infected (Table 1) is small (only 13), given the high frequency of sampling, suggesting that the duration of infection is short (or equivalently high recovery rate). Later, the assumption that the rate of recovery far exceeds the rate of infection is made. It is important to note that the time interval between transitions was not constant. The time intervals were different within and between the children which, as previously stated, makes the data set highly unbalanced. Therefore, standard methods of analysis may not be directly applicable.

3 The model

In this section, we discuss (for the purpose of analysis) the transmission of RSV for this particular cohort of children in relation to the SIS, SIR and SIRS disease models. In an SIS disease model, each individual in the population is either infected (I) or susceptible to infection (S). When a susceptible individual becomes infected, he/she is immediately infectious and when an infected individual is cured, he/she is immediately susceptible again. In a homogenous model assumption, every susceptible individual has the same probability of being infected, and each infected individual has the same probability of recovery. Ross[20] introduced the deterministic SIS model, while Weiss and Dishon[21] introduced the stochastic SIS model namely as a Markov birth-and-death process that is used to model a variety of processes that range from epidemics, transmission of rumours and chemical reactions. It is also important to note that the long-term behaviour of the deterministic and stochastic versions of the SIS model are quite different and we will not go into the details of this difference. In the current problem, it should be noted that according to the biology of RSV, the disease process may not necessarily follow an SIS disease model but rather a more appropriate model would be nearer to the SIRS process[6,8] with a possibility of gradual immunity acquisition. RSV tends to occur in seasonal outbreaks, and while reinfections during one epidemic do occur, it tends to be the case that repeat infections occur in sequential epidemics. However, in the first year of life there is little opportunity of reinfection since only one epidemic was experienced by the vast majority of this infant cohort, hence strictly speaking there is no basis for choosing between SIS, SIR or SIRS model. Further, if the SIRS model framework is actually the more reasonable structure for RSV, and given the short period of followup (i.e. little opportunity for loss of immunity and reinfection) then the more appropriate assumption would be to model the infection as a SIR structure. We therefore restrict ourselves to modelling the process of primary infection and recovery, and we do this by using the simplest of forms where we model the transition rates from the disease-free to the infected state (λ) and from the diseased state to the disease-free state (ν) using an SIS type model. Thus according to the study design and data λ is correctly specified but ν is not and later it will be denoted by ∼ν to distinguish it from the true value. The problem is to estimate the parameters of interest from observed data in the form of repeated (longitudinal) measures where each child presents a sequence of responses of 1's (diseased) and 0's (disease-free). The time duration between states (uninfected and infected) in days was also recorded; thus, the parameter estimates will have day−1 as units. We emphasise such estimates and their interpretations should always be carefully linked to the study design and not from the data alone.

3.1 Model governing differential equations

The SIS basic governing differential equation is given by where q(a, t) and p(a, t) are, respectively, the proportion of susceptible and infected individuals in the population at time t and age a such that Thus for a purely SIS model it is enough to study the solution for Equation (1). However, as already mentioned above, RSV is a viral disease; therefore, the most appropriate model is the SIRS model where R is the class of recovered individuals with a possible loss of immunity to revert back to the S class. Thus in this case the equation for p(a, t) would become where r(a, t) is the rate at which individuals move from the infected state to the recovered class with a possible loss of immunity at rate ν*(a, t) different from ν(a, t) in Equation (1). But because the data currently in use was based on children within the age of 1 year, the immunity against the disease for such individuals is still not yet developed; therefore, we assume ν*(a, t) = 0. It therefore suffices to deal with Equation (1) ignoring the R to S transition as explained in the main opening paragraph under Section 3. For the sake of simplicity, we also ignore the complication of short term immunity from infection in the first months of life due to maternally derived specific RSV antibodies. Hence, it is assumed that all children are born susceptible. In addition, note that losses due to natural mortality can here be assumed to be balanced by new births therefore in effect we are assuming a constant population model. In the Kilifi data set, all the children were all within 1 year of age, thus we can drop age, in Equation (1) and therefore write If we assume λ(t) and ν(t) are time-independent then because p(t) + q(t) = 1. This equation can easily be solved using the ‘variation of coefficients’ technique (Appendix 1). Applying the technique to Equation (3), a solution for q(t) is obtained as: assuming q(0) = 1 and p(0) = 0 as the initial conditions and since p(t) + q(t) = 1 we get as the general solutions for p(t). Note that if we relax the more restrictive initial condition that q(0) = 1 and p(0) = 0 and rather use the more general initial condition p(0)+q(0) = 1 the solutions for p(t) and q(t) are respectively given by and but for simplicity, we stick to Equations (4) and (5).

3.2 Linking the model to data

Note that the model solution for q(t) implies that and hence which give the equilibrium proportions of susceptible and infected individuals, respectively. This means that for a rare disease, we expect ν ≫ λ. Now, let the indicators 1 and 0 denote respectively the infected and uninfected states of an individual and let Y denote a binary response variable taking on one of these values. The subscript i denotes a particular subject in the sample for i = 1, … , n, where n is the number of subjects and t the time. Thus over a time interval (0, t), we can define the four conditional state transition probabilities as follows: Suppose that at t = 0 the proportion infected is 0, that is q(0) = 1 and p(0) = 0. It is noted that since the disease process is reversible, individuals cannot remain infected forever. The solutions for q(t) in (4) imply that, given an individual was initially uninfected, then the probability that this individual is still uninfected after a time duration t is given by, and since π01 + π00 = 1, then Following similar arguments, we can write expressions for π11(t) and π10(t) as: and Note that the process satisfies the ergodic property namely, and , the equilibrium proportion of susceptible and infected, respectively. Estimates of λ and ν can be obtained from these equations via maximum likelihood estimation since the transitions conditionally on previous state represent two separate Bernoulli distributions with probabilities π01 and π10 or their complements, whenever necessary. The general form of the likelihood can be written as: using the notation Y to denote the binary observation from child i at time occasion j out of n occasions. The second part of the likelihood, obtained by conditioning on the previous measurement Y, is the same as that of a product of two independent Bernoulli likelihoods: where n are the total number of transitions from state k ∈ (0, 1) to state l ∈ (0, 1) and therefore explicit maximisation is possible. There is an inherent assumption here that the time intervals are of equal length which in practice is not the case. It is possible to estimate the transition probabilities by maximising this conditional likelihood instead of the full likelihood, since the initial measurement Y contributes a limited amount of information only if some steady state assumptions are made. Thus, conditional on the initial states {Y}, the free parameters π01 and π10 are orthogonal. This allows a separate analysis of the two independent Bernoulli distributions leading to the maximum likelihood estimates of the two transition probabilities given and Subsequently and . Upon equating theses estimates of the transition probabilities to Equations (7) and (9) (or equivalently working with their complements and Equations (6) and (8)) one can ideally obtain estimates of the transition rates λ and ν. The problem with this approach is that the estimating equations are highly non-linear and the method works well for equally spaced observation times, as in Nagelkerke et al.[22] In our case, we are faced with a more complex situation. The observations are not equally spaced within and between subjects and in addition, the number of observations is not constant over individuals. Thus, we are dealing with a more complex scenario than that described in Nagelkerke et al.[22] requiring some simplifying assumptions. The alternative formulation adopted in Section 4 allows the use of GLM approach. This approach has an advantage of easily allowing for time-varying (in our case monthly specific) parameters as will be seen in Section 4.2 of this article. We are not at all against the above approach, but we are merely presenting an alternative approach to a similar problem.

4 Estimation of the model parameters

An alternative estimation procedure is developed by assuming that the residence or sojourn times in each disease state is exponentially distributed. As already explained, the reason for changing to an alternative estimation procedure is that the data we are dealing with are highly unbalanced with unequal time intervals between sampling visits and in addition, all individuals do not have equal number of observations. Thus, we need some simplifying assumptions in order to easily work with the data via the GLM approach (Section 4.1). In the current model, assume that the duration in the susceptible or disease-free state is exponentially distributed with parameter λ. If recovery was possible, then the duration in the disease state would be exponentially distributed with parameter r. Thus, we could correctly interpret λ and r as the force of infection and the recovery rate, respectively. The two parameters can also be viewed as the hazard of infection and recovery, respectively. In effect, we are assuming that the time of stay in the infected class is exponentially distributed with mean r−1 days. Likewise, the time of stay in the susceptible class is assumed to be exponentially distributed with mean λ−1 days. Thus one can ideally consider two Poisson stochastic processes with exponential inter-arrival times. If we observe the processes within an interval of time (0, d), we can infer that given an individual is in the susceptible class, the probability of on infection at or before time d is 1 − e−λ and the probability of no infection event is e−λ. Similarly, given an individual is in the infected class the probability of a recovery at or before time d is 1 − e− and the probability of no recovery event e−. Thus, conditional on the previous state we have two independent stochastic processes that need to be studied. This argument is the basis of the current formulation which was previously published by White et al.[10] However, after careful inspection of the full study design and the data generated, it became clear that it was not possible to estimate the true recovery rate, r for RSV. Thus, to emphasise this fact, we change notation and use instead of r and to denote an estimate of this parameter apparently estimable using the current data which should not be interpreted as the recovery rate. We therefore define the four observable transition probabilities for the current data as follows: The quantity d = t − t, is the time interval between samples at time t and t. The full likelihood can therefore be written as: Now, δ is an indicator variable denoting the initial state of a child where δ = 1 when the child is initially infected and 0 otherwise. Here P0(1) is the unconditional probability that the child is initially in the infected state. Likewise, the unconditional probability that an individual is uninfected is P0(0) = 1 − P0(1). If N is the total number of individuals in the study then ∑δ are individuals who are initially in the infected state. Since P0(1) and P0(0) are unknown it is simpler to consider the conditional likelihood given the initial states Y ∈ {P0(1), P(0)} in order to find the MLEs of the parameters λ and . Using the Fisher scoring method (Appendix 2) to iteratively solve for λ and the estimates together with approximate 95% confidence intervals (CIs) are and respectively. It should be noted that the estimate of the rate parameter (˜ν) is high, compared to the estimate of the force of infection. We further emphasise that the time duration given by of 2 days cannot be interpreted as an estimate of the shedding duration of RSV based on the current data. The reason is because samples were not taken during infection. Thus, the current data cannot support the estimation of the true recovery rate and hence the shedding duration. Based on observational studies carried out recently on the same population this duration is estimated to be between 4 and 11 days.[23] The current analysis is a very good example of a requirement in experimental design theory where it is stated that the analysis and therefore results of a designed study or experiment should directly be linked to the design. The estimated force of infection is justified and it is for infants in the primary phase of the disease where for simplicity we have assumed negligible maternal protection duration.

4.1 Application of GLM estimation to the RSV data

As earlier defined, λ will denote the force of infection but will note denote the per capita loss of infection or the recovery rate for the disease process. If we apply the generalised linear model to derive the force of infection for RSV, it will be necessary to consider data on the transitions from the uninfected to infected states namely, from state 0 to state 1 or 0 → 1 and the transitions from uninfected to uninfected that is 0 → 0. These transitions would make up 2 binary events for the response variable and once these transitions are coded as 1 for 0 → 0 and a 2 for 0 → 1, the response variable can be seen to conditionally follow a Bernoulli distribution. Likewise, another pair of binary responses can be similarly defined by considering the transitions 1 → 1 and 1 → 0. The residence times in the disease-free and disease states are assumed to follow the exponential distribution with parameters λ and , respectively. In survival analysis terminology, λ can also be interpreted as the hazard of infection or per capita risk of infection. The simpler model is where the only explanatory variable is the inter-state time duration that is, the quantity d. Using generalised linear model (GLM) with log link function we obtain and Since the data consist of four transition probabilities as defined in Equation (10), in order to formulate an appropriate GLM we define an indicator variable Let θ = P(Z = 1) and consider the following linear predictor it follows that Thus, using this approach we obtained (95% CI 0.0018–0.0024) and (95% CI 0.386–0.657) for the force of infection and the parameter ν, respectively. Again as with estimates found using direct likelihood maximisation the force of infection leads to a disease-free duration of about 1.5 years and the estimate ν leads to a duration of 2 days which as earlier stated cannot be interpreted as the shedding duration of RSV for this population of infants.[23]

4.2 Time-dependent force of infection

The above estimation procedures helped us only to estimate a single constant force of infection and per capita loss of infection over the time period of the study. However, there is enough evidence that a disease such as RSV does exhibit clear temporal variation in its incidences, which is a function of the force of infection. Thus, we extended the above approach to obtain monthly piecewise estimates of the force of infection. For months 14 and 15, there are no data because none of the children completed the study up to months 14 and 15; hence, no estimate is available for these 2 months. A piecewise constant force of infection with log link function was assumed. Hence, the linear predictor is given by Here, λ is the monthly force of infection but we re-emphasise that ˜ν is not the per capita loss of infection or recovery rate. Note that the model in (13) can be re-expressed in terms of a complementary-log-log link, in which the linear predictor is given by where g is the complementary-log-log link function. In such a model, the monthly regression parameter estimates for the force of infection and the parameter ν are equal to log(λ) and , respectively. As a result, the parameter estimates for the monthly force of infection and the additional parameter ˜ν are constrained to be non-negative, as required. In this article, the complementary-log-log link function was used to estimate the model's parameters. 95% CIs were obtained either by exponentiating the model parameters and their CIs or by applying the delta method for the log of the parameters. Table 2 presents the parameter estimates for the monthly force of infection. For completeness, Table 3 presents the monthly estimates for the parameter ˜ν. The force of infection peaks with different heights in month 3 (); then, it decreases to zero at month 9 and increase to secondary peaks at months 11 and 12 ( and , respectively). Month 1 had too few transitions recorded in it while months 14 and 15 did not have any data in them since the children did not complete the study for these months. Hence, these months have been omitted in the analysis. Figure 1 shows a plot of the force infection against time together with 95% CIs from direct exponentiation and the delta method. There is virtually no difference between the two sets of CIs.

Table 2.

Monthly estimates of the force of infection and CIs

Exponentiation	Delta method
Month	Lambda	Estimate (day⁻¹)	95% CI		95% CI
2	λ∧2	0.0053	0.0032	0.0086	0.0027	0.0079
3	λ∧3	0.0070	0.0053	0.0092	0.0051	0.0089
4	λ∧4	0.0051	0.0038	0.0070	0.0036	0.0067
5	λ∧5	0.0024	0.0016	0.0037	0.0014	0.0034
6	λ∧6	0.0019	0.0011	0.0033	0.0009	0.0029
7	λ∧7	0.0010	0.0005	0.0020	0.0003	0.0017
8	λ∧8	0.0001	0.0000	0.0008	−0.0001	0.0003
9	λ∧9	0.0000	0.0000	0.0000	0.0000	0.0000
10	λ∧10	0.0001	0.0000	0.0009	−0.0001	0.0004
11	λ∧11	0.0022	0.0014	0.0033	0.0013	0.0031
12	λ∧12	0.0022	0.0015	0.0032	0.0013	0.0030
13	λ∧13	0.0014	0.0007	0.0029	0.0004	0.0024

Table 3.

Monthly estimates of the per capita loss of infection

Month	Nu	Estimate (day⁻¹)	Standard error
2	ν¯2	0.4990	0.067
3	ν¯3	0.5000	0.06
4	ν¯4	0.5036	0.064
5	ν¯5	0.5021	0.062
6	ν¯6	0.4990	0.066
7	ν¯7	0.500	0.076
8	ν¯8	0.5002	0.072
9	ν¯9	0.5022	0.065
10	ν¯10	0.5009	0.06
11	ν¯11	0.5006	0.071
12	ν¯12	0.4996	0.061
13	ν¯13	0.5004	0.069

Figure 1.

The force of infection in months together with 95% CIs using the exponentiated and delta methods.

The force of infection in months together with 95% CIs using the exponentiated and delta methods. Monthly estimates of the force of infection and CIs Monthly estimates of the per capita loss of infection For completeness, the monthly estimates of the parameter ˜ν were also similarly obtained and the values are tabulated below for comparison purposes. However we re-iterate that these do not represent estimates of monthly recovery rate of RSV. Months 14 and 15 did not have any data in them because none of the children completed the study up to months 14 and 15. The estimate of the parameter ˜ν is fairly constant over all the months with no unusual peaks in the estimates. Graphically the monthly estimates of ˜ν are plotted over the study period as shown in Figure 2. Since from Table 3, we see that the monthly estimates of ˜ν were very stable within a very narrow range we opted for a common recovery rate estimate. A formal likelihood ratio test was performed to compare the two models (constant versus monthly specific estimates) and the difference was not statistically significant (LR statistic =11.36 on 11 d.f. and p-value = 0.4743). Thus a combined estimate was calculated by finding a weighted average of the 12 estimates and the variance of this common estimate calculated by weighting the within and between component variances (Appendix 3). However because of the similarity of the 12 monthly specific values the contribution from the between component variance was very small and negligible. The overall estimate of the transition rate ˜ν estimated this way was with a SE of 0.0189 and an approximate 95% CI given by (0.4635, 0.5377). The two horizontal lines in Figure 2 represent the combined estimate by the method above (lower horizontal line, ) and the common rate parameter ˜ν from a GLM (upper horizontal line, ). A Wald test for a difference between these two estimates shows they are not significantly different.

Figure 2.

A plot of bar nu in months.

5 Conclusion

In this article, GLM combined with likelihood estimation was used to estimate the force of infection for a childhood respiratory disease (RSV). In the process, an additional parameter ˜ν associated with the data was also estimated. Estimation using the full likelihood was not possible; therefore a form of conditional likelihood was used to model the data. The generalised modelling approach was modified to estimate monthly specific force of infection for the disease thus allowing the model to capture the temporal trends of disease incidence via piecewise parameter estimation. The force of infection was estimated as and the rate parameter ˜ν is estimated as using the direct maximum likelihood estimation method. Corresponding estimates using the GLM approach are 0.0021 and 0.5032. These two approaches gave quite similar sets of parameter estimates for the parameter ˜ν but the GLM approach yielded a force of infection around twofold higher. However, we prefer the GLM approach because of its flexibility in allowing us to come up with monthly piecewise parameter estimates. It is also seen from the estimation of the monthly parameters that RSV force of infection peaks at months 3, 11 and 12 which correspond to the months of May, January and February according to the original study period. This is consistent with the discussions by Cane,[24] Chew et al.[25] and Simoes[26] who all state that RSV has a seasonal signal attributed to meteorological or sociological factors. Furthermore, the force of infection is not constant and varies with time. It will be important at this point to discuss the validity of the estimates in relation to the limitations of the data collected in the study. First, we argue that the force of infection is from both methods an underestimation not because the methods are wrong but because of failure in the study design to collect data on asymptomatic infections. Second, the parameter ˜ν by both methods cannot be used to derive the shedding duration, because samples were not taken over the 2 week period following infection. The issue of average shedding duration has recently been reviewed by Okiro et al.[23] Other shedding studies show that the viral load starts declining only after 4 days or so.[27] In summary, it should be noted that a data generation process is a reflection of the study design which should be linked to the analysis and results. The parameter estimates also imply that the equilibrium proportion of susceptible and infected children stabilises at around 99.74% and 0.26% largely the result of very short duration of infection. Note that from this analysis, the susceptible prevalence is an estimate of both naive individuals and those treated from a previous infection and re-entered the S class. Nonetheless, since the force of infection is actually quite high, a significant proportion of infants are infected in the first year of life where disease risk and severity are highest. Thus, statistical and mathematical models are an important tool in understanding its dynamics and hence assist in designing control and intervention strategies for it. Further analyses to investigate child-to-child heterogenous effects and to account for the different forms of incompleteness mentioned in Section 2 are currently in progress. A sensitivity analysis to assess the impact of different forms of missing data types on the stability of parameter estimates is also proposed.

17 in total

1. SOME A PRIORI PATHOMETRIC EQUATIONS.

Authors: R Ross
Journal: Br Med J Date: 1915-03-27

2. The transmission dynamics of groups A and B human respiratory syncytial virus (hRSV) in England & Wales and Finland: seasonality and cross-protection.

Authors: L J White; M Waris; P A Cane; D J Nokes; G F Medley
Journal: Epidemiol Infect Date: 2005-04 Impact factor: 2.451

Review 1. Use of mathematical modelling to assess respiratory syncytial virus epidemiology and interventions: a literature review.

Authors: John C Lang
Journal: J Math Biol Date: 2022-02-26 Impact factor: 2.259

1 in total

A frequentist approach to estimating the force of infection for a respiratory disease using repeated measurement data from a birth cohort.

1 Introduction

2 The Kilifi RSV study

3 The model

3.1 Model governing differential equations

3.2 Linking the model to data

4 Estimation of the model parameters

4.1 Application of GLM estimation to the RSV data

4.2 Time-dependent force of infection

5 Conclusion

1. SOME A PRIORI PATHOMETRIC EQUATIONS.

2. The transmission dynamics of groups A and B human respiratory syncytial virus (hRSV) in England & Wales and Finland: seasonality and cross-protection.

3. Subcritical endemic steady states in mathematical models for animal infections with incomplete immunity.

Review 4. Molecular epidemiology of respiratory syncytial virus.

Review 5. Respiratory syncytial virus infection.

6. Molecular epidemiology of respiratory syncytial virus: rapid identification of subgroup A lineages.

7. Seasonal trends of viral respiratory tract infections in the tropics.

8. Respiratory syncytial virus infections in infants: quantitation and duration of shedding.

9. Duration of shedding of respiratory syncytial virus in a community study of Kenyan children.

10. Rotavirus within day care centres in Oxfordshire, UK: characterization of partial immunity.

Review 1. Use of mathematical modelling to assess respiratory syncytial virus epidemiology and interventions: a literature review.