Literature DB >> 33017395

Estimation of the force of infection and infectious period of skin sores in remote Australian communities using interval-censored data.

Michael J Lydeamore^1,2, Patricia T Campbell^3,4, David J Price^3,5, Yue Wu⁶, Adrian J Marcato³, Will Cuningham⁷, Jonathan R Carapetis^6,8, Ross M Andrews^7,9, Malcolm I McDonald¹⁰, Jodie McVernon^3,4,5, Steven Y C Tong^3,7, James M McCaw^1,3,5.

Abstract

Prevalence of impetigo (skin sores) remains high in remote Australian Aboriginal communities, Fiji, and other areas of socio-economic disadvantage. Skin sore infections, driven primarily in these settings by Group A Streptococcus (GAS) contribute substantially to the disease burden in these areas. Despite this, estimates for the force of infection, infectious period and basic reproductive ratio-all necessary for the construction of dynamic transmission models-have not been obtained. By utilising three datasets each containing longitudinal infection information on individuals, we estimate each of these epidemiologically important parameters. With an eye to future study design, we also quantify the optimal sampling intervals for obtaining information about these parameters. We verify the estimation method through a simulation estimation study, and test each dataset to ensure suitability to the estimation method. We find that the force of infection differs by population prevalence, and the infectious period is estimated to be between 12 and 20 days. We also find that optimal sampling interval depends on setting, with an optimal sampling interval between 9 and 11 days in a high prevalence setting, and 21 and 27 days for a lower prevalence setting. These estimates unlock future model-based investigations on the transmission dynamics of skin sores.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 33017395 PMCID： PMC7561265 DOI： 10.1371/journal.pcbi.1007838

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Infections with impetigo (commonly known as skin sores) remain highly prevalent in remote Australian Aboriginal communities, as well as Fiji and areas of socio-economic disadvantage [1, 2]. Skin sore infections in these settings are primarily caused by Staphylococcus aureus, and Group A Streptococcus (GAS). GAS is associated with post-infectious sequelae such as acute rheumatic fever and rheumatic heart disease, of which Australia has one of the highest recorded prevalences globally [1]. Despite a relatively high level of understanding about the specifics of the GAS bacterium [3-7], comparatively little is known about the natural history of skin sore infection. Furthermore, what is known is often based on historical studies from a prior generation and from a different, non-endemic, geographical region [8-11]. We aim to utilise a dynamic transmission model for skin sores to estimate two key quantities: the force of infection, and the duration of infectiousness. In the absence of information relating to immunity post-infection, we assume skin sore transmission follows the dynamics of the Susceptible-Infectious-Susceptible (SIS) model. Calculation of these two key quantities will contribute to the development and parameterisation of models which will in turn inform the design of intervention strategies aimed at reducing prevalence. We analyse three separate datasets, all from remote Australian communities, documenting the infection dynamics of skin sores in individuals. The first dataset consists of public health network presentation data for 404 children under five years of age [12-14], collected as part of the East Arnhem Healthy Skin Project; the second contains longitudinal data for 844 individuals from three rural Australian communities, collected during household visits [15], and the third is comprised of survey visits for 163 individuals who participated in a mass treatment program [16], of which the primary endpoint was control of scabies infection. To analyse these data, we linearise the SIS model about the endemic equilibrium, and derive an expression for the likelihood of the two model parameters. By utilising Markov chain Monte Carlo (MCMC) methods, we obtain estimates of each of the force of infection, the duration of infectiousness, the basic reproductive ratio, R0, and the prevalence of infection. Finally, by utilising optimal experimental design, the optimal sampling strategy to inform estimation of these parameters for use in future studies is obtained.

Materials and methods

Ethics statement

Ethics approval for reuse of existing data was obtained from The Human Research Ethics Committee of the Northern Territory Department of Health and Community Services and Menzies School of Health Research (Ethics approval number 2015-2516). Permission was also obtained from the custodians of each dataset. This project has been conducted in association with an Indigenous Reference Group, as well as an ongoing stakeholder group which contains Aboriginal Australian community members.

The Susceptible-Infected-Susceptible model

We consider a stochastic representation of the Susceptible-Infectious-Susceptible (SIS) model [17]. In this model, individuals are either susceptible (S) or infectious (I). The transition rate from susceptible to infectious, known as the force of infection, and denoted λ = βI/N, where β is the transmissibility parameter, is non-linear. This non-linearity is one of the key features of dynamic infectious disease models. However, this means that to model a population of individuals, the state of each individual is required (to know the prevalence, i = I/N). For the SIS model with individuals explicitally stated, the size of the required state space is 2 [18]. When constructing the Markov chain representation of the SIS model then, the generator matrix, Q, is 2 × 2, meaning that for large numbers of individuals, computing the matrix exponential exp(Qt), is computationally intractable. The result of this, then, is that performing inference with infectious disease models is challenging [19-24]. When the dynamics of the SIS model are at (or close to) equilibrium, then the force of infection, λ, is approximately constant. As such, we approximate the SIS model by a two-state process with a constant force of infection. By making this approximation, and assuming individuals are otherwise identical, it follows that a Markov chain consisting of only two states is required, independent of the underlying population size. This approximation has a straightforward likelihood calculation, which allows estimation of parameters in a Bayesian framework and also calculation of the optimal sampling interval for future study designs through the use of optimal experimental design.

Linearisation of the SIS model

The standard SIS model can be described using two transitions, infection and recovery, and two parameters, the transmissibility, β, and the rate of recovery, γ (Table 1). Ignoring demographic processes, the total number of individuals in the population is fixed.

Table 1

Transitions of the SIS model.

The force of infection is given by λ, the transmisibility parameter β and the rate of recovery by γ.

Transition	Rate
(S, I) → (S − 1, I + 1)	λ ≔ βI/N
(S, I) → (S + 1, I − 1)	γ

Transitions of the SIS model.

The force of infection is given by λ, the transmisibility parameter β and the rate of recovery by γ. One of the most important quantities in infectious disease modelling is the basic reproductive ratio, R0, defined as the mean number of secondary infection events caused by a single infectious host, in an otherwise susceptible population. The basic reproductive ratio functions as a threshold parameter, where if R0 ≤ 1, an outbreak of disease will not occur, while if R0 > 1, then there is a non-zero probability of a disease outbreak occurring. For the SIS model, the basic reproductive ratio is The quasi-equilibrium solution of the SIS model is well known [18], and the endemic prevalence of disease is Let t be time of interest of the process. The force of infection, λ(t) is defined as, At equilibrium, the prevalence is approximately constant, and so the force of infection can be approximated by By performing this linearisation, it is assumed that the dynamics of disease are and remain at equilibrium. It follows that we may consider a single individual. The generator matrix for the Markov chain for the life-course of that individual is and the matrix exponential of Q is The matrix in Eq (1), combined with an initial state and time t, gives the probability distribution for the Markov chain. It is possible to calculate expressions for the equilibrium prevalence, I*, and the basic reproductive ratio, R0, in terms of λ and γ. Solving for the equilibrium distribution of the linearised SIS model gives the equilibrium prevalence From the standard SIS model, it is also known that the basic reproductive ratio, R0 = β/γ, and λ = βI*. It follows that the basic reproductive ratio, R0, is given by and substituting Eq (2) gives Given these simple closed form expressions for the key quantities of interest, it is possible to perform estimation in a Bayesian setting, using interval-censored data.

Data

Three separate datasets collected in Australian Aboriginal communities are considered in this study: data from public health network presentation (PHN) data on 404 children from birth to five years of age, collected as part of the East Arnhem Healthy Skin Project; data for 844 individuals from three communities, collected during household visits (referred to as the HH dataset); and data from 163 individuals who were observed for over 25 months as part of a mass treatment program in a single rural community (RC). Each dataset consists of longitudinal observations of each individual, where their infection status is recorded at each observation. The times between presentations are heavily right skewed in each dataset, with a median time to next presentation of 9 days for the PHN data, 61 days for the HH data and 119 days for the RC data. The number of observations in total is also highly variable with 13,439 observations in the PHN data, 4,507 in the HH data and 626 in the RC data. Kernel density estimates of the distribution of time until the next presentation, with the observed data overlayed, are shown in Fig 1. The suitability of each of these datasets for inferring the force of infection, λ, rate of recovery, γ, and the basic reproductive ratio, R0 is investigated in Section Verification of presentation distributions. It is worth noting specifically that the PHN data contains information only on children from birth to five years of age, while the other two datasets contain information on individuals of all ages. Prevalence of skin sores is known to be age-dependent [25] and so by not modelling any age-structure, we are ignoring these differences.

Fig 1

Distribution of (A) time between presentations and (B) age of patients at presentation for each of the three datasets—PHN, HH and RC—with the empirical data overlayed.

Data structure

Recall that the datasets which are considered consist of longitudinal observations for each individual, with an individual’s infection status being noted as either susceptible or infected at each point. The observation is not continuous in nature, with the individual’s infection status only being known at each sampling point. Data of this form are known as interval-censored, or panel data. Interval-censored data are common in epidemiology, and inference in a frequentist setting is well established [26]. Let the state of individual i at observation j be X, and the time at which the jth observation is made be t. The likelihood for a single individual, i, can be evaluated as which is the relevant entry of matrix P in Eq (1), evaluated at the time difference between observations, t − t. It follows that the likelihood for the entire population is It is important to note that the likelihood in Eq (4) has assumed what is known as ignorable sampling times. That is, the sampling times are chosen independently of the outcome of the process. When sampling times are chosen in advance, as they were in the HH and RC datasets, then the sampling times have been proven to be ignorable [26]. For the PHN data, observations were made under what is termed a doctor’s care scheme, whereby the next observation time is chosen at the current observation time, and based on an individual’s disease state at that time. The sampling times are proven to be ignorable if the following two conditions are true [27]: The probability of individual i being in a given disease state u at time t, given all infection history until this point, H, is independent of whether an examination is carried out at this time and past examination times, and The conditional distribution of the jth observation time, t = P(T = t|H), where T is the random variable representing the time of the jth infection for individual i, is functionally independent of the transmission parameters. The first of these conditions effectively means that the infection status at time t only depends on the status at time t and on t and t, but not on whether the individual’s infection status is sampled at time t or on previous sampling times. As treatment is prescribed by a doctor’s visit, it is possible that this condition is violated. However, the dataset does not contain information on the form of treatment administered meaning that it cannot be assumed that the administered treatment is for skin sores, and almost 60% of presentations to the clinic contain no information on skin sores (and so one could assume that the primary reason for the visit is not skin sores). Further, it is noted that the estimate of infectious period in any modern setting will be reduced by the presence of treatment. As such, it is assumed that the first condition is true. The second condition means that the next observation time is conditionally independent of the transmission process. This condition is assumed to be true here due to the high frequency of presentation in this dataset, even when an individual does not have skin sores. A large number of doctors visits do not contain information about an individual’s status with skin sores, including instances where an individual had been marked as infected one day prior. If we ignore these ‘missing’ entries, the mean time to next positive presentation following a negative presentation is 27.2 days, while the mean time to next negative presentation following a negative presentation is 23.2 days. If skin sores caused more frequent doctors presentations, then we would expect these numbers to be reversed. However, the empirical mean time to next presentation is sensitive to the frequency of missing data, and so it is unclear whether this difference can be attributed to a change in patient behaviour based on infection status, or limitations in data collection. The analysis proceeds on the basis that all sampling schemes in the given data are ignorable, but it is noted that this may not be the case. It is important to note that in the settings studied here, treatment is routinely offered and applied to skin sores. This will augment the estimate of the recovery rate, γ, to be the average duration of infection in the presence of treatment. However, as treatment is routinely applied in many settings where skin sores are endemic, this estimate is still relevant when considering control schemes and survey designs. Both the force of infection, λ, and the infectious period, γ are estimated in a Bayesian context using Markov chain Monte Carlo estimation (MCMC). The MCMC is performed using the No-U-Turn sampler implemented in Stan [28], using 10,000 iterations for 4 chains, for a total of 40, 000 iterations. The code used to perform this estimation is available at http://github.com/MikeLydeamore/TMI/. We use truncated normal distributions, , truncated at 0, as priors for both the force of infection, λ, and the infectious period, γ. To calculate the basic reproductive number, R0, we apply the formula in Eq (3) for each sample from the posterior distribution.

Results

We start by verifying the methodology through the use of a simulation estimation study, whereby individuals are simulated from the linearised SIS model, and we attempt to recover parameters through the estimation routine detailed in the previous section. To explore the suitability of the methods and available datasets, we choose reasonable values for the force of infection, λ, and the rate of recovery, γ. In the main text, we use λ = 1/60 and γ = 1/20, while S4 Text presents results for different chosen values. After these verifications, the estimation method is applied to the observational data.

Verification of methodology

There are multiple sources of stochastic variability in this setting, including the underlying population which is observed, the realisation from the observation distribution and the MCMC method itself. The first two of these potential causes for variation are investigated in detail here. To investigate the variability in the underlying population, the estimation procedure is performed on 64 randomly generated populations from the linearised SIS model, and each of the 400 members of each simulated population are observed once daily for one year. This high frequency of observation means that the only source of meaningful variability is that which comes from the linearised SIS model. The top row of Fig 2 shows the marginal posterior estimates for the force of infection, λ, the rate of recovery, γ, and the basic reproductive ratio, R0, for populations simulated using λ = 1/60 and γ = 1/20. Each violin plot shows an individual (marginal) posterior distribution for the parameter of interest from a randomly selected population, while the boxplot shows the variability of the posterior mean for each parameter over all 64 realised populations. The within-simulation variability is relatively high, even in this case with daily observation. However, the method estimates each parameter well and in an unbiased manner.

Fig 2

Marginal posterior distributions for the force of infection, λ, the rate of recovery γ, and the basic reproductive ratio, R0, from 8 randomly generated populations from the linearised SIS model under two different observation distributions.

The mean of each distribution is given by the white circle. The boxplot at the bottom of each panel represents the means of 64 marginal posteriors. The true value which was used to generate each population is represented by the blue line (λ = 1/60, γ = 1/20). The two different observation distributions are (A): Observed daily over 1 year and (B) observed according to the empirical presentation distribution from the PHN data over 1 year. Both observation distributions yield good estimates of the simulated parameters. The observation distribution from the RC dataset was tested, but has not been visualised as the estimates were far from the true values (See Fig 3).

Marginal posterior distributions for the force of infection, λ, the rate of recovery γ, and the basic reproductive ratio, R0, from 8 randomly generated populations from the linearised SIS model under two different observation distributions.

Fig 3

Variance in the estimates of the force of infection, λ, and the rate of recovery, γ, for a range of sampling intervals.

Estimates were performed on 64 realisations of the simulated populations, each with parameters λ = 1/60 and γ = 1/20. Each realisation contains 20 observations from the simulated population, leading to the time horizon for each realisation being 20 × sampling interval.

Next, potential variability in the observation distribution is considered. Again, a population of 400 individuals is simulated, and each simulated individual observed at times drawn from the observation distribution obtained from the PHN dataset (shown in Fig 1) over a time horizon of 1 year (Fig 2(B)). It is satisfying that although the sampling interval in the PHN dataset is notably longer than the daily case shown in Fig 2(A), the estimation method is still able to recover the simulated parameters. This suggests that oversampling the population (Fig 2(A)) gives little benefit to estimates of the parameters. Comparatively, it makes sense that if the sampling interval is too large, then no information will be gained. An example of this phenomenon is shown in Fig 3, where 20 samples are made of the population, separated by some sampling interval. The figure shows that a short sampling interval and a relatively short time horizon means that information about the parameters is difficult to recover. Similarly, a long sampling interval increases the variance in the parameter estimates. This suggests that there exists some optimal sampling interval. This concept will be returned to in Section Prospective sampling strategies.

Variance in the estimates of the force of infection, λ, and the rate of recovery, γ, for a range of sampling intervals.

Verification of linearisation procedure

Having established that parameters can be re-estimated from the linearised model, we now look to verify whether the linearisation of the SIS model is valid. To do this, an individual-based implementation of the full (non-linearised) SIS model is used. The chosen parameters are β = 0.067 and γ = 1/20 (giving λ = 1/60 and an endemic prevalence of 25%), and 300 individuals. The Markov chain is seeded with 125 infected individuals, which is close to the equilibrium of this system. The system is run for 10 years before observation begins. The population is simulated from the full SIS model, and the estimation is performed using the linearised model. No stochastic extinction occurred in any of the simulations throughout this work. Fig 4 shows results from 64 realised populations, under the observation distribution from the PHN dataset. The recovery rate, γ, is estimated accurately and with relatively small variance. The force of infection, λ, is somewhat underestimated on average with a relative error in the mean of 15%, although the variability is large. This underestimate carries over to the estimate of the basic reproductive ratio, R0. However, the true parameters are within the 95% confidence interval when averaged over the 64 simulations, similar to that seen in Fig 2 under the same observation distribution. Thus, it is concluded linearisation of the SIS model is valid when the dynamics are near equilibrium.

Fig 4

Marginal posterior distributions for the force of infection, λ, the rate of recovery γ, and the basic reproductive ratio, R0, from 8 randomly generated populations from the full (non-linearised) SIS model under the empirical observation distribution from the PHN data, over 1 year.

Marginal posterior distributions for the force of infection, λ, the rate of recovery γ, and the basic reproductive ratio, R0, from 8 randomly generated populations from the full (non-linearised) SIS model under the empirical observation distribution from the PHN data, over 1 year.

Verification of presentation distributions

Before estimating the force of infection, λ, and the rate of recovery, γ, for the three datasets discussed, the frequency of presentations must be checked to determine if they are sufficient for use with the method. Fig 5 shows a simulation estimation study using the presentation distributions from the PHN and HH datasets. Both datasets give good estimates. When considering the RC dataset, recall the presentation distributions shown in Fig 1. The RC dataset has a much wider sampling interval compared to the PHN and HH datasets. We suspect that this presentation distribution may not hold sufficient information to recover the parameters of interest. However, as the prevalence is observed at each survey visit, estimating the basic reproductive ratio, R0, may still be possible. Fig 6 shows the prior distributions, with samples from the posterior distribution overlayed, under the observation distribution from the PHN dataset (panel A) and the RC dataset (panel B). Under the observation distribution from the PHN data, the posterior distribution samples are tightly clustered, with variance much smaller than in the prior distributions. Indeed, estimates are so localised relative to the prior that the samples appear to be overlayed in the figure. Comparatively, when the observation distribution is that seen in the RC dataset, the posterior samples are strongly correlated with a wide variance, indicating that this dataset does not have sufficient sampling frequency to separately estimate both the force of infection, λ, and the rate of recovery, γ. However, the posterior distribution samples align with the simulated prevalence (and thus the basic reproductive ratio, R0). The RC dataset can still be used to estimate these quantities.

Fig 5

Marginal posteriors for the force of infection, λ, the rate of recovery γ, and the basic reproductive ratio, R0, from 8 randomly generated populations from the linearised SIS model under two different observation distributions.

Fig 6

Prior distribution (concentric rings) with 20,000 samples from the posterior distribution (black points) overlayed from a randomly generated population under the observation distribution from (A) the PHN dataset, (B) the HH dataset, and (C) the RC dataset.

Marginal posteriors for the force of infection, λ, the rate of recovery γ, and the basic reproductive ratio, R0, from 8 randomly generated populations from the linearised SIS model under two different observation distributions.

Prior distribution (concentric rings) with 20,000 samples from the posterior distribution (black points) overlayed from a randomly generated population under the observation distribution from (A) the PHN dataset, (B) the HH dataset, and (C) the RC dataset.

The red line is the set of parameter values which give the true prevalence in the simulated population. In panels (A) and (B), the samples are tightly clustered with variance far smaller than the prior distribution. In panel (C), the samples are highly correlated, and with high variance, indicating the two parameters of interest cannot be uniquely determined, but their ratio (and so R0) can. Having verified the suitability of each of the datasets to this estimation method, the next step is to estimate each of the force of infection, λ, the rate of recovery, γ, the prevalence of disease and the basic reproductive ratio, R0.

Estimation from data

For the PHN and HH datasets, relatively similar estimates for the infectious period, 1/γ (12 days for the PHN dataset, and 20 days for the HH dataset) are obtained. However, notably different estimates for the force of infection, λ, were obtained. In the PHN dataset, the mean force of infection is estimated at 1/20.21, while in the HH dataset, the estimate is 1/202.07—an order of magnitude different. This difference follows through to estimates of the basic reproductive ratio, R0 (1.60 vs 1.10), and the prevalence, estimated to be 37.5% in the PHN dataset and only 9% in the HH dataset. For the RC dataset, R0 is estimated to be 1.42, and the prevalence to be 26.9%. Point estimates of prevalence in all three study locations have been reported previously (Table 2) [15, 16, 29], at 35.6% in the region in which the PHN dataset was collected, 13.1% in the region where the HH dataset was collected and 35% in the region where the RC dataset was collected. These prevalence estimates align well with the estimates obtained using our method.

Table 2

Parameter estimates for the force of infection, λ, and the infectious period, 1/γ from the three different datasets.

Note this method estimates the rate of recovery, γ, but the infectious period is reported here for clarity.

Dataset	Parameter [units]	Mean	95% CI
PHN	Force of infection (λ) [1/days]	0.049	(0.042, 0.059)
	Infectious period (γ⁻¹) [days]	12.19	(10.23, 14.55)
	R₀	1.60	(1.56, 1.65)
	Prevalence	37.5%	(31.0, 39.4)
Literature [29]	Prevalence	35.6%	(32.9, 38.3)
HH	Force of infection (λ) [1/days]	0.0049	(0.0040, 0.0062)
	Infectious period (γ⁻¹) [days]	19.97	(16.19, 24.56)
	R₀	1.10	(1.09, 1.11)
	Prevalence	9.1%	(8.3, 10.0)
Literature [15]	Prevalence	13.1%	Not provided
RC	R₀	1.42	(1.34, 1.51)
	Force of infection (λ)	—	Not identifiable
	Infectious period (γ⁻¹)	—	Not identifiable
	Prevalence	29.6%	(25.4, 33.8)
Literature [16]	Prevalence	35%	Not provided

Parameter estimates for the force of infection, λ, and the infectious period, 1/γ from the three different datasets.

Note this method estimates the rate of recovery, γ, but the infectious period is reported here for clarity.

Prospective sampling strategies

Thus far, the focus has been on previously collected datasets from which to estimate parameters. If the sole aim of a study was to collect data to best estimate these parameters, then the natural question to ask is when should individuals be sampled? Aided by the simple structure of the linearised SIS model, this question may be answered through optimal experimental design [30]. We take the approach of robust optimal experimental design, under the ED-optimality criterion [31, 32]. Let = (δ1, …, δ) define an n-sampling design with spacing δ, i = 1, …, n − 1 between subsequent observations. Then, the optimal sampling design, *, is given by where = {λ, γ}, () is the Fisher Information matrix, det is the determinant operator, and p() is the prior distribution. Note that the optimal sampling interval, *, is dependent on the prior distribution, p(). Two designs are considered for each dataset. The first is termed the variable sampling interval, where the ith sampling interval, δ, is unrestricted, and n = 11 design spacings are chosen. Although this design strategy is optimal over a 12 visit design, adhering to the varying intervals may be difficult from an implementation perspective. A more practical strategy, and the second considered here, is termed the fixed sampling interval, where δ = δ, ∀i. This is equivalent to considering n = 1 design spacing, as the population dynamics are assumed to be at equilibrium throughout the study. The integral in Eq (5) is approximated using a Monte Carlo estimate with 5,000 samples. Each individual is observed 12 times. We use the induced natural selection heuristic for finding optimal strategies [33]. For detail on the algorithm inputs and evidence of convergence, see S3 Text.

Recommended sampling strategies

We calculate the optimal strategy using the posterior distributions obtained from the PHN dataset, HH dataset and the union of the these two posterior distributions as the prior distribution in Eq (5). The results for both the variable interval strategy and the fixed interval strategy are shown in Table 3.

Table 3

Optimal sampling strategies (in days) using the posterior distributions obtained from the PHN and HH datasets, as well as the union of these two posterior distributions.

Two sampling strategies are considered: variable, where the time between each observation is allowed to vary, and fixed.

Data Source	Interval	Optimal Design (days)
PHN	Variable	(12.9, 12.3, 10.4, 12.0, 10.3, 12.5, 12.5, 9.1, 11.3, 10.6, 11.6, 13.2)
PHN	Fixed	9.9
HH	Variable	(18.1, 23.0, 30.5, 31.1, 31.6, 27.8, 30.0, 31.7, 30.0, 30.6, 32.6, 29.4)
HH	Fixed	24.2
Combined	Variable	(16.2, 23.5, 24.3, 32.0, 28.2, 26.6, 31.6, 29.2, 28.2, 28.1, 30.8, 33.9)
Combined	Fixed	23.4

Optimal sampling strategies (in days) using the posterior distributions obtained from the PHN and HH datasets, as well as the union of these two posterior distributions.

Two sampling strategies are considered: variable, where the time between each observation is allowed to vary, and fixed. Under the constraint of equal observation intervals, and restricted to whole days any sampling interval between 9 days and 11 days gives a Fisher Information within 97% of the maximum for the PHN dataset. Comparatively, for the HH dataset, any sampling interval between 21 days and 28 days gives a Fisher Information within 97% of the maximum. Combining the two posterior distributions, any sampling interval between 21 and 28 days is within 97% of the maximum. However, it should be noted that a sampling interval of 23.4 days achieves only 30% of the maximum Fisher information possible in the PHN dataset, but 99% of the maximum in the HH dataset. This highlights the importance of specifying the optimal sampling strategy according to the specific scenario. Interestingly, the optimal design spacing for the fixed strategy is not the minimum of the optimal design spacing for the variable strategy. We propose the following hypothesis for this phenomenon: when the observation interval is allowed to vary, we can effectively ‘spend’ a single observation close to the previous in order to potentially gain a lot of information. However, in the fixed interval strategy, this option is not available, and so to avoid ‘wasting’ observations, a more conservative strategy becomes the optimal. To understand the difference in the optimal sampling times, recall that the expression in Eq (5) maximises the Fisher Information, which through the Cramer–Rao lower bound, can be thought of as minimising the variance of the parameter estimates [34]. This estimate inherently depends on the underlying parameters of the system: when events (i.e., infection and recovery) are happening slowly (i.e., low prevalence) then sampling should happen less often, while when events are happening frequently (i.e., high prevalence), then sampling should happen more often. In the case where little prior information about the system is available, then it may be more appropriate to adopt a ‘conservative’ sampling strategy, which here is the faster of the two presented strategies. Doing this yields a Fisher Information of 53% of the maximum for the HH dataset. The conservative strategy is presented in S4 Text. Overall, the conservative strategy generally gives good estimation accuracy (up to 10% error in a simulation-estimation study), and so is a viable ‘catch-all’ strategy in the absence of prior information such as the prevalence.

Discussion

We have provided the first model-based estimates for the duration of a skin sore infection (between 12 and 20 days), the force of infection and basic reproductive ratio (1.1 to 1.6) in three different settings. Furthermore, the optimal sampling interval for future strategies has been determined, assuming that a study’s primary goal is to estimate the force of infection and duration of infectiousness. By performing the estimation in a modelling framework, the interval-censored nature of the data has been incorporated. Although the frequentist version of this estimation technique has been utilised in other disease settings [35, 36], to our knowledge this is the first time these quantities have been computed for skin sores. Previous work on the duration of skin sore infection has estimated that under treatment, skin sore infections clear in approximately 50% of individuals in 2 days, and 85% of individuals in 7 days [37]. In this study, we do not have information on whether an individual was prescribed treatment, but it is expected that a proportion of the population were prescribed and used antibiotics. For the remainder of the population who were not treated with antibiotics, it is expected that their clearance time would be longer. As our data are a combination of treated and untreated individuals, we propose that an exponential distribution for the duration of infectiousness is reasonable. Further, without more frequent observations, we are unlikely to be able to distinguish between other proposed distributions with confidence (S5 Text). Should accurately characterising the distribution of the duration of infectiousness be of interest, a similar approach to that in Section Prospective sampling strategies can be used to design a study to best discriminate between different models [38]. These results have been calculated using a linearised SIS model, in which the transmission rate has been assumed to be constant, and disease dynamics are at equilibrium. This assumption has allowed some simple analytic results which are often not able to be determined for traditional infectious disease dynamic models. However, it is important to note that the assumption of equilibrium dynamics is likely to be violated in real-world settings, particularly in the event of mass drug administration. Mass drug administration has been implemented in these communities in the past [29, 39], and was ongoing during the period of data collection in the RC dataset, although skin sores was not the primary outcome of the program in the RC setting [16]. It is also important to note that the SIS model structure, by construction, does not incorporate any period of immunity, or other potential disease states. Carriage (i.e. infected but not showing symptoms), in particular has been demonstrated for skin sore infections in the past [11, 15] and inclusion of carriage in models has been shown to substantially change predicted intervention outcomes [40]. Given the hyperendemic prevalence of skin sores in this setting, the observed high infectiousness of skin to skin transmission, and in the absence of longitudinal data related to carriage available in this study, we have ignored the carrier state in this model. Extension in this area represents an important path for future work. It must also be noted that the condition of skin sores can be caused by a number of pathogens. The microbiology of skin sores in the Northern Territory, Australia, has been studied previously [41]. Streptococcus pyogenes remained the dominant pathogen, but co-infection with Staphylococcus aureus was present. Without microbiologic information present in our data sets, we are unable to determine which pathogen is causing infection. Accordingly, we see these results as a quantification of skin sores as a general condition. In the populations in which these data were collected, treatment is routinely administered for skin sores. Thus, these estimates of the infectious period are inclusive of the effect of treatment, and so are likely to be lower than the natural infectious period (that is, in the absence of treatment being available). Although this interpretation of the infectious period is different to the natural infectious period, it is arguably more useful in an epidemiological context, as treatment will be given in any modern setting for a skin sores infection. There are a number of key differences between the three datasets considered. The PHN dataset only has observations of children under five years of age. Extrapolation from this dataset to the entire population should be performed with caution as the prevalence of skin sores appears to be age-specific [25], and the average age of participants in the PHN data was younger than in the other two datasets. Despite this demographic difference, the relative similarity of the estimates of the infectious period from the HH data (in which the general population was studied) does provide some reassurance of the estimated numbers. Further, sampling times in the PHN dataset were not fixed in advance, but were rather driven by patients or health professionals. It has been assumed these sampling times are ignorable, but further investigation into this assumption may be warranted. As well as estimation of key parameters for models of skin sores transmission, information about future experimental designs has also been provided. Although the optimal sampling interval is a function of both the force of infection and the infectious period, being able to calculate this interval provides helpful information to improve the efficiency of future study designs, or evaluation of disease control programs. These parameter estimates unlock future model-based investigations for skin sores. By providing estimates for both the force of infection and the duration of infectiousness, more complex models which include covariates such as scabies, non-homogenous contact patterns, and population mobility can be considered, and the impact of treatment strategies in these settings can be evaluated. It is our hope that these models will lead to the development of innovative disease control measures, the application of which will reduce the burden of skin disease and health inequalities.

MCMC diagnostics.

MCMC diagnostics relating to convergence of the posterior distributions. (PDF) Click here for additional data file.

Derivation of the Fisher Information matrix.

Derivation of the Fisher Information matrix for both variable and constant time between observations. (PDF) Click here for additional data file.

Optimal sampling strategy diagnostics.

Diagnostics of the optimisation of the sampling strategies. (PDF) Click here for additional data file.

Conservative sampling strategy.

Utility of the ‘conservative’ sampling strategy compared to the optimal sampling strategy. (PDF) Click here for additional data file.

Model sensitivity.

Simulation study where the simulated data comes from a model with two infectious phases but is estimated using a single infectious phase. (PDF) Click here for additional data file.

Patient ages at presentation.

Patient ages at presentation, separated by dataset in which they appeared. (CSV) Click here for additional data file.

Posterior distributions for the three datasets.

Posterior distributions for the force of infection and infectious period from the three datasets. (ZIP) Click here for additional data file.

R code for performing simulation/estimation and verifying Fisher Information.

R code for performing simulation/estimation experiments, and numerically verifying the Fisher Information expressions included in the TMI package. (ZIP) Click here for additional data file. 19 Sep 2019 Dear Dr Lydeamore, Thank you very much for submitting your manuscript 'Estimation of the force of infection and infectious period of skin sores in remote Australian communities using interval-censored data' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here. We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us. Sincerely, Roger Dimitri Kouyos Associate Editor PLOS Computational Biology Rob De Boer Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This article uses a very basic model to estimate important parameters for the dynamics of skin sores in remote Australian populations. I am very much in favor of these type of analysis as these parameters are essential to predict future dynamics and the effect of interventions. However, I did not fully understand all details of the analysis and there hardly any sensistivity analysis to the model structure. Major comments: The analysis assumes an exponential distribution for the duration of sores. Is it possible to check this assumption? For instance, repeat the analysis with a more flexible distribution (Gamma/Weibull) and test whether the exponential distribution is supported by the data. The authors claim that is impossible to check whether a sampling scheme is ignorable? I do not see a reference, but I also wonder what they mean exactly with this. If having akin sores increases the rate at which individuals present to the doctor, one would expect that a statistical test whether the time between two subsequent visits is shorter in case of a negative visit followed by a positive visit compared to two negative visits. If there is a difference, this suggest that the sampling scheme is not ignorable. In the description of the data structure, I had the impression that not all information was used. The probability P(X_{i,1}=0)=\\gamma/(\\lambda+\\gamma) and P(X_{i,1}=1)=\\lambda/(\\lambda+\\gamma). Later on, the authors discuss that the recovery rate for participants may be higher than for the general population due to treatment, but I think an earlier reference is useful. Related to the previous point, I do not really understand how the results in Table 2 relate to the formulas on page 3: The model estimates \\lambda and \\gamma. Is R0 determined based on the prevalence at first presentation (as the recovery rate may be higher for individuals in the study due to treatment) or based on the estimates of lambda and gamma alone. If so, why are the data of the first presentation not used to obtain a better estimate of lambda and gamma When I read that the authors linearise the SIS-model about the endemic equilibrium, I expected a different analysis than the very basic two-state Markov model they are using in which there is no dependence between individuals. Page 6, I think the authors should mention the priors they use when they mention the MCMC procedure. On Page 7, verification of linearization procedure? Did the observation start immediately after the seeding? Was there a conditioning on non-extinction? More importantly, I do not really see the need for the linearization? The matrix exponent of a sparse 301x301 matrix takes 0.2 seconds on my laptop. This would mean that 40,000 iterations take a bit more than 2 hour, which is not really prohibitive. I also do not understand the logic behind the need to verify the presentation distributions. Would a direct analysis, without any checking, not already tell you whether there is sufficient information in the data? (Based on credibility intervals for instance). Regarding the recommended sampling strategies, is there a rule of thumb, for instance, the time between samples should be in the order of 1/\\gamma, such that the simulations do not have to be performed for each value of \\lambda and \\gamma? Reviewer #2: # Major Comments This is an interesting and helpful analysis of a difficult problem in infectious disease epidemiology, namely how to estimate transmission rates for endemic infections from interval-censored panel data. My primary concern with this analysis relates to the fact that asymptomatic colonization is typically assumed to be a precursor to symptomatic disease, and the SIS framework employed does not allow for colonization to impact transmission. Given that all available evidence suggests that asymptomatic individuals transmit at a rate similar to those with invasive infection. Given that this is the case, how should the estimates of R0 provided by the authors be interpreted? In particular, given that asymptomatically colonized individuals may be infectious for long periods of time, this may result in a downward bias in the estimate of the infectious period and an upward bias in the estimated force of infection at each time point. The authors should address whether this omission of colonization impacts their R0 estimates in order to ensure that their results are reliable and clinically useful. With that said, I believe this issue is addressable via some additional assumptions about the relationship between the equilibirum prevalence of invasive disease and the equilibrium distribution of prevalence as well as the duration of colonization in the absence of invasive disease. Other than these concerns, I do not have any other major questions about their analysis and found the section on optimal sampling to be a useful complement to the transmission modeling. # Minor comments 1. P(t) is not defined as a probability anywhere; including this information would be helpful just from a clarity perspective. Reviewer #3: The study entitled “Estimation of the force of infection and infectious period of skin sores in remote Australian communities using interval-censored data” by Lydeamore et al. is a first effort in quantifying key infectious diseases epidemiology parameters for impetigo, a bacterial skin infection. The authors linearized the equations of a compartmental model to obtain simple equations for these key quantities, which were subsequently estimated in three different settings using previously collected longitudinal datasets with a Markov chain Monte Carlo procedure in a Bayesian setting. My area of expertise being bacterial epidemiology, I am reviewing this paper from a conceptual point of view, and with the ambition to assess the real-world/biological plausibility of the authors conclusions rather than the technical aspects (statistical/computational) of their method. Due to the multiple assumptions made by the authors (described in the following points 1,2 and 3) I am dubitative that this analysis brings a general clearer understanding of the skin sores transmission dynamics, but reporting the method, the limitations faced and results in these specific settings is a first step towards further investigation in this area. The authors conclusions are sound and the limitations encountered generally well presented, in my opinion. The paper is neatly structured and written in a clear and accessible language. Major concerns: 1. The observed outcome are skin sores (present/absent at the time of examination). Yet, as stated by the authors at the beginning of their introduction, both S. aureus and S. pyogenes (GAS) can cause these lesions. Within the GAS species, multiple (unrelated) serotypes are likely circulating, and there is no molecular information available to assume that one single strain is being transmitted in the communities under study. Similarly, data relative to asymptomatic colonization are absent, so the authors should be extremely careful when discussing insights into GAS transmission. I think reformulating the last sentence of the abstract as the last sentence of the Conclusion (leaving out GAS) and clarifying this point further with a paragraph in the Conclusion (around line 343, where the immunity and carriage are discussed) is necessary. Typically, the whole picture is likely much more complex, with multiple infections by the same serotype conferring immunity to that serotype (Pandey et al. Streptococcal Immunity Is Constrained by a Lack of Immunological Memory following a Single Episode of Pyoderma, 2016, PLoS Patog.), but not at all to other strains circulating. I do not advise to advance such speculations and do not see how they could improve their model in that direction with the data available, but they should clarify the limitations of having a symptom of infection (skin sores) as outcome, rather than the actual pathogen identification. 2. Unfortunately, as stated by the authors in their conclusion, the condition of disease dynamics equilibrium on which the analysis is based is likely to be violated in real-world settings. This weakens the plausibility of the estimations obtained. Is there any evidence in one of the three setting under observation that this condition was likely fulfilled during the period of the data collection? 3. Are the patient ages at presentation known? If yes, why an age structure was not considered, given that impetigo is age-dependent? In the Supporting Information section, the file “Patient ages at presentation” does not correspond to their actual age but empirical observation times. This should be corrected to avoid confusion. In the conclusion the higher prevalence/force of infection in the young children dataset could be further discussed. 4. The choice of the parameter’s values to simulate the population requires argumentation. A table summarizing these parameters and justifying the chosen values (with citations if necessary) would be helpful for the reader. Minor concerns: Concerning the formulation -Line 22, I suggest using “prior generation” instead of “generation prior” for clarity sake. -Line 157 “It is noted that the estimate of infectious period in any modern setting will be augmented by treatment”. This sentence is unclear, no reference is given. Intuitively I would agree with the opposite: “Thus, this estimate of the infectious period is influenced by treatment, and so is likely to be lower than the natural infectious period” (Line 351). The authors should clarify. -Line 161 “It is important to note that it has been proven impossible to test whether or not a sampling scheme is ignorable.” This is a very strong statement, again lacking a reference to support it. The authors should reformulate this sentence and provide a reference. -Line 219 “The results are visually similar”. This statement could be more quantitative. -Line 230 “However, as the prevalence is observed at each survey visit,”. This sentence is unclear to me. Concerning the data presented (figures, legends) -Figure 1. Visualizing the population age distributions for each dataset on the side would be helpful -Figure 2. Adding a y-axis title, such as Daily observations, 1 year/PHN empirical observation distribution, 1 year to differentiate plot A from plot B at first glance would be helpful. Visualizing the observation distribution from the RC dataset would already highlight the point made in the following figure (even if they are far from the true values, as expected). As such, its inclusion in the figure would be welcomed. -Figure 3. The unit [days] on the x-axis are missing. Furthermore, visualizing the corresponding time to horizon (although the calculation is as simple as Sampling interval x 20) would be helpful for the reader. -Figure 4. Using the same x-axis values as in Figure 2 would ease comparison of the two figures. -The legend of Figure 5 is identical to that of Figure 2, except that in legend 2, Marginal posterior distributions are mentioned while marginal posteriors are mentioned in legend 5. From my understanding, Fig 2B and Figure 5A are redundant (just posterior distributions obtained from different randomly obtained populations) is that correct? Or is there any conceptual difference between these? If yes, it is unclear. Again, labeling the y-axis would enhance the clarity of the figure at first glance. -Figure 6. Visualizing the same plot for the HH dataset (even though it might look like the PHN one) would add some support for the reader. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Martin Bootsma Reviewer #2: No Reviewer #3: No 10 Dec 2019 Submitted filename: parameterestimation-reviewed-responsetoreviewers.pdf Click here for additional data file. 21 Jan 2020 Dear Dr Lydeamore, Thank you very much for submitting your manuscript "Estimation of the force of infection and infectious period of skin sores in remote Australian communities using interval-censored data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Roger Dimitri Kouyos Associate Editor PLOS Computational Biology Rob De Boer Deputy Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Answer 1.1. To obtain a reliable non-parametric estimate of the distribution of the duration of the infectious period, may require more data than is available. However, one could use another family of distributions and test which distribution gives the best fit (using e.g., AIC/DIC). This at least gives some hints whether the assumption of the exponentially distributed infectious period makes sense. Answer 1.2. In [27], it is shown that based on data it can never be proven that a scheme is ignorable, i.e., there always exists a stochastic process for which the sampling scheme is ignorable. However, in this setting, the stochastic process is given (up to the values of the parameters, i.e., it is an SIS-model). Given the stochastic process, one can test whether the scheme is ignorable, so I think the stressing of the fact that one cannot determine whether a sampling scheme is ignorable is misleading in this case. I do like the addition on the between-presentation times. Answer 1.3/1.4 I meant that if the status at the first presentation is known, this contains also information. If the system is in equilibrium (as is assumed), the probability that a patient is positive is \\lambda/(\\lambda+\\gamma). This information is not used in the likelihood. Later on, the authors argue that the recovery rate for participants may be higher than for the general population due to treatment, and hence, the \\gamma may change once the participant enters the cohort, but this is not discussed when the likelihood is created. Answer 1.8: In line 52, the authors mention that the dimension of the state space is N, I think it should be N+1 (there can be 0, 1, …., N infectious individuals). When I commented on the fact that taking the matrix exponent of an NxN-matrix is very doable when N=300, the authors replied that the dimension is actually 301^2x301^2. Why is this not corrected in the text? I also do not understand why the dimension should be 301^2x301^2. If each individual is explicitly present, I would expect 2^{N} different states and not 301^2. New comments: 1) I noticed that the notation used in section 2 is confusing. In Table 1, S and I represent the number of individuals who are susceptible and infectious, respectively. However, the quasi-equilibrium I^* assumes that I is the fraction of the population, also the formula for R_0 assumes that the force of infection is \\beta I with I the fraction of the population who is infectious, (if I is the number of infected individuals, R0=\\beta N/\\gamma). The authors should stick to a single interpretation of I and S (either numbers or fractions) and use a different symbol (e.g., i and s) for the other. 2) The numbering of the sections is strange, there are two sections 2.1 for instance. 3) “The first of these conditions effectively means that the probability that an individual is either susceptible or infectious at time tj , given all past information, is independent of tj ,and all past examinations.” To me, this is not what the first condition means. It says that the infection status at time t_{i,j} only depends on the status at time $t_{i,j-1}$ and on $t_{i,j-1}$ and $t_{i,j}$, but not on whether is a sampling at time $t_{i,j}$ or on previous sampling times. It is, however, dependent on the time since the last known status, so it does depend on t_j. Reviewer #2: I am comfortable with the responses to my comments and the changes the authors have made in response to them. Reviewer #3: The authors have addressed my comments, I am satisfied with the current version of the manuscript. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Martin Bootsma Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see 3 Mar 2020 Submitted filename: ParameterEstimation-ResponseToReviewers2-200221.docx Click here for additional data file. 1 Apr 2020 Dear Dr Lydeamore, We are pleased to inform you that your manuscript 'Estimation of the force of infection and infectious period of skin sores in remote Australian communities using interval-censored data' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Roger Dimitri Kouyos Associate Editor PLOS Computational Biology Rob De Boer Deputy Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I am happy with the proposed changes. I feel the model and its limitations are discussed in suitable detail for a reader to properly judge the results. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Martin Bootsma 28 Sep 2020 PCOMPBIOL-D-19-01219R2 Estimation of the force of infection and infectious period of skin sores in remote Australian communities using interval-censored data Dear Dr Lydeamore, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

30 in total

1. Characterising pandemic severity and transmissibility from data collected during first few hundred studies.

Authors: Andrew J Black; Nicholas Geard; James M McCaw; Jodie McVernon; Joshua V Ross
Journal: Epidemics Date: 2017-01-19 Impact factor: 4.396

2. Low rates of streptococcal pharyngitis and high rates of pyoderma in Australian aboriginal communities where acute rheumatic fever is hyperendemic.

Authors: Malcolm I McDonald; Rebecca J Towers; Ross M Andrews; Norma Benger; Bart J Currie; Jonathan R Carapetis
Journal: Clin Infect Dis Date: 2006-08-09 Impact factor: 9.079

3. Natural history of impetigo. II. Etiologic agents and bacterial interactions.

Authors: A S Dajani; P Ferrieri; L W Wannamaker
Journal: J Clin Invest Date: 1972-11 Impact factor: 14.808

4. Statistical inference for infectious diseases. Risk-specific household and community transmission parameters.

Authors: I M Longini; J S Koopman; M Haber; G A Cotsonis
Journal: Am J Epidemiol Date: 1988-10 Impact factor: 4.897

5. The natural history of streptococcal skin infection: prevention with topical antibiotics.

Authors: J S Maddox; J C Ware; H C Dillon
Journal: J Am Acad Dermatol Date: 1985-08 Impact factor: 11.527

6. Salmonella fecal shedding and immune responses are dose- and serotype- dependent in pigs.

Authors: Renata Ivanek; Julia Österberg; Raju Gautam; Susanna Sternberg Lewerin
Journal: PLoS One Date: 2012-04-16 Impact factor: 3.240

Review 7. The Global Epidemiology of Impetigo: A Systematic Review of the Population Prevalence of Impetigo and Pyoderma.

Authors: Asha C Bowen; Antoine Mahé; Roderick J Hay; Ross M Andrews; Andrew C Steer; Steven Y C Tong; Jonathan R Carapetis
Journal: PLoS One Date: 2015-08-28 Impact factor: 3.240

8. Impact of an Ivermectin Mass Drug Administration on Scabies Prevalence in a Remote Australian Aboriginal Community.

Authors: Thérèse M Kearns; Richard Speare; Allen C Cheng; James McCarthy; Jonathan R Carapetis; Deborah C Holt; Bart J Currie; Wendy Page; Jennifer Shield; Roslyn Gundjirryirr; Leanne Bundhala; Eddie Mulholland; Mark Chatfield; Ross M Andrews
Journal: PLoS Negl Trop Dis Date: 2015-10-30

9. Implications of asymptomatic carriers for infectious disease transmission and control.

Authors: Rebecca H Chisholm; Patricia T Campbell; Yue Wu; Steven Y C Tong; Jodie McVernon; Nicholas Geard
Journal: R Soc Open Sci Date: 2018-02-14 Impact factor: 2.963

10. Model selection for seasonal influenza forecasting.

Authors: Alexander E Zarebski; Peter Dawson; James M McCaw; Robert Moss
Journal: Infect Dis Model Date: 2017-01-10