Literature DB >> 35965464

Heterogeneity in the onwards transmission risk between local and imported cases affects practical estimates of the time-dependent reproduction number.

R Creswell¹, D Augustin¹, I Bouros¹, H J Farm¹, S Miao², A Ahern², M Robinson¹, A Lemenuel-Diot³, D J Gavaghan¹, B C Lambert¹, R N Thompson^4,5.

Abstract

During infectious disease outbreaks, inference of summary statistics characterizing transmission is essential for planning interventions. An important metric is the time-dependent reproduction number (Rt), which represents the expected number of secondary cases generated by each infected individual over the course of their infectious period. The value of Rt varies during an outbreak due to factors such as varying population immunity and changes to interventions, including those that affect individuals' contact networks. While it is possible to estimate a single population-wide Rt, this may belie differences in transmission between subgroups within the population. Here, we explore the effects of this heterogeneity on Rt estimates. Specifically, we consider two groups of infected hosts: those infected outside the local population (imported cases), and those infected locally (local cases). We use a Bayesian approach to estimate Rt, made available for others to use via an online tool, that accounts for differences in the onwards transmission risk from individuals in these groups. Using COVID-19 data from different regions worldwide, we show that different assumptions about the relative transmission risk between imported and local cases affect Rt estimates significantly, with implications for interventions. This highlights the need to collect data during outbreaks describing heterogeneities in transmission between different infected hosts, and to account for these heterogeneities in methods used to estimate Rt. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.

Entities: Chemical

Keywords: COVID-19; SARS-CoV-2; branching processes; imported cases; mathematical modelling; reproduction number

Mesh：

Year: 2022 PMID： 35965464 PMCID： PMC9376709 DOI： 10.1098/rsta.2021.0308

Source DB: PubMed Journal: Philos Trans A Math Phys Eng Sci ISSN： 1364-503X Impact factor: 4.019

Introduction

Mathematical and computational models have been used during the COVID-19 pandemic to infer changes in transmissibility and to plan public health measures [1-7]. An important metric for assessing the effectiveness of current interventions during outbreaks is the time-dependent reproduction number (R—sometimes referred to informally as the ‘R number’), which represents the expected number of infections generated by someone infected at time t over the course of their infectious period [8-17]. This quantity varies during an outbreak in response to factors affecting transmission such as changes in public health measures, varying population immunity and pathogen evolution. If R remains below one, the number of cases each day will decrease; if instead R is persistently above one, the outbreak will grow. In the UK, the government has published estimates of R throughout the COVID-19 pandemic [18] alongside other values such as estimates of the epidemic growth rate and daily numbers of new reported cases, hospitalizations and deaths. Different formal definitions of R have been proposed, most notably the instantaneous reproduction number and the case reproduction number [19]. The instantaneous reproduction number represents the expected number of infections generated (over the course of their infectious period) by someone who is infected at time t if transmission conditions do not change in the future (i.e. this quantity is a measure of instantaneous transmissibility). The case reproduction number, on the other hand, reflects the expected number of infections generated by someone who is infected at time t but accounts for changes in transmissibility that occur after time t (e.g. the subsequent introduction of public health measures). The instantaneous reproduction number has been proposed as the most appropriate definition to use for real-time inference, as this quantity reflects current transmissibility and does not require future changes in transmission conditions to be known [11]. For that reason, we use this definition of R for our analyses in this manuscript. A range of methods has been developed for estimating R from outbreak data [11,12,20,21]. Two common approaches are the Cori method [8,9] and the Wallinga–Teunis method [22], which involve inferring the value of R from disease incidence time-series (i.e. time-series describing the number of new cases every day) and an estimate of the serial interval distribution (representing the time period between successive cases; specifically, the difference between the symptom onset times of infectors and infectees). Irrespective of the precise approach used to infer R, estimates can be updated and tracked as additional data become available during an outbreak. Recent developments in the theory of R estimation include accounting for reporting delays [7] and considering the impacts of temporal changes in the serial interval [23]. Another consideration is the potential for heterogeneity in R between different subgroups in the population. The COVID-19 pandemic has highlighted that individuals in different settings (e.g. care homes as opposed to the wider population [24]) or with different characteristics (e.g. different ages [10,25-28] or vaccination statuses [29,30]) face different risks of both becoming infected and transmitting the virus. Shortly before the COVID-19 pandemic, the Cori method was extended to account for differences in the source locations of local and imported cases [9], but with an assumption that the expected numbers of onwards transmissions from each local case and each imported case are identical. With that assumption, that work illustrated that failing to differentiate between local and imported cases can lead to overestimation of the number of local infections and therefore overestimation of R [9]. Apart from their different origins, local and imported cases can differ in other ways. The risk of onwards transmission from an imported case may be different to the risk from a local case [31]. Imported cases may have visited regions with high case numbers and therefore respond more quickly to early signs of disease, isolating as soon as symptoms develop. This effect might be especially pronounced when a pathogen has first arrived in the local host population, when the infection risk may be higher outside the local population than within it. Imported cases may also be subject to increased testing for infection or pre-emptive home quarantine following travel, thereby lowering the risk of onwards transmission [32]. On the other hand, individuals who travel frequently may be likely to have more contacts with others than those who do not, potentially leading to a higher risk of onwards transmission for imported cases. For example, business travellers may participate in large numbers of meetings, thereby coming into contact with many other people. In either situation, an assumption that R is identical for both local cases and imported ones, as made previously [9], is not always appropriate. In principle, the same disease incidence time-series can occur with different divisions of the transmission risk between local and imported infectors (figure 1). This has implications for pathogen control, since a scenario with substantial local transmission requires localized public health measures to disrupt chains of transmission and prevent spread. By contrast, a scenario with high transmission from imported cases may motivate travel restrictions to prevent importations. Here, we modify the Cori method to allow local and imported cases to have unequal risks of generating new infections. We analyse disease incidence time-series recorded during the COVID-19 pandemic in different locations. Our main goal is not to provide a novel methodological approach for estimating R, but rather to explore as simply as possible the potential consequences for estimates of R of failing to account for differences in the onwards transmission risk from local and imported cases. To allow other researchers to repeat our analyses for similar data, we provide an open-source Python software library including a user-friendly web interface (https://sabs-r3-epidemiology.github.io/branchpro). Our research demonstrates the importance of accounting for differences in the transmission risk between imported and local cases. More widely, it indicates that careful consideration of heterogeneity in the transmission risk between population subgroups may be necessary to make robust public health policy decisions.

Figure 1

A disease incidence time-series dataset can be generated by different combinations of transmission risks from imported and local cases. In the first scenario (bottom left), observed cases are mostly due to infections by imported cases, whereas in the second scenario (bottom right), observed cases are mostly due to infections by local cases. In the bottom panels, red arrows represent infections generated by imported cases and black arrows represent infections generated by local cases. An individual who is infected by an imported case is classified as a local case, since they have themselves been infected locally. Despite the same overall incidence, the two scenarios shown correspond to different risks of sustained local transmission (the risk of sustained local transmission is higher in the second scenario—bottom right), with implications for public health measures. (Online version in colour.)

Methods

Inference of the time-dependent reproduction number

We modify the Cori method for estimating R [8,9] to account for differences in the onwards transmission risk between cases that arise locally compared with those originating elsewhere. In the underlying transmission model, new cases occur according to a time-varying branching process in which each local case is assumed to generate R new infections on average, and each imported case is expected to generate εR new infections on average, where ε ≥ 0 indicates the relative transmission risk from an imported case compared with a local case. Here, we assume that R is the instantaneous reproduction number [8,9], representing the expected number of cases that an individual infected at time t is likely to generate over the course of their infection assuming that future pathogen transmissibility is fixed at the current level. Our focus is on estimating the extent of local transmission (i.e. the local time-dependent reproduction number [21]) characterized by R. As has been proposed previously [21], the value of R therefore reflects the potential for local transmission of the pathogen (rather than being an averaged quantity across both local and imported cases). Here, ε < 1 means that an imported case is responsible for fewer infections (on average) than a local case, whereas ε > 1 indicates that an imported case generates more infections. The total number of new cases arising at time step t can be split according to the sources of infection, , where represents the number of new cases who were infected within the local population and represents the number of new cases who were infected elsewhere. The expected number of local cases at time step t (measured in days) is then given by In this expression, the vector w is the (discrete) serial interval distribution with entries ws (which characterizes the times between successive cases in a chain of transmission; w1 is the probability that the serial interval is one day, w2 is the probability that the serial interval is two days, and so on). We define the transmission potential at time step t to represent the expected number of local cases arising at time step t if R = 1. Thus, the transmission potential at time step t is given by . We assume that the number of local cases in time step t is drawn from a Poisson distribution with mean RΛ(w, ε). Hence, the probability of observing the local incidence over a time window including τ + 1 days (assuming that R is constant during that time window), conditional each day on all previous incidence data, is given by Data describing daily numbers of imported cases enter this expression through Λ(w, ε). The model therefore reflects how local cases arise using information about historical numbers of local and imported cases. Assuming a gamma distributed prior for R, the posterior distribution for R over the time window [t − τ, t], conditional on w, ε and the observed incidence data (denoted p(R|w, ε, ≤)—we represent this by p(·) rather than P(·) since the posterior is a continuous probability density function), is also a gamma distribution due to prior-likelihood conjugacy (see Cori et al. [8] and Thompson et al. [9] for further details). Specifically, where for notational convenience, here and above we have combined the disease incidence data into the variable . In this expression, the parameters α > 0 and β > 0 are the shape and rate parameters of the gamma prior distribution for R. The function gamma(x, a, b) corresponds to the probability density function of a gamma distribution with shape parameter a and rate parameter b, so that The inferred posterior, p(R|w, ε, ≤), is based on local infectees appearing in the incidence data in the estimation window [t − τ, t], infected by local or imported infectors appearing in the incidence data at any time in [0, t − 1]. Estimates of R at successive time steps are generated by shifting the estimation window by one time step and repeating the inference procedure. The purpose of this estimation window (rather than estimating R based on infectees appearing in the incidence time-series on day t alone) is to increase the smoothness of successive R estimates, instead of inferring variations in R due to the inherent randomness in the epidemiological system (or any other factor affecting the numbers of cases observed each day; for example, daily fluctuations in the proportion of cases that are reported). This comes at the cost of missing changes in transmission occurring at a fine temporal resolution [8].

Accounting for uncertainty in the serial interval distribution

The approach described above involves estimating R using disease incidence time-series and an estimate of the serial interval distribution, accounting for differences in both the source location of infection and onwards transmission risk between local and imported cases. However, there is often significant uncertainty in the serial interval distribution. To account for this, we consider a scenario in which there is a set of equally plausible serial interval distributions, . For a single value of i, the entries of the vector w( correspond to the probability that the serial interval takes the value s days, conditional on w( being the true serial interval distribution. In our analyses of COVID-19 data, we use a set of equally plausible serial intervals, , obtained from a previous study (see below). To account for this uncertainty in the serial interval distribution when estimating R, we first estimate R separately for each plausible serial interval distribution, w(, giving the conditional posterior distribution p(R|w(, ε, ≤). We then combine these estimates to give a posterior distribution for R accounting for this uncertainty by calculating

Data and parameterization

In our main analyses, we consider five disease incidence time-series datasets collected in different locations during the COVID-19 pandemic. The key feature of these datasets is that information is available which allows locally originating cases to be differentiated from those infected elsewhere. The datasets are as follows: Ontario, Canada (figure 2a(i)). Incidence data were obtained for the time period from 1 March to 20 April 2020 [33]. Cases were classified as imported if they reported travelling outside Ontario within 14 days prior to symptom onset. Cases with unknown recent travel status were assumed to have been infected locally.

Figure 2

Inference of the local reproduction number (R) under different assumptions about the relative transmission risk from imported and local cases. (a) The COVID-19 incidence time-series datasets used in our main analyses, for Ontario (i), New South Wales (ii) and Victoria (iii). Black bars represent the daily numbers of local cases, and pink bars represent the daily numbers of imported cases. (b) Inferred R values for different assumed values of the relative transmission risk from an imported case compared with a local case (ε). The grey horizontal line represents the threshold R = 1, and shaded regions represent the 95% central credible interval of the R estimates. (Online version in colour.)

New South Wales, Australia (figure 2a(ii)). Incidence data were obtained for the time period from 1 March to 13 April 2020. Cases were classified as imported if they were reported as ‘overseas acquired’ in the Australian national COVID-19 database (see [32] for further details). Cases with unknown origin were assumed to have been infected locally. Victoria, Australia (figure 2a(iii)). Details as above for New South Wales. Hong Kong (figure 4a(i)). Incidence data were obtained for the time period from 23 January to 24 March 2020 [34]. Cases were classified as imported if they were listed as ‘imported case, confirmed’ in the Hong Kong Department of Health COVID-19 database (see [35] for further details). All other cases were classified as local cases.

Figure 4

Inference of the local reproduction number (R) for estimated values of the relative transmission risk from imported and local cases. (a) The COVID-19 incidence time-series datasets used in our main analyses, for Hong Kong (i) and Hainan Province, China (ii). Black bars represent the daily numbers of local cases, and pink bars represent the daily numbers of imported cases. (b) Inferred R values for different assumed values of the relative transmission risk from an imported case compared with a local case (ε), for Hong Kong (i) and Hainan Province (ii). The grey horizontal line represents the threshold R = 1, and shaded regions represent the 95% central credible interval of the R estimates. The values ε = 0.2 for Hong Kong and ε = 0.785 for Hainan were estimated from alternative data sources, as described in the text. (Online version in colour.)

Hainan Province, China (figure 4a(ii)). Incidence data were obtained for the time period from 22 January to 20 February 2020 [36]. Cases were classified as imported if they either reported travel outside Hainan Province in the 14 days prior to symptom onset or reported any recent travel to a known COVID-19 outbreak area. All other cases were classified as local cases. Inference of the local reproduction number (R) under different assumptions about the relative transmission risk from imported and local cases. (a) The COVID-19 incidence time-series datasets used in our main analyses, for Ontario (i), New South Wales (ii) and Victoria (iii). Black bars represent the daily numbers of local cases, and pink bars represent the daily numbers of imported cases. (b) Inferred R values for different assumed values of the relative transmission risk from an imported case compared with a local case (ε). The grey horizontal line represents the threshold R = 1, and shaded regions represent the 95% central credible interval of the R estimates. (Online version in colour.) We chose to analyse the first three datasets in the main text due to their differing outbreak trajectories in the time periods considered. Specifically, the Ontario dataset represents a growing outbreak, the New South Wales dataset represents a full outbreak wave with a large number of imported cases compared with local cases and the Victoria dataset represents a full outbreak wave with more local cases than imported ones. We chose to analyse the fourth and fifth datasets because further information was available from those locations with which it was possible to approximate the value of ε. This allowed us to demonstrate inference of R in scenarios in which the relative transmission risk from imported and local cases is known. In addition to our analyses in the main text, we considered datasets from other locations and display similar analyses in the electronic supplementary material, figures S1–S6; specifically, we considered COVID-19 disease incidence time-series datasets from five other Australian states, New Zealand and Hawaii, and we considered a disease incidence time-series dataset for MERS in Saudi Arabia in 2014–2015. The key feature of all these datasets is that information was available with which to classify cases as either local or imported. In the analysis of MERS in Saudi Arabia, imported cases were not those who had arrived from a geographically distinct location. Instead, in that analysis, imported cases were those who were likely to have been infected directly from the animal reservoir. For the serial interval in all our analyses of COVID-19 incidence datasets, we considered an estimate for SARS-CoV-2 obtained by Nishiura et al. [37]. Specifically, those authors fitted a log-normal distribution to data from known infector–infectee transmission pairs using Markov chain Monte Carlo (MCMC), thereby obtaining a set of equally plausible possible serial interval distributions. We considered the set of serial interval distributions obtained by Nishiura et al. [37] using both certain and probable infector–infectee pairs while accounting for right-truncation (i.e. the possibility that a dataset detailing infector–infectee pairs observed when the outbreak is ongoing excludes some transmissions with longer serial intervals that have not yet occurred). For our inference procedure, we used n = 1000 randomly selected MCMC iterations from their analysis, where each iteration characterizes a continuous distribution. Since our approach considers the number of new cases each day, we require a discrete serial interval distribution. We therefore ‘discretized’ the continuous distributions into daily time steps using the method described by Cori et al. [8] (see web appendix 11 of that article). The set of n = 1000 serial interval distributions used in our analysis (i.e. ) is shown in the electronic supplementary material, figure S7. We fixed the parameters of the gamma distributed prior for R so that both the mean and standard deviation were equal to five (to do this, we chose α = 1 and β = 0.2). The rationale for this choice is that a large standard deviation ensures that the prior is relatively uninformative, while a high mean ensures that the outbreak is unlikely to be determined as under control (R < 1) unless there is substantial evidence from the data supporting this conclusion. In all of our analyses of COVID-19 incidence data, R was estimated using a weekly sliding window, so that τ = 6 days. In the figures, the posterior distribution for R shown on day t is based on a sliding window that ends on day t (i.e. the sliding window [t − τ, t]).

Correctness and reproducibility of results

We followed a range of software development practices to guard against coding errors and to ensure code reusability: these included collaborative coding using Github to manage merging of code via pull requests, unit testing of functions and classes (with 100% test coverage) and continuous integration testing. To ensure reproducibility of results, all analyses for this paper can be rerun by cloning our Github repository (https://github.com/SABS-R3-Epidemiology/transmission-heterogeneity-results) and executed via a single command from the terminal.

Results

Effect of the relative transmission risk on estimates of R

To explore how different assumptions about the relative transmission risk from imported and local cases affect R estimates, we initially applied our method to data from the first three locations described in Methods (figure 2). We considered three different assumptions about the relative transmission risk. First, we assumed that imported cases were each expected to generate fewer infections than local cases (ε = 0.25; figure 2b—blue). Second, we assumed instead that imported cases were each expected to generate more infections than local cases (ε = 2; figure 2b—red). Third, we made the standard assumption [9] that the transmission risk from each local case was identical to the transmission risk from each imported case (ε = 1; figure 2b—black). These analyses highlight that different assumed values of ε lead to different inferred R values. As might be expected, assuming larger values of ε leads to smaller estimated values of R, since more transmission is then attributed to imported cases rather than local cases. We then went on to consider the implications for public health policy of differences in the relative transmissibility of imported and local cases. For the dataset from Ontario (figure 2a(i)), the numbers of local cases broadly increased throughout the time period considered. A key question in that setting is ‘Is R > 1?’, since this determines whether sustained local transmission is likely to occur. If so, fast detection that R > 1 is crucial to allow interventions to be introduced quickly to prevent further exponential growth of the outbreak. In figure 3a(i), posterior mean estimates of R each day from 8 March to 20 April 2020 are shown for a range of values of ε. The first date on which the mean estimate of R is above one and remains above one thereafter is shown for different values of ε in figure 3b(i) (grey). This indicates that a smaller assumed value of ε leads to an earlier conclusion that R is greater than one for this dataset. The proportion of the period considered for which the mean R estimate is above one also depends on the assumed value of ε (figure 3c(i)).

Figure 3

Implications of differences in the assumed relative transmission risk from imported and local cases on policymaking. (a) Inferred mean R values for different values of the relative transmissibility of imported cases compared with local cases (ε). (b) Dates on which the estimated values of R cross policy-relevant thresholds (in scenarios where the thresholds are crossed at some stage in the outbreak; otherwise dates are not plotted). For Ontario (i), the date shown represents the first date when the estimated R value is above one and remains above one for the remainder of the time period considered (until 20 April 2020). This represents the first date when the outbreak is not inferred to be under control for the remainder of the time period. For New South Wales (ii) and Victoria (iii), the date shown represents the first date on which the estimated R value is below one and remains so for the remainder of the time period considered (until 13 April 2020). This represents the first date on which the outbreak could be concluded as being under control for the remainder of the time period. (c) The proportion of the time periods considered for which the inferred R values are above one (so the outbreak is not inferred to be under control). In (b,c), results are shown for the mean values of the posterior for R (grey), and well as for the 2.5th (yellow dotted) and 97.5th (green dotted) percentile values of the posterior for R (which span the 95% central credible interval). (Online version in colour.) While a policymaker may choose to strengthen control measures when the mean estimate of R increases above one, a more risk-averse choice could be to conclude that the outbreak is not under control if an upper percentile of the posterior distribution of R exceeds one. For example, for the Ontario dataset, when ε = 1.2, the mean estimate of R is (and remains) above one from 11 April 2020 onwards (figure 3b(i), grey), whereas the 97.5th percentile estimate of R remains above one from the earlier date of 23 March 2020 onwards (figure 3b(i), green dashed). By using an approach like the one described here, policymakers can adjust their decision making according to their chosen level of risk aversion. This simply involves specifying the percentile value of R to track to guide decision making regarding strengthening and relaxing public health measures. During the COVID-19 pandemic, public health measures have been relaxed in many regions and countries when the outbreak has been assessed as being under control. We therefore considered the incidence dataset from New South Wales and estimated when policymakers could conclude that R had fallen below one (figure 3b(ii)). In this scenario, a larger assumed value of ε led to an earlier date on which R was assessed to be below one (and remained below one thereafter). In order for policymakers to be more certain that R is below one when relaxing restrictions, one possibility is to conclude that R is below one when a high percentile value of the posterior for R has fallen below one. For example, for this dataset, if the mean estimate of R is considered and the value ε = 1.2 is assumed, then R is inferred to fall and remain below one on 15 March 2020 (figure 3b(ii), grey), whereas if instead the 97.5th percentile estimate of R is considered, then R is inferred to fall below one on the later date of 19 March 2020 (figure 3b(ii), green dashed). As the final component of these analyses, we considered the disease incidence time-series dataset from Victoria and repeated the analysis that we conducted for the dataset from New South Wales. We found that, if a high value of ε is assumed, then the outbreak is inferred to be under control (R < 1) for the majority of the time period under consideration (figure 3c(iii)). However, if instead the value of ε is lower, then R may be estimated to be greater than one early in the outbreak. For small values of ε, so that initial estimated values of R are high, the most policy-relevant question may again be to determine when R has fallen below one (figure 3b(iii)).

Realistic values of the relative transmission risk

In the analyses presented in §3a, we demonstrated clearly that the assumed relative transmission risk between imported and local cases affects R estimates, impacting policy-relevant conclusions drawn from disease incidence time-series data. The relative transmission risk may differ between settings. In some scenarios, it may be possible to inform estimates of ε with real-world data. Here we provide two examples, in the context of SARS-CoV-2 transmission in Hong Kong and Hainan Province (the fourth and fifth disease incidence time-series datasets described in Methods). Additional possible approaches for estimating the value of ε are described in the Discussion. First, we considered the dataset from Hong Kong (figure 4a(i)). A previous study [35] reconstructed the transmission network of cases in that region (between 23 January 2020 and 8 January 2021; although in principle a similar analysis could be conducted at a smaller spatial scale for shorter time periods, as would likely be most useful for early real-time estimation of R), inferring the ‘outdegree’ of imported and local cases. Based on the aggregated data shown in table 1 of that study, the mean outdegree was 0.74 for imported cases and 3.68 for local cases, which corresponds to a value of ε = 0.2. We therefore compared estimated values of R for ε = 0.2 (figure 4b(i), green) with analogous estimates under the standard assumption that ε = 1 (figure 4b(i), black). Since a value of ε = 0.2 leads to less transmission being attributed to imported infections than when ε = 1, estimated values of R are higher when ε = 0.2. In terms of decision making during an ongoing outbreak, time periods when the mean estimated value of R is greater than one for ε = 0.2 and less than one for ε = 1 may be particularly concerning. In these periods, the outbreak might erroneously be inferred as being under control if the incorrect assumption that ε = 1 is made. In the analysis shown in figure 4b(i), this is the case for 20.8% of the time period considered. Of course, similarly to the analyses presented in §3a, analogous analyses could be performed based on different percentile estimates of R rather than the mean estimated value. Inference of the local reproduction number (R) for estimated values of the relative transmission risk from imported and local cases. (a) The COVID-19 incidence time-series datasets used in our main analyses, for Hong Kong (i) and Hainan Province, China (ii). Black bars represent the daily numbers of local cases, and pink bars represent the daily numbers of imported cases. (b) Inferred R values for different assumed values of the relative transmission risk from an imported case compared with a local case (ε), for Hong Kong (i) and Hainan Province (ii). The grey horizontal line represents the threshold R = 1, and shaded regions represent the 95% central credible interval of the R estimates. The values ε = 0.2 for Hong Kong and ε = 0.785 for Hainan were estimated from alternative data sources, as described in the text. (Online version in colour.) Second, we considered the dataset from Hainan Province, China. A previous study [36] compared the epidemiological features of imported and local cases in that province and found that imported cases tended to belong to older age groups than local cases. We applied a contact matrix for China [38] to the age distributions of imported and local cases, and thus estimated the expected number of contacts per day for imported cases (10.5) and local cases (13.4). To approximate the value of ε, we divided the expected number of contacts per day for imported cases by the analogous value for local cases, giving ε = 0.785. We then compared estimates of R for that more realistic value of ε = 0.785 (figure 4b(ii), green) with estimates of R under the standard assumption that ε = 1 (figure 4b(ii), black). Since a value of ε = 1 is only slightly larger than ε = 0.785, and the data from Hainan Province suggest only limited local transmission, we found that incorrectly assuming that ε = 1 did not have a substantial effect on inferred R values for this dataset.

Discussion

Summary statistics for tracking pathogen transmissibility are increasingly used during infectious disease outbreaks to guide decision making. Throughout the COVID-19 pandemic, R has been estimated in regions and countries worldwide (see e.g. [7]). This metric is useful and straightforward to interpret, corresponding to the number of individuals that one infected host is expected (on average) to go on to infect. As well as providing information about whether an outbreak is growing or declining, the value of R can be used to determine the proportion of transmissions that must be prevented for a growing outbreak to decline. In this article, we have presented a modified version of the commonly used Cori method for inferring R [8,9]. We have accounted for different transmission risks from local and imported cases, rather than assuming that the transmission risk is identical for individuals in these groups. We provide an accompanying online software tool for estimating R (https://sabs-r3-epidemiology.github.io/branchpro) where users can upload their own data (disease incidence time-series and an estimate of the serial interval distribution—or multiple equally plausible serial interval distributions as described in Methods). We have conducted a systematic analysis of the dependence of inferred R values on the assumed relative transmission risk from an imported case compared with a local case (ε; see figures 2 and 3). We also considered examples in which it was possible to approximate the value of ε from other data sources (figure 4). In general, larger assumed values of ε lead to smaller R estimates. This is important, since assuming an unrealistically high value of ε may lead to the outbreak being falsely determined as under control. When an outbreak is ongoing, we have shown that the speed at which local transmission can be inferred as being either under control or not depends on the assumed value of ε. This dependence on ε demonstrates clearly that whether or not an outbreak is under control cannot always be inferred accurately from summary statistics that do not account for differences in the transmission risk between imported and local cases (e.g. the growth rate of overall cases). We have also shown how different percentile estimates of R can be used to guide decision making, according to the policymaker's level of acceptable risk. A previous approach for estimating R allows infectees to have been infected either within or outside the local population [9]. However, in that framework, an assumption is made that the transmission risk from a local case is identical to the analogous risk from an imported case. The potential for different transmission risks from imported and local cases has implications for optimizing interventions, since if the risk of transmission is predominantly from imported cases, then travel restrictions and interventions that prevent transmissions from imported cases (e.g. quarantine of incoming travellers) may be the optimal measures. If instead the transmission risk is highest from local cases, then interventions such as social distancing and face coverings that reduce transmission from all infected individuals in the population may be necessary. In scenarios in which a novel pathogen variant is being imported into a new location from somewhere it is already widespread, the composition of variants causing local and imported cases might affect the relative transmission risk [39]. However, we note that in our modelling framework it is only the imported cases themselves that are assumed to represent a different transmission risk (rather than all infected individuals in a chain of transmission starting with an imported case). A recent, closely related study by Tsang et al. [31] involved estimating independent R values for local and imported cases throughout an outbreak. A benefit of that approach is that it does not require an assumption to be made about the relative transmission risk from each imported case than from each local case. However, there are substantial logistical challenges to estimating independent R values for local and imported cases: this requires local cases who were infected by other local cases to be distinguished from those who were infected by imported cases. This may be possible either on a small scale or in locations with extensive contact tracing [31,40], but, in many situations, it is infeasible. In the absence of data with which to estimate R for local and imported cases independently, and without known changes in the relative transmission risk from imported compared with local cases, then assuming a constant relative transmission risk between the two types of case as we have done seems reasonable. To obtain an idea about whether the relative transmission risk (i.e. the parameter ε in our model) is likely to be less than or greater than one, we considered examples in which we approximated ε using either a reconstructed transmission network or the age characteristics of local and imported cases (figure 4). In both examples that we considered, the estimated value of ε was less than one, suggesting a lower transmission risk from each imported case than from each local case in those settings. Other approaches for inferring ε are also possible. One way to estimate ε is to analyse data containing both local and imported cases in small-scale settings in which infector–infectee transmission pairs can be identified or estimated, such as household or contact tracing studies. Another option might be to perform forwards contact tracing on imported cases at a single stage of the outbreak. If the value of εR can be estimated from the contact tracing data at that stage, ε could then be estimated from the population-level incidence data. The contribution of imported cases to transmission is likely to vary by the time in the outbreak and by location [32,41]. In principle, estimates of ε could be updated based on the latest available contact tracing data. In our main analyses, we have considered scenarios in which imported cases are individuals who have been infected in other geographical locations. However, an imported case may be defined as any case with an infection source outside the local host population. In the electronic supplementary material, we consider an analysis of MERS cases in Saudi Arabia in 2014–2015 (electronic supplementary material, figures S5 and S6), where cases are likely to have arisen both via human-to-human transmission and from an animal reservoir (specifically, from dromedary camels [42]). In that analysis, imported cases are assumed to be those reporting regular contacts with camels. It is possible that those individuals typically live in less densely populated areas than individuals who do not have regular contacts with camels, meaning that the relative risk of an imported case transmitting the virus is lower on average than the analogous risk from a local case. Like our analyses of COVID-19 datasets, our analysis of the MERS incidence data illustrates that assumptions about the relative transmission risk between local and imported cases can affect estimates of R and conclusions about whether or not local human-to-human transmission is under control. We also conducted an additional supplementary analysis in which we generated synthetic epidemic datasets and investigated further the conditions under which mischaracterizing the relative transmissibility of imported and local cases affects estimates of R substantially. Specifically, we generated synthetic data for different values of ε and different strengths of local transmission. We calculated the error in estimates of R if the standard assumption that ε = 1 is made (electronic supplementary material, figure S8). This suggests that the largest errors occur when the relative transmissibility of imported (compared with local) cases differs substantially from one, and when imported cases represent a high proportion of the overall cases observed in the population. In the research that we have presented, we sought to explore the relationship between heterogeneities in the onwards transmission risk between different groups of infectious individuals and inferred values of R. Practical applications of this approach should consider incorporating additional features into the modelling framework. An important consideration when assessing pathogen transmissibility during outbreaks is that R represents the average number of onwards infections over multiple infected individuals and transmission events. However, different infected individuals may generate very different numbers of infections [10,43-45]. The potential for super-spreading events at which large numbers of infections occur could be built into the underlying transmission model and into the resulting R estimates, although it may then be impossible to generate an analytic expression for the posterior for R. We sought to demonstrate the general principle that population heterogeneity can affect estimates of R. To do this as simply as possible, we used a model with only two groups of infected hosts (i.e. local and imported cases) and assumed that individuals are classified accurately as either local or imported. However, the classification of hosts into distinct groups may be imperfect (as considered elsewhere in this theme issue [46]), and many different sources of heterogeneity exist within host populations. There may be substantial differences in the transmission risk between other subgroups of the population: for example, risk may vary by age [10,25,26] and vaccination status [29]. Geographically distinct populations could be linked in a transmission model, so that spatial heterogeneity in R can be explored. In principle, compartmental models can be developed in which a range of different sources of heterogeneity are included, and R may be estimated using those compartmental models. It might also be possible to include further sources of heterogeneity in a renewal equation framework as studied here. These possibilities represent interesting avenues for future research. Here, we assumed that the data represent disease incidence time-series, and that the serial interval (the time between successive symptomatic cases in a transmission chain) is always positive. In reality, pre-symptomatic infections occur, and serial intervals may take negative values [47-49] with infectors developing symptoms after some of the individuals who they infect. While the assumption of a positive valued serial interval distribution has been made in many previous studies in which R has been estimated for different pathogens, this issue can be avoided by using the incidence of infections and the generation time distribution [47,49,50] rather than the incidence of cases and the serial interval distribution [11]. The subtle difference here is that incidence time-series of cases do not reflect the times at which individuals were first infected, but instead reflect the times at which individuals were recorded as infected (which occurs after infection, for example when individuals display symptoms). Use of the incidence of infections and the generation time distribution may require the incidence of infections to be inferred from the incidence of cases, for example using an assumed incubation period distribution and the Richardson–Lucy deconvolution technique [51]. We note that the serial interval distribution may be different to the generation time distribution (specifically, pre-symptomatic transmission can lead to shorter serial intervals than generation times [49]). Another potential extension to our research is incorporation of different serial interval (or generation time) distributions for local and imported cases [40], particularly given that part of an imported case's infectious period may occur before they enter the local population. Reconstructed transmission networks might provide insights into these distributions. More broadly, we note that R is only one summary statistic for tracking changes in transmission during an infectious disease outbreak. This metric does not provide information about the speed of the outbreak, which is better measured by the growth rate of cases [52,53,54]. Furthermore, current incidence of reported cases, hospitalizations and deaths are also key inputs to policy decisions. For example, an outbreak with R close to one is likely to have more detrimental impacts if case numbers are high compared with if case numbers are low. Nonetheless, R has been useful for guiding interventions during the COVID-19 pandemic, in combination with these other statistics. We therefore contend that studies that improve understanding of the impacts of factors affecting R estimates, such as heterogeneity in the onwards transmission risk between different infectious hosts, are valuable and an important component of preparedness for future outbreaks.

44 in total

1. Reconstructing influenza incidence by deconvolution of daily mortality time series.

Authors: Edward Goldstein; Jonathan Dushoff; Junling Ma; Joshua B Plotkin; David J D Earn; Marc Lipsitch
Journal: Proc Natl Acad Sci U S A Date: 2009-12-18 Impact factor: 11.205

2. Modelling that shaped the early COVID-19 pandemic response in the UK.

Authors: Ellen Brooks-Pollock; Leon Danon; Thibaut Jombart; Lorenzo Pellis
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2021-05-31 Impact factor: 6.671

3. Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China.

Authors: Akira Endo; Sam Abbott; Adam J Kucharski; Sebastian Funk
Journal: Wellcome Open Res Date: 2020-07-10

4. Middle East respiratory syndrome coronavirus in dromedary camels: an outbreak investigation.

Authors: Bart L Haagmans; Said H S Al Dhahiry; Chantal B E M Reusken; V Stalin Raj; Monica Galiano; Richard Myers; Gert-Jan Godeke; Marcel Jonges; Elmoubasher Farag; Ayman Diab; Hazem Ghobashy; Farhoud Alhajri; Mohamed Al-Thani; Salih A Al-Marri; Hamad E Al Romaihi; Abdullatif Al Khal; Alison Bermingham; Albert D M E Osterhaus; Mohd M AlHajri; Marion P G Koopmans
Journal: Lancet Infect Dis Date: 2013-12-17 Impact factor: 25.071

5. Generation time of the alpha and delta SARS-CoV-2 variants: an epidemiological analysis.

Authors: William S Hart; Elizabeth Miller; Nick J Andrews; Pauline Waight; Philip K Maini; Sebastian Funk; Robin N Thompson
Journal: Lancet Infect Dis Date: 2022-02-14 Impact factor: 71.421

6. Are epidemic growth rates more informative than reproduction numbers?

Authors: Kris V Parag; Robin N Thompson; Christl A Donnelly
Journal: J R Stat Soc Ser A Stat Soc Date: 2022-05-26 Impact factor: 2.175

7. Age-dependent effects in the transmission and control of COVID-19 epidemics.

Authors: Petra Klepac; Yang Liu; Nicholas G Davies; Kiesha Prem; Mark Jit; Rosalind M Eggo
Journal: Nat Med Date: 2020-06-16 Impact factor: 53.440

8. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave.

Authors: Paul Birrell; Joshua Blake; Edwin van Leeuwen; Nick Gent; Daniela De Angelis
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2021-05-31 Impact factor: 6.237

9. A new framework and software to estimate time-varying reproduction numbers during epidemics.

Authors: Anne Cori; Neil M Ferguson; Christophe Fraser; Simon Cauchemez
Journal: Am J Epidemiol Date: 2013-09-15 Impact factor: 4.897

10. Increased risk of SARS-CoV-2 infection in staff working across different care homes: enhanced CoVID-19 outbreak investigations in London care Homes.

Authors: Shamez N Ladhani; J Yimmy Chow; Roshni Janarthanan; Jonathan Fok; Emma Crawley-Boevey; Amoolya Vusirikala; Elena Fernandez; Marina Sanchez Perez; Suzanne Tang; Kate Dun-Campbell; Edward Wynne-Evans; Anita Bell; Bharat Patel; Zahin Amin-Chowdhury; Felicity Aiano; Karthik Paranthaman; Thomas Ma; Maria Saavedra-Campos; Richard Myers; Joanna Ellis; Angie Lackenby; Robin Gopal; Monika Patel; Meera Chand; Kevin Brown; Susan Hopkins; CoG Consortium; Nandini Shetty; Maria Zambon; Mary E Ramsay
Journal: J Infect Date: 2020-07-29 Impact factor: 6.072

1 in total

1. Estimation of heterogeneous instantaneous reproduction numbers with application to characterize SARS-CoV-2 transmission in Massachusetts counties.

Authors: Zhenwei Zhou; Eric D Kolaczyk; Robin N Thompson; Laura F White
Journal: PLoS Comput Biol Date: 2022-09-01 Impact factor: 4.779

1 in total