Literature DB >> 35965466

Estimation of age-stratified contact rates during the COVID-19 pandemic using a novel inference algorithm.

Christopher M Pooley¹, Andrea B Doeschl-Wilson², Glenn Marion¹.

Abstract

Well parameterized epidemiological models including accurate representation of contacts are fundamental to controlling epidemics. However, age-stratified contacts are typically estimated from pre-pandemic/peace-time surveys, even though interventions and public response likely alter contacts. Here, we fit age-stratified models, including re-estimation of relative contact rates between age classes, to public data describing the 2020-2021 COVID-19 outbreak in England. This data includes age-stratified population size, cases, deaths, hospital admissions and results from the Coronavirus Infection Survey (almost 9000 observations in all). Fitting stochastic compartmental models to such detailed data is extremely challenging, especially considering the large number of model parameters being estimated (over 150). An efficient new inference algorithm ABC-MBP combining existing approximate Bayesian computation (ABC) methodology with model-based proposals (MBPs) is applied. Modified contact rates are inferred alongside time-varying reproduction numbers that quantify changes in overall transmission due to pandemic response, and age-stratified proportions of asymptomatic cases, hospitalization rates and deaths. These inferences are robust to a range of assumptions including the values of parameters that cannot be estimated from available data. ABC-MBP is shown to enable reliable joint analysis of complex epidemiological data yielding consistent parametrization of dynamic transmission models that can inform data-driven public health policy and interventions. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.

Entities: Chemical

Keywords: Bayesian inference; COVID-19; approximate Bayesian computation; contact matrix; model-based proposals; reproduction number

Mesh：

Year: 2022 PMID： 35965466 PMCID： PMC9376725 DOI： 10.1098/rsta.2021.0298

Source DB: PubMed Journal: Philos Trans A Math Phys Eng Sci ISSN： 1364-503X Impact factor: 4.019

Introduction

Dynamic mathematical models, such as the susceptible–infectious–recovered transmission model and its numerous variants, have been widely applied over many decades to enhance understanding of infectious disease spread and control in populations (e.g. [1]). They have been particularly prominent in guiding public health responses to the COVID-19 pandemic [2], and are increasingly seen as important tools for causal inference in epidemiology [3]. Model results used to inform public health decisions must be defensible, requiring models to be informed by, and be consistent with, the widest range of suitable data sources available [4]. However, as discussed below, the COVID-19 pandemic has shown that tools to enable reliable and routine inference for sufficiently realistic models of disease dynamics from multiple sources of data are currently lacking. Here, we address this gap by introducing a novel algorithm that enables efficient inference for complex compartmental models, illustrating its potential by fitting an age-structured national-scale model to multiple data sources (demographic, operational and survey data) arising from the COVID-19 epidemic in England during 2020–2021. This enables simultaneous inference of many key parameters describing the outbreak, and the benefits of such an integrated approach are exemplified by updating pre-pandemic age-stratified contact rates to account for relative differences in contact between age groups. Fitting complex compartmental models to multiple data sources, with required uncertainty estimates for the key model parameters, comes with computational challenges. Bayesian inference for stochastic models of disease dynamics has been implemented using a range of methodologies that provide exact or approximate approaches to drawing samples from the posterior distribution. Data-augmentation approaches, typically combined with Markov chain Monte Carlo (MCMC), offer exact, albeit computationally expensive inference, but require explicit expressions for the complete-data likelihood (see [5] for an application to an individual level model of COVID-19). By contrast, approximate Bayesian computation (ABC) [6], including frequently used sequential Monte Carlo (SMC) versions [7], is likelihood-free, requiring only the ability to simulate from the model, in order to draw samples from an approximation to the posterior. Other SMC algorithms, such as particle MCMC (pMCMC) [8] are also likelihood-free, and in principle offer exact samples from the posterior by developing unbiased estimators for the likelihood via simulation. In many cases these perform well, however, their computational cost typically scales poorly with the dimensionality of the data [9], a serious drawback when applying to large datasets such as those used for COVID-19 here. A relevant example is [10], which applies pMCMC to a stochastic age-structured non-spatial model used to look at variation in the COVID-19 reproduction number separately for different regions of England (for comparison this contained 26 parameters and fitted around 1300 data points with an over-dispersed observation model to aid computational speed[1] [11]). As with other studies, a contact matrix based on the pan-European POLYMOD study [12] was assumed, so limiting the number of parameters to be estimated, and the model was fitted using non-age-structured (i.e. aggregated) data. Adding in age structure greatly increases the number of individual observations (by a factor of approx. 17 here), and we found this makes applying a pMCMC algorithm computationally impractical (results not shown). This motivated development of the new ABC-MBP algorithm described below (see Methods) that combines model-based proposals (MBPs), derived from data-augmentation MCMC, with ABC methods. Quantifying the effective contact structure of a population (i.e. how different groups intermix and transmit infection during an epidemic) is central to epidemiological modelling. For diseases with age as a risk factor, such as COVID-19, understanding how such contacts vary with age is vital because it helps quantify the likely impact of age-targeted restrictions, e.g. school closures, care home restrictions or closing night-time entertainment. Typically simulation studies are used to assess this impact [13,14], but accuracy is limited by the degree to which model parameters can correctly be identified given the available data. Approximations of social mixing patterns have been constructed based on wearable proximity sensors [15,16] or self-recording of contacts [17]. In terms of scale, two studies in the latter category stand out. Firstly, the 2005–2006 POLYMOD study [12,18] recruited 7290 participants across eight European countries and asked each participant to fill in a diary that recorded the contacts they made on one particular day, along with estimated ages of those contacts. Secondly, the 2017–2018 BBC Pandemic study [19] recruited 36 000 volunteers from the UK who used a self-reporting mobile phone app to inform about contacts made. Based on these surveys each study published (pre-pandemic) estimates for the so-called ‘contact matrix’ C, which captures the social mixing patterns between age groups [20] that have been widely used to model transmission dynamics of COVID-19. In this paper we infer the time-varying reproduction number R [21,22] to quantify the net impact of societal response (lockdowns, social distancing, etc.) during the pandemic, and, importantly, estimate modifications to C that quantify relative changes in mixing between different age groups. Age stratification in observed COVID-19 cases may be driven by age-related variations in susceptibility, contacts, infectivity, probability of being asymptomatic and time spent in latent, infectious and other disease states. This complicates estimation of age-related changes in contacts, but [23] show that viral load does not vary greatly with age or between hospitalized and other cases and thus here we assume infectivity is constant across all age classes and hospitalized and non-hospitalized cases. Furthermore, this study found similar viral load trajectories for children and adults and so we also assume residency times in disease states are not age dependent. By contrast, previous studies have shown age-related differences in contacts [12,18,19], susceptibility and proportion of clinical (symptomatic) cases [24], as modelled here (see Methods). We show that use of PCR and seroprevalence survey results, in addition to case reports, allows estimation of age-related fractions of asymptomatic infections alongside R and age-stratified hospitalization rates and deaths from hospitals. However, estimation of age-related susceptibility and (relative) contact rates, are confounded and therefore we estimate them in two separate analyses, in each case leading to models that perform similarly and are better predictors than models that do not account for such heterogeneity. Results (see below) suggest relative overall contact rates are reduced and age-related contact patterns are different to pre-pandemic patterns with greater relative contact rates by younger age classes. We provide an estimate for these changes assuming susceptibility does not vary with age. Consistent with increasing case fatality rate with age, others have found increasing susceptibility with age [24], suggesting our estimate of increased relative contacts by younger groups is conservative. By contrast, our estimation of age-dependent susceptibilities (assuming pre-pandemic contact patterns) is not consistent with increasing age-related susceptibility, further indicating that pre-pandemic relative mixing matrices require re-estimation during the pandemic. In the main text we, therefore, focus on estimation of age-related contact rates, and defer a comparison to the age-dependent susceptibility model until the discussion.

Methods

We first describe the model, give a brief description of the data, and then outline the new ABC-MBP Bayesian inference algorithm used to infer model parameters.

The compartmental model

Figure 1 shows the non-spatial age-stratified compartmental model, representing the number of individuals in a given age-class and disease category. The population is assumed to be stratified into 18 age groups indexed by a: the first 16 represent 5-year age bands (‘0–4’, ‘5–9’, … ,’75–79’), then ‘80+’ for all individuals 80 and over, and, finally, a group for care home residents denoted by ‘CH’.[2] Once an individual in the susceptible compartment S becomes infected by COVID-19 they enter the exposed E compartment where they spend an exponentially distributed residency time . A branching probability determines whether they then enter an infectious I compartment[3] or an asymptomatic A compartment. Individuals go on to trace a path in figure 1, undergoing exponentially distributed residency times in the various compartments and choosing transitions based on branching probabilities when alternative routes exist, until they eventually reach the terminal recovered R or dead D compartments. Note, the branching probabilities in this diagram all depend on age group a. These probabilities will later be estimated from data.

Figure 1

The compartmental model. The compartments are defined as follows: S: susceptible, E: exposed, A: asymptomatic, I: infectious, T: PCR test-sensitive but non-infectious, C: infected but non-infectious (due to self-isolation), H: hospitalized (or potentially care home in the case of care home residents), R: recovered and D: dead. Variable m gives the mean residency time in a compartment c (with individual residency times exponentially distributed about this mean) and gives the branching probability for an individual in age group a going from compartment A to B (note, these probabilities are constrained to add up to one when leaving a given compartment). The term refers to the time-dependent force of infection acting on individuals in age group a. The annotations also indicate how data from the COVID-19 epidemic in England inform inference. Survey data are taken to provide an unbiased estimate of proportions of the total population in the sum of classes A, I, T, C (PCR survey) and R (seroprevalence). Operational data are used to inform transition rates E → I (test positive cases for second pandemic wave), C → H (hospitalizations) and H → D (deaths in hospitals and care homes). (Online version in colour.)

Reproduction number R

This time-varying quantity is defined as the expected number of secondary cases directly generated by a case in a population in which nearly all individuals are susceptible. This quantity is estimated directly by the inference procedure described below by means of a piecewise linear spline. The breakpoints for this spline are placed at roughly 2-week intervals (with additional points near to lockdowns where it may be expected that R changes rapidly). The quantity r, which links R to the more conventionally used transmission rate, is exactly derived from other model parameters using the approach taken by Diekmann et al. [25], in which R is calculated from the highest eigenvalue of the next generation matrix (see appendix A in the electronic supplementary material for further details).

Age group population P

The total population size in age group a.

Pre-pandemic contact matrix C0

This square matrix captures disease relevant contacts between individuals in the 18 age groups. In particular, the element gives the average number of daily contacts for an individual in age group a with individuals in age group a’. Specification of this fixed matrix is based on data from the BBC Pandemic study [19] (details are provided in electronic supplementary material, appendix C). A pictorial representation of C is given in figure 2a. Note the tri-diagonal structure results from frequent inter-generational contacts.[4]

Figure 2

The contact matrix. (a) The pre-pandemic matrix based on the BBC Pandemic study [19] (see electronic supplementary material, appendix C). For an individual represented by an age group on the x-axis this shows the estimated numbers of daily contacts they make with individuals in the same and different age groups on the y-axis (darker red colours indicate more frequent contacts). (b) The inferred age-adjusted contact matrix C based on COVID-19 data. (c) The factor difference in C over C0 (red colours indicate an increase and blue colours a decrease). (Online version in colour.)

Age contact factors

Each element of this vector corresponds to a specific age group and multiplies the corresponding columns and rows of the contact matrix C. It effectively allows the model to enhance or reduce the relative rate of contacts for different age groups in order to generate an ‘age-adjusted’ estimate C for the contact matrix. These contact factors are constrained to have a weighted average of one, i.e. where a sums over all age groups and P is the total population size.[5]

Infectivity φ

This sets the relative compartmental infectivity. For simplicity we assume only two infectious compartments: I (for which φI = 1) and A (for which φA = 0.55, as estimated in electronic supplementary material, appendix K). All other compartments have φ = 0.

Subpopulation size N

The total number of individuals in compartment c and age group a at time t.

External force of infection η

The time-varying probability per unit time an individual becomes infected from a source outside of England (see electronic supplementary material, appendix D, for how this is derived from flight [26] and global COVID-19 death data [27]). For convenience this term is divided by a factor f = 105 such that it is expressed in units of infections per 100 k individuals. Three important points can be made about equation (2.1): firstly, it corresponds to a non-spatial model, equivalent to assuming small spatial variation across England for demographic make-up, the contact matrix and external force of infection.[6] Secondly, individuals in all age groups are taken to be equally susceptible to disease (this is revisited later). Lastly, while age contact factors adjust the relative rates of contact for different age groups compared with the pre-pandemic matrix C, it is important to remember that the overall effective contact matrix is modulated by the reproduction number: where R0 is the basic reproduction number. This accounts for changes in the overall rate of contacts (through government interventions and modification in social behaviour) as well as measures aimed at blocking transmission on contact (e.g. wearing face masks, washing hands, social distancing).[7] Hence, time variation in R is effectively a proxy for time variation in the overall rate of effective contacts. The stochastic dynamics of the model presented above are approximated by a τ-leaping algorithm [28] with a fixed time step of τ = 0.5 days (see electronic supplementary material, appendix E for details).

Data sources

The time period used for the analysis in this study is between 1 January 2020 (before the start of the COVID-19 pandemic in England) and 9 June 2021 (just after the second wave of infections). Figure 1 provides a synoptic illustration of how demographic, operational and survey information are interpreted in the context of the compartmental model to enable inference. Further details and information on how these datasets have been prepared for analysis are given in the electronic supplementary material, appendix F (plots of the raw data can be seen by the black lines in figures R1 in the electronic supplementary results, with age-aggregated results shown in figure 6). In total, these data consist of almost 9000 individual observations.

Figure 6

Simulation results and data. These plots show age-aggregated data (black) against 500 simulations (blue 'cloud' of curves) performed using model parameters taken from the analysis posterior means (see Table R1 of the electronic supplementary results). The red lines indicate an average across simulations; these are smoother, and typically have peaks greater than the data. (a) The total population in the infected I, C, A and T compartments (data from CIS), (b) daily cases and (c) daily hospital admissions (not including care home patients), (d) daily deaths and (e) total recovered (excludes 0–14 age groups and care home residents, single data point from antibody results in CIS). Visualizing complex data that vary in scale is revealing but challenging [37]. Here we cut off the first peak to better reveal structure in later stages of the outbreak. (Online version in colour.)

Bayesian inference

Collectively the model parameters are referred to as θ (these determine the movement of individuals through the compartmental model in figure 1). When simulating from the model, individuals start in the susceptible S compartment[8] and transitions between the compartmental states result in the subpopulations p within age group a and compartment c changing as a function of time t. These dynamics are collectively referred to as the system ‘state’ (actually a history, or set of states for each point in time), and denoted ξ. In reality ξ is unknown, and from a Bayesian perspective is considered a set of latent variables to be estimated. Such estimates are often of applied interest, for example here where levels of infections are not observed directly. The data are collectively denoted y. Application of Bayes' theorem implies that the posterior probability distribution π(θ, ξ|y) is given by where various terms in this expression are:

The observation model π(y|ξ)

In a standard Bayesian approach this would give the probability of the data given a system state based on statistical characterization of the relevant observation processes. Often writing down a true observation model is complicated by the fact that we may not know the error associated with a set of measurements, and the observation probability for a set of measurements down a time series may be highly correlated. Instead, here we follow an approximate Bayesian computational (ABC) approach that relies on introducing a measure of fit, or distance metric [6], between the data y and the state ξ here called the ‘error function’ (EF). A small EF implies a close correspondence between ξ and y. For ABC the observation model is defined to be non-zero[9] when the EF is less than some specified cut-off EFcutoff: The challenge for inference algorithms is to reduce this cut-off as much as possible to ensure a close correspondence between the system state ξ and the observed data y. How this is implemented in practice is discussed in the next section. Several possibilities exist for the choice of EF,[10] but the following proved effective for the problem in this paper: where the sum i goes over all individual measurements made on the system (i.e. each of the data points on the age-stratified time series for cases, deaths, hospital admissions and the Coronavirus (COVID-19) Infection Survey (CIS) results), y is the value of the ith measurement and Y is the equivalent value derived from the state ξ. Note that since ξ is a complete representation of the putative history of system dynamics the result Y of any potential measurement can be derived. To take an example, suppose a particular measurement y gives the total number of infections in a 1-week period. Here all the infection transitions in ξ (which contains complete information about the dynamics of the system) would be added together for that week to give Y. The small constant (dependent on the time series to which i belongs) is introduced to ensure validity even when y or Y are zero.[11] When y and Y are equal the contribution to the EF in equation (2.6) is zero. When they are unequal the contribution is always positive (and for small deviations, close to the square of the fractional change in the quantity). The weights w in equation (2.6) set the relative importance of different measurements. In particular, they place greater weight on larger observed values[12] and overall give an approximately equal weighting to each of the time series in the data[13] (see electronic supplementary material, appendix I for details).

The latent process likelihood L(ξ|θ)

This gives the probability of simulating a state ξ given a set of model parameters θ (see electronic supplementary material, appendix E, for an expression for this quantity). It should be noted, however, that the MBPs used below are ‘likelihood-free’, so L(ξ|θ) is not explicitly calculated.

The prior π(θ)

This captures the state of knowledge regarding parameter values before data y is considered. Prior specifications are given for all model parameters in Table R1 of the electronic supplementary results (see electronic supplementary material, appendix J, for further explanation). Parameters for which the data are uninformative are fixed to plausible values taken from the literature (see electronic supplementary material, appendix K).[14]

Approximate Bayesian computation with model-based proposals

In this section, we explain the ABC-MBP approach used to perform inference. Note, a ‘particle’ refers to the combination of a set of model parameters θ and a system state ξ. The algorithm below iterates over a series of generations G, and with each generation it successively improves the posterior accuracy: Initialization—A generation index g is set to 1. P particles,[15] indexed by p, are sampled from the model. This is achieved by first sampling from the prior π(θ) and then simulating from the model (using the τ-leaping algorithm described in the electronic supplementary material, appendix E). The initial EF cut-off is set to infinity . Particle culling and resampling—An EF cut-off is set such that exactly half the particles have an EF above this value and half below.[16] We found this procedure worked well in practice but note that it may be optimal to vary the proportion of particles resampled [29]. Particles with EFs below are directly copied to create new particles in the next generation , . On the other hand, particles with EF above are discarded and replaced by a randomly selected particle from one of those whose EF is below this limit. The generation index g is incremented. Particle mixing—To avoid degeneracy each particle undergoes a series of MBPs collectively referred to as an ‘update’ (see electronic supplementary material, appendix L, for details) [9,30]. These allow particles to explore parameter and state space subject to the condition that their EF must not exceed . MBPs are the key novelty in this algorithm because they generate efficient MCMC mixing[17] by making joint proposals in parameter and state space possible while at the same time restricting large changes in the EF. They do so by perturbing, thus preserving some aspects of, the state space trajectory already known to be consistent with the data (this is in contrast to standard ABC schemes that rely on independent simulation[18]). Jump to 2 if g is less than G—Particles in the final generation are used to approximate the posterior distribution. This algorithm guarantees the EF cut-off monotonically decreases towards zero as the generation number increases. As it does so, particle states are forced to lie closer and closer to the data and posterior estimates similarly approach the true posterior. As the generation number g gets higher, however, so more and more MBPs are required in step 3 to avoid particle degeneracy. Therefore, for reasons of computationally practicality, the total generation number G is usually considered sufficiently large when posterior distribution estimates have apparently converged to steady state values (see electronic supplementary results figure R2).

Results

Inference protocol and validation

Inference was performed using the data and ABC-MBP approach outlined (see Methods). This was implemented using an open source software package called BEEPmbp (see electronic supplementary material, appendix M, for details). To ensure robustness against lack of convergence, or multimodality in the posterior, each analysis was completed by running BEEPmbp K = 16 times (chosen to check if randomly initialized starting states converge on the same posterior solution), with each run using P = 16 particles (sufficiently large to adequately represent the posterior and avoid becoming stuck in metastable states) iterated over G = 350 generations (large enough to get good estimates for parameters, see Figure R2 in the electronic supplementary results). The results presented below are based on the K × P = 256 samples generated over those runs.[19] The CPU time in total (over all 16 runs) was 4.5 hours when running on 256 cores. While computationally expensive for challenging problems like this one, ABC-MBP is highly parallelizable and can be run relatively quickly given sufficiently powerful computing resources. Inference on representative simulated data correctly identified true model parameters demonstrating the validity of the algorithm for the problem at hand (see electronic supplementary material, appendix N, for details).

Analysis of England 2020–2021 COVID-19 outbreak data

The posterior Supplementary Results provide details of fitting the model (figure 1) with equation (2.1) to the COVID-19 outbreak data as described above, including: inferred model parameter distributions (electronic supplementary results, Table R1), estimates for the contact matrix C (electronic supplementary results, Table R2), graphs showing fits between raw data and inferred states (electronic supplementary results, Figure R1)[20] and convergence down generations (electronic supplementary results, Figure R2). Here, we summarize the key results. Posterior mean estimates for the contact matrix C are shown in figure 2b with factor differences between C and C in figure 2c. Based on these, notable differences in the contact patterns occur in the majority of age groups. The substantially elevated mixing among younger age groups can be understood by examining figure 3a, which shows posterior distributions for the age contact factors (vector in equation (2.1)). For younger age groups (aged 24 and below) the values exceed one indicating increased contacts relative to what would be expected from C. We note this is broadly consistent with [5], who find increased force of infection for those under 60 versus older individuals. Given the reduced severity of symptoms in these age groups it seems plausible that this results from higher contact rates, rather than increased rates of shedding. The pattern in older age groups is somewhat more complex; group 40–44 was found to reduce contacts relative to C the most, but interestingly relative rates go up again until the 55–59 age group before reducing for older individuals. It should be noted that, in contrast to other age groups, absence of data on pre-pandemic contact rates for care home residents means the estimate for (figure 3a) should not be interpreted as an increase in contact rate. Rather, it is simply a reflection of the fact that contact rates in care homes are much higher than the rest of the population[21] (due to being communal environments with frequent contacts between residents and staff), which has led to a disproportionate number of cases, and sadly deaths, compared with the elderly in the general population.

Figure 3

Age contact factors and variation in susceptibility. This shows two alternative models used to fit the data: (a) age contact factors modify the pre-pandemic contact matrix C0 for different age groups in the population (see equation (2.1)). (b) Age groups are given a relative susceptibility to acquiring infection (see equation (4.1)). The error bars show 95% credible intervals. (Online version in colour.) Time variation in the reproduction number: figure 4a shows the posterior distribution for with important events on the timeline shown by vertical lines (see electronic supplementary material, appendix O, and [31]). As shown in equation (2.3), this is a proxy for the overall effective contact rate in the population. On the other hand the effective reproduction number (see electronic supplementary material, appendix A), as shown in figure 4b, tracks disease progression (above one implies COVID-19 infections are increasing).[22] These estimates capture the net impact of the many factors that determined the epidemic trajectory. The first lockdown clearly had a huge impact on reducing below one and ending the first wave. A gradual increase over the summer and autumn of 2020 (coinciding with restrictions being lifted) led to a second pandemic wave. Despite an already falling reproduction number in the autumn to winter of 2020, a second lockdown was instigated leading to fall below one once again. Emergence of the more transmissible Alpha variant, along with increased mixing during Christmas festivities, led to a substantial rise in cases towards the end of 2020, ahead of a third lockdown. The vaccination programme appears to have helped to curb epidemic spread in early 2021, while emergence of the Delta variant, along with easing restrictions, has led to again increasing in the summer of 2021. Detailed analysis of the impact of these events is likely to prove difficult due to the cofounding effects of multiple factors.

Figure 4

Time variation in reproduction number. (a) The inferred value of the reproduction number R as a function of time (solid red line gives posterior mean and dashed lines indicate the 95% credible interval). This is proportional to the overall effective contact rate (see equation (2.3)). (b) The effective reproduction number (which accounts for the fact that a fraction of the population is not susceptible). Above/below the horizontal black line shows where COVID-19 is increasing/decreasing. The vertical blue lines denote important milestones during the epidemic [31]. Arrows showing changes in the dominant COVID-19 variant are estimated from COG UK [32] (see electronic supplementary material, appendix O) and for vaccination come from CIS [33] (these indicate the time period in which between 5% and 95% of individuals are vaccinated with the first dose for selected age groups). (Online version in colour.) Age-dependence in branching probabilities: figure 5a shows the posterior estimated asymptomatic branching probability for different age groups a. Overall, around 55% of exposed individuals are asymptomatic. This value is consistent with the large range of figures suggested in the literature (e.g. from 18% to 75% [34-36]). Asymptomatic branching probabilities are highest for younger age groups, which has important implications, e.g. for interpreting school cases rates (which may significantly under-represent true infection levels). The proportion of asymptomatics declines through adulthood and rises again in older age groups (figure 5a) consistent with other studies based on data prior to vaccine availability (see e.g. fig. 2c in [24]). The notable exception is in the 80+ and care home groups (which, by definition, are set equal[23]), which have a significantly lower asymptomatic branching probability. This result stems from the fact that although many cases are observed in the 80+ group, measurements taken by the CIS near the end of 2020 show only a small proportion of randomly sampled 80+ individuals are found to be antibody positive. One possibility is that waning immunity is much faster in this group, or that a weaker immune response results in more false negative antibody test results.

Figure 5

Age-dependent branching probabilities and residency time in T. The probability of: (a) becoming asymptomatic, (b) becoming hospitalized given a case (note, for care home patients ‘hospitalized’ may mean going to hospital or becoming critically ill within a care home) and (c) of death given hospitalized. (d) Shows the residency time in the PCR test-sensitive T compartment. The error bars indicate 95% credible intervals. (Online version in colour.) Simulation results and data. These plots show age-aggregated data (black) against 500 simulations (blue 'cloud' of curves) performed using model parameters taken from the analysis posterior means (see Table R1 of the electronic supplementary results). The red lines indicate an average across simulations; these are smoother, and typically have peaks greater than the data. (a) The total population in the infected I, C, A and T compartments (data from CIS), (b) daily cases and (c) daily hospital admissions (not including care home patients), (d) daily deaths and (e) total recovered (excludes 0–14 age groups and care home residents, single data point from antibody results in CIS). Visualizing complex data that vary in scale is revealing but challenging [37]. Here we cut off the first peak to better reveal structure in later stages of the outbreak. (Online version in colour.) The probability of hospitalization (given a case) and dying in hospital (given hospitalized) are shown in figure 5b and c, respectively. These detailed estimates are entirely consistent with the broad understanding that COVID-19 affects the elderly much more severely than the young. Duration of PCR test sensitivity: estimated residency times in the PCR test-sensitive T compartment are shown in figure 5d. As both I and A compartments are also detectable by PCR test, adding a mean value from this plot of 11 days to mE = mA = 4 days leads to an estimated overall test-sensitive period of around 15 days. This is in good agreement with [38], who estimated the time from symptoms to two negative PCR tests at 13 days, which when added to an estimated 2 days pre-symptomatic period [38] gives exactly our figure. It might be expected the test-sensitive period would increase with age (figure 5d), but importantly T represents non-hospitalized individuals with significantly milder symptoms than hospitalized cases.

Discussion

ABC-MBP has been shown to be reliable and efficient at producing informative posterior distributions of many key parameters of a complex epidemiological COVID-19 model that integrates age-stratified transitions and contact rates. ABC-MBP does not treat the target model as a black box, and the version of the algorithm presented here is applied only to Markovian models that can be simulated using a τ-leaping algorithm. While this represents a very broad class of processes, including many epidemiological models of disease transmission, it should be noted that ABC-MBP can in principle be applied to a wider range of stochastic models and is not limited to exponential waiting times [30]. However, application of ABC-MBP methodology to such processes including appropriately formulated agent-based models [39,40] has yet to be assessed and will likely hit computational limits as model and data complexity are increased. Computational efficiency will need to be assessed for such future applications, and compared with alternative methodologies, for example, MCMC, particle filters and neural networks trained on simulated data (e.g. [41]). We now discuss the impact of key assumptions on the robustness of the estimates obtained here. Compartmental residency times in the model are based on evidence in the literature (see electronic supplementary material, appendix N), but a variety of alternative estimates exist. To understand the impact of such uncertainty consider the three key interrelated quantities that determine epidemiological dynamics: the reproduction number, the generation time[24] and the infection rate. As infection rate is fully determined by observed data, a shift in the generation time (brought about by making alterations to any of the assumed residency times) must be accompanied by a corresponding rescaling of the reproduction number around one to maintain the observed infection rate. Electronic supplementary material, appendix N, investigates this and concludes the results are robust to inaccuracies in fixed residency times in the compartmental model, excepting this rescaling in the profile for R. Results are also insensitive (electronic supplementary material, appendix N) to rescaling of the estimated external force of infection (electronic supplementary material, appendix D). Further support for our inferences comes from detailed analysis of the raw age-stratified data, which exhibits similar estimates and trends with age obtained for the various quantities reported here (electronic supplementary material, appendix P). Our results suggest contact patterns during the COVID-19 outbreak in England are significantly different to pre-pandemic patterns. Although plausible and consistent with other results (e.g. [5]), alternative explanations may also be consistent with the available data. As discussed above, there is the possibility for age-dependent residency times and/or infectiousness, but support for these hypotheses in the literature seems limited for COVID-19 [23]. We, therefore, now consider age-dependent susceptibility σ (while fixing the contact matrix to C), for which the force of infection from equation (2.1) becomes: Estimates under this model are shown in figure 3b, and, perhaps unsurprisingly, they exhibit a similar profile to the age contact factors in figure 3a (note, electronic supplementary material, appendix Q, shows that other inferred parameter values undergo almost no change, which is further evidence of the robustness of these estimates). So which model is correct, one which implies age variation in susceptibility, or age variation in expected contact rate, or perhaps both? One clue can be provided by [24], which suggests that susceptibility in individuals under 20 years of age is actually less than in older individuals (although this study is based on case data collected relatively early on in the pandemic, and is perhaps open to bias). This is contrary to what we find. Indeed, if contact patterns were to be estimated under this assumption, even larger increases in relative contact rates for younger age groups would be found, suggesting our estimates with equation (2.1) are actually conservative. Thus, we suggest observed increases in force of infection from younger age groups are more plausibly explained by changes in contact pattern. Nonetheless, there may be age-dependent effects in contact rate, susceptibility, infectivity and duration of disease states, but the public data considered here are not sufficient to estimate these simultaneously. Importantly, however, the model based on equation (2.1) is more accurate than one not containing such age contact or susceptibility factors, and so provides an important improvement for more accurate prediction, and highlights avenues for further investigations regarding these confounding effects. Figure 6 compares national-level age-aggregated data with simulations based on inferred parameters. We find relatively good agreement with simulation averages, but with large stochastic variability between runs. It is important to appreciate, however, that these results are contingent on historic lockdowns and restriction measures encoded in . In reality, extremes of the distribution of modelled outcomes are unlikely to be realized as government interventions and human behaviour are dynamically linked to epidemic severity and would likely reduce in the case of rapidly increasing case numbers. Clearly the compartmental model in figure 1 does not capture all the features of the data, e.g. when examining posterior distributions for the state (in the electronic supplementary results) we see clear deviations between the model and the profiles for some of the data curves.[25] These discrepancies may be due to a number of factors not considered here, including observed spatial heterogeneity in the outbreak, time variation in the contact matrix (e.g. caused by opening and closing schools), the introduction of vaccination and emergence of different COVID-19 variants. In principle these various effects could be studied using the methodology presented in this paper (no doubt requiring significantly more computation and in some cases additional data), and these remain the subject of future work.

Conclusion

This paper has demonstrated the feasibility of fitting complex compartmental stochastic models containing a large number of parameters (over 150) to a large dataset (almost 9000 individual observations). This was achieved using a new ABC-MBP inference algorithm that is highly parallelizable and generic. In addition to the results presented here, this algorithm provided parameter estimates for a non-age-structured COVID-19 model used by Mitchell et al. [42] to demonstrate the ability of the FAIR Data Pipeline to create traceable scientific workflows, transparently linking model outputs to inputs, that are critical for policy focused analysis. Application of ABC-MBP to the age-structured models, and publicly available COVID-19 data, studied here demonstrated a good model fit based on the inferred parameter values, and yielded several new insights. Firstly, an indication that younger age groups come into contact more frequently than is suggested by a commonly used contact matrix taken from the pre-pandemic literature. Secondly, estimates for age variation in the probability individuals become asymptomatic revealed more asymptomatic young. Thirdly, estimation of the time-varying reproduction number, which could inform assessment of intervention effectiveness and impacts of COVID-19 variants and vaccination. Finally, a consistently parameterized age-structured model is presented and available to support data-driven public health policy via simulation of near-term projections and of future scenarios.

27 in total

1. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations.

Authors: O Diekmann; J A Heesterbeek; J A Metz
Journal: J Math Biol Date: 1990 Impact factor: 2.259

2. Using model-based proposals for fast parameter inference on discrete state space, continuous-time Markov processes.

Authors: C M Pooley; S C Bishop; G Marion
Journal: J R Soc Interface Date: 2015-06-06 Impact factor: 4.118

3. Who mixes with whom? A method to determine the contact patterns of adults that may lead to the spread of airborne infections.

Authors: W J Edmunds; C J O'Callaghan; D J Nokes
Journal: Proc Biol Sci Date: 1997-07-22 Impact factor: 5.349

4. Approximate Bayesian computation.

Authors: Mikael Sunnåker; Alberto Giovanni Busetto; Elina Numminen; Jukka Corander; Matthieu Foll; Christophe Dessimoz
Journal: PLoS Comput Biol Date: 2013-01-10 Impact factor: 4.475

5. Four key challenges in infectious disease modelling using data from multiple sources.

Authors: Daniela De Angelis; Anne M Presanis; Paul J Birrell; Gianpaolo Scalia Tomba; Thomas House
Journal: Epidemics Date: 2014-09-28 Impact factor: 4.396

Review 6. Modeling infectious disease dynamics in the complex landscape of global health.

Authors: Hans Heesterbeek; Roy M Anderson; Viggo Andreasen; Shweta Bansal; Daniela De Angelis; Chris Dye; Ken T D Eames; W John Edmunds; Simon D W Frost; Sebastian Funk; T Deirdre Hollingsworth; Thomas House; Valerie Isham; Petra Klepac; Justin Lessler; James O Lloyd-Smith; C Jessica E Metcalf; Denis Mollison; Lorenzo Pellis; Juliet R C Pulliam; Mick G Roberts; Cecile Viboud
Journal: Science Date: 2015-03-13 Impact factor: 47.728

7. Projecting social contact matrices in 152 countries using contact surveys and demographic data.

Authors: Kiesha Prem; Alex R Cook; Mark Jit
Journal: PLoS Comput Biol Date: 2017-09-12 Impact factor: 4.475

Review 8. Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases.

Authors: Andrew William Byrne; David McEvoy; Aine B Collins; Kevin Hunt; Miriam Casey; Ann Barber; Francis Butler; John Griffin; Elizabeth A Lane; Conor McAloon; Kirsty O'Brien; Patrick Wall; Kieran A Walsh; Simon J More
Journal: BMJ Open Date: 2020-08-05 Impact factor: 2.692

9. Estimating infectiousness throughout SARS-CoV-2 infection course.

Authors: Terry C Jones; Guido Biele; Barbara Mühlemann; Talitha Veith; Julia Schneider; Jörn Beheim-Schwarzbach; Tobias Bleicker; Julia Tesch; Marie Luisa Schmidt; Leif Erik Sander; Florian Kurth; Peter Menzel; Rolf Schwarzer; Marta Zuchowski; Jörg Hofmann; Andi Krumbholz; Angela Stein; Anke Edelmann; Victor Max Corman; Christian Drosten
Journal: Science Date: 2021-05-25 Impact factor: 63.714

1 in total

1. Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations.

Authors: Jason Dykes; Alfie Abdul-Rahman; Daniel Archambault; Benjamin Bach; Rita Borgo; Min Chen; Jessica Enright; Hui Fang; Elif E Firat; Euan Freeman; Tuna Gönen; Claire Harris; Radu Jianu; Nigel W John; Saiful Khan; Andrew Lahiff; Robert S Laramee; Louise Matthews; Sibylle Mohr; Phong H Nguyen; Alma A M Rahat; Richard Reeve; Panagiotis D Ritsos; Jonathan C Roberts; Aidan Slingsby; Ben Swallow; Thomas Torsney-Weir; Cagatay Turkay; Robert Turner; Franck P Vidal; Qiru Wang; Jo Wood; Kai Xu
Journal: Philos Trans A Math Phys Eng Sci Date: 2022-08-15 Impact factor: 4.019

1 in total