| Literature DB >> 28981709 |
Lucy M Li1,2, Nicholas C Grassly1, Christophe Fraser1,3.
Abstract
Heterogeneity in individual-level transmissibility can be quantified by the dispersion parameter k of the offspring distribution. Quantifying heterogeneity is important as it affects other parameter estimates, it modulates the degree of unpredictability of an epidemic, and it needs to be accounted for in models of infection control. Aggregated data such as incidence time series are often not sufficiently informative to estimate k. Incorporating phylogenetic analysis can help to estimate k concurrently with other epidemiological parameters. We have developed an inference framework that uses particle Markov Chain Monte Carlo to estimate k and other epidemiological parameters using both incidence time series and the pathogen phylogeny. Using the framework to fit a modified compartmental transmission model that includes the parameter k to simulated data, we found that more accurate and less biased estimates of the reproductive number were obtained by combining epidemiological and phylogenetic analyses. However, k was most accurately estimated using pathogen phylogeny alone. Accurately estimating k was necessary for unbiased estimates of the reproductive number, but it did not affect the accuracy of reporting probability and epidemic start date estimates. We further demonstrated that inference was possible in the presence of phylogenetic uncertainty by sampling from the posterior distribution of phylogenies. Finally, we used the inference framework to estimate transmission parameters from epidemiological and genetic data collected during a poliovirus outbreak. Despite the large degree of phylogenetic uncertainty, we demonstrated that incorporating phylogenetic data in parameter inference improved the accuracy and precision of estimates.Entities:
Keywords: infectious disease; parameter inference; phylodynamics; polio
Mesh:
Year: 2017 PMID: 28981709 PMCID: PMC5850343 DOI: 10.1093/molbev/msx195
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.Estimates of k from simulated data. The horizontal lines denote the true value of k for that set of parameters, that is, the value used to generate the simulated data. The boxes with a horizontal line in the middle indicate the median and 95% HPD interval of parameter estimates pooled from all simulations for that parameter set. The vertical lines with a single dot denote the median and 95% HPD interval of each individual simulation. Blue lines are from simulations in which 10% of individuals were sampled. Red lines are from simulations in which 1% of individuals were sampled.
Precision (measured by the root mean squared deviation), Bias, and Coverage (% of simulations in which the true value is found in the 95% highest posterior density intervals) of Parameter Estimates When Fitting Models to Either Epidemiological, Phylogenetic, or Both Types of Data.
| Both | 0.1507 | −0.0756 | 100% | 15.0737 | 0.0103 | 75% |
| Epi | 0.1695 | −0.0387 | 100% | 673.9374 | 93.4255 | 100% |
| Phy | 0.1863 | −0.0855 | 100% | 31.7680 | 0.5461 | 100% |
| Both | 11.0638 | 0.1813 | 100% | 0.1183 | −0.0766 | 75.00% |
| Epi | 11.4962 | 0.3933 | 100% | 0.1267 | −0.1002 | 83.33% |
| Phy | 11.9352 | −1.2768 | 100% | NA | NA | NA |
Note.—These statistics were evaluated across all simulations presented in figure 1. The estimates of epidemic start dates were converted to the number of days after an arbitrary date. For bias and precision, we normalized the statistics by the true parameter value.
. 2.Parameter estimates when reporting rate was 1 in 100 and k was fixed to 1. The horizontal dashed lines denote the true parameter value for that set of parameters that is, the parameter value used to simulate the data. The boxes indicate the median and 95% HPD interval of parameter estimates pooled from replicate simulations. The vertical lines with a single dot denote the median and 95% HPD interval of each individual simulation. Simulations where the MCMC chain did not converge were left out of the plot. Estimates of the reporting rate did not include inference from phylogenetic data, as the reporting rate refers to the probability that an infection appears in the incidence time series.
The Kolmogorov–Smirnov (K–S) Distance between the Posterior Distributions of Estimated Assuming a Geometric Offspring Distribution (i.e., fixing ) and Those Estimated While Estimating (see results in table 1 and figure 1).
| 0.1 | 0.725 (0.078, 0.956) | 0 (0, 0.125) | 0.201 (0, 0.887) |
| 1.0 | 0.325 (0.111, 0.979) | 0.151 (0.078, 0.247) | 0.422 (0.142, 0.777) |
| 10.0 | 0.646 (0.135, 0.977) | 0.066 (0, 0.101) | 0.395 (0.067, 0.954) |
Note.—K–S values closer to 1 reflect larger discrepancies between the posterior distributions, whereas those close to 0 suggest no difference in posterior distributions. The numbers in the brackets denote the range (maximum–minimum) of K–S distances from different sets of simulated data, and the number preceding the brackets denotes the median K–S distance.
. 3.Inference using 100 trees sampled from simulated sequences, compared with the estimates obtained using the true phylogeny. Horizontal lines with a black dot in the middle represent the median and 95% HPD intervals of parameter values estimated using each of the 100 trees. The red vertical lines are the true parameter values. The red distributions are the posterior distributions integrated over the 100 phylogenies, and the blue distributions are the posterior distributions obtained using the true phylogeny.
. 4.Posterior densities of epidemiological parameters for the 2010 wild type 1 poliovirus outbreak in Tajikistan. The solid and dashed vertical lines are the maximum likelihood estimates and 95% confidence intervals estimated in Blake et al (2014) using epidemiological data only. The solid vertical lines not accompanied by dashed lines correspond to parameter values that were fixed and not estimated.
. 5.Illustration of likelihood estimation using particle filtering (PF). (A) The median and range of simulated epidemic trajectories during PF. (B–D) show the steps that occur during one iteration of PF. (B) J epidemics (particles) are simulated. The frequency distribution of the simulated is proportional to the probability density . (C) The weight of each simulated epidemic (particle) is calculated according to the likelihood . (D) Particles are resampled with replacement according to multinomial distribution where probabilities are the normalized particle weights. Further details of the PF implementation are given as pseudocode and discussed in more detail in the “Materials and Methods” section.
Parameters of the SIR Model Fit to Simulated Outbreak Data.
| Parameter | Value | Prior |
|---|---|---|
| Population size | 20,000 | — |
| Initial number of infected | 1 | — |
| Duration of infection | — | Uniform (3, 7) |
| Basic reproductive number | — | Uniform (1, 20) |
| Offspring distribution dispersion | — | |
| Reporting rate | — | Uniform (0.0, 1.0) |
| Time of first infection | — | Uniform (01 Jan 16, |
Note.—The upper bound of the prior distribution of the epidemic start date is the time of the first reported case or the time of root node in the phylogeny, whichever comes first.
Model Parameters of the Transmission Model for Polio.
| Parameter | Value | Estimated | Prior |
|---|---|---|---|
| Population sizes in thousands | 656, 1,249, 3,721 | ||
| Susceptible individuals at start in thousands | 109.6, 176.1, 104.2 | ||
| Initial numbers of infected | 1, 0, 0 | ||
| Mean duration of latency | 4 | ||
| Mean duration of infectiousness | Yes | Gamma ( | |
| Reproduction numbers of children aged 0–5 years | Yes* (proposal and prior on | ||
| Reproduction numbers of people aged 6+years | Yes* (proposal and prior on | Uniform (1 | |
| Offspring distribution dispersion parameter | Yes | Uniform (1 | |
| Infections:Case ratio (inverse of reporting fraction) | Yes | Uniform (1, 1 | |
| Time of first infection | Yes | Uniform (08 Sep 09, 01 Feb 10) | |
| Vaccine efficacy | Yes | Uniform (0.0, 1.0) | |
| Mean and shape parameters of the Erlang distributed incubation period | 16.5 days, 16 |
Note.—Values of fixed parameters are given in the column “Value.” For parameters that are estimated, the prior distribution on the parameter is given in the “Prior” column. The population was divided into three age groups: 0–5, 6–14 and 15+ years. The initial numbers of susceptibles were fixed to the maximum likelihood estimates used in Blake et al (2014). Vaccinations took place on the following dates: 06 May, 20 May, 03 Jun, 17 Jun and 17 Jun 2010. On these dates, individuals were moved from the susceptible to the recovered compartment with probability . Gamma distributions are parameterized by the shape and scale parameters. *The reproductive numbers and were calculated from the estimated transmission rate amongst young children , the relative transmission rate between all other groups , the duration of infectiousness, and numbers of susceptibles.