| Literature DB >> 24085839 |
Gabriel E Leventhal1, Huldrych F Günthard, Sebastian Bonhoeffer, Tanja Stadler.
Abstract
The control, prediction, and understanding of epidemiological processes require insight into how infectious pathogens transmit in a population. The chain of transmission can in principle be reconstructed with phylogenetic methods which analyze the evolutionary history using pathogen sequence data. The quality of the reconstruction, however, crucially depends on the underlying epidemiological model used in phylogenetic inference. Until now, only simple epidemiological models have been used, which make limiting assumptions such as constant rate parameters, infinite total population size, or deterministically changing population size of infected individuals. Here, we present a novel phylogenetic method to infer parameters based on a classical stochastic epidemiological model. Specifically, we use the susceptible-infected-susceptible model, which accounts for density-dependent transmission rates and finite total population size, leading to a stochastically changing infected population size. We first validate our method by estimating epidemic parameters for simulated data and then apply it to transmission clusters from the Swiss HIV epidemic. Our estimates of the basic reproductive number R0 for the considered Swiss HIV transmission clusters are significantly higher than previous estimates, which were derived assuming infinite population size. This difference in key parameter estimates highlights the importance of careful model choice when doing phylogenetic inference. In summary, this article presents the first fully stochastic implementation of a classical epidemiological model for phylogenetic inference and thereby addresses a key aspect in ongoing efforts to merge phylogenetics and epidemiology.Entities:
Keywords: birth–death; coalescent; density dependence; epidemic inference; phylodynamics
Mesh:
Year: 2013 PMID: 24085839 PMCID: PMC3879443 DOI: 10.1093/molbev/mst172
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FExample of an epidemic with sampled (red) and unsampled (gray) individuals. The top panel shows the infective periods of all individuals in the epidemic. The middle panel shows the infective periods of only the sampled individuals as well as the recreated transmission tree. The red dots are the sampling times of the individuals and the black triangles the branching times on the sampled phylogeny. Note that while we do not know the exact infectious periods of the sampled individuals, the transmission chain between two events can pass through multiple individuals. The bottom panel shows the corresponding phylogenetic tree with branching times x and sampling times y. In this example, the joint event time vector is . The axis at the bottom of the figure shows how the matrices in equation (5) are applied to the probability vector from the present time t0 to the root of the tree (here n = 4 and ). The numbers along the axis are the number of extant lineages within that time interval. Note that the matrices are applied as matrix exponentials, i.e., . The likelihood of the tree given that a single infected individual started the epidemic is then the entry of the vector of probabilities at the root, i.e., for which the number of infected individuals I = 1.
The Relative Bias of the Estimated Parameters and DIC Values of the Fit.
Note.—The SIM entries are the input parameters to the simulation. , density-dependent model with fixed sampling probability; DD, density-dependent model with inferred sampling probability; BD, density-independent model with fixed sampling probability. n is the number of tips in the tree. The entries at , and R0 show the relative bias of the estimates. Smaller DIC values indicate a better fit of the model to the data. The model with the smallest DIC value is indicated by an asterisk for each of the parameter sets.
aThe HPD interval does not contain the true parameter value (shaded cells).
bNumerical maximization of the likelihood failed.
cThe MCMC method did not converge under the Gelman–Rubin diagnostic. We therefore report ML point estimates and the deviance at the ML estimator. This is not equivalent to a DIC, but must be corrected by , where p is the effective number of parameters.
Epidemiological Parameter Estimates for the 10 Swiss HIV Transmission Clusters Under the Model with .
| Cluster | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| | 34 | 29 | 27 | 26 | 25 |
| | 188 [160,218] | 36.1 [32.2,40.6] | — | 39.4 [35.5,43.9] | 65.7 [58.7,73] |
| | 0.269 [0.258,0.282] | 0.5 [0.458,0.543] | — | 1.02 [0.952,1.08] | 0.622 [0.594,0.658] |
| | 0.00973 [0.009,0.011] | 0.0341 [0.032,0.036] | — | 0.0808 [0.076,0.086] | 0.0113 [0.01,0.012] |
| | 0.0292 [0.027,0.032] | 0.102 [0.096,0.108] | — | 0.243 [0.227,0.258] | 0.034 [0.031,0.037] |
| | 6.93 [6.29,7.62] | 3.67 [3.39,3.94] | — | 3.15 [2.88,3.4] | 13.7 [12.4,15.3] |
Note.—n is the number of sampled individuals in each subepidemic. is the basic reproductive ratio. The reported values are the posterior mode and the 95% credible intervals from the posterior distribution. For the uniform prior used, the posterior mode corresponds to the ML estimator and only differed negligibly for the estimate of N in cluster 6. In cluster 6, the posterior distribution of N was heavy-tailed and bounded by the uniform prior . Therefore, the upper limit of the credible interval is likely an underestimate.
aThe fit of the density-dependent model is significantly better than the density-independent model. Model comparison is based on DIC values.
FLineage through time plots and prevalence curves of two example HIV subepidemics in Switzerland. Left panels: The gray lines are the LTT of the 90 samples from the posterior distribution of trees estimated using BEAST. The solid and dashed lines are the expected number of lineages for the density-dependent and density-independent SIS models, respectively, using the parameter estimates from table 2. Predicted LTT plots are almost identical when using parameter estimates for sampling probabilities and . Right panels: Dashed lines correspond to the density-independent (BD) model and solid lines to the density-dependent (DD) model. The vertical dotted line indicates the time of the last sample in the tree. The gray steps are the actual cumulative number of sampled individuals over time and the red curves are the fitted functions. The black lines show the predicted prevalence from the fitted model. The predicted number of infected individuals (black) and cumulative number of sampled individuals (red) for the estimated parameter values. Although both model produce acceptable fits to the cumulative number of samples over time, the BD model predicts both the prevalence and cumulative number of samples to increase exponentially in the future, whereas the DD model can identify subepidemics that are already in the saturated phase.