| Literature DB >> 26717515 |
Matthew Hall1,2, Mark Woolhouse1,2, Andrew Rambaut1,2,3.
Abstract
The use of genetic data to reconstruct the transmission tree of infectious disease epidemics and outbreaks has been the subject of an increasing number of studies, but previous approaches have usually either made assumptions that are not fully compatible with phylogenetic inference, or, where they have based inference on a phylogeny, have employed a procedure that requires this tree to be fixed. At the same time, the coalescent-based models of the pathogen population that are employed in the methods usually used for time-resolved phylogeny reconstruction are a considerable simplification of epidemic process, as they assume that pathogen lineages mix freely. Here, we contribute a new method that is simultaneously a phylogeny reconstruction method for isolates taken from an epidemic, and a procedure for transmission tree reconstruction. We observe that, if one or more samples is taken from each host in an epidemic or outbreak and these are used to build a phylogeny, a transmission tree is equivalent to a partition of the set of nodes of this phylogeny, such that each partition element is a set of nodes that is connected in the full tree and contains all the tips corresponding to samples taken from one and only one host. We then implement a Monte Carlo Markov Chain (MCMC) procedure for simultaneous sampling from the spaces of both trees, utilising a newly-designed set of phylogenetic tree proposals that also respect node partitions. We calculate the posterior probability of these partitioned trees based on a model that acknowledges the population structure of an epidemic by employing an individual-based disease transmission model and a coalescent process taking place within each host. We demonstrate our method, first using simulated data, and then with sequences taken from the H7N7 avian influenza outbreak that occurred in the Netherlands in 2003. We show that it is superior to established coalescent methods for reconstructing the topology and node heights of the phylogeny and performs well for transmission tree reconstruction when the phylogeny is well-resolved by the genetic data, but caution that this will often not be the case in practice and that existing genetic and epidemiological data should be used to configure such analyses whenever possible. This method is available for use by the research community as part of BEAST, one of the most widely-used packages for reconstruction of dated phylogenies.Entities:
Mesh:
Year: 2015 PMID: 26717515 PMCID: PMC4701012 DOI: 10.1371/journal.pcbi.1004613
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1The five possible transmission tree structures of a phylogenetic tree with three tips, depicted as partitions of the nodes of a phylogeny (above) and as directed graphs amongst the hosts A, B and C (below).
Fig 2Illustrations of partitioned phylogenies and MCMC proposals modifying them.
Nodes in all cases are coloured by the partition element containing them. (A) An example partitioned phylogeny. Tips are labelled by the hosts that the isolates corresponding to them were taken from. Where more than one isolate is taken from a host a , c(a ) is labelled; in all other cases c(a ) is the single tip corresponding to an isolate taken from that host. Black diamonds designate nodes that are not ancestral under the partition. The hosts a 7 and a 8 are root-blocked by a 6 due to the position of c(a 6) (black cross). (B) The downward infection branch move. The move attempts to move the node u from the green partition element to the red (which already contains its parent uP). In i), the move is impossible because u is the MRCA node of the tips in the green element. In ii), it can be done with no further modifications required to obey the rules. In iii), the node uC 2, which is not ancestral under the initial partition, must also be moved to the red element so the result obeys the rules. (C) The upward infection branch move. The move attempts to move the node vP from the red partition element to the green (which already contains its child v). In i), the move is impossible because vP is ancestral under the partition and the host represented by the green element is root-blocked by the host represented by the red. In ii), it can be done with no further modifications required to obey the rules. In iii), the node vS, which is not ancestral under the initial partition, must also be moved to the green element, and in iv) the node vG must be because vS is ancestral. (D) The type A phylogeny moves. The exchange move exchanges the nodes u and v; the subtree slide and Wilson-Balding moves change the position of the node u and its parent uP. (E) The type B phylogeny moves. The exchange move exchanges the nodes u and v; the subtree slide move the node w and its parent wP, and the Wilson-Balding the node v and its parent vP. After the latter two moves the transplanted parent node is randomly assigned to one of two new partition elements with equal probability.
Description of symbols used in the probability decomposition.
| Symbol | Type | Meaning |
|---|---|---|
|
| Background information | Examination times of each host |
|
| Background information | Information defining the relationship between hosts used to define |
|
| Data | Results of examinations (sequence data and notes of negative observations) |
|
| Model parameter | Unmodified transmission rate |
|
| Model parameters | Parameters of |
|
| Model parameters | Parameters of the population dynamics of the agents within each host |
|
| Model parameters | Parameters of distribution of infectious periods |
|
| Model parameters | Parameters of nucleotide substitution and molecular clock models |
|
| Latent variable | Phylogenetic tree |
|
| Latent variable | Transmission tree |
|
| Latent variables | Times of infection of each host |
|
| Latent variables | Times of infectiousness of each host (if different to |
|
| Latent variables | Times of becoming noninfectious of each host |
|
| Function | Function modifying |
Explanation of the mathematical symbols used in the simulation model, and prior distributions for their values used in analysis of the simulated datasets.
Mathematical symbols are given where they appear in the text.
| Symbol | Meaning | Actual Value | Prior distribution |
|---|---|---|---|
|
| Identity of index host | variable |
|
|
| Infection time of index host | 0 |
|
|
| Transmission kernel dispersion parameter | 10 | exp(1) |
|
| Unmodified infection rate | 0.1/day | exp(0.5) |
|
| Within-host logistic growth rate | 1.5/day | None |
|
|
| 0.1 | None |
|
| Time before time of infection at which | -4 days | Gamma(10, 2) |
|
| Ratio of lim | 55.6 |
|
|
| Latent period | 2 days | Gamma(200, 100) |
|
| Mean and precision of normal distribution of infectious periods | 10 days, 1 days−2 | NormalGamma(10, 0.01, 1, 1) |
| Molecular clock rate, fast clock datasets | 5 ⋅ 10−4/site/day | exp(0.1) | |
| Molecular clock rate, slow clock datasets and prior analysis | 1 ⋅ 10−5/site/day | None | |
|
| HKY model transition/transversion ratio | 2.718 |
|
1 The prior probability of r is implicitly specified by the priors on T 50 and S
2 N 0 was fixed to its correct value of 0.1 in the analysis
3 The slow clock analysis was also repeated with NormalGamma(10, 100, 1, 1) instead
4 In the analyses of the slow clock datasets and the analyses sampling from the prior distribution only, R was fixed to its correct value of 1 ⋅ 10−5/site/day
Parameters used in the H7N7 analysis, and prior distributions for their values.
| Parameter | Symbol | Prior distribution |
|---|---|---|
| Identity of index host |
|
|
| Infection time of index host |
|
|
| Transmission kernel dispersion parameters |
|
|
| Unmodified transmission rate |
|
|
| Within-farm logistic growth rate |
| None |
| Product of effective population size and pathogen generation time at point of infection |
| Gamma(20, 4) |
| Time before infection time at which |
| None |
| Ratio of lim |
|
|
| Latent period |
| Gamma(200, 100) |
| Mean and precision of normal distribution of infectious periods, high-risk period | NormalGamma(7.3, 169.0, 1, 3.8) | |
| Mean and precision of normal distribution of infectious periods, low-risk period | NormalGamma(13.8, 2.64, 1, 3.8) | |
| Mean molecular clock rate (real space) |
| |
| Standard deviation parameter of relaxed molecular clock (log space) | Exp(0.33) | |
| Transition/transversion ratio |
| |
| Shape parameter of gamma distribution for between-site rate variation | Exp(0.5) | |
| Nucleotide frequencies |
| |
| Relative clock rates for nucleotide positions 1+2 and 3 |
|
1 E corresponds to 17 February 2003, an estimate for the time of the index infection taken from previous literature [36].
2 These parameters is not the true values that that would be estimated in the presence of data on uninfected susceptibles; see the text for details.
3 The prior probability of T 50 is implicitly specified by the priors on r and S
4 N 0 was fixed to 3.37 in the analysis
Fig 3Accuracy of the reconstruction of the transmission tree.
Each violin plot represents the density of a statistic calculated from the results of separate analyses of 25 simulated datasets; the clock model used to generate the dataset and the analysis method are indicated on the y-axis. (A) posterior median of mean bias in estimation of infection dates. (B) posterior median of mean error in estimation of infection dates. (C) Posterior median proportion of hosts whose infector is correctly identified. (D) Proportion of hosts whose infector is correctly identified in the maximum parent credibility (MPC) transmission tree.
Percentage of hosts with parents correctly identified by picking the infector host with the highest posterior probability for different thresholds, and percentage of hosts whose parents are inferred in this way for each threshold.
Numbers are median and range across the 25 simulations.
| Analysis | Statistic | Threshold | |||
|---|---|---|---|---|---|
| None | 0.5 | 0.8 | 0.9 | ||
| Prior | % Parents correctly identified | 22 (10, 28.5) | 40 (25, 75) | 66.7 (0, 100) | 75 (0, 100) |
| % Parents inferred | 100 | 21.3 (10, 36) | 8.0 (2.0, 16.7) | 6.0 (2.0, 14.6) | |
| Fast clock | % Parents correctly identified | 95.9 (82.0, 100) | 95.9 (82, 100) | 97.8 (85.7, 100) | 100 (88.9, 100) |
| % Parents inferred | 100 | 100 (97.8, 100) | 91.7 (79.6, 100) | 84.0 (68.0, 100) | |
| Slow clock | % Parents correctly identified | 72.0 (54.2, 88) | 84.2 (69, 100) | 94.1 (85.7, 100) | 100 (86.4, 100) |
| % Parents inferred | 100 | 76 (42.9, 94) | 42.9 (16.3, 64) | 28 (4.1, 50) | |
| Slow clock, | % Parents correctly identified | 70.2 (52.1, 86) | 85.3 (67.6, 95.7) | 96 (85.2, 100) | 100 (85.7, 100) |
| % Parents inferred | 100 | 75.5 (36.7, 92) | 41.7 (4.1, 64) | 28.3 (2, 54) | |
Fig 4Accuracy of the reconstruction of the phylogeny. Each violin plot represents the density of a statistic calculated from the results of separate analyses of 25 simulated datasets; the clock model used to generate the dataset and the analysis method are indicated on the y-axis.
(A) posterior median of mean bias in estimation of all pairwise TMRCAs. (B) posterior median of mean error in estimation of all pairwise TMRCAs. (C) Posterior median SPR distance from the true phylogeny.
Estimates of simulation parameters from the various analyses.
The median values, across the 25 simulations, of the posterior median, relative error in the posterior median, relative bias in the posterior median and relative width of the 95% HPD interval of each parameter are given, along with the number out of 25 simulations that the correct value was contained within the 95% HPD interval. Where estimates are not given for a particular analysis, this parameter was either fixed to its correct value or, in the case of WHC-related parameters in skyride analyses, not part of the analysis. Mathematical symbols are given where they are referred to in the text.
| Symbol | Meaning | Dataset | Model | True value | Median | Error | Bias | 95% HPD width | HPD accuracy |
|---|---|---|---|---|---|---|---|---|---|
| Molecular clock rate | Fast | Skyride | 5 ⋅ 10−4 | 5.11 ⋅ 10−4 | 2.68 ⋅ 10−2 | 2.13 ⋅ 10−2 | 0.125 | 24 | |
| Fast | WHC | 5 ⋅ 10−4 | 5.1 ⋅ 10−4 | 2.32 ⋅ 10−2 | 1.91 ⋅ 10−2 | 0.112 | 23 | ||
|
| Transition/transversion ratio | Prior | WHC | 2.72 | 2.59 | 4.67 ⋅ 10−2 | −4.67 ⋅ 10−2 | 7.41 | 25 |
| Fast | Skyride | 2.72 | 2.72 | 2.12 ⋅ 10−2 | −1.09 ⋅ 10−3 | 0.113 | 23 | ||
| Fast | WHC | 2.72 | 2.72 | 2.16 ⋅ 10−2 | 6.58 ⋅ 10−4 | 0.114 | 23 | ||
| Slow | Skyride | 2.72 | 2.52 | 0.132 | −7.13 ⋅ 10−2 | 0.745 | 24 | ||
| Slow | WHC | 2.72 | 2.51 | 0.13 | −7.53 ⋅ 10−2 | 0.77 | 24 | ||
| Slow | WHC ( | 2.72 | 2.53 | 0.129 | −7.04 ⋅ 10−2 | 0.763 | 24 | ||
|
| Mean infectious period | Prior | WHC | 10 | 1.99 | 0.802 | −0.802 | 0.141 | 0 |
| Fast | WHC | 10 | 9.85 | 2.18 ⋅ 10−2 | −7.88 ⋅ 10−3 | 0.161 | 25 | ||
| Slow | WHC | 10 | 8.26 | 0.177 | −0.177 | 0.309 | 13 | ||
| Slow | WHC ( | 10 | 9.87 | 1.9 ⋅ 10−2 | −1.8 ⋅ 10−2 | 0.134 | 24 | ||
|
| Standard deviation of infectious periods | Prior | WHC | 1 | 0.944 | 0.216 | −0.107 | 0.593 | 18 |
| Fast | WHC | 1 | 1.06 | 0.12 | 3.13 ⋅ 10−2 | 0.774 | 25 | ||
| Slow | WHC | 1 | 1.63 | 0.724 | 0.724 | 1.99 | 22 | ||
| Slow | WHC ( | 1 | 1.39 | 0.373 | 0.373 | 2.29 | 20 | ||
|
| Latent period | Prior | WHC | 2 | 1.78 | 0.112 | −0.112 | 0.257 | 17 |
| Fast | WHC | 2 | 1.98 | 1.12 ⋅ 10−2 | −9.13 ⋅ 10−3 | 0.264 | 25 | ||
| Slow | WHC | 2 | 1.93 | 3.75 ⋅ 10−2 | −3.75 ⋅ 10−2 | 0.268 | 25 | ||
| Slow | WHC ( | 2 | 1.86 | 6.88 ⋅ 10−2 | −6.88 ⋅ 10−2 | 0.254 | 25 | ||
|
| Transmission kernel dispersion parameter | Prior | WHC | 7 | 4.03 | 0.423 | −0.425 | 0.682 | 9 |
| Fast | WHC | 7 | 6.72 | 6.43 ⋅ 10−2 | −4.04 ⋅ 10−2 | 0.469 | 23 | ||
| Slow | WHC | 7 | 6.81 | 6.87 ⋅ 10−2 | −2.69 ⋅ 10−2 | 0.53 | 24 | ||
| Slow | WHC ( | 7 | 6.90 | 5.67 ⋅ 10−2 | −1.41 ⋅ 10−2 | 0.538 | 24 | ||
|
| Unmodified transmission rate | Prior | WHC | 0.1 | 0.143 | 0.462 | 0.43 | 2.49 | 25 |
| Fast | WHC | 0.1 | 9.88 ⋅ 10−2 | 0.186 | −1.24 ⋅ 10−2 | 1.03 | 24 | ||
| Slow | WHC | 0.1 | 0.111 | 0.182 | 0.105 | 1.43 | 25 | ||
| Slow | WHC ( | 0.1 | 0.103 | 0.186 | 3.1 ⋅ 10−2 | 1.31 | 25 | ||
|
| Within-host logistic growth rate | Prior | WHC | 1 | 0.75 | 0.25 | −0.25 | 2.84 | 25 |
| Fast | WHC | 1 | 1.09 | 0.141 | 9.32 ⋅ 10−2 | 0.986 | 25 | ||
| Slow | WHC | 1 | 2.63 | 1.63 | 1.63 | 5.87 | 24 | ||
| Slow | WHC ( | 1 | 2.17 | 1.17 | 1.17 | 5.54 | 25 | ||
|
| Time at which within-host population size is half its final value | Prior | WHC | −4 | −4.25 | 0.27 | −0.254 | 7.05 | 25 |
| Fast | WHC | −4 | −3.45 | 0.698 | 0.552 | 3.27 | 24 | ||
| Slow | WHC | −4 | −1.42 | 2.58 | 2.58 | 3.86 | 16 | ||
| Slow | WHC ( | −4 | −1.71 | 2.29 | 2.29 | 4.68 | 23 | ||
|
| Ratio of final within-host population size to size at infection | Prior | WHC | 55.6 | 25.1 | 0.547 | −0.549 | 1.53 | 25 |
| Fast | WHC | 55.6 | 41.5 | 0.3 | −0.253 | 1.08 | 23 | ||
| Slow | WHC | 55.6 | 41.4 | 0.255 | −0.255 | 1.33 | 23 | ||
| Slow | WHC ( | 55.6 | 41.5 | 0.253 | −0.253 | 1.09 | 25 |
1 Molecular clock rates were not estimated for runs on the slow clock dataset
2 Infectious periods were drawn from a normal distribution with the “actual values” given here as mean and standard deviation. Error and bias were, however, calculated using the mean and standard deviation of the actual set of estimated periods from each simulated epidemic.
Fig 5Maximum parent credibility transmission tree for the H7N7 outbreak.
Nodes represent farms and are coloured by geographical region. Arrows represent direct transmissions and are coloured by the posterior probability of this particular direct infection. The cyan-bordered nodes, which are also labelled with farm ID numbers from previous literature [18], are were detected during the “high-risk” period before the implementation of control measures. Orange-bordered nodes are farms for which no sequence was available.
Estimates of parameters from the H7N7 outbreak, posterior median and 95% HPD interval.
| Parameter | Symbol | Median value (95% HPD) |
|---|---|---|
| Transmission kernel parameters |
| 1.72 (1.53, 1.95) |
|
| 0.652 (0.341, 1.03) | |
| Unmodified transmission rate |
| 0.124 /day (5.39 ⋅ 10−2, 0.218) |
| Within-farm population growth rate |
| 6.99 /day (4.61, 9.88) |
| Time before infection time at which |
| -0.321 days (-0.441, -0.22) |
| Latent period |
| 1.56 days (1.34, 1.8) |
| Mean infectious period (high-risk period) | 6.19 days (4.57, 7.65) | |
| Standard deviation of infectious periods (high-risk period) | 2.45 days (0.43, 4.05) | |
| Mean infectious period (low-risk period) | 7.52 days (7.02, 8.05) | |
| Standard deviation of infectious periods (low-risk period) | 3.53 days (3.09, 3.99) | |
| Mean molecular clock rate | 2.78 ⋅ 10−5 subs/site/day (2.34 ⋅ 10−5, 3.25 ⋅ 10−5) | |
| Standard deviation of molecular clock rates | 1.34 ⋅ 10−5 subs/site/day (2.24 ⋅ 10−6, 2.40 ⋅ 10−5) | |
| Transition/transversion ratio, positions 1+2 | 7.22 (4.78, 9.99) | |
| Transition/transversion ratio, position 3 | 9.03 (5.56, 13.5) | |
| Gamma shape parameter for between-site rate variation, positions 1+2 | 4.27 ⋅ 10−2 (1.02 ⋅ 10−3, 9.32 ⋅ 10−2) | |
| Gamma shape parameter for between-site rate variation, position 3 | 0.231 (1.08 ⋅ 10−3, 0.677) | |
| Nucleotide frequency, adenine (A) | 0.333 (0.321, 0.346) | |
| Nucleotide frequency, cytosine (C) | 0.188 (0.178, 0.198) | |
| Nucleotide frequency, guanine (G) | 0.249 (0.238, 0.26) | |
| Nucleotide frequency, uracil (U) | 0.230 (0.219, 0.24) | |
| Relative clock rate parameter, positions 1+2 | 0.853 (0.763, 0.939) | |
| Relative clock rate parameter, position 3 | 1.29 (1.12, 1.47) |