| Literature DB >> 31428459 |
Nicola F Müller1,2, Gytis Dudas3,4, Tanja Stadler1,2.
Abstract
Population dynamics can be inferred from genetic sequence data by using phylodynamic methods. These methods typically quantify the dynamics in unstructured populations or assume migration rates and effective population sizes to be constant through time in structured populations. When considering rates to vary through time in structured populations, the number of parameters to infer increases rapidly and the available data might not be sufficient to inform these. Additionally, it is often of interest to know what predicts these parameters rather than knowing the parameters themselves. Here, we introduce a method to infer the predictors for time-varying migration rates and effective population sizes by using a generalized linear model (GLM) approach under the marginal approximation of the structured coalescent. Using simulations, we show that our approach is able to reliably infer the model parameters and its predictors from phylogenetic trees. Furthermore, when simulating trees under the structured coalescent, we show that our new approach outperforms the discrete trait GLM model. We then apply our framework to a previously described Ebola virus dataset, where we infer the parameters and its predictors from genome sequences while accounting for phylogenetic uncertainty. We infer weekly cases to be the strongest predictor for effective population size and geographic distance the strongest predictor for migration. This approach is implemented as part of the BEAST2 package MASCOT, which allows us to jointly infer population dynamics, i.e. the parameters and predictors, within structured populations, the phylogenetic tree, and evolutionary parameters.Entities:
Keywords: BEAST; GLM; infectious disease; phylogenetics; phylogeography
Year: 2019 PMID: 31428459 PMCID: PMC6693038 DOI: 10.1093/ve/vez030
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Comparison in inference of coefficients and indicators from fixed phylogenetic trees and when jointly inferring them. (A) Inferred active coefficients when inferring the phylogenetic tree (y-axis) versus the inferred active coefficients when fixing the phylogenetic tree (x-axis) for the effective population size predictors. The coverage denotes how often the 95% highest posterior density interval includes the true values of the coefficient, which describe the effect size of active predictors. (B) Probability of the indicator of active effective population size predictors to be 1 when inferring the phylogenetic tree (y-axis) versus it being 1 when fixing the phylogenetic tree (x-axis). The effect size of a predictor is given by the grey scale. Predictors with a smaller effect size (brighter) are included less often as active predictors. (C) Inferred active coefficients when inferring the phylogenetic tree (y-axis) versus the inferred active coefficients when fixing the phylogenetic tree (x-axis) for the migration rate predictors. (D) Probability of the indicator of active migration rate predictors to be 1 when inferring the phylogenetic tree (y-axis) versus it being 1 when fixing the phylogenetic tree (x-axis).
Figure 2.Inference of coefficients and indicators using the generalized linear model versions of MASCOT and DTA based on fixed phylogenetic trees. (A) Inferred active coefficients (y-axis) versus the true coefficients for migration rate predictors using MASCOT (left) and DTA (right). (B) Probability of the indicator of active migration rate predictors to be 1 (y-axis) for the effective population size. The dashed line corresponds to a Bayes Factor of 10 for the predictor being included. (C) Histogram of inclusion probabilities from predictors that are not predicting migration rates. Therefore, the true value of these inclusion probabilities is 0 and predictors with large inclusion probabilities might be falsely considered as predicting migration rates. (D) Histogram of inclusion probabilities of indicators for sample number predictors. That is, the predictor that predicts migration into a state being proportional to the number of samples from that state. These predictors were not used to predict migration rates.
Figure 3.Analysis of data from the 2014 Ebola epidemic in Sierra Leone. (A) Inferred maximum clade credibility tree from the 2014 Sierra Leone EBOV sequences. Colours denote the most likely inferred district for each node, and branches are coloured as their descendant node. District colour scheme is shown on the map. (B) Weekly incidence by district. The x-axis denotes time in months and acts as a scale for both incidence data as well as the phylogenetic tree.
Figure 4.Inferred predictors of the effective population sizes and migration rates for the Ebola analysis. (A) Inferred effective population size predictors. The x-axis shows the probability of the predictors being included in predicting the effective population sizes. Red bars are predictors for which the median value of the coefficient is negative and blue bars are predictors for which the median value of the coefficient is positive. The magnitude of the coefficient from a standardized predictor does not have a direct meaning or dimension. We therefore only plotted if a coefficient was inferred to be positive or negative, i.e. if the relationship between a predictor and the effective population size or migration rates is inverse or not. The case data predictor include the number of cases per week in a location. We additionally added eight predictors where the cases are assumed to have happened 1, 3, 6, and 9 weeks earlier or later. These are not inferred to strongly predict effective population sizes. (B) Inferred migration rate predictors. The x-axis shows the probability of the predictors being included in the migration model. ‘Origin’ and ‘from’ predictors predict the migration rate from a location. ‘Destination’ and ‘to’ predictors predict the migration rate into a location. The dashed line corresponds to a Bayes Factor of 10 for the predictor being included.