| Literature DB >> 29790921 |
Nicola F Müller1,2, David Rasmussen1,2,3,4, Tanja Stadler1,2.
Abstract
Motivation: The structured coalescent is widely applied to study demography within and migration between sub-populations from genetic sequence data. Current methods are either exact but too computationally inefficient to analyse large datasets with many sub-populations, or make strong approximations leading to severe biases in inference. We recently introduced an approximation based on weaker assumptions to the structured coalescent enabling the analysis of larger datasets with many different states. We showed that our approximation provides unbiased migration rate and population size estimates across a wide parameter range.Entities:
Mesh:
Year: 2018 PMID: 29790921 PMCID: PMC6223361 DOI: 10.1093/bioinformatics/bty406
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Flow of information using the backwards/forwards algorithm. Going backwards in the tree, we calculate the probability of each node being in any state that includes information up to time t. The vector has the entries in position a. At the root, the backwards probabilities are equal to the forwards probabilities . To calculate the downwards probabilities , we use the information from all the other parts of the tree and the transition matrix M and the backwards probabilities
Fig. 2.Inference of effective population sizes, migration rates and node states. (A) Inferred effective population sizes on the y-axis versus true effective population sizes on the x-axis. The effective population sizes for the tree simulations were sampled from a lognormal (-0.125, 0.5) distribution. The coverage of migration rate estimates was 95.5% and for effective population size estimates 94.9%. (B) Inferred migration rates on the y-axis versus true migration rates on the x-axis. The migration rates between states were sampled from an exponential distribution with mean = 0.5. (C) Inferred node states using MASCOT with and without the backwards/forwards algorithm. (D) Median CPU time per mega sample depending on the number of lineages and the number of different states. The CPU time was taken from 100 replicates of the simulation scenario used. (E) Median posterior ESS per hour from 100 replicates from MASCOT and MultiTypeTree (dashed lines). The different colours indicate the different number of states. Dashed lines show median ESS per hour values for MultiTypeTree
Fig. 3.MASCOT analysis of globally sampled Influenza A/H3N2 viruses. (A) Here we show the maximum clade credibility tree inferred from H3N2 sequences from Australia, Hong Kong, New York, New Zealand and Japan. The colour of each branch indicates the most likely state of its daughter node. The pie charts indicate the probability of chosen nodes being in any of the possible states. The left pie chart is the probability inferred using the backwards/forwards algorithm and the right pie chart without using the backwards/forwards algorithm. Since at the root, these probabilities are the same, only one chart is shown. The node heights are the median node heights. (B) The median inferred immigration rates as indicated by the width of the arrow. The wider an arrow into a location, the more likely it is that a lineage in the destination originated from the source location of that arrow. The different source locations are denoted by the colours of the arrows. A wider arrow from light blue to green than from red to green shows that lineages in New Zealand are more likely to have originated from New York than from Hong Kong. The dot sizes are proportional to the median inferred effective population sizes of that state