| Literature DB >> 32739241 |
Marc Manceau1, Ankit Gupta2, Timothy Vaughan2, Tanja Stadler3.
Abstract
We consider a homogeneous birth-death process with three different sampling schemes. First, individuals can be sampled through time and included in a reconstructed phylogenetic tree. Second, they can be sampled through time and only recorded as a point 'occurrence' along a timeline. Third, extant individuals can be sampled and included in the reconstructed phylogenetic tree with a fixed probability. We further consider that sampled individuals can be removed or not from the process, upon sampling, with fixed probability. We derive the probability distribution of the population size at any time in the past conditional on the joint observation of a reconstructed phylogenetic tree and a record of occurrences not included in the tree. We also provide an algorithm to simulate ancestral population size trajectories given the observation of a reconstructed phylogenetic tree and occurrences. This distribution can be readily used to draw inferences about the ancestral population size in the field of epidemiology and macroevolution. In epidemiology, these results will allow data from epidemiological case count studies to be used in conjunction with molecular sequencing data (yielding reconstructed phylogenetic trees) to coherently estimate prevalence through time. In macroevolution, it will foster the joint examination of the fossil record and extant taxa to reconstruct past biodiversity.Entities:
Keywords: Birth-death process; Epidemiology; Fossilized birth-death model; Macroevolution; Phylogenetics
Mesh:
Year: 2020 PMID: 32739241 PMCID: PMC7733867 DOI: 10.1016/j.jtbi.2020.110400
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691
Fig. 1General setting of the method. a) the full process with sampling. Pink dots translate as dots in and correspond to -sampling (sampling through time without sequencing). Blue dots translate as dots in and correspond to -sampling (sampling through time with sequencing). Yellow dots correspond to all present-day -sampling events. Filled or unfilled dots correspond respectively to sampling with or without removal. b) Population size through time. c) Observed occurrences through time. d) Reconstructed phylogenetic tree. e) Number of individuals in reconstructed phylogenetic tree through time. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Four unobservable scenarios taken into account to derive the ODEs (3.2), (4.1).
Fig. 3Six observable punctual events in the data.
Fig. 4The most efficient results depending on the parameter space considered. In red, results already described in Stadler (2010) and Gupta et al. (2019). In blue, the new contribution of this manuscript. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Assessment of the accuracy of the methods presented in this paper, on toy datasets. First row, probability density of data, A) against known analytical formula when and ; B) against known analytical formula when and ; C) obtained using Algorithms 1’ or 2’ otherwise, with . Second row, quantiles of the population size distribution, against the particle filter in Vaughan et al. (2019), with parameters . D) quantile of level 0.2; E) median; F) quantile of level 0.8.
Fig. 6Inferred population size distribution using matches the simulated population size trajectory under three different processes: A) A homogeneous birth-death with -sampling at present; B) A homogeneous birth-death with -sampling at present and -sampling through time; C) A homogeneous birth-death process with -, - and -sampling. Note that we plot on the same graph , the number of observed lineages in the tree, as this is an obvious lower bound in our population size inference.
| Observed tree and occurrence data |
| parameters |
| set of time points |
| and the truncation |
| 2: Set |
| 3: Set |
| 4: |
| 5: Numerically solve the ODE |
| 6: where matrix |
| 7: |
| 8: Record |
| 9: Set |
| 10: |
| 11: |
| 12: |
| 13: |
| 14: Set |
| 15: |
| 16: Set |
| 17: |
| 18: Set |
| 19: |
| 20: Set |
| 21: |
| 22: Set |
| 23: |
| 24: Set |
| 25: |
| 26: |
| parameters |
| set of time points |
| and the truncation |
| 1: Pool all |
| 2: Set |
| 3: Set |
| 4: Set |
| 5: |
| 6: Compute the values right before the punctual event, |
| 7: |
| 8: Record the result in |
| 9: Set |
| 10: |
| 11: |
| 12: |
| 13: |
| 14: Update |
| 15: Set |
| 16: |
| 17: Update |
| 18: Set |
| 19: |
| 20: Update |
| 21: |
| 22: Update |
| 23: |
| 24: Update |
| 25: |
| 26: Update |
| 27: Set |
| 28: |
| 29: |