Shiwei Lan1, Julia A Palacios2, Michael Karcher3, Vladimir N Minin4, Babak Shahbaba5. 1. Department of Statistics, University of Warwick, Coventry CV4 7AL, UK. 2. Department of Organismic and Evolutionary Biology, Harvard University, MA 02138, US, Department of Ecology and Evolutionary Biology, Brown University, RI 02912, US, Center for Computational Molecular Biology, Brown University. 3. Department of Statistics, University of Washington, WA 98195, US. 4. Department of Statistics, University of Washington, WA 98195, US, Department of Biology, University of Washington and. 5. Department of Statistics, University of California, Irvine, CA 92697, US.
Abstract
MOTIVATION: The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost. RESULTS: To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm. AVAILABILITY AND IMPLEMENTATION: The R code for all simulation studies and real data analysis conducted in this article are publicly available at http://www.ics.uci.edu/∼slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn. CONTACT: S.Lan@warwick.ac.uk or babaks@uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost. RESULTS: To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm. AVAILABILITY AND IMPLEMENTATION: The R code for all simulation studies and real data analysis conducted in this article are publicly available at http://www.ics.uci.edu/∼slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn. CONTACT: S.Lan@warwick.ac.uk or babaks@uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Andrew Rambaut; Oliver G Pybus; Martha I Nelson; Cecile Viboud; Jeffery K Taubenberger; Edward C Holmes Journal: Nature Date: 2008-04-16 Impact factor: 49.962
Authors: Mandev S Gill; Philippe Lemey; Nuno R Faria; Andrew Rambaut; Beth Shapiro; Marc A Suchard Journal: Mol Biol Evol Date: 2012-11-22 Impact factor: 16.240
Authors: Michael Spencer Chapman; Nicholas Williams; Kevin J Dawson; Emily Mitchell; Nicole Mende; Emily F Calderbank; Hyunchul Jung; Thomas Mitchell; Tim H H Coorens; David H Spencer; Heather Machado; Henry Lee-Six; Megan Davies; Daniel Hayler; Margarete A Fabre; Krishnaa Mahbubani; Federico Abascal; Alex Cagan; George S Vassiliou; Joanna Baxter; Inigo Martincorena; Michael R Stratton; David G Kent; Krishna Chatterjee; Kourosh Saeb Parsy; Anthony R Green; Jyoti Nangalia; Elisa Laurenti; Peter J Campbell Journal: Nature Date: 2022-06-01 Impact factor: 69.504