Literature DB >> 30064984

Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference.

Zvi Rosen1, Anand Bhaskar2,3, Sebastien Roch4, Yun S Song5,6,7.   

Abstract

The sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to zero or diverge to infinity, and show undesirable sensitivity to perturbations in the data. The goal of this article is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographies and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model, and generalize our intuition to arbitrary sample sizes using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only [Formula: see text] epochs, where [Formula: see text] is between [Formula: see text] and [Formula: see text] The set of expected SFS for piecewise-constant demographies with fewer than [Formula: see text] epochs is open and nonconvex, which causes the above phenomena for inference from data.
Copyright © 2018 by the Genetics Society of America.

Keywords:  algebraic methods; coalescent theory; expected sample frequency spectrum; population size

Mesh:

Year:  2018        PMID: 30064984      PMCID: PMC6216588          DOI: 10.1534/genetics.118.300733

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  28 in total

1.  Estimation of population parameters and recombination rates from single nucleotide polymorphisms.

Authors:  R Nielsen
Journal:  Genetics       Date:  2000-02       Impact factor: 4.562

2.  On the number of segregating sites in genetical models without recombination.

Authors:  G A Watterson
Journal:  Theor Popul Biol       Date:  1975-04       Impact factor: 1.570

3.  The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations.

Authors:  Gabor T Marth; Eva Czabarka; Janos Murvai; Stephen T Sherry
Journal:  Genetics       Date:  2004-01       Impact factor: 4.562

4.  Frequency spectrum neutrality tests: one for all and all for one.

Authors:  Guillaume Achaz
Journal:  Genetics       Date:  2009-06-22       Impact factor: 4.562

5.  Estimating ancestral population parameters.

Authors:  J Wakeley; J Hey
Journal:  Genetics       Date:  1997-03       Impact factor: 4.562

6.  Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation.

Authors:  Julien Jouganous; Will Long; Aaron P Ragsdale; Simon Gravel
Journal:  Genetics       Date:  2017-05-11       Impact factor: 4.562

7.  Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

Authors:  F Tajima
Journal:  Genetics       Date:  1989-11       Impact factor: 4.562

8.  Statistical tests of neutrality of mutations.

Authors:  Y X Fu; W H Li
Journal:  Genetics       Date:  1993-03       Impact factor: 4.562

9.  Efficient computation of the joint sample frequency spectra for multiple populations.

Authors:  John A Kamm; Jonathan Terhorst; Yun S Song
Journal:  J Comput Graph Stat       Date:  2017-02-16       Impact factor: 2.302

10.  Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes.

Authors:  Jerome Kelleher; Alison M Etheridge; Gilean McVean
Journal:  PLoS Comput Biol       Date:  2016-05-04       Impact factor: 4.475

View more
  7 in total

Review 1.  Inference of population history using coalescent HMMs: review and outlook.

Authors:  Jeffrey P Spence; Matthias Steinrücken; Jonathan Terhorst; Yun S Song
Journal:  Curr Opin Genet Dev       Date:  2018-07-26       Impact factor: 5.578

2.  GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data.

Authors:  Ekaterina Noskova; Vladimir Ulyantsev; Klaus-Peter Koepfli; Stephen J O'Brien; Pavel Dobrynin
Journal:  Gigascience       Date:  2020-03-01       Impact factor: 6.524

3.  Nonparametric coalescent inference of mutation spectrum history and demography.

Authors:  William S DeWitt; Kameron Decker Harris; Aaron P Ragsdale; Kelley Harris
Journal:  Proc Natl Acad Sci U S A       Date:  2021-05-25       Impact factor: 11.205

4.  Contemporary Demographic Reconstruction Methods Are Robust to Genome Assembly Quality: A Case Study in Tasmanian Devils.

Authors:  Austin H Patton; Mark J Margres; Amanda R Stahlke; Sarah Hendricks; Kevin Lewallen; Rodrigo K Hamede; Manuel Ruiz-Aravena; Oliver Ryder; Hamish I McCallum; Menna E Jones; Paul A Hohenlohe; Andrew Storfer
Journal:  Mol Biol Evol       Date:  2019-12-01       Impact factor: 16.240

5.  Vicariance followed by secondary gene flow in a young gazelle species complex.

Authors:  Genís Garcia-Erill; Michael Munkholm Kjaer; Anders Albrechtsen; Hans Redlef Siegismund; Rasmus Heller
Journal:  Mol Ecol       Date:  2020-12-22       Impact factor: 6.185

6.  Drosophila Evolution over Space and Time (DEST): A New Population Genomics Resource.

Authors:  Martin Kapun; Joaquin C B Nunez; María Bogaerts-Márquez; Jesús Murga-Moreno; Margot Paris; Joseph Outten; Marta Coronado-Zamora; Courtney Tern; Omar Rota-Stabelli; Maria P García Guerreiro; Sònia Casillas; Dorcas J Orengo; Eva Puerma; Maaria Kankare; Lino Ometto; Volker Loeschcke; Banu S Onder; Jessica K Abbott; Stephen W Schaeffer; Subhash Rajpurohit; Emily L Behrman; Mads F Schou; Thomas J S Merritt; Brian P Lazzaro; Amanda Glaser-Schmitt; Eliza Argyridou; Fabian Staubach; Yun Wang; Eran Tauber; Svitlana V Serga; Daniel K Fabian; Kelly A Dyer; Christopher W Wheat; John Parsch; Sonja Grath; Marija Savic Veselinovic; Marina Stamenkovic-Radak; Mihailo Jelic; Antonio J Buendía-Ruíz; Maria Josefa Gómez-Julián; Maria Luisa Espinosa-Jimenez; Francisco D Gallardo-Jiménez; Aleksandra Patenkovic; Katarina Eric; Marija Tanaskovic; Anna Ullastres; Lain Guio; Miriam Merenciano; Sara Guirao-Rico; Vivien Horváth; Darren J Obbard; Elena Pasyukova; Vladimir E Alatortsev; Cristina P Vieira; Jorge Vieira; Jorge Roberto Torres; Iryna Kozeretska; Oleksandr M Maistrenko; Catherine Montchamp-Moreau; Dmitry V Mukha; Heather E Machado; Keric Lamb; Tânia Paulo; Leeban Yusuf; Antonio Barbadilla; Dmitri Petrov; Paul Schmidt; Josefa Gonzalez; Thomas Flatt; Alan O Bergland
Journal:  Mol Biol Evol       Date:  2021-12-09       Impact factor: 16.240

7.  Biases in Demographic Modeling Affect Our Understanding of Recent Divergence.

Authors:  Paolo Momigliano; Ann-Britt Florin; Juha Merilä
Journal:  Mol Biol Evol       Date:  2021-06-25       Impact factor: 16.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.