Literature DB >> 24469801

Distortion of genealogical properties when the sample is very large.

Anand Bhaskar1, Andrew G Clark, Yun S Song.   

Abstract

Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands, if not millions, of individuals. In addition to posing computational challenges, such large sample sizes call for carefully reexamining the theoretical foundation underlying commonly used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models, and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For recently inferred demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the trade-off between accuracy and computational efficiency, we propose a hybrid algorithm that uses the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.

Entities:  

Mesh:

Year:  2014        PMID: 24469801      PMCID: PMC3926037          DOI: 10.1073/pnas.1322709111

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  26 in total

1.  Gene genealogies when the sample size exceeds the effective size of the population.

Authors:  John Wakeley; Tsuyoshi Takahashi
Journal:  Mol Biol Evol       Date:  2003-02       Impact factor: 16.240

2.  The impact of sampling schemes on the site frequency spectrum in nonequilibrium subdivided populations.

Authors:  Thomas Städler; Bernhard Haubold; Carlos Merino; Wolfgang Stephan; Peter Pfaffelhuber
Journal:  Genetics       Date:  2009-02-23       Impact factor: 4.562

3.  HLA sequence polymorphism and the origin of humans.

Authors:  H A Erlich; T F Bergström; M Stoneking; U Gyllensten
Journal:  Science       Date:  1996-11-29       Impact factor: 47.728

Review 4.  Evidence for population growth in humans is confounded by fine-scale population structure.

Authors:  Susan E Ptak; Molly Przeworski
Journal:  Trends Genet       Date:  2002-11       Impact factor: 11.639

5.  Evolution and functional impact of rare coding variation from deep sequencing of human exomes.

Authors:  Jacob A Tennessen; Abigail W Bigham; Timothy D O'Connor; Wenqing Fu; Eimear E Kenny; Simon Gravel; Sean McGee; Ron Do; Xiaoming Liu; Goo Jun; Hyun Min Kang; Daniel Jordan; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; Goncalo Abecasis; David Altshuler; Deborah A Nickerson; Eric Boerwinkle; Shamil Sunyaev; Carlos D Bustamante; Michael J Bamshad; Joshua M Akey
Journal:  Science       Date:  2012-05-17       Impact factor: 47.728

6.  Deep resequencing reveals excess rare recent variants consistent with explosive population growth.

Authors:  Alex Coventry; Lara M Bull-Otterson; Xiaoming Liu; Andrew G Clark; Taylor J Maxwell; Jacy Crosby; James E Hixson; Thomas J Rea; Donna M Muzny; Lora R Lewis; David A Wheeler; Aniko Sabo; Christine Lusk; Kenneth G Weiss; Humeira Akbar; Andrew Cree; Alicia C Hawes; Irene Newsham; Robin T Varghese; Donna Villasana; Shannon Gross; Vandita Joshi; Jireh Santibanez; Margaret Morgan; Kyle Chang; Walker Hale Iv; Alan R Templeton; Eric Boerwinkle; Richard Gibbs; Charles F Sing
Journal:  Nat Commun       Date:  2010-11-30       Impact factor: 14.919

7.  Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data.

Authors:  Ryan N Gutenkunst; Ryan D Hernandez; Scott H Williamson; Carlos D Bustamante
Journal:  PLoS Genet       Date:  2009-10-23       Impact factor: 5.917

8.  Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion.

Authors:  Sergio Lukic; Jody Hey
Journal:  Genetics       Date:  2012-08-03       Impact factor: 4.562

9.  Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.

Authors:  Wenqing Fu; Timothy D O'Connor; Goo Jun; Hyun Min Kang; Goncalo Abecasis; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; David Altshuler; Jay Shendure; Deborah A Nickerson; Michael J Bamshad; Joshua M Akey
Journal:  Nature       Date:  2012-11-28       Impact factor: 49.962

10.  Multi-locus match probability in a finite population: a fundamental difference between the Moran and Wright-Fisher models.

Authors:  Anand Bhaskar; Yun S Song
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

View more
  18 in total

1.  The Site Frequency Spectrum for General Coalescents.

Authors:  Jeffrey P Spence; John A Kamm; Yun S Song
Journal:  Genetics       Date:  2016-02-16       Impact factor: 4.562

2.  Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes.

Authors:  Benjamin C Haller; Jared Galloway; Jerome Kelleher; Philipp W Messer; Peter L Ralph
Journal:  Mol Ecol Resour       Date:  2019-02-21       Impact factor: 7.090

3.  Inferring Very Recent Population Growth Rate from Population-Scale Sequencing Data: Using a Large-Sample Coalescent Estimator.

Authors:  Hua Chen; Jody Hey; Kun Chen
Journal:  Mol Biol Evol       Date:  2015-07-16       Impact factor: 16.240

4.  Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation.

Authors:  Julien Jouganous; Will Long; Aaron P Ragsdale; Simon Gravel
Journal:  Genetics       Date:  2017-05-11       Impact factor: 4.562

5.  Inferring Demography and Selection in Organisms Characterized by Skewed Offspring Distributions.

Authors:  Andrew M Sackman; Rebecca B Harris; Jeffrey D Jensen
Journal:  Genetics       Date:  2019-01-16       Impact factor: 4.562

6.  ARGON: fast, whole-genome simulation of the discrete time Wright-fisher process.

Authors:  Pier Francesco Palamara
Journal:  Bioinformatics       Date:  2016-06-16       Impact factor: 6.937

Review 7.  Coalescent inferences in conservation genetics: should the exception become the rule?

Authors:  Valeria Montano
Journal:  Biol Lett       Date:  2016-06       Impact factor: 3.703

8.  Coalescent Processes with Skewed Offspring Distributions and Nonequilibrium Demography.

Authors:  Sebastian Matuszewski; Marcel E Hildebrandt; Guillaume Achaz; Jeffrey D Jensen
Journal:  Genetics       Date:  2017-11-10       Impact factor: 4.562

Review 9.  Explosive genetic evidence for explosive human population growth.

Authors:  Feng Gao; Alon Keinan
Journal:  Curr Opin Genet Dev       Date:  2016-10-04       Impact factor: 5.578

Review 10.  Methods and models for unravelling human evolutionary history.

Authors:  Joshua G Schraiber; Joshua M Akey
Journal:  Nat Rev Genet       Date:  2015-11-10       Impact factor: 53.242

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.