Literature DB >> 20592264

A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination.

Joshua S Paul1, Yun S Song.   

Abstract

The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimating recombination rates, inferring local ancestry in admixed populations, and importance sampling of coalescent genealogies. Unfortunately, the true CSD under the coalescent with recombination is not known, so approximations, formulated as hidden Markov models, have been proposed in the past. These approximations have led to a number of useful statistical tools, but it is important to recognize that they were not derived from, though were certainly motivated by, principles underlying the coalescent process. The goal of this article is to develop a principled approach to derive improved CSDs directly from the underlying population genetics model. Our approach is based on the diffusion process approximation and the resulting mathematical expressions admit intuitive genealogical interpretations, which we utilize to introduce further approximations and make our method scalable in the number of loci. The general algorithm presented here applies to an arbitrary number of loci and an arbitrary finite-alleles recurrent mutation model. Empirical results are provided to demonstrate that our new CSDs are in general substantially more accurate than previously proposed approximations.

Mesh:

Substances:

Year:  2010        PMID: 20592264      PMCID: PMC2940296          DOI: 10.1534/genetics.110.117986

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  30 in total

1.  A coalescent-based method for detecting and estimating recombination from gene sequences.

Authors:  Gil McVean; Philip Awadalla; Paul Fearnhead
Journal:  Genetics       Date:  2002-03       Impact factor: 4.562

2.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

3.  AN ASYMPTOTIC SAMPLING FORMULA FOR THE COALESCENT WITH RECOMBINATION.

Authors:  Paul A Jenkins; Yun S Song
Journal:  Ann Appl Probab       Date:  2010-06       Impact factor: 1.872

4.  Estimating meiotic gene conversion rates from population genetic data.

Authors:  J Gay; S Myers; G McVean
Journal:  Genetics       Date:  2007-07-29       Impact factor: 4.562

5.  Closed-form two-locus sampling distributions: accuracy and universality.

Authors:  Paul A Jenkins; Yun S Song
Journal:  Genetics       Date:  2009-09-07       Impact factor: 4.562

6.  IMPORTANCE SAMPLING AND THE TWO-LOCUS MODEL WITH SUBDIVIDED POPULATION STRUCTURE.

Authors:  Robert C Griffiths; Paul A Jenkins; Yun S Song
Journal:  Adv Appl Probab       Date:  2008-06-01       Impact factor: 0.690

7.  Ancestral inference from samples of DNA sequences with recombination.

Authors:  R C Griffiths; P Marjoram
Journal:  J Comput Biol       Date:  1996       Impact factor: 1.479

8.  The fine-scale structure of recombination rate variation in the human genome.

Authors:  Gilean A T McVean; Simon R Myers; Sarah Hunt; Panos Deloukas; David R Bentley; Peter Donnelly
Journal:  Science       Date:  2004-04-23       Impact factor: 47.728

9.  Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data.

Authors:  Junming Yin; Michael I Jordan; Yun S Song
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

10.  Fast "coalescent" simulation.

Authors:  Paul Marjoram; Jeff D Wall
Journal:  BMC Genet       Date:  2006-03-15       Impact factor: 2.797

View more
  16 in total

1.  Blockwise HMM computation for large-scale population genomic inference.

Authors:  Joshua S Paul; Yun S Song
Journal:  Bioinformatics       Date:  2012-05-28       Impact factor: 6.937

2.  An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination.

Authors:  Joshua S Paul; Matthias Steinrücken; Yun S Song
Journal:  Genetics       Date:  2011-01-26       Impact factor: 4.562

3.  Na Li and Matthew Stephens on Modeling Linkage Disequilibrium.

Authors:  Yun S Song
Journal:  Genetics       Date:  2016-07       Impact factor: 4.562

4.  Theory and applications of a deterministic approximation to the coalescent model.

Authors:  Ethan M Jewett; Noah A Rosenberg
Journal:  Theor Popul Biol       Date:  2014-01-07       Impact factor: 1.570

Review 5.  Inference of population history using coalescent HMMs: review and outlook.

Authors:  Jeffrey P Spence; Matthias Steinrücken; Jonathan Terhorst; Yun S Song
Journal:  Curr Opin Genet Dev       Date:  2018-07-26       Impact factor: 5.578

6.  Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Authors:  Kevin Dialdestoro; Jonas Andreas Sibbesen; Lasse Maretty; Jayna Raghwani; Astrid Gall; Paul Kellam; Oliver G Pybus; Jotun Hein; Paul A Jenkins
Journal:  Genetics       Date:  2016-02-08       Impact factor: 4.562

7.  A sequentially Markov conditional sampling distribution for structured populations with migration and recombination.

Authors:  Matthias Steinrücken; Joshua S Paul; Yun S Song
Journal:  Theor Popul Biol       Date:  2012-09-07       Impact factor: 1.570

8.  Stopping-time resampling and population genetic inference under coalescent models.

Authors:  Paul A Jenkins
Journal:  Stat Appl Genet Mol Biol       Date:  2012-01-06

9.  Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach.

Authors:  Sara Sheehan; Kelley Harris; Yun S Song
Journal:  Genetics       Date:  2013-04-22       Impact factor: 4.562

10.  Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans.

Authors:  Matthias Steinrücken; Jeffrey P Spence; John A Kamm; Emilia Wieczorek; Yun S Song
Journal:  Mol Ecol       Date:  2018-04-17       Impact factor: 6.185

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.