Literature DB >> 22641715

Blockwise HMM computation for large-scale population genomic inference.

Joshua S Paul1, Yun S Song.   

Abstract

MOTIVATION: A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets.
RESULTS: To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible. AVAILABILITY: Software available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: yss@eecs.berkeley.edu.

Entities:  

Mesh:

Year:  2012        PMID: 22641715      PMCID: PMC3400961          DOI: 10.1093/bioinformatics/bts314

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  Two-locus sampling distributions and their application.

Authors:  R R Hudson
Journal:  Genetics       Date:  2001-12       Impact factor: 4.562

2.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

3.  A linear complexity phasing method for thousands of genomes.

Authors:  Olivier Delaneau; Jonathan Marchini; Jean-François Zagury
Journal:  Nat Methods       Date:  2011-12-04       Impact factor: 28.547

4.  Estimating meiotic gene conversion rates from population genetic data.

Authors:  J Gay; S Myers; G McVean
Journal:  Genetics       Date:  2007-07-29       Impact factor: 4.562

5.  Effect of genetic divergence in identifying ancestral origin using HAPAA.

Authors:  Andreas Sundquist; Eugene Fratkin; Chuong B Do; Serafim Batzoglou
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

6.  IMPORTANCE SAMPLING AND THE TWO-LOCUS MODEL WITH SUBDIVIDED POPULATION STRUCTURE.

Authors:  Robert C Griffiths; Paul A Jenkins; Yun S Song
Journal:  Adv Appl Probab       Date:  2008-06-01       Impact factor: 0.690

7.  The fine-scale structure of recombination rate variation in the human genome.

Authors:  Gilean A T McVean; Simon R Myers; Sarah Hunt; Panos Deloukas; David R Bentley; Peter Donnelly
Journal:  Science       Date:  2004-04-23       Impact factor: 47.728

8.  Inference of human population history from individual whole-genome sequences.

Authors:  Heng Li; Richard Durbin
Journal:  Nature       Date:  2011-07-13       Impact factor: 49.962

9.  Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data.

Authors:  Junming Yin; Michael I Jordan; Yun S Song
Journal:  Bioinformatics       Date:  2009-06-15       Impact factor: 6.937

10.  Fast "coalescent" simulation.

Authors:  Paul Marjoram; Jeff D Wall
Journal:  BMC Genet       Date:  2006-03-15       Impact factor: 2.797

View more
  7 in total

1.  Inference of complex population histories using whole-genome sequences from multiple populations.

Authors:  Matthias Steinrücken; Jack Kamm; Jeffrey P Spence; Yun S Song
Journal:  Proc Natl Acad Sci U S A       Date:  2019-08-06       Impact factor: 11.205

Review 2.  Inference of population history using coalescent HMMs: review and outlook.

Authors:  Jeffrey P Spence; Matthias Steinrücken; Jonathan Terhorst; Yun S Song
Journal:  Curr Opin Genet Dev       Date:  2018-07-26       Impact factor: 5.578

3.  A sequentially Markov conditional sampling distribution for structured populations with migration and recombination.

Authors:  Matthias Steinrücken; Joshua S Paul; Yun S Song
Journal:  Theor Popul Biol       Date:  2012-09-07       Impact factor: 1.570

4.  Robust and scalable inference of population history from hundreds of unphased whole genomes.

Authors:  Jonathan Terhorst; John A Kamm; Yun S Song
Journal:  Nat Genet       Date:  2016-12-26       Impact factor: 38.330

5.  Privacy-preserving genotype imputation in a trusted execution environment.

Authors:  Natnatee Dokmai; Can Kockan; Kaiyuan Zhu; XiaoFeng Wang; S Cenk Sahinalp; Hyunghoon Cho
Journal:  Cell Syst       Date:  2021-08-26       Impact factor: 11.091

6.  Next-generation genotype imputation service and methods.

Authors:  Sayantan Das; Lukas Forer; Sebastian Schönherr; Carlo Sidore; Adam E Locke; Alan Kwong; Scott I Vrieze; Emily Y Chew; Shawn Levy; Matt McGue; David Schlessinger; Dwight Stambolian; Po-Ru Loh; William G Iacono; Anand Swaroop; Laura J Scott; Francesco Cucca; Florian Kronenberg; Michael Boehnke; Gonçalo R Abecasis; Christian Fuchsberger
Journal:  Nat Genet       Date:  2016-08-29       Impact factor: 38.330

7.  Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach.

Authors:  Sara Sheehan; Kelley Harris; Yun S Song
Journal:  Genetics       Date:  2013-04-22       Impact factor: 4.562

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.