Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Blockwise HMM computation for large-scale population genomic inference.

Literature DB >> 22641715

Blockwise HMM computation for large-scale population genomic inference.

Abstract

MOTIVATION: A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets.
RESULTS: To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible. AVAILABILITY: Software available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: yss@eecs.berkeley.edu.

Entities: Species

Mesh：

Year: 2012 PMID： 22641715 PMCID： PMC3400961 DOI： 10.1093/bioinformatics/bts314

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

29 in total

1. Two-locus sampling distributions and their application.

Authors: R R Hudson
Journal: Genetics Date: 2001-12 Impact factor: 4.562

2. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors: Na Li; Matthew Stephens
Journal: Genetics Date: 2003-12 Impact factor: 4.562

3. A linear complexity phasing method for thousands of genomes.

Authors: Olivier Delaneau; Jonathan Marchini; Jean-François Zagury
Journal: Nat Methods Date: 2011-12-04 Impact factor: 28.547

4. Estimating meiotic gene conversion rates from population genetic data.

Authors: J Gay; S Myers; G McVean
Journal: Genetics Date: 2007-07-29 Impact factor: 4.562

5. Effect of genetic divergence in identifying ancestral origin using HAPAA.

Authors: Andreas Sundquist; Eugene Fratkin; Chuong B Do; Serafim Batzoglou
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

6. IMPORTANCE SAMPLING AND THE TWO-LOCUS MODEL WITH SUBDIVIDED POPULATION STRUCTURE.

Authors: Robert C Griffiths; Paul A Jenkins; Yun S Song
Journal: Adv Appl Probab Date: 2008-06-01 Impact factor: 0.690

7. The fine-scale structure of recombination rate variation in the human genome.

Authors: Gilean A T McVean; Simon R Myers; Sarah Hunt; Panos Deloukas; David R Bentley; Peter Donnelly
Journal: Science Date: 2004-04-23 Impact factor: 47.728

8. Inference of human population history from individual whole-genome sequences.

Authors: Heng Li; Richard Durbin
Journal: Nature Date: 2011-07-13 Impact factor: 49.962

9. Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data.

Authors: Junming Yin; Michael I Jordan; Yun S Song
Journal: Bioinformatics Date: 2009-06-15 Impact factor: 6.937

10. Fast "coalescent" simulation.

Authors: Paul Marjoram; Jeff D Wall
Journal: BMC Genet Date: 2006-03-15 Impact factor: 2.797

7 in total

1. Inference of complex population histories using whole-genome sequences from multiple populations.

Authors: Matthias Steinrücken; Jack Kamm; Jeffrey P Spence; Yun S Song
Journal: Proc Natl Acad Sci U S A Date: 2019-08-06 Impact factor: 11.205

Review 2. Inference of population history using coalescent HMMs: review and outlook.

Authors: Jeffrey P Spence; Matthias Steinrücken; Jonathan Terhorst; Yun S Song
Journal: Curr Opin Genet Dev Date: 2018-07-26 Impact factor: 5.578

3. A sequentially Markov conditional sampling distribution for structured populations with migration and recombination.

Authors: Matthias Steinrücken; Joshua S Paul; Yun S Song
Journal: Theor Popul Biol Date: 2012-09-07 Impact factor: 1.570

4. Robust and scalable inference of population history from hundreds of unphased whole genomes.

Authors: Jonathan Terhorst; John A Kamm; Yun S Song
Journal: Nat Genet Date: 2016-12-26 Impact factor: 38.330

5. Privacy-preserving genotype imputation in a trusted execution environment.

Authors: Natnatee Dokmai; Can Kockan; Kaiyuan Zhu; XiaoFeng Wang; S Cenk Sahinalp; Hyunghoon Cho
Journal: Cell Syst Date: 2021-08-26 Impact factor: 11.091

6. Next-generation genotype imputation service and methods.

Authors: Sayantan Das; Lukas Forer; Sebastian Schönherr; Carlo Sidore; Adam E Locke; Alan Kwong; Scott I Vrieze; Emily Y Chew; Shawn Levy; Matt McGue; David Schlessinger; Dwight Stambolian; Po-Ru Loh; William G Iacono; Anand Swaroop; Laura J Scott; Francesco Cucca; Florian Kronenberg; Michael Boehnke; Gonçalo R Abecasis; Christian Fuchsberger
Journal: Nat Genet Date: 2016-08-29 Impact factor: 38.330

7. Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach.

Authors: Sara Sheehan; Kelley Harris; Yun S Song
Journal: Genetics Date: 2013-04-22 Impact factor: 4.562

7 in total