Literature DB >> 18676820

An MCMC algorithm for haplotype assembly from whole-genome sequence data.

Vikas Bansal1, Aaron L Halpern, Nelson Axelrod, Vineet Bafna.   

Abstract

In comparison to genotypes, knowledge about haplotypes (the combination of alleles present on a single chromosome) is much more useful for whole-genome association studies and for making inferences about human evolutionary history. Haplotypes are typically inferred from population genotype data using computational methods. Whole-genome sequence data represent a promising resource for constructing haplotypes spanning hundreds of kilobases for an individual. In this article, we propose a Markov chain Monte Carlo (MCMC) algorithm, HASH (haplotype assembly for single human), for assembling haplotypes from sequenced DNA fragments that have been mapped to a reference genome assembly. The transitions of the Markov chain are generated using min-cut computations on graphs derived from the sequenced fragments. We have applied our method to infer haplotypes using whole-genome shotgun sequence data from a recently sequenced human individual. The high sequence coverage and presence of mate pairs result in fairly long haplotypes (N50 length ~ 350 kb). Based on comparison of the sequenced fragments against the individual haplotypes, we demonstrate that the haplotypes for this individual inferred using HASH are significantly more accurate than the haplotypes estimated using a previously proposed greedy heuristic and a simple MCMC method. Using haplotypes from the HapMap project, we estimate the switch error rate of the haplotypes inferred using HASH to be quite low, ~1.1%. Our Markov chain Monte Carlo algorithm represents a general framework for haplotype assembly that can be applied to sequence data generated by other sequencing technologies. The code implementing the methods and the phased individual haplotypes can be downloaded from (http://www.cse.ucsd.edu/users/vibansal/HASH/).

Entities:  

Mesh:

Year:  2008        PMID: 18676820      PMCID: PMC2493424          DOI: 10.1101/gr.077065.108

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  37 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  A new statistical method for haplotype reconstruction from population data.

Authors:  M Stephens; N J Smith; P Donnelly
Journal:  Am J Hum Genet       Date:  2001-03-09       Impact factor: 11.025

3.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.

Authors:  Tianhua Niu; Zhaohui S Qin; Xiping Xu; Jun S Liu
Journal:  Am J Hum Genet       Date:  2001-11-26       Impact factor: 11.025

4.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem.

Authors:  Ross Lippert; Russell Schwartz; Giuseppe Lancia; Sorin Istrail
Journal:  Brief Bioinform       Date:  2002-03       Impact factor: 11.622

5.  Haplotype inference in random population samples.

Authors:  Shin Lin; David J Cutler; Michael E Zwick; Aravinda Chakravarti
Journal:  Am J Hum Genet       Date:  2002-10-17       Impact factor: 11.025

6.  A comparison of bayesian methods for haplotype reconstruction from population genotype data.

Authors:  Matthew Stephens; Peter Donnelly
Journal:  Am J Hum Genet       Date:  2003-10-20       Impact factor: 11.025

7.  Haplotype reconstruction from SNP alignment.

Authors:  Lei M Li; Jong Hyun Kim; Michael S Waterman
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

8.  Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures.

Authors:  Jonas Korlach; Patrick J Marks; Ronald L Cicero; Jeremy J Gray; Devon L Murphy; Daniel B Roitman; Thang T Pham; Geoff A Otto; Mathieu Foquet; Stephen W Turner
Journal:  Proc Natl Acad Sci U S A       Date:  2008-01-23       Impact factor: 11.205

Review 9.  The impact of next-generation sequencing technology on genetics.

Authors:  Elaine R Mardis
Journal:  Trends Genet       Date:  2008-02-11       Impact factor: 11.639

10.  The complete genome of an individual by massively parallel DNA sequencing.

Authors:  David A Wheeler; Maithreyan Srinivasan; Michael Egholm; Yufeng Shen; Lei Chen; Amy McGuire; Wen He; Yi-Ju Chen; Vinod Makhijani; G Thomas Roth; Xavier Gomes; Karrie Tartaro; Faheem Niazi; Cynthia L Turcotte; Gerard P Irzyk; James R Lupski; Craig Chinault; Xing-zhi Song; Yue Liu; Ye Yuan; Lynne Nazareth; Xiang Qin; Donna M Muzny; Marcel Margulies; George M Weinstock; Richard A Gibbs; Jonathan M Rothberg
Journal:  Nature       Date:  2008-04-17       Impact factor: 49.962

View more
  49 in total

1.  HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

Authors:  Derek Aguiar; Sorin Istrail
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

2.  Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Authors:  Wen-Yun Yang; Farhad Hormozdiari; Zhanyong Wang; Dan He; Bogdan Pasaniuc; Eleazar Eskin
Journal:  Bioinformatics       Date:  2013-07-03       Impact factor: 6.937

3.  Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads.

Authors:  Kui Zhang; Degui Zhi
Journal:  Bioinformatics       Date:  2013-08-13       Impact factor: 6.937

Review 4.  Population genetic inference from genomic sequence variation.

Authors:  John E Pool; Ines Hellmann; Jeffrey D Jensen; Rasmus Nielsen
Journal:  Genome Res       Date:  2010-01-12       Impact factor: 9.043

5.  Personal genome sequencing: current approaches and challenges.

Authors:  Michael Snyder; Jiang Du; Mark Gerstein
Journal:  Genes Dev       Date:  2010-03-01       Impact factor: 11.361

6.  The next phase in human genetics.

Authors:  Vikas Bansal; Ryan Tewhey; Eric J Topol; Nicholas J Schork
Journal:  Nat Biotechnol       Date:  2011-01       Impact factor: 54.908

Review 7.  Haplotype-resolved genome sequencing: experimental methods and applications.

Authors:  Matthew W Snyder; Andrew Adey; Jacob O Kitzman; Jay Shendure
Journal:  Nat Rev Genet       Date:  2015-05-07       Impact factor: 53.242

8.  Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

Authors:  Jiang Du; Robert D Bjornson; Zhengdong D Zhang; Yong Kong; Michael Snyder; Mark B Gerstein
Journal:  PLoS Comput Biol       Date:  2009-07-10       Impact factor: 4.475

9.  Optimal algorithms for haplotype assembly from whole-genome sequence data.

Authors:  Dan He; Arthur Choi; Knot Pipatsrisawat; Adnan Darwiche; Eleazar Eskin
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

Review 10.  A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem.

Authors:  Filippo Geraci
Journal:  Bioinformatics       Date:  2010-07-11       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.