Literature DB >> 23825370

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.

Wen-Yun Yang1, Farhad Hormozdiari, Zhanyong Wang, Dan He, Bogdan Pasaniuc, Eleazar Eskin.   

Abstract

MOTIVATION: Haplotypes, defined as the sequence of alleles on one chromosome, are crucial for many genetic analyses. As experimental determination of haplotypes is extremely expensive, haplotypes are traditionally inferred using computational approaches from genotype data, i.e. the mixture of the genetic information from both haplotypes. Best performing approaches for haplotype inference rely on Hidden Markov Models, with the underlying assumption that the haplotypes of a given individual can be represented as a mosaic of segments from other haplotypes in the same population. Such algorithms use this model to predict the most likely haplotypes that explain the observed genotype data conditional on reference panel of haplotypes. With rapid advances in short read sequencing technologies, sequencing is quickly establishing as a powerful approach for collecting genetic variation information. As opposed to traditional genotyping-array technologies that independently call genotypes at polymorphic sites, short read sequencing often collects haplotypic information; a read spanning more than one polymorphic locus (multi-single nucleotide polymorphic read) contains information on the haplotype from which the read originates. However, this information is generally ignored in existing approaches for haplotype phasing and genotype-calling from short read data.
RESULTS: In this article, we propose a novel framework for haplotype inference from short read sequencing that leverages multi-single nucleotide polymorphic reads together with a reference panel of haplotypes. The basis of our approach is a new probabilistic model that finds the most likely haplotype segments from the reference panel to explain the short read sequencing data for a given individual. We devised an efficient sampling method within a probabilistic model to achieve superior performance than existing methods. Using simulated sequencing reads from real individual genotypes in the HapMap data and the 1000 Genomes projects, we show that our method is highly accurate and computationally efficient. Our haplotype predictions improve accuracy over the basic haplotype copying model by ∼20% with comparable computational time, and over another recently proposed approach Hap-SeqX by ∼10% with significantly reduced computational time and memory usage. AVAILABILITY: Publicly available software is available at http://genetics.cs.ucla.edu/harsh CONTACT: bpasaniuc@mednet.ucla.edu or eeskin@cs.ucla.edu.

Mesh:

Year:  2013        PMID: 23825370      PMCID: PMC3753566          DOI: 10.1093/bioinformatics/btt386

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  25 in total

Review 1.  A chronology of fine-scale gene mapping by linkage disequilibrium.

Authors:  L C Lazzeroni
Journal:  Stat Methods Med Res       Date:  2001-02       Impact factor: 3.021

2.  Estimating recombination rates from population genetic data.

Authors:  P Fearnhead; P Donnelly
Journal:  Genetics       Date:  2001-11       Impact factor: 4.562

3.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors:  Na Li; Matthew Stephens
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

4.  Stochastic relaxation, gibbs distributions, and the bayesian restoration of images.

Authors:  S Geman; D Geman
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1984-06       Impact factor: 6.226

5.  HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data.

Authors:  Derek Aguiar; Sorin Istrail
Journal:  J Comput Biol       Date:  2012-06       Impact factor: 1.479

6.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

7.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.

Authors:  Brian L Browning; Sharon R Browning
Journal:  Am J Hum Genet       Date:  2009-02-05       Impact factor: 11.025

8.  Optimal algorithms for haplotype assembly from whole-genome sequence data.

Authors:  Dan He; Arthur Choi; Knot Pipatsrisawat; Adnan Darwiche; Eleazar Eskin
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

9.  Genotype imputation with thousands of genomes.

Authors:  Bryan Howie; Jonathan Marchini; Matthew Stephens
Journal:  G3 (Bethesda)       Date:  2011-11-01       Impact factor: 3.154

10.  HI: haplotype improver using paired-end short reads.

Authors:  Quan Long; Daniel MacArthur; Zemin Ning; Chris Tyler-Smith
Journal:  Bioinformatics       Date:  2009-07-01       Impact factor: 6.937

View more
  13 in total

Review 1.  Haplotype-resolved genome sequencing: experimental methods and applications.

Authors:  Matthew W Snyder; Andrew Adey; Jacob O Kitzman; Jay Shendure
Journal:  Nat Rev Genet       Date:  2015-05-07       Impact factor: 53.242

2.  A New Fast Phasing Method Based On Haplotype Subtraction.

Authors:  Evelina Mocci; Marija Debeljak; Alison P Klein; James R Eshleman
Journal:  J Mol Diagn       Date:  2019-03-11       Impact factor: 5.568

3.  Efficient Estimation of Nonparametric Genetic Risk Function with Censored Data.

Authors:  Yuanjia Wang; Baosheng Liang; Xingwei Tong; Karen Marder; Susan Bressman; Avi Orr-Urtreger; Nir Giladi; Donglin Zeng
Journal:  Biometrika       Date:  2015-09-01       Impact factor: 2.445

4.  Rare variant phasing and haplotypic expression from RNA sequencing with phASER.

Authors:  Stephane E Castel; Pejman Mohammadi; Wendy K Chung; Yufeng Shen; Tuuli Lappalainen
Journal:  Nat Commun       Date:  2016-09-08       Impact factor: 14.919

5.  Whole-genome haplotyping approaches and genomic medicine.

Authors:  Gustavo Glusman; Hannah C Cox; Jared C Roach
Journal:  Genome Med       Date:  2014-09-25       Impact factor: 11.117

6.  Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.

Authors:  Olivier Delaneau; Jonathan Marchini
Journal:  Nat Commun       Date:  2014-06-13       Impact factor: 14.919

7.  Accurate viral population assembly from ultra-deep sequencing data.

Authors:  Serghei Mangul; Nicholas C Wu; Nicholas Mancuso; Alex Zelikovsky; Ren Sun; Eleazar Eskin
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

8.  Integrating dilution-based sequencing and population genotypes for single individual haplotyping.

Authors:  Hirotaka Matsumoto; Hisanori Kiryu
Journal:  BMC Genomics       Date:  2014-08-28       Impact factor: 3.969

9.  Effects of error-correction of heterozygous next-generation sequencing data.

Authors:  M Fujimoto; Paul M Bodily; Nozomu Okuda; Mark J Clement; Quinn Snell
Journal:  BMC Bioinformatics       Date:  2014-05-28       Impact factor: 3.169

10.  Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.

Authors:  Abolfazl Hashemi; Banghua Zhu; Haris Vikalo
Journal:  BMC Genomics       Date:  2018-03-21       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.