Literature DB >> 23782612

Exact algorithms for haplotype assembly from whole-genome sequence data.

Zhi-Zhong Chen1, Fei Deng, Lusheng Wang.   

Abstract

MOTIVATION: Haplotypes play a crucial role in genetic analysis and have many applications such as gene disease diagnoses, association studies, ancestry inference and so forth. The development of DNA sequencing technologies makes it possible to obtain haplotypes from a set of aligned reads originated from both copies of a chromosome of a single individual. This approach is often known as haplotype assembly. Exact algorithms that can give optimal solutions to the haplotype assembly problem are highly demanded. Unfortunately, previous algorithms for this problem either fail to output optimal solutions or take too long time even executed on a PC cluster.
RESULTS: We develop an approach to finding optimal solutions for the haplotype assembly problem under the minimum-error-correction (MEC) model. Most of the previous approaches assume that the columns in the input matrix correspond to (putative) heterozygous sites. This all-heterozygous assumption is correct for most columns, but it may be incorrect for a small number of columns. In this article, we consider the MEC model with or without the all-heterozygous assumption. In our approach, we first use new methods to decompose the input read matrix into small independent blocks and then model the problem for each block as an integer linear programming problem, which is then solved by an integer linear programming solver. We have tested our program on a single PC [a Linux (x64) desktop PC with i7-3960X CPU], using the filtered HuRef and the NA 12878 datasets (after applying some variant calling methods). With the all-heterozygous assumption, our approach can optimally solve the whole HuRef data set within a total time of 31 h (26 h for the most difficult block of the 15th chromosome and only 5 h for the other blocks). To our knowledge, this is the first time that MEC optimal solutions are completely obtained for the filtered HuRef dataset. Moreover, in the general case (without the all-heterozygous assumption), for the HuRef dataset our approach can optimally solve all the chromosomes except the most difficult block in chromosome 15 within a total time of 12 days. For both of the HuRef and NA12878 datasets, the optimal costs in the general case are sometimes much smaller than those in the all-heterozygous case. This implies that some columns in the input matrix (after applying certain variant calling methods) still correspond to false-heterozygous sites. AVAILABILITY: Our program, the optimal solutions found for the HuRef dataset available at http://rnc.r.dendai.ac.jp/hapAssembly.html.

Mesh:

Year:  2013        PMID: 23782612     DOI: 10.1093/bioinformatics/btt349

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  [Reconstruction of tumor clonal haplotypes based on an improved spanning algorithm].

Authors:  Yu Geng; Zhongmeng Zhao; Jianye Liu
Journal:  Nan Fang Yi Ke Da Xue Xue Bao       Date:  2019-11-30

2.  Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

Authors:  Soyeon Ahn; Haris Vikalo
Journal:  BMC Bioinformatics       Date:  2015-07-16       Impact factor: 3.169

3.  SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming.

Authors:  Shreepriya Das; Haris Vikalo
Journal:  BMC Genomics       Date:  2015-04-03       Impact factor: 3.969

4.  Read-based phasing of related individuals.

Authors:  Shilpa Garg; Marcel Martin; Tobias Marschall
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

5.  PWHATSHAP: efficient haplotyping for future generation sequencing.

Authors:  Andrea Bracciali; Marco Aldinucci; Murray Patterson; Tobias Marschall; Nadia Pisanti; Ivan Merelli; Massimo Torquati
Journal:  BMC Bioinformatics       Date:  2016-09-22       Impact factor: 3.169

6.  A fast and accurate enumeration-based algorithm for haplotyping a triploid individual.

Authors:  Jingli Wu; Qian Zhang
Journal:  Algorithms Mol Biol       Date:  2018-06-01       Impact factor: 1.405

7.  HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

Authors:  Stefano Beretta; Murray D Patterson; Simone Zaccaria; Gianluca Della Vedova; Paola Bonizzoni
Journal:  BMC Bioinformatics       Date:  2018-07-03       Impact factor: 3.169

8.  ComHapDet: a spatial community detection algorithm for haplotype assembly.

Authors:  Abishek Sankararaman; Haris Vikalo; François Baccelli
Journal:  BMC Genomics       Date:  2020-09-09       Impact factor: 3.969

9.  Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.

Authors:  Abolfazl Hashemi; Banghua Zhu; Haris Vikalo
Journal:  BMC Genomics       Date:  2018-03-21       Impact factor: 3.969

10.  Efficient algorithms for polyploid haplotype phasing.

Authors:  Dan He; Subrata Saha; Richard Finkers; Laxmi Parida
Journal:  BMC Genomics       Date:  2018-05-09       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.