Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Exact algorithms for haplotype assembly from whole-genome sequence data.

Literature DB >> 23782612

Exact algorithms for haplotype assembly from whole-genome sequence data.

Zhi-Zhong Chen¹, Fei Deng, Lusheng Wang.

Abstract

MOTIVATION: Haplotypes play a crucial role in genetic analysis and have many applications such as gene disease diagnoses, association studies, ancestry inference and so forth. The development of DNA sequencing technologies makes it possible to obtain haplotypes from a set of aligned reads originated from both copies of a chromosome of a single individual. This approach is often known as haplotype assembly. Exact algorithms that can give optimal solutions to the haplotype assembly problem are highly demanded. Unfortunately, previous algorithms for this problem either fail to output optimal solutions or take too long time even executed on a PC cluster.
RESULTS: We develop an approach to finding optimal solutions for the haplotype assembly problem under the minimum-error-correction (MEC) model. Most of the previous approaches assume that the columns in the input matrix correspond to (putative) heterozygous sites. This all-heterozygous assumption is correct for most columns, but it may be incorrect for a small number of columns. In this article, we consider the MEC model with or without the all-heterozygous assumption. In our approach, we first use new methods to decompose the input read matrix into small independent blocks and then model the problem for each block as an integer linear programming problem, which is then solved by an integer linear programming solver. We have tested our program on a single PC [a Linux (x64) desktop PC with i7-3960X CPU], using the filtered HuRef and the NA 12878 datasets (after applying some variant calling methods). With the all-heterozygous assumption, our approach can optimally solve the whole HuRef data set within a total time of 31 h (26 h for the most difficult block of the 15th chromosome and only 5 h for the other blocks). To our knowledge, this is the first time that MEC optimal solutions are completely obtained for the filtered HuRef dataset. Moreover, in the general case (without the all-heterozygous assumption), for the HuRef dataset our approach can optimally solve all the chromosomes except the most difficult block in chromosome 15 within a total time of 12 days. For both of the HuRef and NA12878 datasets, the optimal costs in the general case are sometimes much smaller than those in the all-heterozygous case. This implies that some columns in the input matrix (after applying certain variant calling methods) still correspond to false-heterozygous sites. AVAILABILITY: Our program, the optimal solutions found for the HuRef dataset available at http://rnc.r.dendai.ac.jp/hapAssembly.html.

Mesh：

Year: 2013 PMID： 23782612 DOI： 10.1093/bioinformatics/btt349

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

13 in total

Exact algorithms for haplotype assembly from whole-genome sequence data.

1. [Reconstruction of tumor clonal haplotypes based on an improved spanning algorithm].

2. Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm.

3. SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming.

4. Read-based phasing of related individuals.

5. PWHATSHAP: efficient haplotyping for future generation sequencing.

6. A fast and accurate enumeration-based algorithm for haplotyping a triploid individual.

7. HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

8. ComHapDet: a spatial community detection algorithm for haplotype assembly.

9. Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.

10. Efficient algorithms for polyploid haplotype phasing.