Literature DB >> 27280382

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes.

Paola Bonizzoni1, Riccardo Dondi2, Gunnar W Klau3,4, Yuri Pirola1, Nadia Pisanti4,5, Simone Zaccaria1.   

Abstract

In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.

Entities:  

Keywords:  combinatorial optimization; graph theory; haplotypes; next-generation sequencing

Mesh:

Year:  2016        PMID: 27280382     DOI: 10.1089/cmb.2015.0220

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  9 in total

1.  Resolving multicopy duplications de novo using polyploid phasing.

Authors:  Mark J Chaisson; Sudipto Mukherjee; Sreeram Kannan; Evan E Eichler
Journal:  Res Comput Mol Biol       Date:  2017-04-12

2.  HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

Authors:  Stefano Beretta; Murray D Patterson; Simone Zaccaria; Gianluca Della Vedova; Paola Bonizzoni
Journal:  BMC Bioinformatics       Date:  2018-07-03       Impact factor: 3.169

3.  ComHapDet: a spatial community detection algorithm for haplotype assembly.

Authors:  Abishek Sankararaman; Haris Vikalo; François Baccelli
Journal:  BMC Genomics       Date:  2020-09-09       Impact factor: 3.969

4.  Karyon: a computational framework for the diagnosis of hybrids, aneuploids, and other nonstandard architectures in genome assemblies.

Authors:  Miguel A Naranjo-Ortiz; Manu Molina; Diego Fuentes; Verónica Mixão; Toni Gabaldón
Journal:  Gigascience       Date:  2022-10-07       Impact factor: 7.658

5.  Sparse Tensor Decomposition for Haplotype Assembly of Diploids and Polyploids.

Authors:  Abolfazl Hashemi; Banghua Zhu; Haris Vikalo
Journal:  BMC Genomics       Date:  2018-03-21       Impact factor: 3.969

6.  Variable-order reference-free variant discovery with the Burrows-Wheeler Transform.

Authors:  Nicola Prezza; Nadia Pisanti; Marinella Sciortino; Giovanna Rosone
Journal:  BMC Bioinformatics       Date:  2020-09-16       Impact factor: 3.169

7.  Minimum error correction-based haplotype assembly: Considerations for long read data.

Authors:  Sina Majidian; Mohammad Hossein Kahaei; Dick de Ridder
Journal:  PLoS One       Date:  2020-06-12       Impact factor: 3.240

Review 8.  Unzipping haplotypes in diploid and polyploid genomes.

Authors:  Xingtan Zhang; Ruoxi Wu; Yibin Wang; Jiaxin Yu; Haibao Tang
Journal:  Comput Struct Biotechnol J       Date:  2019-12-09       Impact factor: 7.271

9.  flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning.

Authors:  Jim Shaw; Yun William Yu
Journal:  J Comput Biol       Date:  2022-01-17       Impact factor: 1.479

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.