Literature DB >> 35228706

Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads.

Anton Bankevich1, Andrey V Bzikadze2, Mikhail Kolmogorov3, Dmitry Antipov4, Pavel A Pevzner5.   

Abstract

Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.

Entities:  

Mesh:

Year:  2022        PMID: 35228706     DOI: 10.1038/s41587-022-01220-6

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   68.164


  2 in total

1.  The fragment assembly string graph.

Authors:  Eugene W Myers
Journal:  Bioinformatics       Date:  2005-09-01       Impact factor: 6.937

2.  TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes.

Authors:  Ilia Minkin; Son Pham; Paul Medvedev
Journal:  Bioinformatics       Date:  2017-12-15       Impact factor: 6.937

  2 in total
  3 in total

1.  Assembler artifacts include misassembly because of unsafe unitigs and underassembly because of bidirected graphs.

Authors:  Amatur Rahman; Paul Medvedev
Journal:  Genome Res       Date:  2022-07-27       Impact factor: 9.438

2.  MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction.

Authors:  Jidong Lang
Journal:  PLoS One       Date:  2022-05-20       Impact factor: 3.240

3.  Robust data storage in DNA by de Bruijn graph-based de novo strand assembly.

Authors:  Lifu Song; Feng Geng; Zi-Yi Gong; Xin Chen; Jijun Tang; Chunye Gong; Libang Zhou; Rui Xia; Ming-Zhe Han; Jing-Yi Xu; Bing-Zhi Li; Ying-Jin Yuan
Journal:  Nat Commun       Date:  2022-09-12       Impact factor: 17.694

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.