Literature DB >> 36076275

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Jamshed Khan1,2, Marek Kokot3, Sebastian Deorowicz4, Rob Patro5,6.   

Abstract

The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17-23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54-58 h, using considerably more memory.
© 2022. The Author(s).

Entities:  

Keywords:  Compacted de Bruijn graph; Data structures; High-throughput sequencing; Path cover; Unitig; de Bruijn graph

Mesh:

Year:  2022        PMID: 36076275      PMCID: PMC9454175          DOI: 10.1186/s13059-022-02743-6

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   17.906


  68 in total

1.  SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.

Authors:  Shoshana Marcus; Hayan Lee; Michael C Schatz
Journal:  Bioinformatics       Date:  2014-11-13       Impact factor: 6.937

2.  On the representation of de Bruijn graphs.

Authors:  Rayan Chikhi; Antoine Limasset; Shaun Jackman; Jared T Simpson; Paul Medvedev
Journal:  J Comput Biol       Date:  2015-01-28       Impact factor: 1.479

3.  KMC 3: counting and manipulating k-mer statistics.

Authors:  Marek Kokot; Maciej Dlugosz; Sebastian Deorowicz
Journal:  Bioinformatics       Date:  2017-09-01       Impact factor: 6.937

4.  Big Data: Astronomical or Genomical?

Authors:  Zachary D Stephens; Skylar Y Lee; Faraz Faghri; Roy H Campbell; Chengxiang Zhai; Miles J Efron; Ravishankar Iyer; Michael C Schatz; Saurabh Sinha; Gene E Robinson
Journal:  PLoS Biol       Date:  2015-07-07       Impact factor: 8.029

5.  ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.

Authors:  Shaun D Jackman; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Sarah Yeo; S Austin Hammond; Golnaz Jahesh; Hamza Khan; Lauren Coombe; Rene L Warren; Inanc Birol
Journal:  Genome Res       Date:  2017-02-23       Impact factor: 9.043

6.  deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index.

Authors:  Bo Liu; Yadong Liu; Junyi Li; Hongzhe Guo; Tianyi Zang; Yadong Wang
Journal:  Genome Biol       Date:  2019-12-16       Impact factor: 13.583

7.  Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors:  Michael Baym; Gregory Kucherov; Karel Břinda
Journal:  Genome Biol       Date:  2021-04-06       Impact factor: 13.583

8.  LoRDEC: accurate and efficient long read error correction.

Authors:  Leena Salmela; Eric Rivals
Journal:  Bioinformatics       Date:  2014-08-26       Impact factor: 6.937

9.  REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.

Authors:  Camille Marchet; Zamin Iqbal; Daniel Gautheret; Mikaël Salson; Rayan Chikhi
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

10.  Puffaligner : A Fast, Efficient, and Accurate Aligner Based on the Pufferfish Index.

Authors:  Fatemeh Almodaresi; Mohsen Zakeri; Rob Patro
Journal:  Bioinformatics       Date:  2021-06-12       Impact factor: 6.931

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.