Literature DB >> 30072337

Fast de Bruijn Graph Compaction in Distributed Memory Environments.

Tony Pan, Rahul Nihalani, Srinivas Aluru.   

Abstract

De Bruijn graph based genome assembly has gained popularity as short read sequencers become ubiquitous. A core assembly operation is the generation of unitigs, which are sequences corresponding to chains in the graph. Unitigs are used as building blocks for generating longer sequences in many assemblers, and can facilitate graph compression. Chain compaction, by which unitigs are generated, remains a critical computational task. In this paper, we present a distributed memory parallel algorithm for simultaneous compaction of all chains in bi-directed de Bruijn graphs. The key advantages of our algorithm include bounding the chain compaction run-time to logarithmic number of iterations in the length of the longest chain, and ability to differentiate cycles from chains within logarithmic number of iterations in the length of the longest cycle. Our algorithm scales to thousands of computational cores, and can compact a whole genome de Bruijn graph from a human sequence read set in 7.3 seconds using 7680 distributed memory cores, and in 12.9 minutes using 64 shared memory cores. It is 3.7× and 2.0× faster than equivalent steps in the state-of-the-art tools for distributed and shared memory environments, respectively. An implementation of the algorithm is available at https://github.com/ParBLiSS/bruno.

Entities:  

Mesh:

Year:  2018        PMID: 30072337     DOI: 10.1109/TCBB.2018.2858797

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  Representation of k-Mer Sets Using Spectrum-Preserving String Sets.

Authors:  Amatur Rahman; Paul Medevedev
Journal:  J Comput Biol       Date:  2020-12-07       Impact factor: 1.479

2.  Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors:  Michael Baym; Gregory Kucherov; Karel Břinda
Journal:  Genome Biol       Date:  2021-04-06       Impact factor: 13.583

3.  Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections.

Authors:  Jamshed Khan; Rob Patro
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.