| Literature DB >> 36076275 |
Jamshed Khan1,2, Marek Kokot3, Sebastian Deorowicz4, Rob Patro5,6.
Abstract
The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17-23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54-58 h, using considerably more memory.Entities:
Keywords: Compacted de Bruijn graph; Data structures; High-throughput sequencing; Path cover; Unitig; de Bruijn graph
Mesh:
Year: 2022 PMID: 36076275 PMCID: PMC9454175 DOI: 10.1186/s13059-022-02743-6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906