Literature DB >> 31056509

deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph.

Hongzhe Guo, Yilei Fu, Yan Gao, Junyi Li, Yadong Wang, Bo Liu.   

Abstract

The de Bruijn graph, a fundamental data structure to represent and organize genome sequence, plays important roles in various kinds of sequence analysis tasks. With the rapid development of HTS data and ever-increasing number of assembled genomes, there is a high demand to construct the very large de Bruijn graph for sequences up to Tera-base-pair level. Current approaches may have unaffordable memory footprints to handle such a large de Bruijn graph. We propose a lightweight parallel de Bruijn graph construction approach: de Bruijn Graph Constructor in Scalable Memory (deGSM). The main idea of deGSM is to efficiently construct the Burrows-Wheeler Transformation (BWT) of the unipaths of the de Bruijn graph in constant RAM space and transform the BWT into the original unitigs. The experimental results demonstrate that, just with a commonly available machine, deGSM is able to handle very large genome sequence(s), e.g., the contigs (305 Gbp) and scaffolds (1.1 Tbp) recorded in GenBank database and Picea abies HTS dataset (9.7 Tbp). Moreover, deGSM also has faster or comparable construction speed compared with state-of-the-art approaches. With its high scalability and efficiency, deGSM has enormous potential in many large scale genomics studies. The deGSM is publicly available at: https://github.com/hitbc/deGSM.

Entities:  

Mesh:

Year:  2021        PMID: 31056509     DOI: 10.1109/TCBB.2019.2913932

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  8 in total

1.  Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Authors:  Jamshed Khan; Marek Kokot; Sebastian Deorowicz; Rob Patro
Journal:  Genome Biol       Date:  2022-09-08       Impact factor: 17.906

2.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

3.  Representation of k-Mer Sets Using Spectrum-Preserving String Sets.

Authors:  Amatur Rahman; Paul Medevedev
Journal:  J Comput Biol       Date:  2020-12-07       Impact factor: 1.479

4.  Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.

Authors:  Mustafa Abdallah; Ashraf Mahgoub; Hany Ahmed; Somali Chaterji
Journal:  Sci Rep       Date:  2019-11-06       Impact factor: 4.379

5.  Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.

Authors:  Guillaume Holley; Páll Melsted
Journal:  Genome Biol       Date:  2020-09-17       Impact factor: 13.583

6.  Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors:  Michael Baym; Gregory Kucherov; Karel Břinda
Journal:  Genome Biol       Date:  2021-04-06       Impact factor: 13.583

7.  Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections.

Authors:  Jamshed Khan; Rob Patro
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

Review 8.  Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement.

Authors:  Aamir W Khan; Vanika Garg; Manish Roorkiwal; Agnieszka A Golicz; David Edwards; Rajeev K Varshney
Journal:  Trends Plant Sci       Date:  2019-11-29       Impact factor: 18.313

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.