Literature DB >> 34525345

Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer.

Barış Ekim1, Bonnie Berger2, Rayan Chikhi3.   

Abstract

DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. Here, we define an algorithmic approach, mdBG, that makes use of minimizer-space de Bruijn graphs to enable long-read genome assembly. mdBG achieves orders-of-magnitude improvement in both speed and memory usage over existing methods without compromising accuracy. A human genome is assembled in under 10 min using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 min using 1 GB RAM. In addition, we constructed a minimizer-space de Bruijn graph-based representation of 661,405 bacterial genomes, comprising 16 million nodes and 45 million edges, and successfully search it for anti-microbial resistance (AMR) genes in 12 min. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics, and pangenomics. Code for constructing mdBGs is freely available for download at https://github.com/ekimb/rust-mdbg/.
Copyright © 2021 The Authors. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  bacterial genomes; data structures; de Bruijn graphs; genome assembly; genome graphs; long-read sequencing; metagenomics; minimizers; pangenomics; partial order alignment

Mesh:

Year:  2021        PMID: 34525345      PMCID: PMC8562525          DOI: 10.1016/j.cels.2021.08.009

Source DB:  PubMed          Journal:  Cell Syst        ISSN: 2405-4712            Impact factor:   10.304


  39 in total

1.  Multiple sequence alignment using partial order graphs.

Authors:  Christopher Lee; Catherine Grasso; Mark F Sharlow
Journal:  Bioinformatics       Date:  2002-03       Impact factor: 6.937

Review 2.  The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee.

Authors:  M J Ellington; O Ekelund; F M Aarestrup; R Canton; M Doumith; C Giske; H Grundman; H Hasman; M T G Holden; K L Hopkins; J Iredell; G Kahlmeter; C U Köser; A MacGowan; D Mevius; M Mulvey; T Naas; T Peto; J-M Rolain; Ø Samuelsen; N Woodford
Journal:  Clin Microbiol Infect       Date:  2016-11-23       Impact factor: 8.067

3.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Authors:  Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach
Journal:  Nat Methods       Date:  2013-05-05       Impact factor: 28.547

4.  Assembly of long error-prone reads using de Bruijn graphs.

Authors:  Yu Lin; Jeffrey Yuan; Mikhail Kolmogorov; Max W Shen; Mark Chaisson; Pavel A Pevzner
Journal:  Proc Natl Acad Sci U S A       Date:  2016-12-12       Impact factor: 11.205

5.  A Python-based programming language for high-performance computational genomics.

Authors:  Ariya Shajii; Ibrahim Numanagić; Alexander T Leighton; Haley Greenyer; Saman Amarasinghe; Bonnie Berger
Journal:  Nat Biotechnol       Date:  2021-07-19       Impact factor: 54.908

6.  A complete bacterial genome assembled de novo using only nanopore sequencing data.

Authors:  Nicholas J Loman; Joshua Quick; Jared T Simpson
Journal:  Nat Methods       Date:  2015-06-15       Impact factor: 28.547

7.  Mash: fast genome and metagenome distance estimation using MinHash.

Authors:  Brian D Ondov; Todd J Treangen; Páll Melsted; Adam B Mallonee; Nicholas H Bergman; Sergey Koren; Adam M Phillippy
Journal:  Genome Biol       Date:  2016-06-20       Impact factor: 13.583

8.  Fast and accurate de novo genome assembly from long uncorrected reads.

Authors:  Robert Vaser; Ivan Sović; Niranjan Nagarajan; Mile Šikić
Journal:  Genome Res       Date:  2017-01-18       Impact factor: 9.043

9.  Locality-sensitive hashing for the edit distance.

Authors:  Guillaume Marçais; Dan DeBlasio; Prashant Pandey; Carl Kingsford
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

10.  Fast and flexible bacterial genomic epidemiology with PopPUNK.

Authors:  John A Lees; Simon R Harris; Gerry Tonkin-Hill; Rebecca A Gladstone; Stephanie W Lo; Jeffrey N Weiser; Jukka Corander; Stephen D Bentley; Nicholas J Croucher
Journal:  Genome Res       Date:  2019-01-24       Impact factor: 9.043

View more
  2 in total

1.  Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Authors:  Jamshed Khan; Marek Kokot; Sebastian Deorowicz; Rob Patro
Journal:  Genome Biol       Date:  2022-09-08       Impact factor: 17.906

2.  Theory of local k-mer selection with applications to long-read alignment.

Authors:  Jim Shaw; Yun William Yu
Journal:  Bioinformatics       Date:  2022-10-14       Impact factor: 6.931

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.