Literature DB >> 21245053

Succinct data structures for assembling large genomes.

Thomas C Conway1, Andrew J Bromage.   

Abstract

MOTIVATION: Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and fine-scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem and the in-practice memory requirements of tools.
RESULTS: In this article, we use entropy compressed or succinct data structures to create a practical representation of the de Bruijn assembly graph, which requires at least a factor of 10 less storage than the kinds of structures used by deployed methods. Moreover, because our representation is entropy compressed, in the presence of sequencing errors it has better scaling behaviour asymptotically than conventional approaches. We present results of a proof-of-concept assembly of a human genome performed on a modest commodity server.

Entities:  

Mesh:

Year:  2011        PMID: 21245053     DOI: 10.1093/bioinformatics/btq697

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  36 in total

Review 1.  Sequence assembly demystified.

Authors:  Niranjan Nagarajan; Mihai Pop
Journal:  Nat Rev Genet       Date:  2013-01-29       Impact factor: 53.242

2.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.

Authors:  Jason Pell; Arend Hintze; Rosangela Canino-Koning; Adina Howe; James M Tiedje; C Titus Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2012-07-30       Impact factor: 11.205

3.  Portable nanopore analytics: are we there yet?

Authors:  Marco Oliva; Franco Milicchio; Kaden King; Grace Benson; Christina Boucher; Mattia Prosperi
Journal:  Bioinformatics       Date:  2020-08-15       Impact factor: 6.937

4.  Entropy-scaling search of massive biological data.

Authors:  Y William Yu; Noah M Daniels; David Christian Danko; Bonnie Berger
Journal:  Cell Syst       Date:  2015-08-26       Impact factor: 10.304

5.  Metagenome SNP calling via read-colored de Bruijn graphs.

Authors:  Bahar Alipanahi; Martin D Muggli; Musa Jundi; Noelle R Noyes; Christina Boucher
Journal:  Bioinformatics       Date:  2021-04-01       Impact factor: 6.937

6.  Efficient de novo assembly of large genomes using compressed data structures.

Authors:  Jared T Simpson; Richard Durbin
Journal:  Genome Res       Date:  2011-12-07       Impact factor: 9.043

7.  Succinct colored de Bruijn graphs.

Authors:  Martin D Muggli; Alexander Bowe; Noelle R Noyes; Paul S Morley; Keith E Belk; Robert Raymond; Travis Gagie; Simon J Puglisi; Christina Boucher
Journal:  Bioinformatics       Date:  2017-10-15       Impact factor: 6.937

8.  Representation of k-Mer Sets Using Spectrum-Preserving String Sets.

Authors:  Amatur Rahman; Paul Medevedev
Journal:  J Comput Biol       Date:  2020-12-07       Impact factor: 1.479

9.  A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes.

Authors:  Jindan Guo; Erli Pang; Hongtao Song; Kui Lin
Journal:  BMC Bioinformatics       Date:  2021-05-27       Impact factor: 3.169

Review 10.  Prospects and limitations of full-text index structures in genome analysis.

Authors:  Michaël Vyverman; Bernard De Baets; Veerle Fack; Peter Dawyndt
Journal:  Nucleic Acids Res       Date:  2012-05-13       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.