Literature DB >> 31504157

Improved representation of sequence bloom trees.

Robert S Harris1, Paul Medvedev2,3,4.   

Abstract

MOTIVATION: Algorithmic solutions to index and search biological databases are a fundamental part of bioinformatics, providing underlying components to many end-user tools. Inexpensive next generation sequencing has filled publicly available databases such as the Sequence Read Archive beyond the capacity of traditional indexing methods. Recently, the Sequence Bloom Tree (SBT) and its derivatives were proposed as a way to efficiently index such data for queries about transcript presence.
RESULTS: We build on the SBT framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. Compared to previous SBT methods, on real RNA-seq data, HowDe-SBT can construct the index in less than 36% of the time and with 39% less space and can answer small-batch queries at least five times faster. We also develop a theoretical framework in which we can analyze and bound the space and query performance of HowDe-SBT compared to other SBT methods.
AVAILABILITY AND IMPLEMENTATION: HowDe-SBT is available as a free open source program on https://github.com/medvedevgroup/HowDeSBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2020        PMID: 31504157      PMCID: PMC8215923          DOI: 10.1093/bioinformatics/btz662

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  11 in total

1.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors:  Guillaume Marçais; Carl Kingsford
Journal:  Bioinformatics       Date:  2011-01-07       Impact factor: 6.937

2.  Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

Authors:  Prashant Pandey; Fatemeh Almodaresi; Michael A Bender; Michael Ferdman; Rob Johnson; Rob Patro
Journal:  Cell Syst       Date:  2018-06-20       Impact factor: 10.304

3.  AllSome Sequence Bloom Trees.

Authors:  Chen Sun; Robert S Harris; Rayan Chikhi; Paul Medvedev
Journal:  J Comput Biol       Date:  2018-04-05       Impact factor: 1.479

4.  Fast search of thousands of short-read sequencing experiments.

Authors:  Brad Solomon; Carl Kingsford
Journal:  Nat Biotechnol       Date:  2016-02-08       Impact factor: 54.908

5.  BLAST+: architecture and applications.

Authors:  Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal:  BMC Bioinformatics       Date:  2009-12-15       Impact factor: 3.169

6.  An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors:  Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

7.  Succinct colored de Bruijn graphs.

Authors:  Martin D Muggli; Alexander Bowe; Noelle R Noyes; Paul S Morley; Keith E Belk; Robert Raymond; Travis Gagie; Simon J Puglisi; Christina Boucher
Journal:  Bioinformatics       Date:  2017-10-15       Impact factor: 6.937

8.  Dynamic compression schemes for graph coloring.

Authors:  Harun Mustafa; Ingo Schilken; Mikhail Karasikov; Carsten Eickhoff; Gunnar Rätsch; André Kahles
Journal:  Bioinformatics       Date:  2019-02-01       Impact factor: 6.937

9.  Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.

Authors:  Guillaume Holley; Páll Melsted
Journal:  Genome Biol       Date:  2020-09-17       Impact factor: 13.583

10.  SeqOthello: querying RNA-seq experiments at scale.

Authors:  Ye Yu; Jinpeng Liu; Xinan Liu; Yi Zhang; Eamonn Magner; Erik Lehnert; Chen Qian; Jinze Liu
Journal:  Genome Biol       Date:  2018-10-19       Impact factor: 13.583

View more
  8 in total

1.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

2.  An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation.

Authors:  Fatemeh Almodaresi; Jamshed Khan; Sergey Madaminov; Michael Ferdman; Rob Johnson; Prashant Pandey; Rob Patro
Journal:  Bioinformatics       Date:  2022-03-23       Impact factor: 6.931

Review 3.  Data structures based on k-mers for querying large collections of sequencing data sets.

Authors:  Camille Marchet; Christina Boucher; Simon J Puglisi; Paul Medvedev; Mikaël Salson; Rayan Chikhi
Journal:  Genome Res       Date:  2020-12-16       Impact factor: 9.043

4.  Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors:  Michael Baym; Gregory Kucherov; Karel Břinda
Journal:  Genome Biol       Date:  2021-04-06       Impact factor: 13.583

5.  Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer.

Authors:  Grégoire Siekaniec; Emeline Roux; Téo Lemane; Eric Guédon; Jacques Nicolas
Journal:  Microb Genom       Date:  2021-11

6.  Effective sequence similarity detection with strobemers.

Authors:  Kristoffer Sahlin
Journal:  Genome Res       Date:  2021-10-19       Impact factor: 9.043

7.  Needle: A fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments.

Authors:  Mitra Darvish; Enrico Seiler; Svenja Mehringer; René Rahn; Knut Reinert
Journal:  Bioinformatics       Date:  2022-07-08       Impact factor: 6.931

8.  Disk compression of k-mer sets.

Authors:  Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal:  Algorithms Mol Biol       Date:  2021-06-21       Impact factor: 1.405

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.