Literature DB >> 29620920

AllSome Sequence Bloom Trees.

Chen Sun1, Robert S Harris2, Rayan Chikhi3, Paul Medvedev1,4,5.   

Abstract

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

Entities:  

Keywords:  Bloom filters; RNA-seq; Sequence Bloom Trees; algorithms; bioinformatics; data structures

Mesh:

Year:  2018        PMID: 29620920     DOI: 10.1089/cmb.2017.0258

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  7 in total

1.  Improved representation of sequence bloom trees.

Authors:  Robert S Harris; Paul Medvedev
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

2.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

3.  Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Authors:  Brad Solomon; Carl Kingsford
Journal:  J Comput Biol       Date:  2018-03-12       Impact factor: 1.479

4.  Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.

Authors:  Guillaume Holley; Páll Melsted
Journal:  Genome Biol       Date:  2020-09-17       Impact factor: 13.583

5.  Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors:  Michael Baym; Gregory Kucherov; Karel Břinda
Journal:  Genome Biol       Date:  2021-04-06       Impact factor: 13.583

6.  Needle: A fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments.

Authors:  Mitra Darvish; Enrico Seiler; Svenja Mehringer; René Rahn; Knut Reinert
Journal:  Bioinformatics       Date:  2022-07-08       Impact factor: 6.931

7.  Shark: fishing relevant reads in an RNA-Seq sample.

Authors:  Luca Denti; Yuri Pirola; Marco Previtali; Tamara Ceccato; Gianluca Della Vedova; Raffaella Rizzi; Paola Bonizzoni
Journal:  Bioinformatics       Date:  2021-05-01       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.