Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 AllSome Sequence Bloom Trees.

Literature DB >> 29620920

AllSome Sequence Bloom Trees.

Chen Sun¹, Robert S Harris², Rayan Chikhi³, Paul Medvedev^1,4,5.

Abstract

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

Entities: Species

Keywords: Bloom filters; RNA-seq; Sequence Bloom Trees; algorithms; bioinformatics; data structures

Mesh：

Year: 2018 PMID： 29620920 DOI： 10.1089/cmb.2017.0258

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

Keyword Cloud
Cited

7 in total

1. Improved representation of sequence bloom trees.

Authors: Robert S Harris; Paul Medvedev
Journal: Bioinformatics Date: 2020-02-01 Impact factor: 6.937

2. SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors: Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal: Bioinformatics Date: 2022-05-18 Impact factor: 6.931

3. Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Authors: Brad Solomon; Carl Kingsford
Journal: J Comput Biol Date: 2018-03-12 Impact factor: 1.479

4. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.

Authors: Guillaume Holley; Páll Melsted
Journal: Genome Biol Date: 2020-09-17 Impact factor: 13.583

5. Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Authors: Michael Baym; Gregory Kucherov; Karel Břinda
Journal: Genome Biol Date: 2021-04-06 Impact factor: 13.583

6. Needle: A fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments.

Authors: Mitra Darvish; Enrico Seiler; Svenja Mehringer; René Rahn; Knut Reinert
Journal: Bioinformatics Date: 2022-07-08 Impact factor: 6.931

7. Shark: fishing relevant reads in an RNA-Seq sample.

Authors: Luca Denti; Yuri Pirola; Marco Previtali; Tamara Ceccato; Gianluca Della Vedova; Raffaella Rizzi; Paola Bonizzoni
Journal: Bioinformatics Date: 2021-05-01 Impact factor: 6.937

7 in total