Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

Literature DB >> 29936185

Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

Prashant Pandey¹, Fatemeh Almodaresi¹, Michael A Bender¹, Michael Ferdman¹, Rob Johnson², Rob Patro³.

Abstract

Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6-108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days.

Entities: Chemical Species

Keywords: Bloom filter; Mantis; RNA sequencing; color equivalence classes; counting quotient filter; de Bruijn graph; experiment discovery; sequence Bloom tree; sequence search

Mesh：

Substances：
RNA

Year: 2018 PMID： 29936185 DOI： 10.1016/j.cels.2018.05.021

Source DB: PubMed Journal: Cell Syst ISSN： 2405-4712 Impact factor: 10.304

Keyword Cloud
Cited

28 in total

1. Portable nanopore analytics: are we there yet?

Authors: Marco Oliva; Franco Milicchio; Kaden King; Grace Benson; Christina Boucher; Mattia Prosperi
Journal: Bioinformatics Date: 2020-08-15 Impact factor: 6.937

2. Improved representation of sequence bloom trees.

Authors: Robert S Harris; Paul Medvedev
Journal: Bioinformatics Date: 2020-02-01 Impact factor: 6.937

3. Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Authors: Jamshed Khan; Marek Kokot; Sebastian Deorowicz; Rob Patro
Journal: Genome Biol Date: 2022-09-08 Impact factor: 17.906

4. Lossless indexing with counting de Bruijn graphs.

Authors: Mikhail Karasikov; Harun Mustafa; Gunnar Rätsch; André Kahles
Journal: Genome Res Date: 2022-05-24 Impact factor: 9.438

5. SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors: Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal: Bioinformatics Date: 2022-05-18 Impact factor: 6.931

6. An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors: Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal: J Comput Biol Date: 2020-03-16 Impact factor: 1.479

Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

1. Portable nanopore analytics: are we there yet?

2. Improved representation of sequence bloom trees.

3. Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

4. Lossless indexing with counting de Bruijn graphs.

5. SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

6. An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

7. An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation.

8. Succinct dynamic de Bruijn graphs.

9. Disk compression of k-mer sets.

10. Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections.