Literature DB >> 29936185

Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.

Prashant Pandey1, Fatemeh Almodaresi1, Michael A Bender1, Michael Ferdman1, Rob Johnson2, Rob Patro3.   

Abstract

Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient system that uses new data structures to index thousands of raw-read experiments and facilitates large-scale sequence searches. In our evaluation, index construction with Mantis is 6× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6-108× faster than SSBT and has no false-positives or -negatives. For example, Mantis was able to search for all 200,400 known human transcripts in an index of 2,652 RNA sequencing experiments in 82 min; SSBT took close to 4 days.
Copyright © 2018 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Bloom filter; Mantis; RNA sequencing; color equivalence classes; counting quotient filter; de Bruijn graph; experiment discovery; sequence Bloom tree; sequence search

Mesh:

Substances:

Year:  2018        PMID: 29936185     DOI: 10.1016/j.cels.2018.05.021

Source DB:  PubMed          Journal:  Cell Syst        ISSN: 2405-4712            Impact factor:   10.304


  28 in total

1.  Portable nanopore analytics: are we there yet?

Authors:  Marco Oliva; Franco Milicchio; Kaden King; Grace Benson; Christina Boucher; Mattia Prosperi
Journal:  Bioinformatics       Date:  2020-08-15       Impact factor: 6.937

2.  Improved representation of sequence bloom trees.

Authors:  Robert S Harris; Paul Medvedev
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

3.  Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.

Authors:  Jamshed Khan; Marek Kokot; Sebastian Deorowicz; Rob Patro
Journal:  Genome Biol       Date:  2022-09-08       Impact factor: 17.906

4.  Lossless indexing with counting de Bruijn graphs.

Authors:  Mikhail Karasikov; Harun Mustafa; Gunnar Rätsch; André Kahles
Journal:  Genome Res       Date:  2022-05-24       Impact factor: 9.438

5.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

6.  An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors:  Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

7.  An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation.

Authors:  Fatemeh Almodaresi; Jamshed Khan; Sergey Madaminov; Michael Ferdman; Rob Johnson; Prashant Pandey; Rob Patro
Journal:  Bioinformatics       Date:  2022-03-23       Impact factor: 6.931

8.  Succinct dynamic de Bruijn graphs.

Authors:  Bahar Alipanahi; Alan Kuhnle; Simon J Puglisi; Leena Salmela; Christina Boucher
Journal:  Bioinformatics       Date:  2021-08-04       Impact factor: 6.931

9.  Disk compression of k-mer sets.

Authors:  Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal:  Algorithms Mol Biol       Date:  2021-06-21       Impact factor: 1.405

10.  Cuttlefish: fast, parallel and low-memory compaction of de Bruijn graphs from large-scale genome collections.

Authors:  Jamshed Khan; Rob Patro
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.