Literature DB >> 30423080

DREAM-Yara: an exact read mapper for very large databases with short update time.

Temesgen Hailemariam Dadi1, Enrico Siragusa2, Vitor C Piro3,4, Andreas Andrusch5, Enrico Seiler1, Bernhard Y Renard3, Knut Reinert1.   

Abstract

Motivation: Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times.
Results: To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework. Availability and implementation: https://gitlab.com/pirovc/dream_yara/.

Mesh:

Year:  2018        PMID: 30423080     DOI: 10.1093/bioinformatics/bty567

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  11 in total

1.  ReadBouncer: precise and scalable adaptive sampling for nanopore sequencing.

Authors:  Jens-Uwe Ulrich; Ahmad Lutfi; Kilian Rutzen; Bernhard Y Renard
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

2.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

3.  To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors:  R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal:  Nucleic Acids Res       Date:  2020-06-04       Impact factor: 16.971

4.  Featherweight long read alignment using partitioned reference indexes.

Authors:  Hasindu Gamaarachchi; Sri Parameswaran; Martin A Smith
Journal:  Sci Rep       Date:  2019-03-13       Impact factor: 4.379

5.  Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation.

Authors:  Enrico Seiler; Kathrin Trappe; Bernhard Y Renard
Journal:  PLoS Comput Biol       Date:  2019-07-23       Impact factor: 4.475

Review 6.  Data structures based on k-mers for querying large collections of sequencing data sets.

Authors:  Camille Marchet; Christina Boucher; Simon J Puglisi; Paul Medvedev; Mikaël Salson; Rayan Chikhi
Journal:  Genome Res       Date:  2020-12-16       Impact factor: 9.043

Review 7.  Technology dictates algorithms: recent developments in read alignment.

Authors:  Mohammed Alser; Jeremy Rotman; Onur Mutlu; Serghei Mangul; Dhrithi Deshpande; Kodi Taraszka; Huwenbo Shi; Pelin Icer Baykal; Harry Taegyun Yang; Victor Xue; Sergey Knyazev; Benjamin D Singer; Brunilda Balliu; David Koslicki; Pavel Skums; Alex Zelikovsky; Can Alkan
Journal:  Genome Biol       Date:  2021-08-26       Impact factor: 13.583

8.  Disk compression of k-mer sets.

Authors:  Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal:  Algorithms Mol Biol       Date:  2021-06-21       Impact factor: 1.405

9.  ganon: precise metagenomics classification against large and up-to-date sets of reference sequences.

Authors:  Vitor C Piro; Temesgen H Dadi; Enrico Seiler; Knut Reinert; Bernhard Y Renard
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

10.  Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.

Authors:  Enrico Seiler; Svenja Mehringer; Mitra Darvish; Etienne Turc; Knut Reinert
Journal:  iScience       Date:  2021-06-24
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.