Literature DB >> 24751871

Hybrid indexes for repetitive datasets.

H Ferrada1, T Gagie, T Hirvola, S J Puglisi.   

Abstract

Advances in DNA sequencing mean that databases of thousands of human genomes will soon be commonplace. In this paper, we introduce a simple technique for reducing the size of conventional indexes on such highly repetitive texts. Given upper bounds on pattern lengths and edit distances, we pre-process the text with the lossless data compression algorithm LZ77 to obtain a filtered text, for which we store a conventional index. Later, given a query, we find all matches in the filtered text, then use their positions and the structure of the LZ77 parse to find all matches in the original text. Our experiments show that this also significantly reduces query times.

Entities:  

Keywords:  LZ77; approximate pattern matching; indexing

Mesh:

Year:  2014        PMID: 24751871     DOI: 10.1098/rsta.2013.0137

Source DB:  PubMed          Journal:  Philos Trans A Math Phys Eng Sci        ISSN: 1364-503X            Impact factor:   4.226


  7 in total

Review 1.  Searching and Indexing Genomic Databases via Kernelization.

Authors:  Travis Gagie; Simon J Puglisi
Journal:  Front Bioeng Biotechnol       Date:  2015-02-09

2.  Linear time minimum segmentation enables scalable founder reconstruction.

Authors:  Tuukka Norri; Bastien Cazaux; Dmitry Kosolobov; Veli Mäkinen
Journal:  Algorithms Mol Biol       Date:  2019-05-17       Impact factor: 1.405

3.  Indexes of large genome collections on a PC.

Authors:  Agnieszka Danek; Sebastian Deorowicz; Szymon Grabowski
Journal:  PLoS One       Date:  2014-10-07       Impact factor: 3.240

4.  Towards pan-genome read alignment to improve variation calling.

Authors:  Daniel Valenzuela; Tuukka Norri; Niko Välimäki; Esa Pitkänen; Veli Mäkinen
Journal:  BMC Genomics       Date:  2018-05-09       Impact factor: 3.969

5.  Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.

Authors:  Alan Kuhnle; Taher Mun; Christina Boucher; Travis Gagie; Ben Langmead; Giovanni Manzini
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

6.  Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment.

Authors:  Altti Ilari Maarala; Ossi Arasalo; Daniel Valenzuela; Veli Mäkinen; Keijo Heljanko
Journal:  PLoS One       Date:  2021-08-03       Impact factor: 3.240

7.  Founder Reconstruction Enables Scalable and Seamless Pangenomic Analysis.

Authors:  Tuukka Norri; Bastien Cazaux; Saska Dönges; Daniel Valenzuela; Veli Mäkinen
Journal:  Bioinformatics       Date:  2021-07-14       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.