Literature DB >> 27828710

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

David Pellow1, Darya Filippova2, Carl Kingsford3.   

Abstract

Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 - 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.

Entities:  

Keywords:  Bloom fitters; efficient data structures; genomics; k-mers.; string algorithms

Mesh:

Year:  2016        PMID: 27828710      PMCID: PMC5467106          DOI: 10.1089/cmb.2016.0155

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  15 in total

1.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

2.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.

Authors:  Jason Pell; Arend Hintze; Rosangela Canino-Koning; Adina Howe; James M Tiedje; C Titus Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2012-07-30       Impact factor: 11.205

3.  BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

Authors:  Yun Heo; Xiao-Long Wu; Deming Chen; Jian Ma; Wen-Mei Hwu
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

4.  Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Authors:  David Pellow; Darya Filippova; Carl Kingsford
Journal:  J Comput Biol       Date:  2016-11-09       Impact factor: 1.479

5.  Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification.

Authors:  Y William Yu; Deniz Yorukoglu; Bonnie Berger
Journal:  Res Comput Mol Biol       Date:  2014-04

6.  Classification of DNA sequences using Bloom filters.

Authors:  Henrik Stranneheim; Max Käller; Tobias Allander; Björn Andersson; Lars Arvestad; Joakim Lundeberg
Journal:  Bioinformatics       Date:  2010-05-13       Impact factor: 6.937

7.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

Authors:  Rob Patro; Stephen M Mount; Carl Kingsford
Journal:  Nat Biotechnol       Date:  2014-04-20       Impact factor: 54.908

8.  Lighter: fast and memory-efficient sequencing error correction without counting.

Authors:  Li Song; Liliana Florea; Ben Langmead
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

9.  Space-efficient and exact de Bruijn graph representation based on a Bloom filter.

Authors:  Rayan Chikhi; Guillaume Rizk
Journal:  Algorithms Mol Biol       Date:  2013-09-16       Impact factor: 1.405

10.  Kraken: ultrafast metagenomic sequence classification using exact alignments.

Authors:  Derrick E Wood; Steven L Salzberg
Journal:  Genome Biol       Date:  2014-03-03       Impact factor: 13.583

View more
  4 in total

1.  Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Authors:  David Pellow; Darya Filippova; Carl Kingsford
Journal:  J Comput Biol       Date:  2016-11-09       Impact factor: 1.479

2.  To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors:  R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal:  Nucleic Acids Res       Date:  2020-06-04       Impact factor: 16.971

3.  An efficient strategy using k-mers to analyse 16S rRNA sequences.

Authors:  Marcel Martínez-Porchas; Francisco Vargas-Albores
Journal:  Heliyon       Date:  2017-07-27

4.  Sequence-specific minimizers via polar sets.

Authors:  Hongyu Zheng; Carl Kingsford; Guillaume Marçais
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.