Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Literature DB >> 27828710

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

David Pellow¹, Darya Filippova², Carl Kingsford³.

Abstract

Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 - 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.

Entities: Chemical Disease Gene Species

Keywords: Bloom fitters; efficient data structures; genomics; k-mers.; string algorithms

Mesh：

Year: 2016 PMID： 27828710 PMCID： PMC5467106 DOI： 10.1089/cmb.2016.0155

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

15 in total

1. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

2. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.

Authors: Jason Pell; Arend Hintze; Rosangela Canino-Koning; Adina Howe; James M Tiedje; C Titus Brown
Journal: Proc Natl Acad Sci U S A Date: 2012-07-30 Impact factor: 11.205

3. BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.

Authors: Yun Heo; Xiao-Long Wu; Deming Chen; Jian Ma; Wen-Mei Hwu
Journal: Bioinformatics Date: 2014-01-21 Impact factor: 6.937

4. Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Authors: David Pellow; Darya Filippova; Carl Kingsford
Journal: J Comput Biol Date: 2016-11-09 Impact factor: 1.479

5. Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification.

Authors: Y William Yu; Deniz Yorukoglu; Bonnie Berger
Journal: Res Comput Mol Biol Date: 2014-04

6. Classification of DNA sequences using Bloom filters.

Authors: Henrik Stranneheim; Max Käller; Tobias Allander; Björn Andersson; Lars Arvestad; Joakim Lundeberg
Journal: Bioinformatics Date: 2010-05-13 Impact factor: 6.937

7. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

Authors: Rob Patro; Stephen M Mount; Carl Kingsford
Journal: Nat Biotechnol Date: 2014-04-20 Impact factor: 54.908

8. Lighter: fast and memory-efficient sequencing error correction without counting.

Authors: Li Song; Liliana Florea; Ben Langmead
Journal: Genome Biol Date: 2014 Impact factor: 13.583

9. Space-efficient and exact de Bruijn graph representation based on a Bloom filter.

Authors: Rayan Chikhi; Guillaume Rizk
Journal: Algorithms Mol Biol Date: 2013-09-16 Impact factor: 1.405

10. Kraken: ultrafast metagenomic sequence classification using exact alignments.

Authors: Derrick E Wood; Steven L Salzberg
Journal: Genome Biol Date: 2014-03-03 Impact factor: 13.583

4 in total

1. Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Authors: David Pellow; Darya Filippova; Carl Kingsford
Journal: J Comput Biol Date: 2016-11-09 Impact factor: 1.479

2. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors: R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal: Nucleic Acids Res Date: 2020-06-04 Impact factor: 16.971

3. An efficient strategy using k-mers to analyse 16S rRNA sequences.

Authors: Marcel Martínez-Porchas; Francisco Vargas-Albores
Journal: Heliyon Date: 2017-07-27

4. Sequence-specific minimizers via polar sets.

Authors: Hongyu Zheng; Carl Kingsford; Guillaume Marçais
Journal: Bioinformatics Date: 2021-07-12 Impact factor: 6.937

4 in total