Literature DB >> 26854477

Fast search of thousands of short-read sequencing experiments.

Brad Solomon1, Carl Kingsford2.   

Abstract

The amount of sequence information in public repositories is growing at a rapid rate. Although these data are likely to contain clinically important information that has not yet been uncovered, our ability to effectively mine these repositories is limited. Here we introduce Sequence Bloom Trees (SBTs), a method for querying thousands of short-read sequencing experiments by sequence, 162 times faster than existing approaches. The approach searches large data archives for all experiments that involve a given sequence. We use SBTs to search 2,652 human blood, breast and brain RNA-seq experiments for all 214,293 known transcripts in under 4 days using less than 239 MB of RAM and a single CPU. Searching sequence archives at this scale and in this time frame is currently not possible using existing tools.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26854477      PMCID: PMC4804353          DOI: 10.1038/nbt.3442

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   54.908


  16 in total

1.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors:  Guillaume Marçais; Carl Kingsford
Journal:  Bioinformatics       Date:  2011-01-07       Impact factor: 6.937

2.  Efficient q-gram filters for finding all epsilon-matches over a given length.

Authors:  Kim R Rasmussen; Jens Stoye; Eugene W Myers
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

3.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

4.  Fast lossless compression via cascading Bloom filters.

Authors:  Roye Rozov; Ron Shamir; Eran Halperin
Journal:  BMC Bioinformatics       Date:  2014-09-10       Impact factor: 3.169

5.  Entropy-scaling search of massive biological data.

Authors:  Y William Yu; Noah M Daniels; David Christian Danko; Bonnie Berger
Journal:  Cell Syst       Date:  2015-08-26       Impact factor: 10.304

6.  BLAST+: architecture and applications.

Authors:  Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal:  BMC Bioinformatics       Date:  2009-12-15       Impact factor: 3.169

7.  Using cascading Bloom filters to improve the memory usage for de Brujin graphs.

Authors:  Kamil Salikhov; Gustavo Sacomoto; Gregory Kucherov
Journal:  Algorithms Mol Biol       Date:  2014-02-24       Impact factor: 1.405

8.  These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.

Authors:  Qingpeng Zhang; Jason Pell; Rosangela Canino-Koning; Adina Chuang Howe; C Titus Brown
Journal:  PLoS One       Date:  2014-07-25       Impact factor: 3.240

9.  Compressive genomics for protein databases.

Authors:  Noah M Daniels; Andrew Gallant; Jian Peng; Lenore J Cowen; Michael Baym; Bonnie Berger
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

10.  CRAC: an integrated approach to the analysis of RNA-seq reads.

Authors:  Nicolas Philippe; Mikaël Salson; Thérèse Commes; Eric Rivals
Journal:  Genome Biol       Date:  2013-03-28       Impact factor: 13.583

View more
  33 in total

1.  Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Authors:  David Pellow; Darya Filippova; Carl Kingsford
Journal:  J Comput Biol       Date:  2016-11-09       Impact factor: 1.479

2.  Improved representation of sequence bloom trees.

Authors:  Robert S Harris; Paul Medvedev
Journal:  Bioinformatics       Date:  2020-02-01       Impact factor: 6.937

3.  Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers.

Authors:  Roozbeh Dehghannasiri; Donald E Freeman; Milos Jordanski; Gillian L Hsieh; Ana Damljanovic; Erik Lehnert; Julia Salzman
Journal:  Proc Natl Acad Sci U S A       Date:  2019-07-15       Impact factor: 11.205

4.  SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors:  Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal:  Bioinformatics       Date:  2022-05-18       Impact factor: 6.931

5.  An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors:  Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

6.  Large-scale sequence comparisons with sourmash.

Authors:  N Tessa Pierce; Luiz Irber; Taylor Reiter; Phillip Brooks; C Titus Brown
Journal:  F1000Res       Date:  2019-07-04

7.  To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors:  R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal:  Nucleic Acids Res       Date:  2020-06-04       Impact factor: 16.971

8.  Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Authors:  Brad Solomon; Carl Kingsford
Journal:  J Comput Biol       Date:  2018-03-12       Impact factor: 1.479

9.  An Incrementally Updatable and Scalable System for Large-Scale Sequence Search using the Bentley-Saxe Transformation.

Authors:  Fatemeh Almodaresi; Jamshed Khan; Sergey Madaminov; Michael Ferdman; Rob Johnson; Prashant Pandey; Rob Patro
Journal:  Bioinformatics       Date:  2022-03-23       Impact factor: 6.931

10.  Disk compression of k-mer sets.

Authors:  Amatur Rahman; Rayan Chikhi; Paul Medvedev
Journal:  Algorithms Mol Biol       Date:  2021-06-21       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.