Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fast search of thousands of short-read sequencing experiments.

Literature DB >> 26854477

Fast search of thousands of short-read sequencing experiments.

Abstract

The amount of sequence information in public repositories is growing at a rapid rate. Although these data are likely to contain clinically important information that has not yet been uncovered, our ability to effectively mine these repositories is limited. Here we introduce Sequence Bloom Trees (SBTs), a method for querying thousands of short-read sequencing experiments by sequence, 162 times faster than existing approaches. The approach searches large data archives for all experiments that involve a given sequence. We use SBTs to search 2,652 human blood, breast and brain RNA-seq experiments for all 214,293 known transcripts in under 4 days using less than 239 MB of RAM and a single CPU. Searching sequence archives at this scale and in this time frame is currently not possible using existing tools.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
RNA

Year: 2016 PMID： 26854477 PMCID： PMC4804353 DOI： 10.1038/nbt.3442

Source DB: PubMed Journal: Nat Biotechnol ISSN： 1087-0156 Impact factor: 54.908

16 in total

1. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors: Guillaume Marçais; Carl Kingsford
Journal: Bioinformatics Date: 2011-01-07 Impact factor: 6.937

2. Efficient q-gram filters for finding all epsilon-matches over a given length.

Authors: Kim R Rasmussen; Jens Stoye; Eugene W Myers
Journal: J Comput Biol Date: 2006-03 Impact factor: 1.479

3. STAR: ultrafast universal RNA-seq aligner.

Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937

4. Fast lossless compression via cascading Bloom filters.

Authors: Roye Rozov; Ron Shamir; Eran Halperin
Journal: BMC Bioinformatics Date: 2014-09-10 Impact factor: 3.169

5. Entropy-scaling search of massive biological data.

Authors: Y William Yu; Noah M Daniels; David Christian Danko; Bonnie Berger
Journal: Cell Syst Date: 2015-08-26 Impact factor: 10.304

6. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169

7. Using cascading Bloom filters to improve the memory usage for de Brujin graphs.

Authors: Kamil Salikhov; Gustavo Sacomoto; Gregory Kucherov
Journal: Algorithms Mol Biol Date: 2014-02-24 Impact factor: 1.405

8. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.

Authors: Qingpeng Zhang; Jason Pell; Rosangela Canino-Koning; Adina Chuang Howe; C Titus Brown
Journal: PLoS One Date: 2014-07-25 Impact factor: 3.240

9. Compressive genomics for protein databases.

Authors: Noah M Daniels; Andrew Gallant; Jian Peng; Lenore J Cowen; Michael Baym; Bonnie Berger
Journal: Bioinformatics Date: 2013-07-01 Impact factor: 6.937

10. CRAC: an integrated approach to the analysis of RNA-seq reads.

Authors: Nicolas Philippe; Mikaël Salson; Thérèse Commes; Eric Rivals
Journal: Genome Biol Date: 2013-03-28 Impact factor: 13.583

33 in total

1. Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.

Authors: David Pellow; Darya Filippova; Carl Kingsford
Journal: J Comput Biol Date: 2016-11-09 Impact factor: 1.479

2. Improved representation of sequence bloom trees.

Authors: Robert S Harris; Paul Medvedev
Journal: Bioinformatics Date: 2020-02-01 Impact factor: 6.937

3. Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers.

Authors: Roozbeh Dehghannasiri; Donald E Freeman; Milos Jordanski; Gillian L Hsieh; Ana Damljanovic; Erik Lehnert; Julia Salzman
Journal: Proc Natl Acad Sci U S A Date: 2019-07-15 Impact factor: 11.205

4. SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors: Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal: Bioinformatics Date: 2022-05-18 Impact factor: 6.931

5. An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors: Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal: J Comput Biol Date: 2020-03-16 Impact factor: 1.479

6. Large-scale sequence comparisons with sourmash.

Authors: N Tessa Pierce; Luiz Irber; Taylor Reiter; Phillip Brooks; C Titus Brown
Journal: F1000Res Date: 2019-07-04

7. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors: R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal: Nucleic Acids Res Date: 2020-06-04 Impact factor: 16.971