Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Literature DB >> 29641248

Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.

Abstract

Enormous databases of short-read RNA-seq experiments such as the NIH Sequencing Read Archive are now available. These databases could answer many questions about condition-specific expression or population variation, and this resource is only going to grow over time. However, these collections remain difficult to use due to the inability to search for a particular expressed sequence. Although some progress has been made on this problem, it is still not feasible to search collections of hundreds of terabytes of short-read sequencing experiments. We introduce an indexing scheme called split sequence bloom trees (SSBTs) to support sequence-based querying of terabyte scale collections of thousands of short-read sequencing experiments. SSBT is an improvement over the sequence bloom tree (SBT) data structure for the same task. We apply SSBTs to the problem of finding conditions under which query transcripts are expressed. Our experiments are conducted on a set of 2652 publicly available RNA-seq experiments for the breast, blood, and brain tissues. We demonstrate that this SSBT index can be queried for a 1000 nt sequence in <4 minutes using a single thread and can be stored in just 39 GB, a fivefold improvement in search and storage costs compared with SBT.

Entities: Chemical

Keywords: RNA-seq; data indexing; sequence bloom trees; sequence search.

Mesh：

Year: 2018 PMID： 29641248 PMCID： PMC6067102 DOI： 10.1089/cmb.2017.0265

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

13 in total

1. Compressive genomics.

Authors: Po-Ru Loh; Michael Baym; Bonnie Berger
Journal: Nat Biotechnol Date: 2012-07-10 Impact factor: 54.908

2. Efficient q-gram filters for finding all epsilon-matches over a given length.

Authors: Kim R Rasmussen; Jens Stoye; Eugene W Myers
Journal: J Comput Biol Date: 2006-03 Impact factor: 1.479

3. Entropy-scaling search of massive biological data.

Authors: Y William Yu; Noah M Daniels; David Christian Danko; Bonnie Berger
Journal: Cell Syst Date: 2015-08-26 Impact factor: 10.304

4. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169

5. The sequence read archive.

Authors: Rasko Leinonen; Hideaki Sugawara; Martin Shumway
Journal: Nucleic Acids Res Date: 2010-11-09 Impact factor: 16.971

6. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.

Authors: Rob Patro; Stephen M Mount; Carl Kingsford
Journal: Nat Biotechnol Date: 2014-04-20 Impact factor: 54.908

7. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure.

Authors: Qingpeng Zhang; Jason Pell; Rosangela Canino-Koning; Adina Chuang Howe; C Titus Brown
Journal: PLoS One Date: 2014-07-25 Impact factor: 3.240

8. Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.

Authors: Guillaume Holley; Roland Wittler; Jens Stoye
Journal: Algorithms Mol Biol Date: 2016-04-14 Impact factor: 1.405

9. Compressive genomics for protein databases.

Authors: Noah M Daniels; Andrew Gallant; Jian Peng; Lenore J Cowen; Michael Baym; Bonnie Berger
Journal: Bioinformatics Date: 2013-07-01 Impact factor: 6.937

10. CRAC: an integrated approach to the analysis of RNA-seq reads.

Authors: Nicolas Philippe; Mikaël Salson; Thérèse Commes; Eric Rivals
Journal: Genome Biol Date: 2013-03-28 Impact factor: 13.583

19 in total

1. Improved representation of sequence bloom trees.

Authors: Robert S Harris; Paul Medvedev
Journal: Bioinformatics Date: 2020-02-01 Impact factor: 6.937

2. Lossless indexing with counting de Bruijn graphs.

Authors: Mikhail Karasikov; Harun Mustafa; Gunnar Rätsch; André Kahles
Journal: Genome Res Date: 2022-05-24 Impact factor: 9.438

3. SPRISS: Approximating Frequent K-mers by Sampling Reads, and Applications.

Authors: Diego Santoro; Leonardo Pellegrina; Matteo Comin; Fabio Vandin
Journal: Bioinformatics Date: 2022-05-18 Impact factor: 6.931

4. CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices.

Authors: Shaopeng Liu; David Koslicki
Journal: Bioinformatics Date: 2022-06-24 Impact factor: 6.931

5. An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

Authors: Fatemeh Almodaresi; Prashant Pandey; Michael Ferdman; Rob Johnson; Rob Patro
Journal: J Comput Biol Date: 2020-03-16 Impact factor: 1.479

6. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics.

Authors: R A Leo Elworth; Qi Wang; Pavan K Kota; C J Barberan; Benjamin Coleman; Advait Balaji; Gaurav Gupta; Richard G Baraniuk; Anshumali Shrivastava; Todd J Treangen
Journal: Nucleic Acids Res Date: 2020-06-04 Impact factor: 16.971