Robert S Harris1, Paul Medvedev2,3,4. 1. Department of Biology, PA 16801, USA. 2. Department of Computer Science and Engineering, PA 16801, USA. 3. Department of Biochemistry and Molecular Biology, PA 16801, USA. 4. Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16801, USA.
Abstract
MOTIVATION: Algorithmic solutions to index and search biological databases are a fundamental part of bioinformatics, providing underlying components to many end-user tools. Inexpensive next generation sequencing has filled publicly available databases such as the Sequence Read Archive beyond the capacity of traditional indexing methods. Recently, the Sequence Bloom Tree (SBT) and its derivatives were proposed as a way to efficiently index such data for queries about transcript presence. RESULTS: We build on the SBT framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. Compared to previous SBT methods, on real RNA-seq data, HowDe-SBT can construct the index in less than 36% of the time and with 39% less space and can answer small-batch queries at least five times faster. We also develop a theoretical framework in which we can analyze and bound the space and query performance of HowDe-SBT compared to other SBT methods. AVAILABILITY AND IMPLEMENTATION: HowDe-SBT is available as a free open source program on https://github.com/medvedevgroup/HowDeSBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Algorithmic solutions to index and search biological databases are a fundamental part of bioinformatics, providing underlying components to many end-user tools. Inexpensive next generation sequencing has filled publicly available databases such as the Sequence Read Archive beyond the capacity of traditional indexing methods. Recently, the Sequence Bloom Tree (SBT) and its derivatives were proposed as a way to efficiently index such data for queries about transcript presence. RESULTS: We build on the SBT framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. Compared to previous SBT methods, on real RNA-seq data, HowDe-SBT can construct the index in less than 36% of the time and with 39% less space and can answer small-batch queries at least five times faster. We also develop a theoretical framework in which we can analyze and bound the space and query performance of HowDe-SBT compared to other SBT methods. AVAILABILITY AND IMPLEMENTATION: HowDe-SBT is available as a free open source program on https://github.com/medvedevgroup/HowDeSBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Prashant Pandey; Fatemeh Almodaresi; Michael A Bender; Michael Ferdman; Rob Johnson; Rob Patro Journal: Cell Syst Date: 2018-06-20 Impact factor: 10.304
Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169
Authors: Martin D Muggli; Alexander Bowe; Noelle R Noyes; Paul S Morley; Keith E Belk; Robert Raymond; Travis Gagie; Simon J Puglisi; Christina Boucher Journal: Bioinformatics Date: 2017-10-15 Impact factor: 6.937