Literature DB >> 28968761

FMtree: a fast locating algorithm of FM-indexes for genomic data.

Haoyu Cheng1,2,3, Ming Wu1,2, Yun Xu1,2,3.   

Abstract

Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text has been accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g. suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text indexes, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are not efficient, especially for the patterns with many occurrences.
Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual multiway tree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the multiway tree. Compared with existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient. Availability and implementation: FMtree is freely available at https://github.com/chhylp123/FMtree. Contact: xuyun@ustc.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Mesh:

Year:  2018        PMID: 28968761     DOI: 10.1093/bioinformatics/btx596

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  2 in total

Review 1.  Analysis and Performance Assessment of the Whole Genome Bisulfite Sequencing Data Workflow: Currently Available Tools and a Practical Guide to Advance DNA Methylation Studies.

Authors:  Ting Gong; Heather Borgard; Zao Zhang; Shaoqiu Chen; Zitong Gao; Youping Deng
Journal:  Small Methods       Date:  2022-01-22

2.  An optimized FM-index library for nucleotide and amino acid search.

Authors:  Tim Anderson; Travis J Wheeler
Journal:  Algorithms Mol Biol       Date:  2021-12-31       Impact factor: 1.405

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.