Literature DB >> 15713733

Using sequence compression to speedup probabilistic profile matching.

Valerio Freschi1, Alessandro Bogliolo.   

Abstract

MOTIVATION: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P << N.
RESULTS: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.

Mesh:

Year:  2005        PMID: 15713733     DOI: 10.1093/bioinformatics/bti323

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  2 in total

1.  Fast index based algorithms and software for matching position specific scoring matrices.

Authors:  Michael Beckstette; Robert Homann; Robert Giegerich; Stefan Kurtz
Journal:  BMC Bioinformatics       Date:  2006-08-24       Impact factor: 3.169

2.  Fast sequence analysis based on diamond sampling.

Authors:  Liangxin Gao; Wenzhen Bao; Hongbo Zhang; Chang-An Yuan; De-Shuang Huang
Journal:  PLoS One       Date:  2018-06-28       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.