Literature DB >> 14980017

Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching.

Gonzalo Navarro1, Mathieu Raffinot.   

Abstract

The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK] - x(2,3) - [DE] - x(2,3) - Y, where the brackets match any of the letters inside, and x(2,3) a gap of length between 2 and 3). Currently, the only way to search for a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching for it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters, and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a priori the fastest between both. We also show how to search permitting a few mistakes in the occurrences. We performed many practical experiments using the PROSITE database, and all of them show that our algorithms are the fastest in virtually all cases.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 14980017     DOI: 10.1089/106652703322756140

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  3 in total

1.  SMOTIF: efficient structured pattern and profile motif search.

Authors:  Yongqiang Zhang; Mohammed J Zaki
Journal:  Algorithms Mol Biol       Date:  2006-11-21       Impact factor: 1.405

2.  Functionally specified protein signatures distinctive for each of the different blue copper proteins.

Authors:  Anuradha Vivekanandan Giri; Sharmila Anishetty; Pennathur Gautam
Journal:  BMC Bioinformatics       Date:  2004-09-09       Impact factor: 3.169

3.  RNA motif search with data-driven element ordering.

Authors:  Ladislav Rampášek; Randi M Jimenez; Andrej Lupták; Tomáš Vinař; Broňa Brejová
Journal:  BMC Bioinformatics       Date:  2016-05-18       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.