Yongqiang Zhang1, Mohammed J Zaki. 1. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York 12180, USA. zhangy0@cs.rpi.edu
Abstract
BACKGROUND: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. RESULTS: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. CONCLUSION: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.
BACKGROUND: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. RESULTS: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. CONCLUSION: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.
Authors: A E Kel; E Gössling; I Reuter; E Cheremushkin; O V Kel-Margoulis; E Wingender Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971
Authors: K Cartharius; K Frech; K Grote; B Klocke; M Haltmeier; A Klingenhoff; M Frisch; M Bayerlein; T Werner Journal: Bioinformatics Date: 2005-04-28 Impact factor: 6.937
Authors: V Matys; E Fricke; R Geffers; E Gössling; M Haubrock; R Hehl; K Hornischer; D Karas; A E Kel; O V Kel-Margoulis; D-U Kloos; S Land; B Lewicki-Potapov; H Michael; R Münch; I Reuter; S Rotert; H Saxel; M Scheer; S Thiele; E Wingender Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971