I Rigoutsos1, A Floratos. 1. Computational Biology Center, IBM Thomas J. Watson Research Center, York Town Heights, NY 10598, USA.
Abstract
MOTIVATION: The discovery of motifs in biological sequences is an important problem. RESULTS: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasi-linear to the size of the generated output.
MOTIVATION: The discovery of motifs in biological sequences is an important problem. RESULTS: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (user-defined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasi-linear to the size of the generated output.
Authors: Janet Klass; Frank V Murphy; Susan Fouts; Melissa Serenil; Anita Changela; Jessica Siple; Mair E A Churchill Journal: Nucleic Acids Res Date: 2003-06-01 Impact factor: 16.971
Authors: Isidore Rigoutsos; Jiri Novotny; Tien Huynh; Stephen T Chin-Bow; Laxmi Parida; Daniel Platt; David Coleman; Thomas Shenk Journal: J Virol Date: 2003-04 Impact factor: 5.103