Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 PIDA:A new algorithm for pattern identification.

Literature DB >> 19834570

PIDA:A new algorithm for pattern identification.

C Putonti¹, Bm Pettitt, Jg Reid, Y Fofanov.

Abstract

Algorithms for motif identification in sequence space have predominately been focused on recognizing patterns of a fixed length containing regions of perfect conservation with possible regions of unconstrained sequence. Such motifs can be found in everything from proteins with distinct active sites to non-coding RNAs with specific structural elements that are necessary to maintain functionality. In the event that an insertion/deletion has occurred within an unconstrained portion of the pattern, it is possible that the pattern retains its functionality. In such a case the length of the pattern is now variable and may be overlooked when utilizing existing motif detection methods. The Pattern Island Detection Algorithm (PIDA) presented here has been developed to recognize patterns that have occurrences of varying length within sequences of any size alphabet. PIDA works by identifying all regions of perfect conservation (for lengths longer than a user-specified threshold), and then builds those conservation "islands" into fixed-length patterns. Next the algorithm modifies these fixed-length patterns by identifying additional (and different) islands that can be incorporated into each pattern through insertions/deletions within the "water" separating the islands. To provide some benchmarks for this analysis, PIDA was used to search for patterns within randomly generated sequences as well as sequences known to contain conserved patterns. For each of the patterns found, the statistical significance is calculated based upon the pattern's likelihood to appear by chance, thus providing a means to determine those patterns which are likely to have a functional role. The PIDA approach to motif finding is designed to perform best when searching for patterns of variable length although it is also able to identify patterns of a fixed length. PIDA has been created to be as generally applicable as possible since there are a variety of sequence problems of this type. The algorithm was implemented in C++ and is freely available upon request from the authors.

Entities: Chemical

Year: 2007 PMID： 19834570 PMCID： PMC2761635

Source DB: PubMed Journal: Online J Bioinform ISSN： 1443-2250

Keyword Cloud
References

25 in total

PIDA:A new algorithm for pattern identification.

1. A statistical method for finding transcription factor binding sites.

2. TFBS: Computational framework for transcription factor binding site analysis.

3. CMfinder--a covariance model based RNA motif finding algorithm.

4. Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.

5. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes.

6. Genetic identification of the DNA binding domain of Escherichia coli LexA protein.

7. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes.

8. Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

9. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data.

10. A method for aligning RNA secondary structures and its application to RNA motif detection.