Literature DB >> 19834570

PIDA:A new algorithm for pattern identification.

C Putonti1, Bm Pettitt, Jg Reid, Y Fofanov.   

Abstract

Algorithms for motif identification in sequence space have predominately been focused on recognizing patterns of a fixed length containing regions of perfect conservation with possible regions of unconstrained sequence. Such motifs can be found in everything from proteins with distinct active sites to non-coding RNAs with specific structural elements that are necessary to maintain functionality. In the event that an insertion/deletion has occurred within an unconstrained portion of the pattern, it is possible that the pattern retains its functionality. In such a case the length of the pattern is now variable and may be overlooked when utilizing existing motif detection methods. The Pattern Island Detection Algorithm (PIDA) presented here has been developed to recognize patterns that have occurrences of varying length within sequences of any size alphabet. PIDA works by identifying all regions of perfect conservation (for lengths longer than a user-specified threshold), and then builds those conservation "islands" into fixed-length patterns. Next the algorithm modifies these fixed-length patterns by identifying additional (and different) islands that can be incorporated into each pattern through insertions/deletions within the "water" separating the islands. To provide some benchmarks for this analysis, PIDA was used to search for patterns within randomly generated sequences as well as sequences known to contain conserved patterns. For each of the patterns found, the statistical significance is calculated based upon the pattern's likelihood to appear by chance, thus providing a means to determine those patterns which are likely to have a functional role. The PIDA approach to motif finding is designed to perform best when searching for patterns of variable length although it is also able to identify patterns of a fixed length. PIDA has been created to be as generally applicable as possible since there are a variety of sequence problems of this type. The algorithm was implemented in C++ and is freely available upon request from the authors.

Entities:  

Year:  2007        PMID: 19834570      PMCID: PMC2761635     

Source DB:  PubMed          Journal:  Online J Bioinform        ISSN: 1443-2250


  25 in total

1.  A statistical method for finding transcription factor binding sites.

Authors:  S Sinha; M Tompa
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

2.  TFBS: Computational framework for transcription factor binding site analysis.

Authors:  Boris Lenhard; Wyeth W Wasserman
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

3.  CMfinder--a covariance model based RNA motif finding algorithm.

Authors:  Zizhen Yao; Zasha Weinberg; Walter L Ruzzo
Journal:  Bioinformatics       Date:  2005-12-15       Impact factor: 6.937

4.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm.

Authors:  I Rigoutsos; A Floratos
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

5.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes.

Authors:  A M McGuire; J D Hughes; G M Church
Journal:  Genome Res       Date:  2000-06       Impact factor: 9.043

6.  Genetic identification of the DNA binding domain of Escherichia coli LexA protein.

Authors:  A T Thliveris; D W Mount
Journal:  Proc Natl Acad Sci U S A       Date:  1992-05-15       Impact factor: 11.205

7.  Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes.

Authors:  L McCue; W Thompson; C Carmack; M P Ryan; J S Liu; V Derbyshire; C E Lawrence
Journal:  Nucleic Acids Res       Date:  2001-02-01       Impact factor: 16.971

8.  Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

Authors:  A F Neuwald; J S Liu; C E Lawrence
Journal:  Protein Sci       Date:  1995-08       Impact factor: 6.725

9.  MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data.

Authors:  K Quandt; K Frech; H Karas; E Wingender; T Werner
Journal:  Nucleic Acids Res       Date:  1995-12-11       Impact factor: 16.971

10.  A method for aligning RNA secondary structures and its application to RNA motif detection.

Authors:  Jianghui Liu; Jason T L Wang; Jun Hu; Bin Tian
Journal:  BMC Bioinformatics       Date:  2005-04-07       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.