Literature DB >> 10487864

Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

G Z Hertz1, G D Stormo.   

Abstract

MOTIVATION: Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme.
RESULTS: We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein. AVAILABILITY: Programs were developed under the UNIX operating system and are available by anonymous ftp from ftp://beagle.colorado.edu/pub/consensus.

Entities:  

Mesh:

Substances:

Year:  1999        PMID: 10487864     DOI: 10.1093/bioinformatics/15.7.563

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  460 in total

1.  Assessing clusters and motifs from gene expression data.

Authors:  L M Jakt; L Cao; K S Cheah; D K Smith
Journal:  Genome Res       Date:  2001-01       Impact factor: 9.043

2.  Finding nuclear localization signals.

Authors:  M Cokol; R Nair; B Rost
Journal:  EMBO Rep       Date:  2000-11       Impact factor: 8.807

3.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads.

Authors:  J van Helden; A F Rios; J Collado-Vides
Journal:  Nucleic Acids Res       Date:  2000-04-15       Impact factor: 16.971

4.  Splicing enhancement in the yeast rp51b intron.

Authors:  D Libri; A Lescure; M Rosbash
Journal:  RNA       Date:  2000-03       Impact factor: 4.942

5.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome.

Authors:  Benjamin P Berman; Yutaka Nibu; Barret D Pfeiffer; Pavel Tomancak; Susan E Celniker; Michael Levine; Gerald M Rubin; Michael B Eisen
Journal:  Proc Natl Acad Sci U S A       Date:  2002-01-22       Impact factor: 11.205

6.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Authors:  J Besemer; A Lomsadze; M Borodovsky
Journal:  Nucleic Acids Res       Date:  2001-06-15       Impact factor: 16.971

7.  Discovering common stem-loop motifs in unaligned RNA sequences.

Authors:  J Gorodkin; S L Stricklin; G D Stormo
Journal:  Nucleic Acids Res       Date:  2001-05-15       Impact factor: 16.971

8.  Discovery of regulatory elements by a computational method for phylogenetic footprinting.

Authors:  Mathieu Blanchette; Martin Tompa
Journal:  Genome Res       Date:  2002-05       Impact factor: 9.043

9.  Additivity in protein-DNA interactions: how good an approximation is it?

Authors:  Panayiotis V Benos; Martha L Bulyk; Gary D Stormo
Journal:  Nucleic Acids Res       Date:  2002-10-15       Impact factor: 16.971

10.  Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle.

Authors:  Tata Pramila; Shawna Miles; Debraj GuhaThakurta; Dave Jemiolo; Linda L Breeden
Journal:  Genes Dev       Date:  2002-12-01       Impact factor: 11.361

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.