Literature DB >> 2184437

An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences.

C E Lawrence1, A A Reilly.   

Abstract

Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented. Each sequence must contain at least one common site. No alignment of the sites is required. Instead, the uncertainty in the location of the sites is handled by employing the missing information principle to develop an "expectation maximization" (EM) algorithm. This approach allows for the simultaneous identification of the sites and characterization of the binding motifs. The reliability of the algorithm increases with the number of fragments, but the computations increase only linearly. The method is illustrated with an example, using known cyclic adenosine monophosphate receptor protein (CRP) binding sites. The final motif is utilized in a search for undiscovered CRP binding sites.

Mesh:

Substances:

Year:  1990        PMID: 2184437     DOI: 10.1002/prot.340070105

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  92 in total

1.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads.

Authors:  J van Helden; A F Rios; J Collado-Vides
Journal:  Nucleic Acids Res       Date:  2000-04-15       Impact factor: 16.971

Review 2.  In silico identification of metazoan transcriptional regulatory regions.

Authors:  Wyeth W Wasserman; William Krivan
Journal:  Naturwissenschaften       Date:  2003-03-27

3.  Additivity in protein-DNA interactions: how good an approximation is it?

Authors:  Panayiotis V Benos; Martha L Bulyk; Gary D Stormo
Journal:  Nucleic Acids Res       Date:  2002-10-15       Impact factor: 16.971

4.  Finding important sites in protein sequences.

Authors:  Peter J Bickel; Katherina J Kechris; Philip C Spector; Gary J Wedemayer; Alexander N Glazer
Journal:  Proc Natl Acad Sci U S A       Date:  2002-11-04       Impact factor: 11.205

5.  Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences.

Authors:  Martin C Frith; John L Spouge; Ulla Hansen; Zhiping Weng
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

6.  Gibbs Recursive Sampler: finding transcription factor binding sites.

Authors:  William Thompson; Eric C Rouchka; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 7.  Computational approaches to identify promoters and cis-regulatory elements in plant genomes.

Authors:  Stephane Rombauts; Kobe Florquin; Magali Lescot; Kathleen Marchal; Pierre Rouzé; Yves van de Peer
Journal:  Plant Physiol       Date:  2003-07       Impact factor: 8.340

8.  Characterization of a new tissue-specific transcription factor binding to the simian virus 40 enhancer TC-II (NF-kappa B) element.

Authors:  A L Lattion; E Espel; P Reichenbach; C Fromental; P Bucher; A Israël; P Baeuerle; N R Rice; M Nabholz
Journal:  Mol Cell Biol       Date:  1992-11       Impact factor: 4.272

9.  MotifPrototyper: a Bayesian profile model for motif families.

Authors:  Eric P Xing; Richard M Karp
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-13       Impact factor: 11.205

10.  Discovery of sequence motifs related to coexpression of genes using evolutionary computation.

Authors:  Gary B Fogel; Dana G Weekes; Gabor Varga; Ernst R Dow; Harry B Harlow; Jude E Onyia; Chen Su
Journal:  Nucleic Acids Res       Date:  2004-07-20       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.