Literature DB >> 12015879

Finding motifs using random projections.

Jeremy Buhler1, Martin Tompa.   

Abstract

The DNA motif discovery problem abstracts the task of discovering short, conserved sites in genomic DNA. Pevzner and Sze recently described a precise combinatorial formulation of motif discovery that motivates the following algorithmic challenge: find twenty planted occurrences of a motif of length fifteen in roughly twelve kilobases of genomic sequence, where each occurrence of the motif differs from its consensus in four randomly chosen positions. Such "subtle" motifs, though statistically highly significant, expose a weakness in existing motif-finding algorithms, which typically fail to discover them. Pevzner and Sze introduced new algorithms to solve their (15,4)-motif challenge, but these methods do not scale efficiently to more difficult problems in the same family, such as the (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif-discovery algorithm, PROJECTION, designed to enhance the performance of existing motif finders using random projections of the input's substrings. Experiments on synthetic data demonstrate that PROJECTION remedies the weakness observed in existing algorithms, typically solving the difficult (14,4)-, (16,5)-, and (18,6)-motif problems. Our algorithm is robust to nonuniform background sequence distributions and scales to larger amounts of sequence than that specified in the original challenge. A probabilistic estimate suggests that related motif-finding problems that PROJECTION fails to solve are in all likelihood inherently intractable. We also test the performance of our algorithm on realistic biological examples, including transcription factor binding sites in eukaryotes and ribosome binding sites in prokaryotes.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12015879     DOI: 10.1089/10665270252935430

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  65 in total

1.  Additivity in protein-DNA interactions: how good an approximation is it?

Authors:  Panayiotis V Benos; Martha L Bulyk; Gary D Stormo
Journal:  Nucleic Acids Res       Date:  2002-10-15       Impact factor: 16.971

2.  Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification.

Authors:  Peter M Haverty; Ulla Hansen; Zhiping Weng
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

3.  Finding functional sequence elements by multiple local alignment.

Authors:  Martin C Frith; Ulla Hansen; John L Spouge; Zhiping Weng
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

Review 4.  Charting gene regulatory networks: strategies, challenges and perspectives.

Authors:  Gong-Hong Wei; De-Pei Liu; Chih-Chuan Liang
Journal:  Biochem J       Date:  2004-07-01       Impact factor: 3.857

5.  Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes.

Authors:  Gurmukh Sahota; Gary D Stormo
Journal:  Bioinformatics       Date:  2010-08-31       Impact factor: 6.937

6.  Identifying tissue-selective transcription factor binding sites in vertebrate promoters.

Authors:  Andrew D Smith; Pavel Sumazin; Michael Q Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2005-01-24       Impact factor: 11.205

7.  Identification of muscle-specific regulatory modules in Caenorhabditis elegans.

Authors:  Guoyan Zhao; Lawrence A Schriefer; Gary D Stormo
Journal:  Genome Res       Date:  2007-02-06       Impact factor: 9.043

Review 8.  Computational methods to dissect cis-regulatory transcriptional networks.

Authors:  Vibha Rani
Journal:  J Biosci       Date:  2007-12       Impact factor: 1.826

9.  ZOOM! Zillions of oligos mapped.

Authors:  Hao Lin; Zefeng Zhang; Michael Q Zhang; Bin Ma; Ming Li
Journal:  Bioinformatics       Date:  2008-08-06       Impact factor: 6.937

10.  Discovery of phosphorylation motif mixtures in phosphoproteomics data.

Authors:  Anna Ritz; Gregory Shakhnarovich; Arthur R Salomon; Benjamin J Raphael
Journal:  Bioinformatics       Date:  2008-11-07       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.