Literature DB >> 22081765

A cost-aggregating integer linear program for motif finding.

Carl Kingsford1, Elena Zaslavsky, Mona Singh.   

Abstract

In the motif finding problem one seeks a set of mutually similar substrings within a collection of biological sequences. This is an important and widely-studied problem, as such shared motifs in DNA often correspond to regulatory elements. We study a combinatorial framework where the goal is to find substrings of a given length such that the sum of their pairwise distances is minimized. We describe a novel integer linear program for the problem, which uses the fact that distances between substrings come from a limited set of possibilities allowing for aggregate consideration of sequence position pairs with the same distances. We show how to tighten its linear programming relaxation by adding an exponential set of constraints and give an efficient separation algorithm that can find violated constraints, thereby showing that the tightened linear program can still be solved in polynomial time. We apply our approach to find optimal solutions for the motif finding problem and show that it is effective in practice in uncovering known transcription factor binding sites.

Entities:  

Year:  2011        PMID: 22081765      PMCID: PMC3212737          DOI: 10.1016/j.jda.2011.04.001

Source DB:  PubMed          Journal:  J Discrete Algorithms (Amst)        ISSN: 1570-8667


  16 in total

1.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.

Authors:  G Z Hertz; G D Stormo
Journal:  Bioinformatics       Date:  1999 Jul-Aug       Impact factor: 6.937

2.  Systematic determination of genetic network architecture.

Authors:  S Tavazoie; J D Hughes; M J Campbell; R J Cho; G M Church
Journal:  Nat Genet       Date:  1999-07       Impact factor: 38.330

3.  Gibbs Recursive Sampler: finding transcription factor binding sites.

Authors:  William Thompson; Eric C Rouchka; Charles E Lawrence
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

4.  Comparative analysis of methods for representing and searching for transcription factor binding sites.

Authors:  Robert Osada; Elena Zaslavsky; Mona Singh
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

5.  Solving and analyzing side-chain positioning problems using linear and integer programming.

Authors:  Carleton L Kingsford; Bernard Chazelle; Mona Singh
Journal:  Bioinformatics       Date:  2004-11-16       Impact factor: 6.937

6.  A workbench for multiple alignment construction and analysis.

Authors:  G D Schuler; S F Altschul; D J Lipman
Journal:  Proteins       Date:  1991

7.  A combinatorial optimization approach for diverse motif finding applications.

Authors:  Elena Zaslavsky; Mona Singh
Journal:  Algorithms Mol Biol       Date:  2006-08-17       Impact factor: 1.405

8.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes.

Authors:  A M McGuire; J D Hughes; G M Church
Journal:  Genome Res       Date:  2000-06       Impact factor: 9.043

9.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Authors:  C E Lawrence; S F Altschul; M S Boguski; J S Liu; A F Neuwald; J C Wootton
Journal:  Science       Date:  1993-10-08       Impact factor: 47.728

10.  Assessing computational tools for the discovery of transcription factor binding sites.

Authors:  Martin Tompa; Nan Li; Timothy L Bailey; George M Church; Bart De Moor; Eleazar Eskin; Alexander V Favorov; Martin C Frith; Yutao Fu; W James Kent; Vsevolod J Makeev; Andrei A Mironov; William Stafford Noble; Giulio Pavesi; Graziano Pesole; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher Workman; Chun Ye; Zhou Zhu
Journal:  Nat Biotechnol       Date:  2005-01       Impact factor: 54.908

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.