MOTIVATION: To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in genome-wide discovery of similarly acting enhancers, but requires prior knowledge of the set of TFs acting at the CRM and the TFs' binding motifs. RESULTS: We propose a method for CRM discovery that combines aspects of both approaches in an effort to overcome their individual limitations. By treating phylogenetically footprinted non-coding regions (PFRs) as proxies for CRMs, we endeavor to find PFRs near co-regulated genes that are comprised of similar short, conserved sequences. Using Markov chains as a convenient formulation to assess similarity, we develop a sampling algorithm to search a large group of PFRs for the most similar subset. When starting with a set of genes involved in Drosophila early blastoderm development and using phylogenetic comparisons of Drosophila melanogaster and D.pseudoobscura genomes, we show here that our algorithm successfully detects known CRMs. Further, we use our similarity metric, based on Markov chain discrimination, in a genome-wide search, and uncover additional known and many candidate early blastoderm CRMs. AVAILABILITY: Software is available via http://arep.med.harvard.edu/enhancer
MOTIVATION: To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in genome-wide discovery of similarly acting enhancers, but requires prior knowledge of the set of TFs acting at the CRM and the TFs' binding motifs. RESULTS: We propose a method for CRM discovery that combines aspects of both approaches in an effort to overcome their individual limitations. By treating phylogenetically footprinted non-coding regions (PFRs) as proxies for CRMs, we endeavor to find PFRs near co-regulated genes that are comprised of similar short, conserved sequences. Using Markov chains as a convenient formulation to assess similarity, we develop a sampling algorithm to search a large group of PFRs for the most similar subset. When starting with a set of genes involved in Drosophila early blastoderm development and using phylogenetic comparisons of Drosophila melanogaster and D.pseudoobscura genomes, we show here that our algorithm successfully detects known CRMs. Further, we use our similarity metric, based on Markov chain discrimination, in a genome-wide search, and uncover additional known and many candidate early blastoderm CRMs. AVAILABILITY: Software is available via http://arep.med.harvard.edu/enhancer
Authors: Hervé Rouault; Khalil Mazouni; Lydie Couturier; Vincent Hakim; François Schweisguth Journal: Proc Natl Acad Sci U S A Date: 2010-07-29 Impact factor: 11.205
Authors: Peter Van Loo; Stein Aerts; Bernard Thienpont; Bart De Moor; Yves Moreau; Peter Marynen Journal: Genome Biol Date: 2008-04-07 Impact factor: 13.583
Authors: Huaxia Qin; Michael W Y Chan; Sandya Liyanarachchi; Curtis Balch; Dustin Potter; Irene J Souriraj; Alfred S L Cheng; Francisco J Agosto-Perez; Elena V Nikonova; Pearlly S Yan; Huey-Jen Lin; Kenneth P Nephew; Joel H Saltz; Louise C Showe; Tim H M Huang; Ramana V Davuluri Journal: BMC Syst Biol Date: 2009-07-17