Literature DB >> 12032281

Probabilistic clustering of sequences: inferring new bacterial regulons by comparative genomics.

Erik van Nimwegen1, Mihaela Zavolan, Nikolaus Rajewsky, Eric D Siggia.   

Abstract

Genome-wide comparisons between enteric bacteria yield large sets of conserved putative regulatory sites on a gene-by-gene basis that need to be clustered into regulons. Using the assumption that regulatory sites can be represented as samples from weight matrices (WMs), we derive a unique probability distribution for assignments of sites into clusters. Our algorithm, "PROCSE" (probabilistic clustering of sequences), uses Monte Carlo sampling of this distribution to partition and align thousands of short DNA sequences into clusters. The algorithm internally determines the number of clusters from the data and assigns significance to the resulting clusters. We place theoretical limits on the ability of any algorithm to correctly cluster sequences drawn from WMs when these WMs are unknown. Our analysis suggests that the set of all putative sites for a single genome (e.g., Escherichia coli) is largely inadequate for clustering. When sites from different genomes are combined and all the homologous sites from the various species are used as a block, clustering becomes feasible. We predict 50-100 new regulons as well as many new members of existing regulons, potentially doubling the number of known regulatory sites in E. coli.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12032281      PMCID: PMC124229          DOI: 10.1073/pnas.112690399

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  23 in total

1.  Neutral evolution of mutational robustness.

Authors:  E van Nimwegen; J P Crutchfield; M Huynen
Journal:  Proc Natl Acad Sci U S A       Date:  1999-08-17       Impact factor: 11.205

2.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis.

Authors:  H J Bussemaker; H Li; E D Siggia
Journal:  Proc Natl Acad Sci U S A       Date:  2000-08-29       Impact factor: 11.205

3.  The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons.

Authors:  Nikolaus Rajewsky; Nicholas D Socci; Martin Zapotocky; Eric D Siggia
Journal:  Genome Res       Date:  2002-02       Impact factor: 9.043

4.  Analysis of the Escherichia coli gene encoding L-asparaginase II, ansB, and its regulation by cyclic AMP receptor and FNR proteins.

Authors:  M P Jennings; I R Beacham
Journal:  J Bacteriol       Date:  1990-03       Impact factor: 3.490

5.  A conserved RNA structure (thi box) is involved in regulation of thiamin biosynthetic gene expression in bacteria.

Authors:  J Miranda-Ríos; M Navarro; M Soberón
Journal:  Proc Natl Acad Sci U S A       Date:  2001-07-24       Impact factor: 11.205

6.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters.

Authors:  O G Berg; P H von Hippel
Journal:  J Mol Biol       Date:  1987-02-20       Impact factor: 5.469

7.  Identifying protein-binding sites from unaligned DNA fragments.

Authors:  G D Stormo; G W Hartzell
Journal:  Proc Natl Acad Sci U S A       Date:  1989-02       Impact factor: 11.205

Review 8.  Adenosylmethionine-dependent iron-sulfur enzymes: versatile clusters in a radical new role.

Authors:  J Cheek; J B Broderick
Journal:  J Biol Inorg Chem       Date:  2001-03       Impact factor: 3.358

9.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Authors:  C E Lawrence; S F Altschul; M S Boguski; J S Liu; A F Neuwald; J C Wootton
Journal:  Science       Date:  1993-10-08       Impact factor: 47.728

10.  Specificity and robustness in transcription control networks.

Authors:  Anirvan M Sengupta; Marko Djordjevic; Boris I Shraiman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-02-19       Impact factor: 11.205

View more
  32 in total

1.  Identification of the binding sites of regulatory proteins in bacterial genomes.

Authors:  Hao Li; Virgil Rhodius; Carol Gross; Eric D Siggia
Journal:  Proc Natl Acad Sci U S A       Date:  2002-08-14       Impact factor: 11.205

2.  Quantifying modularity in the evolution of biomolecular systems.

Authors:  Berend Snel; Martijn A Huynen
Journal:  Genome Res       Date:  2004-03       Impact factor: 9.043

Review 3.  Phylogenetic footprinting: a boost for microbial regulatory genomics.

Authors:  Pramod Katara; Atul Grover; Vinay Sharma
Journal:  Protoplasma       Date:  2011-11-24       Impact factor: 3.356

4.  Genome-wide expression profiling, in vivo DNA binding analysis, and probabilistic motif prediction reveal novel Abf1 target genes during fermentation, respiration, and sporulation in yeast.

Authors:  Ulrich Schlecht; Ionas Erb; Philippe Demougin; Nicolas Robine; Valérie Borde; Erik van Nimwegen; Alain Nicolas; Michael Primig
Journal:  Mol Biol Cell       Date:  2008-02-27       Impact factor: 4.138

5.  Making connections between novel transcription factors and their DNA motifs.

Authors:  Kai Tan; Lee Ann McCue; Gary D Stormo
Journal:  Genome Res       Date:  2005-01-14       Impact factor: 9.043

6.  Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix.

Authors:  Rahul Siddharthan
Journal:  PLoS One       Date:  2010-03-22       Impact factor: 3.240

7.  Simultaneous prediction of transcription factor binding sites in a group of prokaryotic genomes.

Authors:  Shaoqiang Zhang; Shan Li; Phuc T Pham; Zhengchang Su
Journal:  BMC Bioinformatics       Date:  2010-07-23       Impact factor: 3.169

8.  Genomic analysis identifies a transcription-factor binding motif regulating expression of the alpha C protein in Group B Streptococcus.

Authors:  David C Klinzing; Lawrence C Madoff; Karen M Puopolo
Journal:  Microb Pathog       Date:  2009-03-27       Impact factor: 3.738

Review 9.  Finding regulatory elements and regulatory motifs: a general probabilistic framework.

Authors:  Erik van Nimwegen
Journal:  BMC Bioinformatics       Date:  2007-09-27       Impact factor: 3.169

10.  PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.

Authors:  Rahul Siddharthan; Eric D Siggia; Erik van Nimwegen
Journal:  PLoS Comput Biol       Date:  2005-12-09       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.