Literature DB >> 18974177

PRODORIC (release 2009): a database and tool platform for the analysis of gene regulation in prokaryotes.

Andreas Grote1, Johannes Klein, Ida Retter, Isam Haddad, Susanne Behling, Boyke Bunk, Ilona Biegler, Svitlana Yarmolinetz, Dieter Jahn, Richard Münch.   

Abstract

PRODORIC is a database that provides annotated information on the regulation of gene expression in prokaryotes. It integrates a large compilation of gene regulatory data including transcription factor binding sites, promoter structures and gene expression patterns. The whole dataset is manually curated and relies on published results extracted from the scientific literature. The current extended version of PRODORIC contains gene regulatory data for several new microorganisms. Major improvements were realized in the design of the web interface and the accessibility of the stored information. The database was further improved by the implementation of various new tools for the elucidation of gene regulatory interactions. Thus, the PRODORIC platform represents a framework for the interactive exploration, prediction and evaluation of gene regulatory networks in prokaryotes. PRODORIC is accessible at http://www.prodoric.de.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18974177      PMCID: PMC2686542          DOI: 10.1093/nar/gkn837

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In the last decade, the analysis and modeling of prokaryotic gene regulatory networks as basis of a systems biology approach to infection and biotechnological processes became of central interest (1,2). In this context network reconstruction requires reliable datasets of gene regulatory interactions, which are usually only available in the scientific literature. The fast accumulation of published gene regulatory data enhanced by the availability of numerous finished genomes and by high-throughput technologies fostered the development of structured repositories in the form of public databases. Several specialized gene regulation databases with focus on one model organism or several organism groups were established (3–8). The PRODORIC database was released in 2003 as a universal data source covering gene regulation in prokaryotes with focus on pathogenic bacteria (9). In a manual curation process relevant data is extracted by constantly screening of the scientific literature. The main part of PRODORIC contains a unique collection of transcription factor binding sites (TFBSs) and their interacting transcription factors. Besides these regulatory interactions, promoter structures with transcriptional initiation sites and sigma factor binding sites were included. Moreover, gene expression data derived from published microarray experiments were integrated. An integral part of PRODORIC are aligned profiles of TFBSs for a certain regulator represented as positions weight matrices (PWMs) and sequence logos (10,11). Provided PWMs are useful tools for pattern matching, and thus for the prediction of unknown putative TFBSs in DNA sequences of interest. For this purpose PRODORIC is associated with the prediction tool Virtual Footprint that allows a PWM based scanning of sequences or even whole genomes for new regulator targets (12). Here, we summarize the modifications and improvements of PRODORIC made in the recent years. This comprises a significant increase of data content and updates of our tools. Moreover, PRODORIC was further developed towards a database and bioinformatics tool platform combining data and software for the interactive browsing, prediction and evaluation of gene regulatory networks in prokaryotes.

DATABASE CURATION AND CONTENT

PRODORIC relies completely on published results with experimental validation and is not complemented with computational predicted data. The transformation of free-text data from the primary literature into structured information is constantly done manually by a team of curators. During the process of literature screening we observed that even refined PubMed searches with keywords like ‘gene regulation’ and ‘prokaryotes’ are not sensitive enough since important classification terms like ‘DNaseI footprint’ or ‘electromobility shift assay’ are not generally part of PubMed abstracts. Since these terms are often associated with figure captions we optimized the literature preselection and data mining tasks by use of the PDF search engine CaptionSearch (13). The main content of PRODORIC was significantly increased to an overall number of nearly 3000 TFBSs. The number of promoter and operon structures as well as expression profiles increased concurrently (Table 1). The main portion of regulatory interactions is expectedly covered by the two model organisms Escherichia coli and Bacillus subtilis. Interestingly, these are followed by Pseudomonas aeruginosa and Staphylococcus aureus revealing the relevance of data from pathogenic bacteria. An other striking group of bacteria annotated recently are phototrophic bacteria like Rhodobacter sphaeroides and Synechococcus sp. DNA sequence elements like TFBSs or transcriptional start sites are usually mapped to fixed genomic positions. Consequently, PRODORIC is limitted to sequenced organisms with elucidated genome sequence. Therefore, finished genomes are imported from flat files into PRODORIC in a frequent process, so the number of available organisms has increased to 696 different bacterial genomes with a total of 1304 replicons.
Table 1.

Statistics of the PRODORIC content (september 2008)

OrganismTFBSsGenesRegulonsPWMsaPromotersProfilesb
Escherichia coli167010459088 (76)74064 (4666)
Bacillus subtilis7857388871 (53)49334 (2488)
Pseudomonas aeruginosa1972643322 (19)16416 (1292)
Staphylococcus aureus10639929
Rhodobacter sphaeroides383343 (3)28
Streptococcus pyogenes181353 (2)1119 (416)
Bradyrhizobium japonicum142233 (3)11
Synechococcus sp.136213
Rhizobium meliloti1248112 (198)
Others6867257 (7)8613 (402)
Sum29212231267197 (163)1586148 (9462)

aThe non-redundant number of position weight matrices (number in parentheses).

bThe sum of genes that are linked to the profiles (number in parentheses).

Statistics of the PRODORIC content (september 2008) aThe non-redundant number of position weight matrices (number in parentheses). bThe sum of genes that are linked to the profiles (number in parentheses). For the purpose of pattern matching and prediction of potentially new transcription factor targets, a significant number of new PWMs were generated from aligned profiles of TFBSs (Figure 1). This PWM library provides the data basis for the PRODORIC associated prediction tool Virtual Footprint.
Figure 1.

Position weight matrix view of PRODORIC for the binding site of the Anr transcription faction from Pseudomonas aerugionosa.

Position weight matrix view of PRODORIC for the binding site of the Anr transcription faction from Pseudomonas aerugionosa.

DATABASE ACCESS

There are principally four different ways to access PRODORIC: Submitting a database query via the supplied web forms. Browsing through the content by the use of genome browser GBpro. Exploring the regulatory network as visualized graph with the ProdoNet tool. Accessing the database via webservices [Simple Object Access Protocol (SOAP) interface]. The previously developed PRODORIC web interface was significantly improved with regard to its design, handling and web browser support. Database queries with genes, proteins, TFBSs and PWMs can be submitted via web forms. We added new sections for searching promoters, expression profiles and whole regulons. Besides the regular web forms, various improved possibilities for interactive browsing of the database contents were implemented. The new version of the genome browser GBpro offers an improved presentation of gene regulatory features both as genome map and formatted sequence. In this context, the application of inline frames enabled a more convenient browsing through the database contents. We recently developed ProdoNet, a new visualization tool for the exploration of PRODORIC contents in an interactive graph view (14). This tool enables the detection and visualization of underlying gene regulatory networks to uncover the multiple levels of gene regulation like regulatory circuits and various network motifs. Moreover, ProdoNet allows for the mapping and visualization of sets of co-expressed genes to gene regulatory network graphs. A different method to query the database without using the webpages was implemented recently via the establishment of webservices using SOAP. These webservices enable a platform-independent access to PRODORIC and offer an interactive way for data integration which was realized first for the SYSTOMONAS and ROSY platforms (15,16). A more detailed description of the SOAP interface and application examples are available on the PRODORIC website.

PREDICTION OF GENE REGULATORY NETWORKS

Although the PRODORIC core database excludes computationally predicted data, we follow the approach of a database assisted interactive prediction and validation of gene regulatory networks. Produced results are usually most accurate since they are based on the most recent set of data. For this purpose we developed Virtual Footprint, a tool for the prediction of potentially new transcription factor targets (12). Various search patterns can be defined by PWMs, IUPAC consensus strings or regular expressions. Complex bipartite patterns consisting of two subpatterns separated by a spacer are also possible. The integrated PWM library derived from the PRODORIC dataset was extended to 197 patterns corresponding to 163 different transcription factors (Table 1). The Virtual Footprint program allows the analysis of complete genomes with one PWM, which is called ‘regulon analysis’. In the other program mode ‘promoter analysis’, all available patterns are applied on one sequence. The new PRODORIC release 2009 was supplemented with a new tool called SMILE (similar intergenic location analyzer). Using this novel tool, the evolutionary conservation of Virtual Footprint matches can be further investigated by a comparative analysis of orthologous promoter sequences similar to a regulog analysis (17). In SMILE both sequence and positional conservation within an orthologous group of matches can be analyzed. This approach enables the evaluation of putative transcription factor targets and helps to rule out false-positive predictions (Figure 2).
Figure 2.

SMILE analysis using the Anr binding site in the promoter of the hemN gene. The results show both a high evolutional and positional conservation between the orthologous promoters (the list of matches was shortened).

SMILE analysis using the Anr binding site in the promoter of the hemN gene. The results show both a high evolutional and positional conservation between the orthologous promoters (the list of matches was shortened).

CONCLUSIONS

PRODORIC is a manual curated data resource and bioinformatics tool platform about gene regulation and gene expression covering all sequenced prokaryotes. The whole system is supplemented with various browsing, prediction and validation tools representing a framework for the interactive analysis and visualization of gene regulatory networks. The manual curation process of PRODORIC will be continued. Mapping of gene regulatory interactions on sequenced genomes will be one of the most challenging task. The availability of reliable gene regulatoy networks will be essential for modeling approaches in systems biology.

FUNDING

The Deutsche Forschungsgemeinschaft (Sonderforschungsbereich 578); German Bundesministerium für Bildung und Forschung (ERA-NET grant 0313936C). Funding for open access publication charge: Technical University of Braunschweig. Conflict of interest statement. None declared.
  16 in total

1.  Network motifs in the transcriptional regulation network of Escherichia coli.

Authors:  Shai S Shen-Orr; Ron Milo; Shmoolik Mangan; Uri Alon
Journal:  Nat Genet       Date:  2002-04-22       Impact factor: 38.330

2.  Regulog analysis: detection of conserved regulatory networks across bacteria: application to Staphylococcus aureus.

Authors:  Wynand B L Alkema; Boris Lenhard; Wyeth W Wasserman
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

3.  Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes.

Authors:  Richard Münch; Karsten Hiller; Andreas Grote; Maurice Scheer; Johannes Klein; Max Schobert; Dieter Jahn
Journal:  Bioinformatics       Date:  2005-08-18       Impact factor: 6.937

4.  What are DNA sequence motifs?

Authors:  Patrik D'haeseleer
Journal:  Nat Biotechnol       Date:  2006-04       Impact factor: 54.908

5.  ROSY--a flexible and universal database and bioinformatics tool platform for Roseobacter related species.

Authors:  Claudia Pommerenke; Inga Gabriel; Boyke Bunk; Richard Münch; Isam Haddad; Petra Tielen; Irene Wagner-Döbler; Dieter Jahn
Journal:  In Silico Biol       Date:  2008

6.  A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome.

Authors:  K Robison; A M McGuire; G M Church
Journal:  J Mol Biol       Date:  1998-11-27       Impact factor: 5.469

7.  SwissRegulon: a database of genome-wide annotations of regulatory sites.

Authors:  Mikhail Pachkov; Ionas Erb; Nacho Molina; Erik van Nimwegen
Journal:  Nucleic Acids Res       Date:  2006-11-27       Impact factor: 16.971

8.  RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes.

Authors:  Alexei E Kazakov; Michael J Cipriano; Pavel S Novichkov; Simon Minovitsky; Dmitry V Vinogradov; Adam Arkin; Andrey A Mironov; Mikhail S Gelfand; Inna Dubchak
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

9.  SYSTOMONAS--an integrated database for systems biology analysis of Pseudomonas.

Authors:  Claudia Choi; Richard Münch; Stefan Leupold; Johannes Klein; Inga Siegel; Bernhard Thielen; Beatrice Benkert; Martin Kucklick; Max Schobert; Jens Barthelmes; Christian Ebeling; Isam Haddad; Maurice Scheer; Andreas Grote; Karsten Hiller; Boyke Bunk; Kerstin Schreiber; Ida Retter; Dietmar Schomburg; Dieter Jahn
Journal:  Nucleic Acids Res       Date:  2007-01       Impact factor: 16.971

10.  CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks.

Authors:  Jan Baumbach
Journal:  BMC Bioinformatics       Date:  2007-11-06       Impact factor: 3.169

View more
  34 in total

Review 1.  Phylogenetic footprinting: a boost for microbial regulatory genomics.

Authors:  Pramod Katara; Atul Grover; Vinay Sharma
Journal:  Protoplasma       Date:  2011-11-24       Impact factor: 3.356

2.  Every Site Counts: Submitting Transcription Factor-Binding Site Information through the CollecTF Portal.

Authors:  Ivan Erill
Journal:  J Bacteriol       Date:  2015-05-26       Impact factor: 3.490

Review 3.  Tailor-made transcriptional biosensors for optimizing microbial cell factories.

Authors:  Brecht De Paepe; Gert Peters; Pieter Coussement; Jo Maertens; Marjan De Mey
Journal:  J Ind Microbiol Biotechnol       Date:  2016-11-11       Impact factor: 3.346

4.  The Master Regulators of the Fla1 and Fla2 Flagella of Rhodobacter sphaeroides Control the Expression of Their Cognate CheY Proteins.

Authors:  José Hernandez-Valle; Clelia Domenzain; Javier de la Mora; Sebastian Poggio; Georges Dreyfus; Laura Camarena
Journal:  J Bacteriol       Date:  2017-02-14       Impact factor: 3.490

5.  A short story about a big magic bug.

Authors:  Boyke Bunk; Arne Schulz; Simon Stammen; Richard Münch; Martin J Warren; Manfred Rohde; Dieter Jahn; Rebekka Biedendieck
Journal:  Bioeng Bugs       Date:  2010-01-04

6.  RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes.

Authors:  Pavel S Novichkov; Olga N Laikova; Elena S Novichkova; Mikhail S Gelfand; Adam P Arkin; Inna Dubchak; Dmitry A Rodionov
Journal:  Nucleic Acids Res       Date:  2009-11-01       Impact factor: 16.971

Review 7.  Genomic repertoires of DNA-binding transcription factors across the tree of life.

Authors:  Varodom Charoensawan; Derek Wilson; Sarah A Teichmann
Journal:  Nucleic Acids Res       Date:  2010-07-30       Impact factor: 16.971

8.  Differential RNA Sequencing Implicates Sulfide as the Master Regulator of S0 Metabolism in Chlorobaculum tepidum and Other Green Sulfur Bacteria.

Authors:  Jacob M Hilzinger; Vidhyavathi Raman; Kevin E Shuman; Brian J Eddie; Thomas E Hanson
Journal:  Appl Environ Microbiol       Date:  2018-01-17       Impact factor: 4.792

9.  phiSITE: database of gene regulation in bacteriophages.

Authors:  Lubos Klucar; Matej Stano; Matus Hajduk
Journal:  Nucleic Acids Res       Date:  2009-11-09       Impact factor: 16.971

10.  PePPER: a webserver for prediction of prokaryote promoter elements and regulons.

Authors:  Anne de Jong; Hilco Pietersma; Martijn Cordes; Oscar P Kuipers; Jan Kok
Journal:  BMC Genomics       Date:  2012-07-02       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.