Literature DB >> 24214966

DOOR 2.0: presenting operons and their functions through dynamic and integrated views.

Xizeng Mao1, Qin Ma, Chuan Zhou, Xin Chen, Hanyuan Zhang, Jincai Yang, Fenglou Mao, Wei Lai, Ying Xu.   

Abstract

We have recently developed a new version of the DOOR operon database, DOOR 2.0, which is available online at http://csbl.bmb.uga.edu/DOOR/ and will be updated on a regular basis. DOOR 2.0 contains genome-scale operons for 2072 prokaryotes with complete genomes, three times the number of genomes covered in the previous version published in 2009. DOOR 2.0 has a number of new features, compared with its previous version, including (i) more than 250,000 transcription units, experimentally validated or computationally predicted based on RNA-seq data, providing a dynamic functional view of the underlying operons; (ii) an integrated operon-centric data resource that provides not only operons for each covered genome but also their functional and regulatory information such as their cis-regulatory binding sites for transcription initiation and termination, gene expression levels estimated based on RNA-seq data and conservation information across multiple genomes; (iii) a high-performance web service for online operon prediction on user-provided genomic sequences; (iv) an intuitive genome browser to support visualization of user-selected data; and (v) a keyword-based Google-like search engine for finding the needed information intuitively and rapidly in this database.

Entities:  

Mesh:

Year:  2013        PMID: 24214966      PMCID: PMC3965076          DOI: 10.1093/nar/gkt1048

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Operons have been widely used as the basic transcriptional and functional units when studying higher-level functional systems in prokaryotes such as biochemical pathways, networks and regulation systems since the concept was proposed by French scientists Jacob and Monod in 1960 (1). Although it has never been suggested by the two scientists in their original paper, computational prediction of operons often treats them as units that do not overlap with each other (2,3), as this greatly simplifies operon prediction on the genomic scale. For the past decade, an increasingly popular term being used is ‘transcriptional units’, which are experimentally identified ‘operons’ as defined by Jacob and Monod in 1960 and may have overlaps. The emergence of large-scale RNA-seq data for increasingly more prokaryotic organisms has made it possible to elucidate ‘operons’ in their full complexities, as few genome-scale transcriptomic data collected under multiple conditions have been used to reveal the dynamic structures of the statically predicted operons under different experimental conditions (4). We envision that the need for elucidation of the condition-dependent transcriptional units (TUs) (4,5) will continue to increase, as increasingly more RNA-seq data become available. Throughout this article, we use operons to refer to static non-overlapping ‘transcriptional units’ while using TUs to refer to operons according to the original definition of Jacob and Monod, i.e. sequences of consecutive genes that each encode a single RNA molecule along with their own promoters and terminators. The typical relationship between operons and TUs is that TUs tend to be sub-units of operons, while in some cases, a TU may span more than one operon. As of now, a number of operon databases have been publicly deployed by different research groups, including RegulonDB (5), ODB (6), DBTBS (7), OperonDB (8), ProOpDB (9) and DOOR (10) that was developed by our laboratory. These databases differ in their coverage of the operon information, and only a few have TU data. For example, the current version of RegulonDB contains >800 unique TUs for Escherichia coli (5) and ODB has 10 000 TUs (11), both collected from the public domain. Most of these databases do not contain regulatory information for their operons such as transcription factor binding sites and transcription terminators. In addition, none of these database servers provide services for online operon prediction on user-provided genomic sequences; only ODB provides 4812 reference operons that can potentially be used to assist operon prediction. The new version of the DOOR database, DOOR 2.0, covers all the 2072 completely sequenced prokaryotic genomes in the NCBI genome database (as of April 2012), which is three times the number of genomes covered in its previous version published in 2009. In addition, DOOR 2.0 has several new features, namely, (i) 254 685 TUs collected from public databases such as RegulonDB (5) and Palsson’s dataset (4) or computationally predicted based on RNA-seq data; (ii) an integrated operon-centric data resource offering operons, their regulatory binding sites for transcription initiation (TFBSs), transcription terminators, gene-expression levels estimated based on RNA-seq data and their conservation information across multiple genomes; (iii) a high-performance web service for operon prediction on user-provided genomic sequences, powered by a backend computer cluster with >150 computing nodes; (iv) an intuitive genome browser to support visualization of user-specified data in the database; and (v) a keyword-based Google-like search engine for finding the needed information in the database intuitively and rapidly. To the best of our knowledge, DOOR 2.0 is the first web-based operon database that integrates all such capabilities. Together, it provides an easy-to-use environment for discovering new information and synthesizing new knowledge about operons, their function, regulation and evolution across all sequenced prokaryotes. The database can be accessed at http://csbl.bmb.uga.edu/DOOR/, which will be updated on a regular basis when new prokaryotic genomes are released.

DATABASE UPDATE

DOOR 2.0 contains operons for 2072 complete prokaryotic genomes that were downloaded from the NCBI Genome FTP server (April 2012), which consists of 1939 bacteria and 133 archaea, with 2205 chromosomes and 1645 plasmids. We predicted 1 323 902 multi-gene operons using our prediction program (12), on average ∼583 such operons per chromosome and ∼24 operons per plasmid, and 2 578 949 single-gene operons, as detailed in Table 1. All the operons are stored in a MySQL relational database on our server and can be accessed efficiently through different ways. A user can browse operons by organisms or chromosomes/plasmids that are organized into a searchable HTML table under the ‘Organisms’ navigation menu. The operons for an organism can be downloaded through the ‘Download’ link on the ‘listing operons’ page. A user can search for individual operons in the search box using keyword(s), which is located in the upper right corner of the web page (Figure 1A). The user can also specify more complex queries by using multiple keywords connected through Boolean operators just as in Google, whose details can be found in the online manual at the DOOR 2.0 web server (Figure 1A).
Table 1.

The key statistics of DOOR 2.0

CategoryNumber of operonsWith TUsWith TFBSsWith terminatorsNumber of conserved operon
Species2072242032072N/A
Chromosome2205242242205N/A
Plasmid16450131645N/A
Operon (M)1 323 902254 68542291 493 2726 975 454
Operon (S)2 578 949N/A22601 963 446N/A

Operon (M), multi-gene operons; Operon (S), single-gene operons; N/A, not applicable.

Figure 1.

(A) A screenshot of a display window. (B) A display of TUs, with the red bars representing genes, the first row of the blue bars representing multi-gene operons and the following rows of blue bars being TUs under different conditions. (C) A display of validated or predicted transcription factor binding sites (the left bottom) and Rho-independent terminators (on the right); and (D) conserved operons.

(A) A screenshot of a display window. (B) A display of TUs, with the red bars representing genes, the first row of the blue bars representing multi-gene operons and the following rows of blue bars being TUs under different conditions. (C) A display of validated or predicted transcription factor binding sites (the left bottom) and Rho-independent terminators (on the right); and (D) conserved operons. The key statistics of DOOR 2.0 Operon (M), multi-gene operons; Operon (S), single-gene operons; N/A, not applicable.

NEW FEATURES IN DOOR 2.0

DOOR 2.0 consists of 254 685 (1385 experimentally verified and 253 300 predicted) TUs for 24 prokaryotic genomes, 6408 verified TFBS for 203 prokaryotic genomes, 3 456 718 Rho-independent terminators for 2072 genomes and 6 975 454 conserve operons. The reason that only 24 organisms have TU information is that only those organisms each have a large number of RNA-seq data, sufficient for reliable TU predictions. We expect that this number will increase rapidly as the more genome-scale RNA-seq data become available. The previous version of DOOR supports the following features: (i) an online operon database for 675 prokaryotic genomes, (ii) a menu-based interface for finding user-specified attributes of operons, (iii) a motif prediction service for user-specified operons and (iv) a Wiki page to facilitate communications between the users and the developer. DOOR 2.0 has kept all these features except for ‘operon search based on its number of genes’ and the Wiki page, as we found that they have not been actively used based on the usage statistics in the past 4 years. In addition, DOOR 2.0 has a number of new features, selected based on users’ inputs as well as our expectation of what might be needed by users of an operon database, based on our own research experience of comparative genome analyses.

INTEGRATION OF TUs

An operon may be transcribed into different TUs under different experimental conditions, which tend to be sub-operonic with their own promoters and/or terminators (13), whereas in some cases could be super-operonic, which spans at least two operons (4). The TUs can be derived through RNA-seq analysis. In addition, numerous TUs have been experimentally validated in various prokaryotes and stored in public databases (5). We have collected 1385 experimentally validated TUs in E. coli from the RegulonDB database (5) and Palsson’s dataset (4), with 941 and 842 from the first and the second dataset, respectively. In addition, we have predicted 253 300 TUs for all 24 bacterial genomes with genome-scale RNA-seq data in the NCBI SRA database (release of March 2013) (14) using our in-house program SeqTU (manuscript in preparation), 119 RNA-seq datasets being used for our prediction. SeqTU is a machine learning-based classifier for detecting boundaries between consecutive TUs on the same strand of a genome. All the TUs are stored in a relational database and can be retrieved and displayed through the genome browser (Figure 1B). A user can examine TUs within an operon using the ‘operon’ page. Like operons, each TU has its own gene list with their genomic coordinates, underlying RNA-seq data, and an accuracy score if the TU is predicted by SeqTU. These items are individually clickable for more detailed information. A user can examine individual TUs via the genome browser by double-clicking the relevant RNA-seq ID in the left panel of the browser, which are not displayed by the default setting. To help the users to examine the expression values of a gene of interest, DOOR 2.0 provides a BigWig XY plot for each underlying RNA-seq data (15), where a user can double-click on the relevant BigWig item for more detailed information.

INTEGRATION OF TRANSCRIPTION REGULATORY ELEMENTS

DOOR 2.0 provides experimentally verified TFBSs for 203 organisms and predicted intrinsic transcriptional terminators for all 2072 organisms, which can be used to study transcriptional regulation of operons. We have collected 6489 verified TFBS for 203 organisms from RegulonDB (for E. coli only) (5) and RegTransBase (for 202 organisms) (16). All the TFBSs for each operon, if available, are displayed in an HTML table on the operon page, and can be examined along the underlying chromosome through the genome browser. TFBSs are not shown by default when an operon is displayed, but a user can double-click on the relevant menu in the left panel of the genome browser to turn on this feature. Each TFBS displayed is clickable, through which a user can find out the more detailed information such as its name, genomic coordinates and the DNA sequence (see Figure 1C). DOOR 2.0 also provides a de novo TFBS prediction capability for user-selected operons using two programs: BoBro (17,18) and MEME (19,20). In all, 300-bp upstream sequences of the selected operons will be automatically retrieved from the selected genomes, and the predicted TFBSs will be displayed in an HTML table along with the coordinates, the P-value measuring the statistical significance of the prediction, the consensus sequence and a WebLogo (21) (see Figure 2).
Figure 2.

A screenshot of motif search results for a user-selected operon.

A screenshot of motif search results for a user-selected operon. It is known that prokaryotes use two different mechanisms of transcription termination: Rho-independent (intrinsic) and Rho-dependent (22). Rho-dependent termination involves the binding of a Rho factor to the mRNA to destabilize the RNA–DNA interaction to stop transcription, whereas Rho-independent termination functions by creating an RNA hairpin loop to stop the RNA polymerase (23). Rho-independent terminators can be reliably predicted based on identification of the conserved RNA hairpin loop, whereas Rho-dependent terminators cannot yet, due to the lack of known signals, be associated with them. We have predicted 3 456 718 Rho-independent terminators, on average ∼2.6 terminators per operon, suggesting alternative terminators for each operon, for all the 2072 organisms using the TranstermHP program (23), the best terminator predictor in the public domain, with the default parameters. All the terminators for each operon can be displayed both in an HTML table on the operon page and through the genome browser (see Figure 1C).

INTEGRATION OF CONSERVED OPERONS ACROSS BACTERIA

We have included the orthologous relationships among multi-gene operons across different bacterial genomes. Such information can be used for studies of operon evolution, such as elucidation of the life cycle of an operon (24). For two operons a and b in genomes A and B, respectively, we define a ‘similarity score’ between them as follows: where G(a) and G(b) denote the component genes in a and b, respectively; orth(a,b) represents the orthologous gene pairs between a and b identified by our prediction program GOST (25) (see Figure 1D); and |X| denotes the number of elements in X. Intuitively, the score = 1 if and only if all genes in a and b are one-to-one mapped to orthologous gene pairs; and the score = 0 if and only if no orthologous genes between a and b are detected. Generally, the higher the score, the higher percentage of genes in a and b are orthologous pairs. We consider a pair of operons a and b as conserved if S(a,b) is at least 0.7. Using this cut-off, 6 975 454 conserved operon pairs have been identified among the 2072 genomes. For any specific operon, a user can retrieve its conserved operons across all the other 2071 genomes in DOOR 2.0 by selecting the relevant menu item on the browser.

A NEW WEB INTERFACE

The web interface of DOOR 2.0 is completely redesigned compared with the previous version. The new features of the interface include (i) an intuitive genome browser based on JBrowse Genome Browser (http://jbrowse.org) (26) that supports visualization of all the aforementioned data types related with operons along with a scrollable and zoomable chromosome for each organism; (ii) a new keyword-based Google-like search engine implemented using the Sphinx Open Source Search Server (http://sphinxsearch.com), through which a user can enter one or a few keywords to search for operons that have the specified attributes, e.g. coli, lactose, NC_00913, and can also formulate the search key as a complex query with Boolean operators (see online document on the DOOR 2.0 web server for examples); and (iii) an intuitive Web 2.0 HTML table (DataTables, https://datatables.net) that supports on-the-fly filtering, multi-column sorting, variable length pagination and asynchronous loading for large datasets.

ONLINE OPERON PREDICTION

DOOR 2.0 offers an intuitive high-performance web service for online operon prediction. A user can have operons predicted in a newly sequenced genome or any provided prokaryotic genome sequence by uploading three types of data into the server, including chromosomal DNA sequence (in fna format as used by the NCBI Genome FTP Server), protein sequence (faa format as used by the NCBI Genome FTP Server) and gene location (ptt format as used by the NCBI Genome FTP Server). All the submitted jobs are put automatically into a job queue, which are executed in a ‘first-in, first-served’ manner on our computing cluster. Once the job is done, the user will be notified via email with links to the web pages containing the computational results. All the predicted operons are displayed in an intuitive HTML table and stored on the DOOR 2.0 server for half a year.

IMPLEMENTATION

DOOR 2.0 is implemented as a web portal server with a multi-layer architecture. The representation and the logic layers are implemented using the Web 2.0 technology (HTML5, CSS3 and Javascript language along with jQuery library) and PHP server-side scripting language. All data are stored in an optimized MySQL relational database. The keyword-based search engine is implemented based on the Sphinx Open Source Search Server (http://sphinxsearch.com), and the genome browser is implemented based on JBrowse Genome Browser (http://jbrowse.org) (26) and integrated into DOOR 2.0 using the iframe (inline frame) HTML tag. The web server runs on a Red Hat Enterprise Linux 6 box (16 Intel Xeon CPUs with 2.4 GHz and 16 GB memory), and automated operon prediction pipeline runs on the computing cluster server with >150 computing nodes (2 Intel Xeon CPUs with 3.06 GHz and 2.5 GB memory per node).

CONCLUDING REMARKS

Here we presented a new version of the DOOR operon database, DOOR 2.0. Although the previous version has been widely used (with over ∼120 citations since its publication in 2009), we feel that it is time to develop and deploy a new version of the database to include all the prokaryotic genomes sequenced in the past few years, the available TU information experimentally validated or computationally derivable from RNA-seq data, as well as regulatory signals for each operon, which can be predicted based on comparative genome analysis. To best facilitate data retrieval, analysis and integrated applications of these data, we have developed a highly intuitive genome browser to support the visualization of these data types. With the high quality of our predicted operons, along with their regulatory signals and evolutionary conservation information, we believe that the new version of DOOR will continue to serve as a main source of operon data for the microbial research community.

FUNDING

National Science Foundation [DEB-0830024]; DOE BioEnergy Science Center [contract no. DE-PS02-717 06ER64304] [DOE 4000063512], which is supported by the Office of Biological and Environmental Research in the Department of Energy Office of Science. Funding for open access charge: National Science Foundation [DEB-0830024] and the DOE BioEnergy Science Center [contract no. DE-PS02-717 06ER64304] [DOE 4000063512]. Conflict of interest statement. None declared.
  25 in total

1.  JBrowse: a next-generation genome browser.

Authors:  Mitchell E Skinner; Andrew V Uzilov; Lincoln D Stein; Christopher J Mungall; Ian H Holmes
Journal:  Genome Res       Date:  2009-07-01       Impact factor: 9.043

2.  BigWig and BigBed: enabling browsing of large distributed datasets.

Authors:  W J Kent; A S Zweig; G Barber; A S Hinrichs; D Karolchik
Journal:  Bioinformatics       Date:  2010-07-17       Impact factor: 6.937

3.  The transcription unit architecture of the Escherichia coli genome.

Authors:  Byung-Kwan Cho; Karsten Zengler; Yu Qiu; Young Seoub Park; Eric M Knight; Christian L Barrett; Yuan Gao; Bernhard Ø Palsson
Journal:  Nat Biotechnol       Date:  2009-11-01       Impact factor: 54.908

4.  The life-cycle of operons.

Authors:  Morgan N Price; Adam P Arkin; Eric J Alm
Journal:  PLoS Genet       Date:  2006-06-23       Impact factor: 5.917

5.  RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes.

Authors:  Alexei E Kazakov; Michael J Cipriano; Pavel S Novichkov; Simon Minovitsky; Dmitry V Vinogradov; Adam Arkin; Andrey A Mironov; Mikhail S Gelfand; Inna Dubchak
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

6.  Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake.

Authors:  Carleton L Kingsford; Kunmi Ayanbule; Steven L Salzberg
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

7.  DOOR: a database for prokaryotic operons.

Authors:  Fenglou Mao; Phuongan Dam; Jacky Chou; Victor Olman; Ying Xu
Journal:  Nucleic Acids Res       Date:  2008-11-06       Impact factor: 16.971

8.  OperonDB: a comprehensive database of predicted operons in microbial genomes.

Authors:  Mihaela Pertea; Kunmi Ayanbule; Megan Smedinghoff; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2008-10-23       Impact factor: 16.971

9.  MEME SUITE: tools for motif discovery and searching.

Authors:  Timothy L Bailey; Mikael Boden; Fabian A Buske; Martin Frith; Charles E Grant; Luca Clementi; Jingyuan Ren; Wilfred W Li; William S Noble
Journal:  Nucleic Acids Res       Date:  2009-05-20       Impact factor: 16.971

10.  DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information.

Authors:  Nicolas Sierro; Yuko Makita; Michiel de Hoon; Kenta Nakai
Journal:  Nucleic Acids Res       Date:  2007-10-25       Impact factor: 16.971

View more
  86 in total

1.  Increased Pilus Production Conferred by a Naturally Occurring Mutation Alters Host-Pathogen Interaction in Favor of Carriage in Streptococcus pyogenes.

Authors:  Anthony R Flores; Randall J Olsen; Concepcion Cantu; Kyler B Pallister; Fermin E Guerra; Jovanka M Voyich; James M Musser
Journal:  Infect Immun       Date:  2017-04-21       Impact factor: 3.441

2.  Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli.

Authors:  Maureen K Thomason; Thorsten Bischler; Sara K Eisenbart; Konrad U Förstner; Aixia Zhang; Alexander Herbig; Kay Nieselt; Cynthia M Sharma; Gisela Storz
Journal:  J Bacteriol       Date:  2014-09-29       Impact factor: 3.490

3.  Processing generates 3' ends of RNA masking transcription termination events in prokaryotes.

Authors:  Xun Wang; Monford Paul Abishek N; Heung Jin Jeon; Yonho Lee; Jin He; Sankar Adhya; Heon M Lim
Journal:  Proc Natl Acad Sci U S A       Date:  2019-02-19       Impact factor: 11.205

4.  Transcriptional Control of the Lateral-Flagellar Genes of Bradyrhizobium diazoefficiens.

Authors:  Elías J Mongiardini; J Ignacio Quelas; Carolina Dardis; M Julia Althabegoiti; Aníbal R Lodeiro
Journal:  J Bacteriol       Date:  2017-07-11       Impact factor: 3.490

5.  Glucosylglycerate Phosphorylase, an Enzyme with Novel Specificity Involved in Compatible Solute Metabolism.

Authors:  Jorick Franceus; Denise Pinel; Tom Desmet
Journal:  Appl Environ Microbiol       Date:  2017-09-15       Impact factor: 4.792

6.  Competitive Growth Enhances Conditional Growth Mutant Sensitivity to Antibiotics and Exposes a Two-Component System as an Emerging Antibacterial Target in Burkholderia cenocepacia.

Authors:  April S Gislason; Matthew Choy; Ruhi A M Bloodworth; Wubin Qu; Maria S Stietz; Xuan Li; Chenggang Zhang; Silvia T Cardona
Journal:  Antimicrob Agents Chemother       Date:  2016-12-27       Impact factor: 5.191

Review 7.  Molecular networks in Network Medicine: Development and applications.

Authors:  Edwin K Silverman; Harald H H W Schmidt; Eleni Anastasiadou; Lucia Altucci; Marco Angelini; Lina Badimon; Jean-Luc Balligand; Giuditta Benincasa; Giovambattista Capasso; Federica Conte; Antonella Di Costanzo; Lorenzo Farina; Giulia Fiscon; Laurent Gatto; Michele Gentili; Joseph Loscalzo; Cinzia Marchese; Claudio Napoli; Paola Paci; Manuela Petti; John Quackenbush; Paolo Tieri; Davide Viggiano; Gemma Vilahur; Kimberly Glass; Jan Baumbach
Journal:  Wiley Interdiscip Rev Syst Biol Med       Date:  2020-04-19

8.  Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum.

Authors:  Wen-Chi Chou; Qin Ma; Shihui Yang; Sha Cao; Dawn M Klingeman; Steven D Brown; Ying Xu
Journal:  Nucleic Acids Res       Date:  2015-03-12       Impact factor: 16.971

9.  Diverse Intestinal Bacteria Contain Putative Zwitterionic Capsular Polysaccharides with Anti-inflammatory Properties.

Authors:  C Preston Neff; Matthew E Rhodes; Kathleen L Arnolds; Colm B Collins; Jody Donnelly; Nichole Nusbacher; Paul Jedlicka; Jennifer M Schneider; Martin D McCarter; Michael Shaffer; Sarkis K Mazmanian; Brent E Palmer; Catherine A Lozupone
Journal:  Cell Host Microbe       Date:  2016-09-29       Impact factor: 21.023

10.  Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses.

Authors:  Bingqiang Liu; Chuan Zhou; Guojun Li; Hanyuan Zhang; Erliang Zeng; Qi Liu; Qin Ma
Journal:  Sci Rep       Date:  2016-03-15       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.