Literature DB >> 34664671

The current state of SubtiWiki, the database for the model organism Bacillus subtilis.

Tiago Pedreira1, Christoph Elfmann1, Jörg Stülke1.   

Abstract

Bacillus subtilis is a Gram-positive model bacterium with extensive documented annotation. However, with the rise of high-throughput techniques, the amount of complex data being generated every year has been increasing at a fast pace. Thus, having platforms ready to integrate and give a representation to these data becomes a priority. To address it, SubtiWiki (http://subtiwiki.uni-goettingen.de/) was created in 2008 and has been growing in data and viewership ever since. With millions of requests every year, it is the most visited B. subtilis database, providing scientists all over the world with curated information about its genes and proteins, as well as intricate protein-protein interactions, regulatory elements, expression data and metabolic pathways. However, there is still a large portion of annotation to be unveiled for some biological elements. Thus, to facilitate the development of new hypotheses for research, we have added a Homology section covering potential protein homologs in other organisms. Here, we present the recent developments of SubtiWiki and give a guided tour of our database and the current state of the data for this organism.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 34664671      PMCID: PMC8728116          DOI: 10.1093/nar/gkab943

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

With the rise of high-throughput techniques, the complexity and amount of information have been increasing significantly over the past years. Importantly, the new techniques not only allow the acquisition of one-dimensional datasets such as new sequences or protein catalogs, but can also generate bi- or even multidimensional datasets that include two or multiple partners such as in global protein–protein, protein–RNA or protein–metabolite interactomes (1–3). A possible approach to catalog and organize the data is by resorting to specialized databases. One type of these databases is dedicated to model organisms, which play a fundamental role in the advance of knowledge in science as they are widely used across the world (4,5). The databases can widely differ in their intended audience and purpose, and accordingly, the presentation of the data varies substantially. Clearly, databases with a general audience in mind need a more intuitive structure and way of presentation as compared to databases that are designed for informaticians who are mainly interested in the technical possibilities to extract information. The Gram-positive model bacterium Bacillus subtilis is one of the most intensively studied organisms. This bacterium has gained much interest due to the simple developmental program of sporulation, its genetic competence and the ease of genetic manipulation, because of the wide applications in biotechnology and agriculture, and since it is a close relative of many important pathogens such as Staphylococcus aureus, Clostridioides difficile or Listeria monocytogenes (6,7). We have previously developed SubtiWiki, a database focused on the functional annotation of B. subtilis (8–10). In SubtiWiki, the user has access to individual gene and protein pages, complemented by SubtiApps, which give access to integrative metabolic pathways, regulatory networks and manually curated protein–protein interaction maps (10). As of today (September 2021), SubtiWiki contains data on 6121 protein- and RNA-coding genes, 2271 operons, 8355 literature references and >2660 documented protein interactions. Additionally, 49 metabolic pathways are displayed in an interactive map, a genomic interactive map is provided for the full extent of the B. subtilis genome and gene expression pattern profiles are presented for 121 experimental conditions, which can be on the transcript or protein levels. Despite the current advances in the research on B. subtilis, there is still a large portion of genes to be further annotated. This is the case for ∼30% of all B. subtilis genes/proteins. We try to address this issue by providing the user with a broad list of protein homologs in different organisms. To complement this, we also integrated the Cluster of Ortholog Genes (COG) database for every available protein (11). Thus, beyond providing the community with an up-to-date and free to use platform, we also contributed to the annotation of this model organism. Here, we report the current state of SubtiWiki and the integration of new data, and describe the most recent addition, the homology analysis.

SubtiWiki GENE PAGES

Similar to previous versions, version 4 of SubtiWiki keeps the general design and page layout. Upon searching for a gene or protein, the respective biological element page is presented in a simple and intuitive way (Figure 1). Specific and factual information is promptly presented in the form of a table, such as data on locus name, and in the case of protein-encoding loci, isoelectric point, molecular weight and function. Additionally, it is possible to directly access NCBI Blast tools with either DNA or protein sequences and the user gets access to other relevant database links that also have information on the queried protein, such as KEGG, UniProt or BsubCyc. Moreover, curated interactive information on specific data such as protein structure, protein–protein interactions, regulatory elements and functional annotation is also available. Each gene/protein is assigned to one or more functional categories organized in a tree-like structure and to regulons. The pages then provide detailed information on the gene, the protein and the regulation of its expression. To complement the data, biological materials available in the community, laboratories that study the gene or proteins, and publications are presented in a categorized way for easy reading and organization.
Figure 1.

The citB gene page display. All gene pages share the same layout and follow the same structure, with changes that depend on the type of available data.

The citB gene page display. All gene pages share the same layout and follow the same structure, with changes that depend on the type of available data.

SubtiApps

For a better representation of information, different types of data are presented under specific browsers in SubtiApps. The Expression Browser fully explores expression patterns under 121 experimental conditions, while the Genomic Browser is an interactive map where one can scroll through the B. subtilis genome in an intuitive and easy manner. In addition to genes’ coordinates, their respective DNA and protein sequences are loaded to provide a complete view over the genome (Figure 2).
Figure 2.

Genome Browser focusing on the citB gene. Tabs at the top allow to search for specific genes, including flanking regions, or simply to query by region. The main frame includes an interactive scrollable genome, where all elements are clickable for a redirect to the specific gene page. At the bottom, both protein and DNA sequences are shown (on the left) with the possibility to show the reverse complement or to search for specific patterns within a sequence (on the right).

Genome Browser focusing on the citB gene. Tabs at the top allow to search for specific genes, including flanking regions, or simply to query by region. The main frame includes an interactive scrollable genome, where all elements are clickable for a redirect to the specific gene page. At the bottom, both protein and DNA sequences are shown (on the left) with the possibility to show the reverse complement or to search for specific patterns within a sequence (on the right). One of the highlights of the previous version was the biological network visualizer for interactions and regulation (10). Currently, we have expanded the data to >5950 and >2660 experimentally validated regulatory and protein–protein interactions, respectively. Here, we offer a clear representation of the documented physical interactions with the possibility to expand the neighborhood (Figure 3A). Focusing on the regulatory data, regulatory networks can be interrogated for 2271 documented operons. There are three different types of annotation depending on the mechanism of regulation: green for positive regulation (or activation), red for negative regulation (or inhibition) and gray for sigma factor interactions (Figure 3B). Similar to the Interaction Browser, the Regulation Browser also allows to expand the view to the neighborhood of the target gene, giving the user a view over the full known regulatory network of the gene of interest, but also of its neighbors, even allowing the visualization of regulatory subnetworks (Figure 3C). The gravity model (12) in which nodes (genes/proteins) are considered as mass points and edges (physical or regulatory interactions) as springs was preserved for both browsers and the user can find a clear and intuitive integration of the expression data within networks (Figure 3A). Notably, all references of such data can be found at the bottom of each respective gene page in the References section.
Figure 3.

The Interaction and Regulation Browsers. (A) Interaction Browser for the citB gene. Nodes represent proteins and edges represent physical interactions, commonly referred to as protein–protein interactions. Using the settings, transcriptomic and proteomic data can be integrated using a color code (this feature is also available for the Regulation Browser). Upon selection of an experimental condition to be visualized, each node will be masked with a color depending on the expression value of the gene in the selected dataset. Green represents lower values of expression and red represents higher values of expression. (B) Regulation Browser for the citB gene. Nodes represent genes and edges represent regulatory interactions. The interaction type is represented with a color scheme: red for repression, green for activation and gray for sigma factor. (C) Regulation Browser for the citB gene with an increased radius of distance of 2. This setting allows to expand all regulations from the initial representation (B) with the distance of 2 elements [thus including the regulatory interactions of the primary partners that are shown in panel (B)], creating a more complex view over the regulatory network of citB.

The Interaction and Regulation Browsers. (A) Interaction Browser for the citB gene. Nodes represent proteins and edges represent physical interactions, commonly referred to as protein–protein interactions. Using the settings, transcriptomic and proteomic data can be integrated using a color code (this feature is also available for the Regulation Browser). Upon selection of an experimental condition to be visualized, each node will be masked with a color depending on the expression value of the gene in the selected dataset. Green represents lower values of expression and red represents higher values of expression. (B) Regulation Browser for the citB gene. Nodes represent genes and edges represent regulatory interactions. The interaction type is represented with a color scheme: red for repression, green for activation and gray for sigma factor. (C) Regulation Browser for the citB gene with an increased radius of distance of 2. This setting allows to expand all regulations from the initial representation (B) with the distance of 2 elements [thus including the regulatory interactions of the primary partners that are shown in panel (B)], creating a more complex view over the regulatory network of citB.

PROTEIN HOMOLOGY INTEGRATION

According to the SubtiWiki data collection, there are 1819 biological elements with unknown function or lacking annotation, which represents ∼30% of all elements. To mitigate this lack of annotation, we decided to integrate information on potential protein homologies for all B. subtilis proteins. For this, we created a proteome library of 16 relevant bacteria, some close relatives of B. subtilis as well as important representatives of other bacterial groups such as the proteobacteria or cyanobacteria (13) (Figure 4A) by extracting the corresponding sequences from UniProt (14). To assess potential homologs, we developed a pipeline using the well-known sequence similarity search tool FASTA (15). The pipeline is composed of two rounds of alignments. The first one compares every protein of B. subtilis with the library, and the second one compares the best hits found in the previous step with the B. subtilis proteome. If the resulting protein from the second alignment is the same as the initial input of the first step, then the proteins are considered bidirectional best hits (Figure 4B). Although performing this second round of alignments gives more confidence in the prediction of homologs, it was still necessary to consider the difference of protein sizes when running alignments. Disregarding this step may cause false positives, as small proteins can have a high identity score for multiple regions of larger proteins. To avoid this, we paid special attention to the alignment of smaller proteins with larger ones, and manually excluded cases where strong inconsistencies were found. All resulting pairs showing an E-value ≤0.01 with a minimum identity of 40% were considered. However, lower values of identity were also considered as long as high values of similarity were retained.
Figure 4.

Homology analysis. (A) Phylogenetic tree of all organisms chosen for the protein homology analysis. (B) Bidirectional FASTA alignment pipeline representation.

Homology analysis. (A) Phylogenetic tree of all organisms chosen for the protein homology analysis. (B) Bidirectional FASTA alignment pipeline representation. Currently, SubtiWiki provides protein homology results for every protein, with the exception of 254 proteins that did not meet the identity/similarity and/or E-value cutoff criteria for any of the representative relatives. The results can be accessed on the gene page and are presented in an easy-to-read table (Figure 5). For each queried B. subtilis protein, there is a list of organisms and their respective best hit protein homologs. Each of these proteins is linked to its respective UniProt page and the metrics for identity, similarity and bidirectional best hit are shown. If no protein homolog for a certain organism was found, then this is stated (Figure 5).
Figure 5.

Protein homology page for the CitB protein. At the top is a brief description of the analysis, containing details on the FASTA alignment tool and on the table. The table is composed of five columns: Organism, lists all 16 organisms used; Protein name, best hit protein found in the respective organism with a hyperlink to UniProt; Identity and Similarity, scores from FASTA alignment tool; and Bidirectional best hit (yes/no), result of the bidirectional alignment.

Protein homology page for the CitB protein. At the top is a brief description of the analysis, containing details on the FASTA alignment tool and on the table. The table is composed of five columns: Organism, lists all 16 organisms used; Protein name, best hit protein found in the respective organism with a hyperlink to UniProt; Identity and Similarity, scores from FASTA alignment tool; and Bidirectional best hit (yes/no), result of the bidirectional alignment. Finally, to complement the new protein homology section, the COG database is now fully integrated in SubtiWiki. COG is a tool for comparative and functional genomics of prokaryotes and identifies orthologs in a representative set of different bacteria and archaea (11). By taking advantage of the optimized COG database and fully integrating their IDs, SubtiWiki now provides a strong complementary section of homologs and orthologs (Figure 6).
Figure 6.

Homology and COG database implementation in SubtiWiki on the citB gene page. The section highlighted in blue is shown when a corresponding COG entry is available for the gene and redirects to the respective COG database page.

Homology and COG database implementation in SubtiWiki on the citB gene page. The section highlighted in blue is shown when a corresponding COG entry is available for the gene and redirects to the respective COG database page.

NEW DATA INTEGRATION

A hallmark of this SubtiWiki update is the integration of new datasets. Recently, we have added a compendium of genome minimized strains derived from B. subtilis 168. Some of these strains have already been proven superior for biotechnological applications such as the production and secretion of difficult proteins and lantibiotics (16,17). The MiniBacillus Compendium section (http://www.subtiwiki.uni-goettingen.de/v4/minibacillus) (18) contains a table listing the name of each strain, the respective download link of Geneious and GenBank files, genomic details (genome size, deletion steps and percentage of genome reduction) and a list of publications associated with the respective strain. Furthermore, we have added a large set of protein expression information. This information is based on a large-scale study that sought to compare proteomic responses to 91 different antibiotics and comparator compounds in an attempt to elucidate antibacterial modes of action (19). To supplement this representation, we also provide the possibility to download the entire dataset under the new Downloads section (http://www.subtiwiki.uni-goettingen.de/v4/download). In addition to the previously implemented expression data, the new dataset can be accessed in the Expression Browser. Here, the user has full access to the available data for each biological element in a specific representation and it is possible to overlay the expression data for each protein in the Interaction Browser. The latest updates reflect intensive research to identify novel transporters in B. subtilis. These include transporters for bicarbonate (NdhF-YbcC) (20,21) and several amino acids, including alanine (AlaP), glutamate and serine (both AimA), and the toxic analogon glyphosate (GltT) (22–25). In addition, two new categories of genes/proteins have been included. These are the quasi-essential genes and genes that encode proteins that are required for the detoxification of toxic metabolites. Quasi-essential genes are those genes whose inactivation results in the immediate acquisition of suppressor mutations. Interestingly, essentiality of such quasi-essential genes has so far been controversial (26–28). The new concept of quasi-essentiality supports the idea that often there is no clear binary answer in this issue but rather a continuum from clearly dispensable genes, genes that prove essential only after long-term cultivation, genes that require genetic suppression upon deletion to those that cannot be deleted under any circumstances. As an example, the topA gene encoding DNA topoisomerase I is responsible for the relaxation of negatively supercoiled DNA behind the RNA polymerase. Upon deletion of topA, the bacteria immediately acquire mutations resulting in an increased expression of DNA topoisomerase IV that can then take over the role of TopA (29). Similarly, deletion of the rny gene encoding RNase Y results in the acquisition of suppressor mutations that result in a globally reduced level of transcription activity (30) and deletion of the unknown gene yqhY triggers mutations that often affect conserved residues of acetyl-CoA synthetase indicating that YqhY prevents a potentially toxic accumulation of malonyl-CoA or fatty acids (31). Proteins involved in detoxification reactions have recently been recognized as very important for cellular metabolism as they remove harmful by-products or intermediates. Examples for such toxic by-products are 4-phosphoerythronate, a by-product of erythrose-4-phosphate oxidation in the pentose phosphate pathway that inhibits the phosphogluconate dehydrogenase GndA, and 5-oxoproline, an unavoidable damage product formed spontaneously from glutamine. These toxic metabolites are disposed of by the CpgA GTPase that moonlights in dephosphorylation of 4-phosphoerythronate (32) and by the PxpABC 5-oxoprolinase (33), respectively. It has recently become obvious that these detoxification mechanisms are very important for the viability of any living cell. The limited knowledge on these mechanisms is an important bottleneck in all genome reduction projects (34,35). Thus, both these new categories are very important to get a more comprehensive knowledge about the minimal requirements to sustain bacterial life.

FUTURE PERSPECTIVES

With millions of accesses on a yearly basis, SubtiWiki has become one of the most popular database that provides up-to-date and curated data for the model organism B. subtilis. Although this bacterium is one of the best-studied organisms, there is still a vast portion of data to be annotated and a lot to be discovered. By providing daily data and feature updates to the community, we expect SubtiWiki to continue to grow. We hope for SubtiWiki to expand in viewership and further establish itself as a prominent tool helping researchers develop new hypotheses. Future aims include the addition of new basis entities such as metabolites and corresponding new types of datasets such as protein–metabolite, protein–RNA and RNA–metabolite interactions that will consolidate the role of SubtiWiki as a trendsetter for model organism databases.
  34 in total

1.  Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus subtilis.

Authors:  Byoung-Mo Koo; George Kritikos; Jeremiah D Farelli; Horia Todor; Kenneth Tong; Harvey Kimsey; Ilan Wapinski; Marco Galardini; Angelo Cabal; Jason M Peters; Anna-Barbara Hachmann; David Z Rudner; Karen N Allen; Athanasios Typas; Carol A Gross
Journal:  Cell Syst       Date:  2017-02-08       Impact factor: 10.304

2.  Bacillus subtilis mutants with knockouts of the genes encoding ribonucleases RNase Y and RNase J1 are viable, with major defects in cell morphology, sporulation, and competence.

Authors:  Sabine Figaro; Sylvain Durand; Laetitia Gilet; Nadège Cayet; Martin Sachse; Ciarán Condon
Journal:  J Bacteriol       Date:  2013-03-15       Impact factor: 3.490

3.  Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.

Authors:  Ivica Letunic; Peer Bork
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

4.  SubtiWiki-a database for the model organism Bacillus subtilis that links pathway, interaction and expression information.

Authors:  Raphael H Michna; Fabian M Commichau; Dominik Tödter; Christopher P Zschiedrich; Jörg Stülke
Journal:  Nucleic Acids Res       Date:  2013-10-30       Impact factor: 16.971

5.  The Highly Conserved Asp23 Family Protein YqhY Plays a Role in Lipid Biosynthesis in Bacillus subtilis.

Authors:  Dominik Tödter; Katrin Gunka; Jörg Stülke
Journal:  Front Microbiol       Date:  2017-05-19       Impact factor: 5.640

6.  Topoisomerase IV can functionally replace all type 1A topoisomerases in Bacillus subtilis.

Authors:  Daniel R Reuß; Patrick Faßhauer; Philipp Joel Mroch; Inam Ul-Haq; Byoung-Mo Koo; Anja Pöhlein; Carol A Gross; Rolf Daniel; Sabine Brantl; Jörg Stülke
Journal:  Nucleic Acids Res       Date:  2019-06-04       Impact factor: 16.971

7.  UniProt: the universal protein knowledgebase in 2021.

Authors: 
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

8.  Essentiality of c-di-AMP in Bacillus subtilis: Bypassing mutations converge in potassium and glutamate homeostasis.

Authors:  Larissa Krüger; Christina Herzberg; Hermann Rath; Tiago Pedreira; Till Ischebeck; Anja Poehlein; Jan Gundlach; Rolf Daniel; Uwe Völker; Ulrike Mäder; Jörg Stülke
Journal:  PLoS Genet       Date:  2021-01-22       Impact factor: 5.917

9.  Quasi-essentiality of RNase Y in Bacillus subtilis is caused by its critical role in the control of mRNA homeostasis.

Authors:  Martin Benda; Simon Woelfel; Patrick Faßhauer; Katrin Gunka; Stefan Klumpp; Anja Poehlein; Debora Kálalová; Hana Šanderová; Rolf Daniel; Libor Krásný; Jörg Stülke
Journal:  Nucleic Acids Res       Date:  2021-07-09       Impact factor: 16.971

10.  Large-scale reduction of the Bacillus subtilis genome: consequences for the transcriptional network, resource allocation, and metabolism.

Authors:  Daniel R Reuß; Josef Altenbuchner; Ulrike Mäder; Hermann Rath; Till Ischebeck; Praveen Kumar Sappa; Andrea Thürmer; Cyprien Guérin; Pierre Nicolas; Leif Steil; Bingyao Zhu; Ivo Feussner; Stefan Klumpp; Rolf Daniel; Fabian M Commichau; Uwe Völker; Jörg Stülke
Journal:  Genome Res       Date:  2016-12-13       Impact factor: 9.043

View more
  10 in total

1.  Distribution of fitness effects of cross-species transformation reveals potential for fast adaptive evolution.

Authors:  Isabel Rathmann; Mona Förster; Melih Yüksel; Lucas Horst; Gabriela Petrungaro; Tobias Bollenbach; Berenike Maier
Journal:  ISME J       Date:  2022-10-12       Impact factor: 11.217

2.  Conservation and Evolution of the Sporulation Gene Set in Diverse Members of the Firmicutes.

Authors:  Michael Y Galperin; Natalya Yutin; Yuri I Wolf; Roberto Vera Alvarez; Eugene V Koonin
Journal:  J Bacteriol       Date:  2022-05-31       Impact factor: 3.476

3.  Characterization of subtilosin gene in wild type Bacillus spp. and possible physiological role.

Authors:  Muaaz Mutaz Alajlani
Journal:  Sci Rep       Date:  2022-06-22       Impact factor: 4.996

4.  Antifungal activity and genomic characterization of the biocontrol agent Bacillus velezensis CMRP 4489.

Authors:  Julia Pezarini Baptista; Gustavo Manoel Teixeira; Maria Luiza Abreu de Jesus; Rosiana Bertê; Allan Higashi; Mirela Mosela; Daniel Vieira da Silva; João Paulo de Oliveira; Danilo Sipoli Sanches; Jacques Duílio Brancher; Maria Isabel Balbi-Peña; Ulisses de Padua Pereira; Admilton Gonçalves de Oliveira
Journal:  Sci Rep       Date:  2022-10-18       Impact factor: 4.996

5.  Constitutive expression of the global regulator AbrB restores the growth defect of a genome-reduced Bacillus subtilis strain and improves its metabolite production.

Authors:  Junya Yamamoto; Onuma Chumsakul; Yoshihiro Toya; Takuya Morimoto; Shenghao Liu; Kenta Masuda; Yasushi Kageyama; Takashi Hirasawa; Fumio Matsuda; Naotake Ogasawara; Hiroshi Shimizu; Ken-Ichi Yoshida; Taku Oshima; Shu Ishikawa
Journal:  DNA Res       Date:  2022-05-27       Impact factor: 4.477

6.  The Minimal Translation Machinery: What We Can Learn From Naturally and Experimentally Reduced Genomes.

Authors:  María José Garzón; Mariana Reyes-Prieto; Rosario Gil
Journal:  Front Microbiol       Date:  2022-04-11       Impact factor: 5.640

7.  MycoWiki: Functional annotation of the minimal model organism Mycoplasma pneumoniae.

Authors:  Christoph Elfmann; Bingyao Zhu; Tiago Pedreira; Ben Hoßbach; Maria Lluch-Senar; Luis Serrano; Jörg Stülke
Journal:  Front Microbiol       Date:  2022-07-25       Impact factor: 6.064

8.  Transcriptomic Analysis Reveals the Role of tmRNA on Biofilm Formation in Bacillus subtilis.

Authors:  Shanshan Xu; Qianqian Cao; Zengzhi Liu; Junpeng Chen; Peiguang Yan; Bingyu Li; Ying Xu
Journal:  Microorganisms       Date:  2022-07-01

9.  Toward the Complete Functional Characterization of a Minimal Bacterial Proteome.

Authors:  David M Bianchi; James F Pelletier; Clyde A Hutchison; John I Glass; Zaida Luthey-Schulten
Journal:  J Phys Chem B       Date:  2022-09-01       Impact factor: 3.466

10.  The impact of PrsA over-expression on the Bacillus subtilis transcriptome during fed-batch fermentation of alpha-amylase production.

Authors:  Adrian S Geissler; Line D Poulsen; Nadezhda T Doncheva; Christian Anthon; Stefan E Seemann; Enrique González-Tortuero; Anne Breüner; Lars J Jensen; Carsten Hjort; Jeppe Vinther; Jan Gorodkin
Journal:  Front Microbiol       Date:  2022-08-04       Impact factor: 6.064

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.