Literature DB >> 19494184

From Corynebacterium glutamicum to Mycobacterium tuberculosis--towards transfers of gene regulatory networks and integrated data analyses with MycoRegNet.

Justina Krawczyk¹, Thomas A Kohl, Alexander Goesmann, Jörn Kalinowski, Jan Baumbach.

Abstract

Year by year, approximately two million people die from tuberculosis, a disease caused by the bacterium Mycobacterium tuberculosis. There is a tremendous need for new anti-tuberculosis therapies (antituberculotica) and drugs to cope with the spread of tuberculosis. Despite many efforts to obtain a better understanding of M. tuberculosis' pathogenicity and its survival strategy in humans, many questions are still unresolved. Among other cellular processes in bacteria, pathogenicity is controlled by transcriptional regulation. Thus, various studies on M. tuberculosis concentrate on the analysis of transcriptional regulation in order to gain new insights on pathogenicity and other essential processes ensuring mycobacterial survival. We designed a bioinformatics pipeline for the reliable transfer of gene regulations between taxonomically closely related organisms that incorporates (i) a prediction of orthologous genes and (ii) the prediction of transcription factor binding sites. In total, 460 regulatory interactions were identified for M. tuberculosis using our comparative approach. Based on that, we designed a publicly available platform that aims to data integration, analysis, visualization and finally the reconstruction of mycobacterial transcriptional gene regulatory networks: MycoRegNet. It is a comprehensive database system and analysis platform that offers several methods for data exploration and the generation of novel hypotheses. MycoRegNet is publicly available at http://mycoregnet.cebitec.uni-bielefeld.de.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2009 PMID： 19494184 PMCID： PMC2724278 DOI： 10.1093/nar/gkp453

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Year by year, approximately two million people die worldwide from tuberculosis (1) and one-third of the world's total population suffer from this communicable disease (http://www.who.int) caused by the bacterium Mycobacterium tuberculosis. Tuberculosis is the leading cause of death to people living with HIV and claims on average 200 000 lives every year, most of them in Africa. Persons infected with tuberculosis will not directly develop the characteristic full-blown clinical picture, but in most cases the latent form, which can progress to an active condition after years. About 10–15 people can be infected by a person with active tuberculosis a year, if she or he is left untreated (http://www.who.int). Although there is effective treatment to cure patients with tuberculosis, and new strategies have been developed to stop its further dissemination, its containment is still a serious problem (2). The number of multi-resistant strains not responding to standard drug treatments is increasing constantly worldwide (3,4). Consequently, there is a tremendous need for new anti-tuberculosis therapies (antituberculotica) and drugs to cope with the spread of tuberculosis. Despite many efforts to obtain a better understanding of the pathogenicity of M. tuberculosis and its survival strategy in humans, many questions are still unresolved. The molecular mechanisms responsible for resisting the human immune system and their activation are not perceived sufficiently so far; most notably, its ability to remain within the human host for years in a clinically latent state (5). Among other cellular processes in bacteria, pathogenicity is controlled by transcriptional regulation. Thus, various studies on M. tuberculosis concentrate in the analysis of transcriptional regulation in order to gain new insights on pathogenicity and other essential processes ensuring mycobacterial survival. The identification and characterization of transcriptional regulation on a genome-wide level will enable a better understanding of drug metabolism in M. tuberculosis and facilitate the development of new antibiotics, which are urgently needed. At present, studies focus mainly on the analysis of single regulons, or distinct subunits of the complex transcriptional regulatory network of M. tuberculosis [see e.g. (3,5,6)]. Bioinformatics platforms for data storage and public access of transcriptional regulation exist for M. tuberculosis, similar to other organisms such as Escherichia coli [RegulonDB (7)] or Corynebacterium glutamicum [CoryneRegNet (8)]. MtbRegList (9) and MTBreg (http://www.doe-mbi.ucla.edu/services/mtbreg) offer information relevant to regulatory interactions in M. tuberculosis H37Rv (MT) accumulated from literature or attained from computational predictions. While MtbRegList contains predicted and characterized regulatory DNA motifs cross-referenced with transcription factors (TFs), MTBreg combines a collection of conditionally regulated proteins together with information about selected TFs. However, both systems are designed as data repositories and only provide nonsatisfying bioinformatics support necessary for transcriptional gene regulatory network visualization, analysis, and reconstruction. Recently, the TB database has become available. This integrated online platform for tuberculosis research combines the annotated genome and expression data with a suite of bioinformatic tools for data analysis (10). The scope of TB database is placed on investigating and providing expression data, while little support is given for the reconstruction of regulatory networks based on these findings. Hence, there is currently no online platform or database system available, which aims to an appropriate data handling and analysis of transcriptional regulation in M. tuberculosis on a genome-wide level. Here, we introduce MycoRegNet, an online accessible, user-friendly platform dedicated to the biomedical researcher, who is interested in the regulation of gene expression in the human pathogen M. tuberculosis. MycoRegNet is online available at http://mycoregnet.cebitec.uni-bielefeld.de. The first idea of our approach is based on the assumption that orthologous TFs tend to regulate the expression of orthologous target genes for taxonomically closely related species (11–13). Corynebacterium glutamicum and M. tuberculosis are taxonomically classified into the suborder Corynebacterineae of the Actinobacteria phylum and are thus taxonomically closely related (14). Hence, the industrially important amino acid producer C. glutamicum has been successfully applied as model organism, e.g. for investigating cell envelope synthesis of M. tuberculosis (15–17). We therefore started with the well-examined regulatory network of C. glutamicum ATCC 13032 (CG) (18), which is stored in the corynebacterial reference database CoryneRegNet (8). Our comparative genomics approach aims for a reliable transfer of known regulatory interactions from CG to MT. Instead of relying exclusively on the detection of orthologous genes, we consider further evidence by means of an integrated TF binding site (TFBS) prediction. The resulting data were subsequently stored in an online platform designed for the visualization and analysis of the deduced transcriptional regulatory network, which enables the execution of bioinformatics tools for further hypotheses generation: MycoRegNet. The remainder of this article is structured as follows: we first describe the workflow used for the transfer of C. glutamicum data to M. tuberculosis in detail. The design of MycoRegNet is briefly introduced afterwards. It aims to overcome typical data integration problems and to supply online visualization and hypotheses generation tools. In the last section, we illustrate and discuss these functionalities. We finally conclude that MycoRegNet is an appropriate reference database and platform for gene regulatory network analysis of M. tuberculosis.

MATERIALS AND METHODS

The network reconstruction pipeline mainly consists of the detection of (i) conserved genes between CG and MT and (ii) binding sites upstream the conserved genes in MT. Based on the corresponding results, a list of putative gene regulatory interactions in MT is generated and imported into the MycoRegNet database back-end (see Figure 1 for a graphical overview of the workflow).

Figure 1.

Diagram of the prediction pipeline. The diagram shows the main steps performed during transfer of gene regulations from C. glutamicum to M. tuberculosis. Starting with an orthology detection, the next step was a prediction of conserved regulations. Based on that, a TFBSs prediction provided further evidence. Finally, the results can be exported as TAB-delimited files and imported into the MycoRegNet data repository.

Detection of orthologous genes

Generally, the detection of orthologous genes is not straightforward, since analysis can be perturbed by factors like paralogs or sequence divergences in the genomes of interest. To reduce such effects, we searched for orthologous genes by performing bidirectional BLASTP (19) searches on the corresponding protein sequences. Therefore, we scanned the CG genome for sequence similarities with the MT genome and vice versa, performing BLASTP with an E-value cut-off of 10−4 in both directions. As a result, we obtained amino acid sequence pairs, so called bidirectional best hits (BBHs), representing the reciprocal best alignments of respective protein sequences. Thus, identified BBHs were considered to be putative orthologous proteins in CG and MT, which in turn indicates the respective genes to be regulated in both bacteria by orthologous TFs.

Transfer of regulatory interactions

Based on the previously identified BBHs, regulatory interactions characterized in CG were transferred to MT. We utilized the comprehensive data on transcriptional regulation in CG collected in the corynebacterial reference database CoryneRegNet (8), which contains 806 regulatory interactions of 72 TFs and 544 regulated target genes on CG (status: January 2009). For each regulatory interaction taken from CG, both the gene encoding the TF and the target gene were compared to the list of predicted orthologs in the MT genome. Only if both, the TF as well as its target gene, were identified as BBH, the regulatory interaction was transferred from CG to the orthologous counterparts in MT and was considered as a candidate transcriptional regulation in MT. Furthermore, we assume the regulatory role of the TF (activation or repression) to be conserved as well, including known autoregulations.

Further evidence through conserved TFBSs

In the last step of our regulatory network prediction pipeline, we add further evidence to the orthology-based approach introduced above by combining the preliminary results with the prediction of TFBSs. Therefore, all known binding sites of characterized TFs of CG with potential orthologs in MT were utilized to create appropriate motif profiles. TF binding motifs were modeled as so called position weight matrices (PWMs), the most widely used model for that purpose. However, we applied only PWMs of corynebacterial TFs deduced from more than 20 binding sites, i.e. of the TFs GlxR, RamB, AmtR, DtxR and LexA. To detect instances of the respective motifs in MT, we employed the TFBS matching tool PoSSuMsearch (20) and scanned 580-bp long, noncoding DNA sequences upstream all genes and operons, which have been detected as potential orthologs to target genes of the respective TF. The upstream sequences ranged from +20 bp relative to the transcription start. In our initial approach, we performed a restrictive search by setting the P-value threshold to 10−5. Due to the low number of detected binding sites in the first PoSSuMsearch runs, we decided to decrease the P-value threshold since the set P-value might be chosen too restrictive for our TFBS predictions. To determine a new P-value, we considered P-values of binding site matches of the PWM for GlxR upstream 26 target genes as marking value, where the binding of the GlxR ortholog Rv3676 in MT was experimentally verified (21–24). For P-value definition, we chose the binding site that match upstream one of these genes with the worst P-value. Thus, we finally set the P-value to <10−2 and defined for each target gene/operon in MT the TFBS match with the lowest P-value as prediction for the respective binding site. Taken together, the outcome of the above introduced workflow is a list of transcriptional regulations for MT where (i) the TF is conserved, and (ii) the target gene is conserved between CG and MT as well, and additionally (iii) a binding site is predicted, if the target gene/operon is controlled by one of the five TFs where a TFBS search was performed for. Hence, the resulting predictions present most likely regulatory interactions in MT due to the taxonomically close relation between CG and MT. This is the data we aim to integrate into the MycoRegNet platform together with validated knowledge we have from (5,25–34).

Data integration with the MycoRegNet platform

Based on our experiences with CoryneRegNet, we designed MycoRegNet in a very similar way: as an ontology-based data warehouse for mycobacterial TFs and regulatory networks. We set it up as a sister project of CoryneRegNet to store, analyze and visualize the regulatory interactions in M. tuberculosis that are derived from the above introduced prediction pipeline. MycoRegNet is composed of two main parts: (i) A web front-end running on an Apache HTTP web server that manages user-database interactions as well as the execution of further online bioinformatics computations. (ii) The back-end consists of data preprocessing tools and a MySQL database that stores all data corresponding to the deduced and ontologically restructured mycobacterial gene regulatory interactions. This process comprises the integration of transcriptional regulations, the complete genome sequence of MT along with the genome annotation as stored in the GenBank database (NCBI) (35), operon predictions available from the Virtual Institute of Microbial Stress and Survival (VIMSS) (36), precalculated PWMs and other preprocessed data necessary for subsequent online TFBS detections, and stimulons derived from literature (25–31). The import and conversion software is implemented in Java, while the web pages generated at front-end level are developed in PHP. An embedded Java applet realizes the visualization of gene regulatory networks from the included data. A SOAP-based web Service (37) client/ server system implemented by means of NuSOAP enables a bidirectional interconnection with GenDB (38) and EMMA (39). The server is open access and provides well-structured data access via the SOAP interface to any other bioinformatics client. GenDB is an open source system for the annotation of prokaryotic genomes, while EMMA is a web-based application for the storage and analysis of transcriptomics data from microarrays. By means of the clients for GenDB and EMMA, data integrated in MycoRegNet is supplemented with up-to-date information on the genome annotation of MT (GenDB) and gene expression data preanalyzed with EMMA. To give one example, the Web Service client for GenDB facilitates the mapping of all genes controlled by a certain regulator to KEGG pathways (40) in order to provide an overview on the general nature of a TF of interest. Furthermore, the automatic annotation pipeline of GenDB can be used to regularly update gene function assignments.

RESULTS AND DISCUSSION

Here, we first summarize the database content. Subsequently, we present and discuss the benefits of MycoRegNet from the end-user perspective. We first describe the web interface with special attention to the TFBS prediction feature and the network visualization and analysis capability. We briefly describe the Web Service access afterwards and finally demonstrate the platforms' visualization functionality by means of an application example.

The database content

By using the above described transfer pipeline for regulatory interactions, we identified 1012 of 3991 proteins from MT as putative orthologs to proteins from CG. Based on the respective set of genes coding for the orthologous proteins, we detected 226 of 806 regulatory interactions from CG as likely conserved in MT (Table 1). Our initial findings reveal 24 partial conserved regulons affecting processes of the carbohydrate metabolism, cellular program, macroelement and metal homeostasis, SOS and stress response, specific biosynthesis as well as processes governed by sigma factors. By setting the P-value threshold to 102, we could put further evidence to 129 target genes 40 by predicting binding sites upstream the respective target genes/operons regulated by the TF orthologs of GlxR, RamB, AmtR, DtxR and LexA in MT (Table 2). All in all, we obtained a set of regulatory interactions which is based on good evidence. The database content comprises 618 regulatory interactions for 515 target genes regulated by 26 TFs. Several gene expression experiments are also directly stored within MycoRegNet's database back-end (data not shown). We also integrated genome annotation data of M. tuberculosis CDC1551 for future investigations concerning transcriptional regulation in another ecotype of M. tuberculosis.

Table 1.

Putative gene regulations of CG in MT

TF	Target genes
Carbohydrate metabolism
Rv0465c	Carbohydrate metabolism
	Rv0211 (pckA), Rv0247c (-), Rv0363c (fba), Rv0408 (pta)
	Rv0409 (ackA), Rv0465c (-), Rv0467 (icl), Rv0896 (gltA2)
	Rv0904c (accD3), Rv0951 (sucC), Rv0952 (sucD), Rv1475c (acn)
	Rv1837c (glcB), Rv1862 (adhA), Rv2193 (ctaE), Rv2241 (aceE)
	Rv2332 (mez), Rv2967c (pca), Rv3318 (sdhA)
	Cell division and septation
	Rv1009 (rpfB)
	Specific biosynthesis pathways
	Rv0884c (serC), Rv1010 (ksgA), Rv1011 (ispE), Rv1379 (pyrR)
	Rv1380 (pyrB), Rv1381 (pyrC)
Rv0792c	Carbohydrate metabolism
	Rv0753c (mmsA)
Rv1719	Carbohydrate metabolism
	Rv0554 (bpoC), Rv1074c (fadA3), Rv1719 (-), Rv2503c (scoB), Rv2504c (scoA)
Rv3676	Carbohydrate metabolism
	Rv0211 (pckA), Rv0247c (-), Rv0400c (fadE7), Rv0465c (-)
	Rv0467 (icl), Rv0896 (gltA2), Rv0904c (accD3), Rv0951 (sucC)
	Rv0952 (sucD), Rv1098c (fumC), Rv1130 (-), Rv1161 (narG)
	Rv1162 (narH), Rv1163 (narJ), Rv1436 (gap), Rv1437 (pgk)
	Rv1438 (tpi), Rv1475c (acn), Rv1837c (glcB), Rv1854c (ndh)
	Rv1862 (adhA), Rv1872c (lldD2), Rv2029c (pfkB), Rv2193 (ctaE)
	Rv2194 (qcrC), Rv2195 (qcrA), Rv2196 (qcrB), Rv2200c (ctaC)
	Rv2524c (fas), Rv2967c (pca), Rv3010c (pfkA), Rv3043c (ctaD)
	Rv3279c (birA), Rv3280 (accD5), Rv3318 (sdhA), Rv3548c (-), Rv3676 (-)
	Cell division and septation
	Rv1009 (rpfB), Rv2145c (wag31), Rv2201 (asnB)
	Macroelement and metal homeostasis
	Rv0820 (phoT), Rv0928 (pstS3), Rv0929 (pstC2), Rv0930 (pstA1), Rv2220 (glnA1)
	Rv2832c (ugpC), Rv2833c (ugpB), Rv2834c (ugpE), Rv2835c (ugpA), Rv2918c (glnD)
	Rv2919c (glnB), Rv2920c (amt), Rv3859c (gltB)
	SOS and stress response
	Rv0867c (rpfA), Rv3048c (nrdF2), Rv3217c(-), Rv3219 (whiB1), Rv3681c (whiB4)
	Specific biosynthesis pathways
	Rv0884c (serC), Rv1010 (ksgA), Rv1011 (ispE), Rv1092c (coaA)
	Rv3001c (ilvC)
	Rv3002c (ilvN), Rv3003c (ilvB1)
Cellular Program
RelA	Sigma factor module
	Rv1221 (sigE), Rv2710 (sigB), Rv3221A (-), Rv3911 (sigM)
	SOS and stress response
	Rv2720 (lexA)
Macroelement and metal homeostasis
Rv0485	Macroelement and metal homeostasis
	Rv0132c (fgd2)
PhoP	Macroelement and metal homeostasis
	Rv0545c (pitA), Rv0757 (phoP), Rv0758 (phoR), Rv0820 (phoT)
	Rv0928 (pstS3), Rv0929 (pstC2), Rv0930 (pstA1), Rv1095 (phoH2)
	Rv2832c (ugpC), Rv2833c (ugpB), Rv2834c (ugpE), Rv2835c (ugpA)
Rv0827c	Macroelement and metal homeostasis
	Rv0827c (-)
Rv1994c	Macroelement and metal homeostasis
	Rv1994c (-)
IdeR	Carbohydrate metabolism
	Rv0247c (-), Rv3318 (sdhA)
	Macroelement and metal homeostasis
	Rv0827c (-), Rv0844c (narL), Rv1285 (cysD), Rv1286 (cysN)
	Rv2391 (nirA), Rv2392 (cysH), Rv2393 (-), Rv2895c (viuB)
	Rv3044 (fecB), Rv3841 (bfrB)
Rv3160c	Macroelement and metal homeostasis
	Rv1848 (ureA), Rv1849 (ureB), Rv1850 (ureC), Rv1852 (ureG)
	Rv2220 (glnA1), Rv2918c (glnD), Rv2919c (glnB), Rv2920c (amt)
	Rv3664c (dppC), Rv3665c (dppB), Rv3666c (dppA), Rv3859c (gltB)
Rv3173c	Macroelement and metal homeostasis
	Rv0132c (fgd2), Rv0485 (-), Rv1079 (metB), Rv1133c (metE)
	Rv1175c (fadH), Rv1285 (cysD), Rv1286 (cysN), Rv1294 (thrA)
	Rv1296 (thrB), Rv1392 (metK), Rv2124c (metH), Rv2334 (cysK1)
	Rv2391 (nirA), Rv2392 (cysH), Rv2393 (-), Rv3025c (iscS)
	Rv3028c (fixB), Rv3029c (fixA), Rv3173c (-), Rv3340 (metC)
	Rv3341 (metA)
Sigma factor module
SigB	Carbohydrate metabolism
	Rv0363c (fba), Rv1023 (eno), Rv1098c (fumC), Rv1436 (gap)
	Rv1437 (pgk), Rv1438 (tpi), Rv3010c (pfkA)
	SOS and stress response
	Rv3132c (devS)
	Specific biosynthesis pathways
	Rv2210c (ilvE)
SigM	SOS and stress response
	Rv0384c (clpB), Rv1464 (csd), Rv1465, Rv1471 (trxB1)
	Rv3418c (groES), Rv3913 (trxB2), Rv3914 (trxC)
SOS and stress response
HspR	SOS and stress response
	Rv0350 (dnaK), Rv0351 (grpE), Rv0352 (dnaJ1), Rv0353 (hspR)
	Rv0384c (clpB), Rv0440 (groEL), Rv2745c
HrcA	SOS and stress response
	Rv0440 (groEL), Rv3418c (groES)
LexA	Cell division and septation
	Rv2748c (ftsK)
	SOS and stress response
	Rv1235 (lpqY), Rv1638 (uvrA), Rv1696 (recN), Rv2592c (ruvB)
	Rv2593c (ruvA), Rv2594c (ruvC), Rv2720 (lexA), Rv2736c (recX)
	Rv2737c (recA), Rv3370c (dnaE2), Rv3395c, Rv3585 (radA)
Rv2745c	SOS and stress response
	Rv0782 (ptrBb), Rv2460c (clpP2), Rv2461c (clpP1), Rv2725c (hflX)
	Rv3596c (clpC1), Rv3715c (recR), Rv3716c
WhiB1	SOS and stress response
	Rv3913 (trxB2), Rv3914 (trxC)
MtrA	SOS and stress response
	Rv0917 (betP), Rv3476c (kgtP)
CspA	Carbohydrate metabolism
	Rv1837c (glcB)
Specific biosynthesis pathways
PyrR	Specific biosynthesis pathways
	Rv1379 (pyrR), Rv1380 (pyrB), Rv1381 (pyrC), Rv2883c (pyrH)
ArgR	Specific biosynthesis pathways
	Rv1383 (carA), Rv1384 (carB), Rv1652 (argC), Rv1653 (argJ)
	Rv1654 (argB), Rv1655 (argD), Rv1656 (argF), Rv1657 (argR)
	Rv0488

Putative gene regulations of CG in MT, predicted in silico by using the introduced MycoRegNet pipeline

Table 2.

Detected binding sites upstream transferred target genes of CG in MT

TF	Gene ID	Gene name	Operon	Binding motif
Rv0465c	Rv0211^a	pckA		ATAACTACGCAGG
	Rv0249c	–	Rv0249c-Rv0248c-Rv0247c^a	AGTAGTTCGCGAT
	Rv0363c^a	fba	–	CGTACTTCTCAAA
	Rv0407	pta	Rv0407-Rv0408^a-Rv0409^a	CGTGCTGTGCTCA
	Rv0465c	–	Rv0465c^a-Rv0464c	CTAACTCTGCGAA
	Rv0467^a	icl	–	CAAAATTTGCAAA
	Rv0884c^a	serC	–	ATGGCATGGCCGA
	Rv0896^a	gltA2	–	TGAGCAGATCACT
	Rv0904c^a	accD3	–	ATTGCATGGCAAG
	Rv0951	sucC	Rv0951^a-Rv0952^a	AGTGCTAAGCCGT
	Rv1009	rpfB	Rv1009^a-Rv1010^a-Rv1011^a	TCTACTTACCAAA
	Rv1379	pyrR	Rv1379^a-Rv1380^a-Rv1381^a-Rv1382-Rv1383-Rv1384-Rv1385	AGTGCTACGCTGC
	Rv1475c	acn	Rv1475c^a-Rv1474c	ACTGCTAGGCTGA
	Rv1837c^a	glcB	–	TAGGCTGAGCAAT
	Rv1862^a	adhA	–	TGTGCTGGGCTAA
	Rv2193	ctaE	Rv2193^a-Rv2194-Rv2195-Rv2196	ACTACAAAGCGTC
	Rv2241	aceE	Rv2241^a-Rv2242	CAAACAGCGCAAG
	Rv2332^a	mez	–	TGCGCTCTGCGAA
	Rv2967c^a	pca	–	CATGCAATGTCAA
	Rv3316	sdhA	Rv3316-Rv3317-Rv3318^a-Rv3319	GTTGCATTGCCCC
IdeR	Rv0249c	–	Rv0249c-Rv0248c-Rv0247c^a	TTAGATGAGCGCACCCACG
	Rv0827c^a	–	–	CTATGGATCGCTGTACTAC
	Rv0844c^a	narL	–	CGACGAGCAGCTAAACTCA
	Rv1285	cysD	Rv1285^a-Rv1286^a	GAGGGCGAGGCACACGTCA
	Rv2391	nirA	Rv2391^a-Rv2392^a-Rv2393^a	TCAGGTGCGCGTCTCCCAG
	Rv2895c^a	viuB	–	TAAGCGAAGCCGAACGCCA
	Rv3044^a	fecB	–	GTAGACCAGGCTCCCCTTG
	Rv3316	sdhA	Rv3316-Rv3317-Rv3318^a-Rv3319	CTAAGAAAAGCCAGCCTAA
	Rv3841^a	bfrB	–	CTAGGAAAGCCTTTCCTGA
LexA	Rv1235	lpqY	Rv1235^a-Rv1236-Rv1237-Rv1238	TCGACTATCTATCCGA
	Rv1638^a	uvrA	–	TCGAATGTCAGCTCGC
	Rv1696^a	recN	–
	Rv2594c	ruvC	Rv2594c^a-Rv2593c^a-Rv2592c^a	TCGAACGATTGTTCGG
	Rv2720^a	lexA	–	TCGAACACATGTTTGA
	Rv2737c	recA	Rv2737c^a-Rv2736c^a	TCGAACAGGTGTTCGG
	Rv2748c^a	ftsK	–	CCGACCAGGTGCTCGC
	Rv3370c^a	dnaE2	–	TCGAACAATTGTTCGA
	Rv3395c	–	Rv3395c^a-Rv3394c	TCGAACATATTTTCGA
Rv3160c	Rv1848	ureA	Rv1848^a-Rv1849-Rv1850^a-Rv1851-Rv1852^a-Rv1853	GTGTCTACTGCGCGATGATCGAGAGCAT
	Rv2220^a	glnA1	–	CAACACGGGGTTGACTGACGGGCAATAT
	Rv2920c	amt	Rv2920c^a-Rv2919c^a-Rv2918c^a	AAGTTTTACGTTAATCCTGATGAAACAT
	Rv3666c	dppA	Rv3666c^a-Rv3665c-^aRv3664c^a-Rv3663c-Rv3662c	GTGGTAGCTAACGGTCACCGGCGAGTGT
	Rv3859c	gltB	Rv3859c^a-Rv3858c	CGCTTGACGGACAGCCTATCGACAAGAC
Rv3676	Rv0211^a	pckA		TGTGAGCAGGCTTATA
	Rv0249c	–	Rv0249c-Rv0248c-Rv0247c^a	TGTGATCTGTAACACC
	Rv0400c^a	fadE7	–	AGTGATGAGCACCCCG
	Rv0465c	–	Rv0465c^a-Rv0464c	TTTGTCGAGGCTCACG
	Rv0467^a	ivl	–	TGTTACAACGCTCACA
	Rv0820^a	phoT		GGTGGTGATCCGCACC
	Rv0867c^a	rpfA		TGTGACATTACCCACA
	Rv0884c^a	serC		TGTGAGCTTGTTCACA
	Rv0896^a	gltA2	–	GGCGTTGAACATCACC
	Rv0904c^a	accD3	–	CGTGAGTCGTATCACG
	Rv0928	pstS3	Rv0928^a-Rv0929^a-Rv0930^a	ACTGAATTGAAACTCA
	Rv0951	sucC	Rv0951^a-Rv0952^a	TGTGAGTTGGATCACG
	Rv1009	rpfB	Rv1009^a-Rv1010^a-Rv1011^a	GGTGGCGCTCATCACC
	Rv1092c^a	coaA		TGCCACGTAGGTCACG
	Rv1099c	fumC	Rv1099c-Rv1098c^a-Rv1097c
	Rv1130	–	Rv1130^a-Rv1131	TGTGGATAAGTCCAGG
	Rv1161	narG	Rv1161^a-Rv1162^a-Rv1163^a-Rv1164-Rv1165-Rv1166	TGCGTTGAACGGCACG
	Rv1436	gap	Rv1436^a-Rv1437^a-Rv1438^a	GGTTGTTTAGCCAACA
	Rv1475c	acn	Rv1475c^a-Rv1474c	TGTAACTGCCGACATA
	Rv1837c^a	glcB	–	AGGGATGCACTACACA
	Rv1854c^a	ndh	–	TGTGGCTGATGACACA
	Rv1862^a	adhA	–	CGTGGGGCGCCACACA
	Rv1872c^a	lldD2	–	GATGCCGTAGCGCACT
	Rv2029c	pfkB	Rv2029c^a-Rv2028c-Rv2027c-Rv2026c	GGTGACGAGTCGCGCA
	Rv2145c	wag31		CGTGACTGGCGTCCCA
	Rv2193	ctaE	Rv2193^a-Rv2194^a-Rv2195^a-Rv2196^a	GGTGGATAGGTTCACC
	Rv2200c	ctaC	Rv2200c^a-Rv2199c	TGTGATACAGGAGGCG
	Rv2201	asnB		GCTGTCGAAGACCACG
	Rv2220^a	glnA1		TGTGACGGAAAAGACG
	Rv2524c^a	fas	–	CGTTACCCACGACACG
	Rv2835c	ugpA	Rv2835c^a-Rv2834c^a-Rv2833c^a-Rv2832c^a	GGTGATGCCGGGCACG
	Rv2920c	am	Rv2920c^a-Rv2919c^a-Rv2918c^a	AGTGGACCAATTCCCC
	Rv2967c^a	pca	–	CGTGGTGGTGGTCACC
	Rv3003c^a	ilvB1	Rv3003c^a-Rv3002c^a-Rv3001c^a	TGTGGTGGCCACCCCA
	Rv3010c^a	pfkA	–	GGTGATGGCGATGACC
	Rv3043c	ctaD	Rv3043c^a-Rv3042c	AGTGGATCGCATCCCG
	Rv3048c^a	nrdF2		GGTGACTGGAAACGCA
	Rv3217c^a	–		TGTGGTGGCGGTCGCA
	Rv3219^a	whiB1		AGTGAGATAGCCCACG
	Rv3279c	birA	Rv3279c^a-Rv3278c	TATCGGCTGCCGCACA
	Rv3280	accD5	Rv3280-^aRv3281-Rv3282	CGGGACGTCGACCACA
	Rv3316	sdhA	Rv3316-Rv3317-Rv3318^a-Rv3319	CGAGACGTTTTCCACG
	Rv3549c	–	Rv3549c-Rv3548c^a	GGTGATCGGCATTGCA
	Rv3676^a	–		TGTCACCTACGACAGA
	Rv3681c^a	whiB4		TGAGATACAGGTAACA
	Rv3859c	gltB	Rv3859c^a-Rv3858c	TGCTCCGGATTTCACA

Detected binding sites of GlxR (ortholog in MT: Rv3676/Crp), RamB (ortholog in MT: Rv0465c), AmtR (ortholog in MT: Rv3160c), DtxR (ortholog in MT: IdeR/Rv3173c) and LexA (ortholog in MT: Rv2720/LexA) orthologs of CG in MT. Code:

aTransferred target gene of CG in MT.

Putative gene regulations of CG in MT Putative gene regulations of CG in MT, predicted in silico by using the introduced MycoRegNet pipeline Detected binding sites upstream transferred target genes of CG in MT Detected binding sites of GlxR (ortholog in MT: Rv3676/Crp), RamB (ortholog in MT: Rv0465c), AmtR (ortholog in MT: Rv3160c), DtxR (ortholog in MT: IdeR/Rv3173c) and LexA (ortholog in MT: Rv2720/LexA) orthologs of CG in MT. Code: aTransferred target gene of CG in MT.

The user interface

As for other online databases, MycoRegNet's web interface provides the three major capabilities: browsing the database content, searching by specifying filter criteria and basic visualization possibilities. Furthermore, the front-end offers the execution of computational features. At the main page (Figure 2), one has the option to search or to browse the database content. The user may browse the data repository by clicking on an ecotype name of interest and is provided with an overview on the selected organism. Alternatively, using one of the provided options within the search form, the database can be searched for specific gene/protein identifiers, gene/protein names, regulator types or functional modules. The search results are presented in tabular form, listing all relevant information for subsequent investigation. Furthermore, the following built-in features can be accessed from the main page, directly: TFBScan [for TFBS predictions; see below) and COMA (to check for contradictions within microarray gene expression studies, given the regulatory network stored in the database; refer to (8) for more details]. Detailed information on the results can be obtained via respective links at the result page. By selecting a particular gene, the corresponding gene details page is invoked. It presents a detailed overview of all available data attached to the gene of interest. Besides general information about the gene/protein (position in the genome, nucleotide sequence, etc.), it comprises a graphical representation of the genomic context, regulated target genes (if encoding a TF) including the TFBSs, etc., and stimulons that initiate a differential gene expression level. The integrated Web Service client for GenDB maintains the representation of up-to-date gene annotation data. General information (description, comments, an assigned function, etc.) is listed as well as the EC numbers for enzymes, and links to COG (41) and GO (42). Additionally, all target genes of a TF of interest, are linked to KEGG pathways and a list of regulated pathways is displayed.

Figure 2.

MycoRegNet main page. The main page includes a typical search mask, a statistical overview of the database content, an entry point to browse the integrated organisms, and links to more specific statistics, the system documentation and a tutorial on how to use the MycoRegNet Web Service.

TFBS prediction

With the integrated PoSSuMsearch software, MycoRegNet provides a statistically sound tool for the prediction of TFBSs based on PWMs, which have been precalculated during data import. To our knowledge, PoSSuMsearch is the only TFBSs profiling tool that offers exact P-value calculations and at the same time provides reasonable response times on genome-wide runs. There are three ways to access this feature through the MycoRegNet web site: (1) The TFBScan button at the main page offers the possibility to upload user-defined sequences in FASTA format. (2) At any gene details page the user can predict TFBSs in the upstream sequence of the selected gene. (3) If the gene of interest encodes a TF, the PWM learned from the known TFBSs of the TF may be used to scan for further TFBSs in the upstream sequences of all other mycobacterial genes. The predicted results may further be visualized as graphs. The interface is easy to use: one just has to choose a background model (nucleotide distribution) and a P-value threshold. For further details regarding the prediction of prokaryotic TFBSs by utilizing PoSSuMsearch, the reader is referred to (20,43).

Gene regulatory network visualization

As mentioned earlier, MycoRegNet also provides a network visualization toolkit: GraphVis. It is a Java applet, which graphically reconstructs regulatory networks as graphs based on selected genes and a user-defined graph depth cutoff. It traverses all regulatory interactions from the starting point until the graph depth cutoff has been reached. Finally, a Java applet window appears showing the regulatory network as graph, where nodes represent genes and edges regulatory interactions. GraphVis allows the user to zoom into the graph, apply different layout styles, remove selected elements or retrieve detailed information on selected genes. Furthermore, it is possible to extend the graph dynamically with more genes/regulations from the database and to display the operon grouping of presented genes. Visualized networks may also be graphically compared between two species or between a predicted and an evidenced network by utilizing special comparative graph layout algorithms. In addition, GraphVis features the projection of gene expression data onto the genes of a visualized network. The user can choose to apply gene expression data from the stimulon repository of MycoRegNet or from own tab-delimited text or MS-Excel files, which can be uploaded to GraphVis directly. It is also possible to use expression data extracted from EMMA by means of the integrated Web Service interconnection. According to the differential expression level of the genes, the concerned nodes are resized within the graph. Thus, the user can achieve a comprehensive overview of the transcriptional response of M. tuberculosis to a certain stimulus.

Well-structured data access by using Web Services

Although no real standard in bioinformatics, a growing number of platforms offer SOAP-based Web Service access to their data repositories [refer to some EBI resources (44) or to the BRENDA database (45), just to name two of them]. Many databases still provide flat files for exchange with other data processing systems. Thus, the developers of novel tools and platforms have to perform updates in regular time intervals and to adjust the downloaded data for their special purpose. On that account, gene regulatory data stored in MycoRegNet can also be accessed via the integrated Web Service server. The data can be integrated directly into corresponding projects without further time-consuming data processing. Detailed information on how to use the MycoRegNet Web Services is available from the main page via the Web Service button.

Application example—the regulatory network of the GlxR ortholog Rv3676 (Crp) in MT

Both GlxR (Cg0350) of C. glutamicum and CRP (Rv3676) of M. tuberculosis belong to the Crp-Fnr family of TFs (46) and have been characterized as cAMP sensing homologs of E. coli Crp (23,47,48). Crp-cAMP-dependent gene regulation is commonly involved in carbon catabolite repression and forms one of the possible connections between carbon metabolism and virulence (49,50). In mycobacteria, cAMP signalling is the subject of intensive research, as it may be related to virulence of these strains (51,52). It is noteworthy that M. tuberculosis contains 16 putative adenylate cyclases, as well as 10 putative cyclic nucleotide binding proteins (53,54), hinting at a crucial and diverse role for cAMP signalling in mycobacteria. GlxR of C. glutamicum has been in the focus of interest in the last years (48,55–59), and available data indicates GlxR as global regulator with about 150 target genes in functional diverse network modules, such as carbohydrate metabolism, aerobic and anaerobic respiration, fatty acid metabolism, aromatic compound degradation, glutamate uptake and nitrogen assimilation, the cellular stress response and resuscitation. Previous studies suggested a similar vital role for Crp in M. tuberculosis. Published data implicate Crp in virulence, hypoxia and nutrient starvation (21,23,24,60). Deletion of Crp altered the expression of 16 genes, and caused an impaired growth phenotype in bone marrow-derived macrophages as well as in tuberculosis mouse models (24). Several suggestions for a putative Crp regulon have been made, although these studies relied solely on the detection of putative binding sites (23,24,60). As part of our pipeline, the known regulatory interactions of GlxR collected in CoryneRegNet have been used to reconstruct the regulon of the orthologous TF Crp. Due to the apparent vital role of these regulators in their respective organisms, and the available data on putative target genes and characterized binding sites, we chose them as application case for our analysis. Employing our pipeline, regulatory interactions with 64 target genes could be transferred from C. glutamicum GlxR to M. tuberculosis Crp. Furthermore, we considered 26 genes with experimental evidence of regulation by Crp as potential target genes (21–24). Based on experimentally verified binding sites of Crp (21–23) together with binding sites predicted by the TFBS search of our pipeline, we complemented the suggested regulon with the prediction of Crp binding sites in the upstream regions of putative target genes. In contrast to the TFBS searched within our pipeline, we created an adopted and optimized PWM for Crp from experimentally verified and predicted binding sites, and applied it for TFBS search. To detect the novel binding sites, we set the P-value threshold to 10−5 and performed a restrictive search on sequences upstream genes/operons concering the whole genome of MT. Again, we used PoSSuMsearch for binding site prediction scanning 580-bp long upstream sequence, ranging from +20 bp relative to the transcription start site. Using Weblogo (61), we generated a sequence logo from the detected binding sites of Crp and from the appropriate binding sites of GlxR that were used for PWM creation (see Methods section). The resulting sequence logos are shown in Figure 3.

Figure 3.

Sequence logo of the predicted Crp binding sites (A) in comparison to the sequence logo of GlxR (B). The sequence logo models the binding site motif of Crp. It was deduced from the predicted binding sites in Table 3. The height of each letter within an individual stack represents the nucleotide's frequency relative to the particular motif position; thus, the degree of a nucleotide's conservation is indicated by the stack according to the respective position.

Table 3.

Predicted Crp binding sites

Gene ID	Gene	Motif position^d	Motif sequence	Operon
Carbohydarate metabolism
Rv0211^a	pckA	−166	TGTGAGCAGGCTTATA	–
Rv0249c	sdhCD	−104	TGTGATCTGTAACACC	Rv0249c-Rv0248c-Rv0247c^a
Rv0249c	sdhCD	−410	GGTGTCGGAGGTCACA	Rv0249c-Rv0248c-Rv0247c^a
Rv0458	adhA	−41	TGTGAGCTGTATTACA	Rv0458-Rv0459
Rv0465c	–	−167	TTTGTCGAGGCTCACG	Rv0465c^a–Rv0464c^g
Rv0467^a,g	icl	−341	TGTTACAACGCTCACA	–
Rv0896^a	gltA2	−356	GGCGTTGAACATCACC	–
Rv0951	sucC	−173	TGTGAGTTGGATCACG	Rv0951^a,f–Rv0952^a,f
Rv1099c	–	−515	GCTGATGAATCCCACG	Rv1099c–Rv1098c^a,f–Rv1097c
Rv1130	prpD2	−152	TGTGGATAAGTCCAGG	Rv1130^a–Rv1131
Rv1436	gap	−48	GGTTGTTTAGCCAACA	Rv1436^a,f–Rv1437^a,f–Rv1438^a,f
Rv1475c	acn	−462	TGTAACTGCCGACATA	Rv1475c^a,f–Rv1474c
Rv1552	frdA	−284	TGTGATCTAGGTCACG^b	Rv1552–Rv1553–Rv1554–Rv1555
Rv1837c^a	glcB	−381	AGGGATGCACTACACA	–
Rv1862^a	adhA	−227	CGTGGGGCGCCACACA	–
Rv1872c^a	lldD2	−200	GATGCCGTAGCGCACT	–
Rv2029c^a	pfkB	−410	GGTGACGAGTCGCGCA	Rv2029c–Rv2028c–Rv2027c–Rv2026c^f
Rv2967c^a,f	pca	−389	CGTGGTGGTGGTCACC	–
Rv3010c^a	pfkA	−532	GGTGATGGCGATGACC	–
Rv3316	sdhC	−386	CGAGACGTTTTCCACG	Rv3316–Rv3317–Rv3318^a–Rv3319
Rv3676^a	CRP	−538	TGTCACCTACGACAGA	–
Fatty acid metabolism
Rv0097	–	−526	TGTCACGCCGGCCACG	Rv0097–Rv0098^e–Rv0099^c–Rv0100^e–Rv0101
Rv0166	fadD5	−84	TGTGACCCAGACAACA	–
Rv0400c^a,f	fadE7	−5	AGTGATGAGCACCCCG	–
Rv1185c	fadD21	−168	CGTGACGCCCCTCACG	–
Rv1714	–	−405	GGTGACGGCGGCCACA	Rv1714^f–Rv1715^f–Rv1716–Rv1717–Rv1718
Rv2485c^c	lipQ	−91	TGTGATCCTCGACACA	–
Rv2486	echA14	−287	TGTGTCGAGGATCACA	–
Rv2524c^a,f	fas	−259	CGTTACCCACGACACG	–
Rv2930^c	fadD26	−498	TGTTAATCTCGTCACA	Rv2930–Rv2931–Rv2932^f–Rv2933–Rv2934–Rv2935–Rv2936^g–Rv2937^g–Rv2938–Rv2939
Rv3279c	birA	−38	TATCGGCTGCCGCACA	Rv3279c^a–Rv3278c^e
Rv3280	accD5	−331	CGGGACGTCGACCACA	Rv3280^a–Rv3281^e,f–Rv3282
Rv3549c	–	−67	GGTGATCGGCATTGCA	Rv3549c–Rv3548c^a
Nitrogen assimilation
Rv1538c	ansA	−187	TGTGAGCACCACCACA	–
Rv2220^a,f,g	glnA1	−1	TGTGACGGAAAAGACG	–
Rv2920c	amt	−2	AGTGGACCAATTCCCC	Rv2920c^a–Rv2919c^a,g–Rv2918c^a
Rv3859c	gltB	−398	TGCTCCGGATTTCACA	Rv3859c^a,f–Rv3858c^f
PGRS
Rv0453	PPE11	−269	GGTGACCAAACTCACG	–
Rv1386	PE15	−133	TGTGACCAAACTCACC^b	Rv1386^e–Rv1387^c,e
Rv2408	PE24	−213	GGTGATCGGCGTCACG	–
Rv2591	P_PGRS44	−38	CGTGACATGTGTCACA	–
Rv3136^c	PPE51	−16	AAGGAGCTGAGACACA	–
Rv3650	PE33	−83	TGTGATGCACTTGACA	–
Respiration
Rv1161	narG	−512	TGCGTTGAACGGCACG	Rv1161^a–Rv1162^a–Rv1163^a–Rv1164–Rv1165–Rv1166^f
Rv1623c^c	cydA	−181	CGTGGTGATCGGCACA	–
Rv1854c^a	ndh	−109	TGTGGCTGATGACACA	–
Rv2193	ctaE	−517	GGTGGATAGGTTCACC	Rv2193^a,f–Rv2194^a,f–Rv2195^a,f–Rv2196^a,f
Rv2200c	ctaC	−23	TGTGATACAGGAGGCG	Rv2200c^a,f–Rv2199c
Rv3043c	ctaD	−227	AGTGGATCGCATCCCG	Rv3043c^a,f–Rv3042c^f
Other cellular processes
Rv0019c^g	fhaB	−69	CGTGACTTTGCTGACG^b	–
Rv0079	–	−110	GGTGACACAGCCCACA	Rv0079–Rv0080
Rv0103c	ctpB	−159	TGTGACGGGCGTCACA	–
Rv0104	–	−1	TGTGACGCCCGTCACA	–
Rv0145	–	−59	AGTGATGTGCCACACA^b	Rv0145–Rv0146
Rv0188^c	–	−356	AGAGAACAACGTCGCA	–
Rv0194	–	−517	TGTCATCTAGATCACG	–
Rv0232	–	−53	CGTGATGCAGCGCACA	Rv0232–Rv0233
Rv0250c^e	–	−37	TGTGATCTGTAACACC	–
Rv0360c	–	−2	CGTGACCAAGCGCACA	–
Rv0457c	–	−43	TGTAATACAGCTCACA	–
Rv0470A	–	−212	TGTGGTGGGAATCACA	–
Rv0483	lprQ	−116	TGTGTTTGGTATCACA	–
Rv0793	–	−375	TGTGATGGTGCGCACG	–
Rv0820^a,g	phoT	−538	GGTGGTGATCCGCACC	–
Rv0867c^a	rpfA	−443	TGTGACATTACCCACA^b	–
Rv0884c^a,f	serC	−91	TGTGAGCTTGTTCACA^b	–
Rv0885	–	−133	TGTGAACAAGCTCACA	Rv0885–Rv0886
Rv0904c^a	accD3	−2	CGTGAGTCGTATCACG	–
Rv0928	pstS3	−6	ACTGAATTGAAACTCA	Rv0928^a,g–Rv0929^a,g–Rv0930^a,g
Other cellular processes
Rv0993	galU	−8	TGTGAACGATGTCACG	Rv0993^f–Rv0994^g–Rv0995^g
Rv0950c	–	−153	CGTGATCCAACTCACA^b	–
Rv0992c	–	−109	CGTGACATCGTTCACA	Rv0992–Rv0991–Rv0990
Rv1009	rpfB	−271	GGTGGCGCTCATCACC	Rv1009^a–Rv1010^a,g–Rv1011^a,f
Rv1057	–	−248	CGTGACCTAGGTAACA	–
Rv1092c^a,f	coaA	−242	TGCCACGTAGGTCACG	–
Rv1111c^e	–	−411	GGTGACATGAGTCACG	–
Rv1158c	–	−69	TGTCACTTGAGTCACA^b	Rv1158c^e–Rv1157c^e
Rv1159	pimE	−77	TGTGACTCAAGTGACA	–
Rv1230c	–	−79	GGTGATCTAGTTCACG^b	–
Rv1291c	–	−323	TGTGATCGGCGCCACC	–
Rv1314c	–	−294	GGTGATCCGGGCCACG	–
Rv1324^e	–	−104	TGTGATCTTGGTCATA	–
Rv1482c	–	−23	TGTGACTCAGCACACT	–
Rv1566c	–	−235	CGTGACTGAAATCACA	–
Rv1568	bioA	−553	TGTGATTTCAGTCACG	Rv1568–Rv1569^g–Rv1570–Rv1571
Rv1592c^c	–	−215	TGTGATAGGCGCCACG	–
Rv1757c	–	−351	TGTGACGGCGGCCACG	–
Rv1779c	–	−89	TGTGAACAACACCACA	–
Rv1780	–	−147	TGTGGTGTTGTTCACA	–
Rv1890c	–	−7	TGTGTCGTGGCCCACA	–
Rv1891^e,g	–	−63	TGTGGGCCACGACACA	Rv1891–Rv1892–Rv1893^e
Rv2145c^a,f	wag31	−463	CGTGACTGGCGTCCCA	–
Rv2172c^e	–	−2	TGTGACCCTCAACACG	–
Rv2180c	–	−304	TGTGTGGAACAACACA	–
Rv2201^a,f	asnB	−336	GCTGTCGAAGACCACG	–
Rv2258c	–	−459	GGTGACGTCGACCACG	–
Rv2362c	recO	−224	TGTGGGCTGGCTCACA	Rv2362c–Rv2361c^f–Rv2360c
Rv2377c	mbtH	−268	TGTGGTTCACCTCACT	–
Rv2406c	–	−34	TGTGAACCAGCTCACC	–
Rv2407	–	−242	GGTGAGCTGGTTCACA	–
Rv2428^c	ahpC	−93	GGTGTGATATATCACC	–
Rv2450c	rpfE	−509	TGTGGCGCAGGTCACC	–
Rv2450c	rpfE	−422	CGTGATTCGGCTCACG	–
Rv2455c	–	−237	AGTGACCAATACCACA	Rv2455c–Rv2454c–Rv2453c
Rv2650c	–	−305	CGTGAGGAGCCTCACG	–
Rv2699c	–	−116	TGTGATGTAAATCACA	–
Rv2700^e,f	–	−138	TGTGATTTACATCACA	–
Rv2712c	–	−296	GGTGAGGTAGAGCACA	–
Rv2835c	ugpA	−513	GGTGATGCCGGGCACG	Rv2835c–Rv2834c^a–Rv2833c^a,f–Rv2832c^a,f
Other cellular processes
Rv2874	dipZ	−351	TGTGGCGGAGTTCACA	–
Rv3003c	ilvB1	−335	TGTGGTGGCCACCCCA	Rv3003c^a,f–Rv3002c^a,f–Rv3001c^a,f
Rv3048c^a,f	nrdF2	−2	GGTGACTGGAAACGCA	–
Rv3053c	nrdH	−347	GGTGATCTGCGACACG	Rv3053c–Rv3052c–Rv3051c^f
Rv3217c^a,e	–	−278	TGTGGTGGCGGTCGCA	–
Rv3219 ^a,c	whiB1	−176	AGTGAGATAGCCCACG^b	–
Rv3613c^c	–	−458	CGTGACGAATCCCCCA	–
Rv3617	ephA	−315	TGTGACCGGTGTCACT	Rv3617–Rv3618
Rv3645	–	−179	TGTGAGCCGAATCACG	–
Rv3681c^a	whiB4	−106	TGAGATACAGGTAACA	–
Rv3729	–	−190	TGTGACCACGGCCACG	–
Rv3843c	–	−505	GGTGAGGTAAGTCACA	Rv3843c^e–Rv3842c^g
Rv3856c	–	−547	TGTGGGCTTCGTCACA	–
Rv3857c	–	−341	TGTGGGCTTCGTCACA^b	–
Consensus			TGTGANNNNNNTCACA

Crp binding sites detected by the TFBS search of the introduced pipeline and by the additional TFBS search with adopted and optimized PWMs. Bold letters indicate conserved pentamers of the motif. Codes:

aTransferred target gene from CG.

bExperimentally verified binding site by EMSA/CHiP/RT-PCR (21–23).

cGene showed altered expression in microarray studies of ΔRv3676 versus WT (24).

dMotif position relative to the translation start site.

eCore gene.

fEssential gene.

gGene involved in virulence processes

In total, we identified 207 putative target genes of Crp, organized in 121 transcription units (see Table 3 and Figure 4). Of this set, 17 genes belong to the mycobacterial core regulon (62) and 41 were reported as essential for M. tuberculosis (63,64). Furthermore, at least 17 genes of the suggested regulon are connected to antibiotic resistance and virulence of M. tuberculosis (65–69). Based on annotation information for M. tuberculosis (69), knowledge about orthologous C. glutamicum genes and operon structures, we attributed individual target genes to distinct functional modules.

Figure 4.

Reconstructed network of the GlxR ortholog Crp. The network reconstruction of the Crp regulon is based on the 121 transcription units presented in Table 3. It was generated by the integrated network reconstruction tool GraphVis of MycoRegNet. Transcription units relying on binding site predictions/experimental verifications that were reported previously in (22–24,60) and correspond with our findings are colored according to the appropriate publication. Arrows and gene IDs (node labels) coloured in red indicate a repressive regulation of Crp, green arrows correspond to an activating regulation.

Predicted Crp binding sites Crp binding sites detected by the TFBS search of the introduced pipeline and by the additional TFBS search with adopted and optimized PWMs. Bold letters indicate conserved pentamers of the motif. Codes: aTransferred target gene from CG. bExperimentally verified binding site by EMSA/CHiP/RT-PCR (21–23). cGene showed altered expression in microarray studies of ΔRv3676 versus WT (24). dMotif position relative to the translation start site. eCore gene. fEssential gene. gGene involved in virulence processes Reconstructed network of the GlxR ortholog Crp. The network reconstruction of the Crp regulon is based on the 121 transcription units presented in Table 3. It was generated by the integrated network reconstruction tool GraphVis of MycoRegNet. Transcription units relying on binding site predictions/experimental verifications that were reported previously in (22–24,60) and correspond with our findings are colored according to the appropriate publication. Arrows and gene IDs (node labels) coloured in red indicate a repressive regulation of Crp, green arrows correspond to an activating regulation. Similar to present knowledge on GlxR, results implicate Crp in the regulation of several functional modules such as carbohydrate metabolism (40 target genes), fatty acid metabolism (33 target genes), respiration (16 target genes) and nitrogen assimilation (7 target genes). Therefore, the position of the GlxR homolog Crp as global regulator in the transcriptional regulatory network seems to be conserved in M. tuberculosis. It is interesting to note that the suggested regulon comprises genes involved in essential functional modules, e.g. the citrate cycle, as well as genes involved in the synthesis of the cellular envelope which plays an important role in the virulence of M. tuberculosis. Together with the supposed regulation of further virulence−associated genes this might explain why a functional Crp is required for virulence in model systems (24).

CONCLUSIONS

With MycoRegNet, we have set up a system that allows researchers of the tuberculosis community to perform comprehensive analysis and visualizations of the gene regulatory network of MT. With its TFBS prediction it further provides easy access to a method that helps to generate new hypotheses in silico. As the sister project to CoryneRegNet, the MycoRegNet database content was generated through our comparative genomics pipeline, which provided us with reliable transfers of gene regulatory interactions from the reference organism C. glutamicum to M. tuberculosis. With MycoRegNet, the corresponding data are publicly available and can be accessed easily through the web interface, or in a well-structured manner by using the MycoRegNet Web Service to maintain the reconstruction, visualization, and validation of mycobacterial regulatory networks at different hierarchical levels. Taken together, MycoRegNet is a reference resource for the tuberculosis community to gain a better understanding of the complex coherences of transcriptional gene control. It has the potential to assist researchers at the development of new vaccines and drugs to treat and prevent tuberculosis. Although MycoRegNet has been initially designed for MT, it may also serve for other mycobacterial strains in future, such as the already integrated M. tuberculosis CDC1551.

65 in total

Review 1. The mechanisms of carbon catabolite repression in bacteria.

Authors: Josef Deutscher
Journal: Curr Opin Microbiol Date: 2008-03-21 Impact factor: 7.934

2. Triple transcriptional control of the resuscitation promoting factor 2 (rpf2) gene of Corynebacterium glutamicum by the regulators of acetate metabolism RamA and RamB and the cAMP-dependent regulator GlxR.

Authors: Britta Jungwirth; Denise Emer; Iris Brune; Nicole Hansmeier; Alfred Pühler; Bernhard J Eikmanns; Andreas Tauch
Journal: FEMS Microbiol Lett Date: 2008-03-18 Impact factor: 2.742

Review 3. Carbon catabolite repression in bacteria: many ways to make the most out of nutrients.

Authors: Boris Görke; Jörg Stülke
Journal: Nat Rev Microbiol Date: 2008-08 Impact factor: 60.633

4. Genome scale portrait of cAMP-receptor protein (CRP) regulons in mycobacteria points to their role in pathogenesis.

Authors: Yusuf Akhter; Sailu Yellaboina; Aisha Farhana; Akash Ranjan; Niyaz Ahmed; Seyed E Hasnain
Journal: Gene Date: 2007-10-22 Impact factor: 3.688

5. The GlxR regulon of the amino acid producer Corynebacterium glutamicum: in silico and in vitro detection of DNA binding sites of a global transcription regulator.

Authors: Thomas A Kohl; Jan Baumbach; Britta Jungwirth; Alfred Pühler; Andreas Tauch
Journal: J Biotechnol Date: 2008-06-03 Impact factor: 3.307

6. Effect of carbon source availability and growth phase on expression of Corynebacterium glutamicum genes involved in the tricarboxylic acid cycle and glyoxylate bypass.

Authors: Sung Ok Han; Masayuki Inui; Hideaki Yukawa
Journal: Microbiology Date: 2008-10 Impact factor: 2.777

7. TB database: an integrated platform for tuberculosis research.

Authors: T B K Reddy; Robert Riley; Farrell Wymore; Phillip Montgomery; Dave DeCaprio; Reinhard Engels; Marcel Gellesch; Jeremy Hubble; Dennis Jen; Heng Jin; Michael Koehrsen; Lisa Larson; Maria Mao; Michael Nitzberg; Peter Sisk; Christian Stolte; Brian Weiner; Jared White; Zachariah K Zachariah; Gavin Sherlock; James E Galagan; Catherine A Ball; Gary K Schoolnik
Journal: Nucleic Acids Res Date: 2008-10-03 Impact factor: 16.971

8. GenBank.

Authors: Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

9. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation.

Authors: Socorro Gama-Castro; Verónica Jiménez-Jacinto; Martín Peralta-Gil; Alberto Santos-Zavaleta; Mónica I Peñaloza-Spinola; Bruno Contreras-Moreira; Juan Segura-Salazar; Luis Muñiz-Rascado; Irma Martínez-Flores; Heladia Salgado; César Bonavides-Martínez; Cei Abreu-Goodger; Carlos Rodríguez-Penagos; Juan Miranda-Ríos; Enrique Morett; Enrique Merino; Araceli M Huerta; Luis Treviño-Quintanilla; Julio Collado-Vides
Journal: Nucleic Acids Res Date: 2007-12-23 Impact factor: 16.971

10. CoryneRegNet 4.0 - A reference database for corynebacterial gene regulatory networks.

Authors: Jan Baumbach
Journal: BMC Bioinformatics Date: 2007-11-06 Impact factor: 3.169

18 in total

Review 1. Cyclic AMP signalling in mycobacteria: redirecting the conversation with a common currency.

Authors: Guangchun Bai; Gwendowlyn S Knapp; Kathleen A McDonough
Journal: Cell Microbiol Date: 2010-12-28 Impact factor: 3.715

2. Systems biology approaches to understanding mycobacterial survival mechanisms.

Authors: Helena I M Boshoff; Desmond S Lun
Journal: Drug Discov Today Dis Mech Date: 2010

3. Dysregulation of serine biosynthesis contributes to the growth defect of a Mycobacterium tuberculosis crp mutant.

Authors: Guangchun Bai; Damen D Schaak; Eric A Smith; Kathleen A McDonough
Journal: Mol Microbiol Date: 2011-09-08 Impact factor: 3.501

4. Comparing Galactan Biosynthesis in Mycobacterium tuberculosis and Corynebacterium diphtheriae.

Authors: Darryl A Wesener; Matthew R Levengood; Laura L Kiessling
Journal: J Biol Chem Date: 2016-12-30 Impact factor: 5.157

5. ClpR protein-like regulator specifically recognizes RecA protein-independent promoter motif and broadly regulates expression of DNA damage-inducible genes in mycobacteria.

Authors: Yi Wang; Yuanxia Huang; Chaolun Xue; Yang He; Zheng-Guo He
Journal: J Biol Chem Date: 2011-07-19 Impact factor: 5.157

Review 6. Tuberculosis: global approaches to a global disease.

Authors: Denise E Kirschner; Douglas Young; JoAnne L Flynn
Journal: Curr Opin Biotechnol Date: 2010-07-14 Impact factor: 9.740

7. RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes.

Authors: Pavel S Novichkov; Olga N Laikova; Elena S Novichkova; Mikhail S Gelfand; Adam P Arkin; Inna Dubchak; Dmitry A Rodionov
Journal: Nucleic Acids Res Date: 2009-11-01 Impact factor: 16.971

8. Role of the transcriptional regulator RamB (Rv0465c) in the control of the glyoxylate cycle in Mycobacterium tuberculosis.

Authors: Julia C Micklinghoff; Katrin J Breitinger; Mascha Schmidt; Robert Geffers; Bernhard J Eikmanns; Franz-Christoph Bange
Journal: J Bacteriol Date: 2009-09-18 Impact factor: 3.490

9. Scoring protein relationships in functional interaction networks predicted from sequence data.

Authors: Gaston K Mazandu; Nicola J Mulder
Journal: PLoS One Date: 2011-04-19 Impact factor: 3.240

Review 10. The regulation of sulfur metabolism in Mycobacterium tuberculosis.

Authors: Stavroula K Hatzios; Carolyn R Bertozzi
Journal: PLoS Pathog Date: 2011-07-21 Impact factor: 6.823