Literature DB >> 16845075

iCR: a web tool to identify conserved targets of a regulatory protein across the multiple related prokaryotic species.

Sarita Ranjan1, Jayshree Seshadri, Vaibhav Vindal, Sailu Yellaboina, Akash Ranjan.   

Abstract

Gene regulatory circuits are often commonly shared between two closely related organisms. Our web tool iCR (identify Conserved target of a Regulon) makes use of this fact and identify conserved targets of a regulatory protein. iCR is a special refined extension of our previous tool PredictRegulon- that predicts genome wide, the potential binding sites and target operons of a regulatory protein in a single user selected genome. Like PredictRegulon, the iCR accepts known binding sites of a regulatory protein as ungapped multiple sequence alignment and provides the potential binding sites. However important differences are that the user can select more than one genome at a time and the output reports the genes that are common in two or more species. In order to achieve this, iCR makes use of Cluster of Orthologous Group (COG) indices for the genes. This tool analyses the upstream region of all user-selected prokaryote genome and gives the output based on conservation target orthologs. iCR also reports the Functional class codes based on COG classification for the encoded proteins of downstream genes which helps user understand the nature of the co-regulated genes at the result page itself. iCR is freely accessible at http://www.cdfd.org.in/icr/.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16845075      PMCID: PMC1538900          DOI: 10.1093/nar/gkl202

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Over last one and half decades, genomes of microorganisms have been sequenced at a highly accelerated pace. However, extracting useful information from such a large pool of genome data has become a major challenge of post genomics era. One approach to address this issue is to organize the large and complex genome into an ordered and manageable subsystem that can be tackled systematically. An important example of this approach is to study cellular processes and associated gene expression in terms of gene regulatory circuits. Each of these circuits contains a regulator and a list of its target sites (motifs) located upstream to a subset of genes that are being regulated (1–3). Such an approach will enable us to understand how the constituent genes of a genome come together to execute metabolic and physiological processes of a cell in response to a given regulator. A large number of experimental and computational approaches are being attempted to understand how these genes come together to perform physiological function. The experimental approaches typically include microarray analysis of transcriptome (4,5). Subsequent to gathering the experimental data computational approaches are applied to search for common regulatory motifs and promoters present upstream to the up and down regulated genes and protein (6). Some of the computational tools like PHYLONET (7), BioProspector (8,9), Compare Prospector (9,10), MDscan (9,11), Motif Regressor (12), Bio Optimizer (13), PhyME (14) and so on are available for this purpose, but, most of these are either designed for eukaryotes or written to analyze the experimental data, such as micro array data, in terms of gene regulation. An alternate approach could be to first select the regulator associated with a cellular process and then use computational approach to identify the potential target of regulatory protein which could then subsequently be followed up by experiments to validate the computationally identified targets. As a first step in this direction, we had previously proposed a tool called PredictRegulon, which finds targets of a regulatory protein in a genome based on limited set of known binding motif data (15). We have successfully used this tool to identify and validate the DtxR and IdeR targets in corynebacteria and mycobacteria, respectively (16,17). However an important limitation of Predictregulon was that it searches one genome at a time. Carrying out simultaneous search in multiple genomes offers many advantages, most important among these are ability of such approach to reveal the conserved regulatory targets across the multiple related genomes. This would increase the confidence of experimental biologist in taking up experimental validation. Further it was also felt that if we could group the targets based on class of genes that is being regulated then we could provide the overall impact of the regulator on the physiology of the organism. We describe here iCR (identify Conserved target of a Regulon), a web server tool, for identification of conserved high priority targets of a regulatory protein from heterologous sequence data of prokaryotes (which includes regulatory sequences of genes and their orthologs in other species) where the user can easily distinguish biologically important motifs from background noise based on their cross species conservation.

PROGRAM DESCRIPTION

iCR is a CGI based web application written in Perl and C language. It uses a Shannon relative entropy based profile search method, similar to what was used in PredictRegulon tool. This application can utilize the available experimental data on binding sites of a transcription regulatory protein (18–20) to identify the regulons of a given regulator in genomes of various phylogenetically related bacterial species. iCR is composed of three parts (Figure 1): (i) a front-end web interface for submitting the block aligned known binding motifs and for selection of species of choice; (ii) a search engine for scanning the upstream sequences; and (iii) a classification and reporting system for rendering the textual output produced by iCR into a meaningful grouping. Each of these components is discussed in detail in the help pages linked to the iCR home page. A brief description is being given here.
Figure 1

Architecture of iCR. iCR is a CGI application which collects input from user using html forms (A). B represents a Perl script that gathers the input from A launches the Search Engine (C) which looks up genome sequences and their annotations (D), and returns the potential targets as an output which is further classified based on COG/Class or Genome. The classified output is returned as HTML output (F).

Input submission

iCR provides a web-based form for the input submission. The input form consists of two HTML pages. The first one accepts the sample motifs and the parameters defining the upstream region. On this page the known motifs can be copied either from sample input form or any authentic source and then be pasted in the web form in a block aligned fashion. The second page has a list of genomes organized in a taxonomically meaningful order for convenience in selection of multiple related species at a time and finally, the users need to specify the basis on which they want the predicted motifs to be grouped or classified on. The default or preferred option is Cluster of Orthologous Group (COG).

Search engine

Parameters accepted from the input forms are passed to a search engine which uses the Shannon relative entropy based profile scan method to scan the upstream sequences for regulatory motifs. This method is described in our previous paper PredictRegulon (15). However this analysis is carried out on multiple user selected genome and the results are compiled together. Since the complete COG data were not available for many of the genes of various genomes, we updated these data by running COGNITOR (21,22). Each COG selected represents the best hits to proteins from at least three lineages. The output of the search result is classified and grouped based on one of the three options—orthology, function class code or genome. Classification based on orthology (default option) lists all the orthologous targets of a regulator together emphasizing the fact that these are conserved targets of a given regulon.

Output

All the predicted and classified target motifs are presented as HTML table. This table has following columns: COG name, Functional class code, Genome, motif score, motif, Gene id mentioned in NCBI's ptt table, ORF number and gene product. The program predicts a number of motifs, the blue background color shows the high scoring motifs above the cut-off value. The motifs with yellow background color depicts exact match to the known binding sites.

Example usage

To demonstrate the typical application of iCR's regulon assignments, we chose to use known LexA-binding sites from Bacillus subtilis as a query set. These sites were collected from PRODORIC (19). We then selected different species belonging to Fermicutes (Bacillales, Lactobacillales, Clostridia and Mollicutes) simultaneously for search. We obtained the result classified on COG in which DNA motifs upstream to lexA (COG1974), recA(COG0468), uvrB(COG0556), dinP(COG0389), rpsE(COG0098), rpsN(COG0098), rggD (COG0457) and so on were picked up in many species together and therefore they qualify for conserved targets of LexA regulon (Table 1). Lex A is known to autoregulates itself (23). recA gene has been experimentally shown to be part of LexA regulon in Escherichia coli as well as B.subtilis (23,24). Homologs of dinP have also been shown to be regulated by LexA protein in Bdellovibrio bacteriovorus (25). LexA protein has been reported to interact with the regulatory region of uvrB in B.subtilis (19). All these observations confirm that the program is capable of identifying significant and high priority targets of a given regulator successfully. Additionally the result also highlights many motifs upstream to hypothetical genes/ORFs. An experimental confirmation of interaction of these motifs to LexA, followed by a functional assay based on known processes involved with a given regulator, could shed more lights on function of these hypothetical genes.
Table 1

Output of iCR showing the conserved targets of LexA regulon in Fermicutes

COGClassGenomeScorePositionSiteGeneSynonym
COG1974KNC_0041934.6875−77AGAACGAGTGTTTGlexAOB1669
COG1974KNC_0030304.77125−84AGAACATAAGTTTGlexACAC1832
COG1974KNC_0027454.88271−71CGAACAAATGTTTGlexASA1174
COG1974KNC_0045574.82946−80AGAACATAAGTTTGlexACTC01298
COG1974KNC_0033664.83493−70AGAACATAAGTTTGlexACPE1161
COG1974KNC_0025704.72756−77AGAACTTATGTTTGlexABH2356
COG1974KNC_0009644.81601−118CGAACCTATGTTTGlexABSU17850
COG1974KNC_0039234.88303−71CGAACAAATGTTTGlexAMW1226
COG1974KNC_0032124.82162−79CGAACCTTTGTTTGLIN1340
COG1974KNC_0027584.88182−138CGAACAAATGTTTGlexASAV1339
COG1974KNC_0032104.81541−79CGAACCTTTGTTTGLMO1302
COG0468LNC_0025704.64423−121CGAATAAATGTTCGrecABH2383
COG0468LNC_0032124.67474−138CGAATAAATGTTCGrecALIN1435
COG0468LNC_0032104.66915−138CGAATAAATGTTCGrecALMO1398
COG0468LNC_0039234.40442−143AGCACGTTTGTTCGrecAMW1168
COG0468LNC_0027584.40302−80AGCACGTTTGTTCGrecASAV1285
COG0468LNC_0030304.90549−48AGAACAAATGTTCGrecACAC1815
COG0468LNC_0033665.01207−34AGAACTTATGTTCGrecACPE1673
COG0468LNC_0044614.42484−143AGTACGTTTGTTCGSE0963
COG0468LNC_0009084.18494−236TGAACTGTTGTATGrecAMG339
COG0468LNC_0027454.40405−143AGCACGTTTGTTCGrecASA1128
COG0468LNC_0045574.9426−54AGAACAGATGTTCGrecACTC01289
COG0556LNC_0009644.7767−122CGAACTTTAGTTCGuvrBBSU35170
COG0556LNC_0039234.8228−105CGAACAAACGTTTGuvrBMW0720
COG0556LNC_0027454.82248−105CGAACAAACGTTTGuvrBSA0713
COG0556LNC_0030304.93204−29CGAACAAATGTTTGuvrBCAC0502
COG0556LNC_0027584.82157−103CGAACAAACGTTTGuvrBSAV0758
COG0556LNC_0041934.65391−69CGAATACTTGTTCGOB2488
COG0556LNC_0032124.62091−158CGAAAATATGTTCGuvrBLIN2632
COG0556LNC_0032104.61721−160CGAAAATATGTTCGuvrBLMO2489
COG0556LNC_0044614.90087−128CGAACAAATGTTTGSE0541
COG0389LNC_0033664.82409−26TGAACATATGTTTGdinPCPE1566
COG0389LNC_0039234.77999−49GGAACACGTGTTCGMW1251
COG0389LNC_0027584.33641−6AGAACATTTGTTCTSAV1364
COG0389LNC_0027454.81919−49AGAACACGTGTTCGSA1196
COG0389LNC_0032104.72978−33AGAACGCTTGTTCGLMO1975
COG0389LNC_0044614.32424−75AGAACAAATGTTCTSE1046
COG0389LNC_0032124.73647−33AGAACGCTTGTTCGLIN2082
COG0389LNC_0045574.82946−40AGAACATAAGTTTGCTC00437
COG0389LNC_0009644.37402−68CGAACATAAGTTCTyqjWBSU23710
COG0199JNC_0043684.29015−280TGAACGTATGTACGGBS0071
COG0199JNC_0026624.9713−280CGAACGTATGTTCGrpsNL0391
COG0199JNC_0030284.22998−280TGAACGTATGTACGSP0222
COG0199JNC_0030984.22977−280TGAACGTATGTACGrpsNSPR0202
COG0199JNC_0027374.41534−278CGAACGTATGTACGrpsNSPY0064
COG0199JNC_0034854.41477−278CGAACGTATGTACGrpsNSPYM18_0065
COG0199JNC_0044324.29794−140CGAAATTGTGTATGMYPE10040
COG0199JNC_0040704.41397−278CGAACGTATGTACGrpsN.1SPYM3_0053
COG0199JNC_0041164.2898−280TGAACGTATGTACGSAG0071
COG1396KNC_0034854.21446−8AGAAACCATGTTAGSPYM18_0038
COG1396KNC_0039234.32136−263GGAACAAGTGTACGMW1228
COG1396KNC_0040704.21434−8AGAAACCATGTTAGSPYM3_0031
COG1396KNC_0025704.4873−118GGAACGGGCGTTTGBH0096
COG1396KNC_0030284.47861−127TGAACAAATGTTGGSP1115
COG1396KNC_0027374.21453−8AGAAACCATGTTAGSPY0037
COG1396KNC_0041934.36271−253TGAACAGGAGTTAGOB3501
COG1396KNC_0033664.35319−58TGAACATTTGATTGCPE2564
COG0098JNC_0030284.38376−109AGAAGTGGTGTTCGSP0227
COG0098JNC_0041164.25066−110TGAAGTGGTGTTTGrpsESAG0075
COG0098JNC_0027374.23373−110TGAAGTGGTGTTTGrpsESPY0069
COG0098JNC_0043684.25082−110TGAAGTGGTGTTTGrpsEGBS0075
COG0098JNC_0030984.38367−109AGAAGTGGTGTTCGrpsESPR0206
COG0098JNC_0040704.23345−110TGAAGTGGTGTTTGrpsESPYM3_0057
COG0098JNC_0043504.24208−109TGAAGTGGTGTTTGrs5SMU.2009
COG0098JNC_0034854.23361−110TGAAGTGGTGTTTGrpsESPYM18_0069
COG0457RNC_0045574.34686−240GGAAGAAGAGTTTGCTC02554
COG0457RNC_0025704.38934−268CGAAGCAACGTTTGBH3054
COG0457RNC_0045574.39536−233AGAACAATTGTATGCTC01089
COG0457RNC_0027454.37946−17AGAAATGAGGTTCGSA1448
COG0457RNC_0039234.3797−17AGAAATGAGGTTCGMW1570
COG0457RNC_0030984.47855−97TGAACAAATGTTGGrggDSPR1022
COG0457RNC_0027584.3788−86AGAAATGAGGTTCGSAV1620

Note: Gene, Synonym column is as per NCBI ptt table. Class codes—K involved in transcription, L in DNA replication, recombination and repair, J represents orthologs involved in translation, ribosomal structure and biogenesis and so on.

To test the sensitivity of the iCR predictions, we deleted two important and known binding motifs of LexA protein (present upstream to the dinB and uvrB in B.subtilis) from the input form and selected two species of Bacillales, B.subtilis and Bacillus holodurans. These two motifs were picked up on result page with blue background proving the reliability of predictions. Certainly iCR results can serve as a useful starting point for molecular and cellular biologists for designing experiments to see the in vitro and in vivo effects of a regulatory protein in different systems.

CONCLUSION

To summarize, iCR is a web server that permits high throughput, detailed and fully automated prediction of potential binding targets of a regulatory protein in user selected prokaryotic species. iCR consists of 115 prokaryotic species arranged phylogenetically on the web interface. The first column on the result page, COG, is hyperlinked to NCBI and are fully navigable to allow users to have easy access to more related and descriptive information. The genome column shows the genome ID that is hyperlinked to a HTML page containing genome names corresponding to different IDs. For the user's convenience, functional class code column has also been linked to a page, which has a description of all the codes. iCR's strengths are in its free web accessibility, its comprehensiveness regarding choice of multiple species at a time, sorting of result based on COG and Class, and its interactive graphical interface.
  25 in total

1.  PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes.

Authors:  Sailu Yellaboina; Jayashree Seshadri; M Senthil Kumar; Akash Ranjan
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

2.  BioOptimizer: a Bayesian scoring function approach to motif discovery.

Authors:  Shane T Jensen; Jun S Liu
Journal:  Bioinformatics       Date:  2004-02-12       Impact factor: 6.937

3.  The COG database: a tool for genome-scale analysis of protein functions and evolution.

Authors:  R L Tatusov; M Y Galperin; D A Natale; E V Koonin
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 4.  Chromosomal organization is shaped by the transcription regulatory network.

Authors:  Ruth Hershberg; Esti Yeger-Lotem; Hanah Margalit
Journal:  Trends Genet       Date:  2005-03       Impact factor: 11.639

5.  Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli.

Authors:  G Balázsi; A-L Barabási; Z N Oltvai
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-20       Impact factor: 11.205

6.  Purified lexA protein is a repressor of the recA and lexA genes.

Authors:  J W Little; D W Mount; C R Yanisch-Perron
Journal:  Proc Natl Acad Sci U S A       Date:  1981-07       Impact factor: 11.205

7.  Computational prediction and experimental verification of novel IdeR binding sites in the upstream sequences of Mycobacterium tuberculosis open reading frames.

Authors:  Prachee Prakash; Sailu Yellaboina; Akash Ranjan; Seyed E Hasnain
Journal:  Bioinformatics       Date:  2005-03-03       Impact factor: 6.937

8.  Eukaryotic regulatory element conservation analysis and identification using comparative genomics.

Authors:  Yueyi Liu; X Shirley Liu; Liping Wei; Russ B Altman; Serafim Batzoglou
Journal:  Genome Res       Date:  2004-03       Impact factor: 9.043

9.  PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences.

Authors:  Saurabh Sinha; Mathieu Blanchette; Martin Tompa
Journal:  BMC Bioinformatics       Date:  2004-10-28       Impact factor: 3.169

10.  Prediction of DtxR regulon: identification of binding sites and operons controlled by Diphtheria toxin repressor in Corynebacterium diphtheriae.

Authors:  Sailu Yellaboina; Sarita Ranjan; Prachee Chakhaiyar; Seyed Ehtesham Hasnain; Akash Ranjan
Journal:  BMC Microbiol       Date:  2004-09-24       Impact factor: 3.605

View more
  3 in total

1.  MycoperonDB: a database of computationally identified operons and transcriptional units in Mycobacteria.

Authors:  Sarita Ranjan; Ranjit Kumar Gundu; Akash Ranjan
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

2.  MycoRRdb: a database of computationally identified regulatory regions within intergenic sequences in mycobacterial genomes.

Authors:  Mohit Midha; Nirmal K Prasad; Vaibhav Vindal
Journal:  PLoS One       Date:  2012-04-26       Impact factor: 3.240

3.  GntR family of regulators in Mycobacterium smegmatis: a sequence and structure based characterization.

Authors:  Vaibhav Vindal; Katta Suma; Akash Ranjan
Journal:  BMC Genomics       Date:  2007-08-23       Impact factor: 3.969

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.