Literature DB >> 22235300

Genomic arrangement of regulons in bacterial genomes.

Han Zhang1, Yanbin Yin, Victor Olman, Ying Xu.   

Abstract

Regulons, as groups of transcriptionally co-regulated operons, are the basic units of cellular response systems in bacterial cells. While the concept has been long and widely used in bacterial studies since it was first proposed in 1964, very little is known about how its component operons are arranged in a bacterial genome. We present a computational study to elucidate of the organizational principles of regulons in a bacterial genome, based on the experimentally validated regulons of E. coli and B. subtilis. Our results indicate that (1) genomic locations of transcriptional factors (TFs) are under stronger evolutionary constraints than those of the operons they regulate so changing a TF's genomic location will have larger impact to the bacterium than changing the genomic position of any of its target operons; (2) operons of regulons are generally not uniformly distributed in the genome but tend to form a few closely located clusters, which generally consist of genes working in the same metabolic pathways; and (3) the global arrangement of the component operons of all the regulons in a genome tends to minimize a simple scoring function, indicating that the global arrangement of regulons follows simple organizational principles.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22235300      PMCID: PMC3250446          DOI: 10.1371/journal.pone.0029496

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Regulons are the basic units of cellular response systems in bacterial cells, and represent a most basic concept in bacterial studies. A bacterial regulon is a group of operons that are transcriptionally co-regulated by the same regulatory machinery, consisting of trans regulators (transcription factors or simply TFs) and cis regulatory binding elements in the promoters of the operons they regulate. Operationally, a regulon contains operons regulated by one same transcription factor. Since the term regulon was first proposed in 1964 [1], 173 regulons have been fully or partially identified in E. coli K12 [2] and numerous more in other bacteria e.g. B. subtilis. Loosely speaking, regulons can be categorized into two classes: local and global regulons, with the former corresponding to regulons consisting of only a few component operons and the latter having a relatively large number of operons [3]. While the functionalities of the known regulons have been well studied, very little is known about how regulons are organized in a bacterial genome. The only published related work is by Janga et al. [4], which found that for small regulons, TFs tend to be located close to their targets (TGs). We present a computational study here to elucidate the organizational principles of regulons in a bacterial genome. We have carried out our study on E. coli K12 and also on B. subtilis str. 168 to demonstrate the generality of the results. Our key findings are (1) operons of each regulon tend to form a few closely located clusters along with genome; (2) TFs are under stronger evolutionary constraints than their TGs; and (3) the global arrangement of the component operons of all the (known) regulons in a genome tend to minimize a simple scoring function.

Results and Discussion

We have examined all the 3,684 regulatory relationships between TFs and their TGs in RegulonDB [2], involving 173 TFs and 729 TGs forming 173 regulons. We assigned genes to operons based on the operon information in the DOOR database (14) (http://csbl1.bmb.uga.edu/OperonDB) of E. coli K12. Among these regulatory relationships, 105 TFs are self-regulated; 123 (71%) of the 173 TFs regulate more than one TG; 411 (56%) of the 729 TGs are regulated by more than one TF; and 131 (18%) TGs are also TFs so they are regulated by upper-stream TFs while regulating downstream targets in the global transcription regulation network.

Operons in a regulon tend to form clusters in terms of their genomic locations

Intuitively we would expect that operons in a regulon should stay close to each other in a genome to facilitate efficient co-regulation, which was used earlier to explain the formation of operons [5]. To test if this is indeed the case, we examined the distribution of the distances between two neighboring operons within a regulon, measured as the (smallest) number of operons between the two operons (we do not consider the orientations of operons). We noted that 523 (32%) of the 1,624 such distances in the 173 regulons are less than two (and 47% less than 10), as shown in Figure 1A, suggesting that the component operons in a regulons tend to cluster together, although they may form multiple clusters. This remains to be true for all large regulons, which are defined as regulons with more than 5 component operons in this study. For example, crp, the largest regulon in E. coli, consists of 230 operons, 86 of which (37%) have distances less than two. As a control, we have checked a similar distance distribution calculated over 173 artificial regulons by randomly grouping E. coli K12 operons. Figure 1B shows the distance distribution, which is clearly very different from the one in Figure 1A. Similar observations were made when studying the regulons of B. subtilis (Figure 1C and 1D).
Figure 1

Component operons of regulons tend to be clustered.

The distance between two neighboring operons within a regulon is defined as the number of operons between the two operons. The bin width of the histogram is 1. (C) and (D) are similar to (A) and (B), respectively, but are for B. subtilis.

Component operons of regulons tend to be clustered.

The distance between two neighboring operons within a regulon is defined as the number of operons between the two operons. The bin width of the histogram is 1. (C) and (D) are similar to (A) and (B), respectively, but are for B. subtilis. We hypothesize that operons in each cluster of each regulon encode enzymes participating in the same metabolic pathway. To check for this, we consider a (maximal) list of operons of a regulon forms an operon cluster if the maximum distance between each pair of neighboring operons in this list is less than five. Using this definition, we obtained 353 operon clusters (85% from large regulons with size >5 as defined above), 242 of which each has at least two operons mapped to some SEED metabolic pathways (http://www.theseed.org/) [6]. Among them, 191 (79%) clusters have at least two operons participating in the same SEED pathway. Interestingly among these 191 clusters, 158 (83%) have all their mapped operons participating in the same SEED pathways. We noted that all these results do not change substantially when we adjust the distance cutoff from 5 to any integer between 2 and 7 when defining an operon cluster (Table S1). This suggests that each operon cluster generally consists of genes working in the same metabolic pathways. Similar observation (Table S2) was made when studying the B. subtilis regulons.

Genomic locations of TFs are under stronger constraints than those of TGs

It has been observed that small regulons tend to have their TFs located close to their TGs [4], suggesting that the genomic location of a TF is under strong constraints from its TGs (meaning TG's locations). A natural question is if a TG is under strong constraints from its TFs. We note that 56% TGs are regulated by more than one TF in E. coli K12. From Figure 2A, we conclude that there is no significant difference (Wilcoxon test P-value = 0.31) between TGs regulated by one regulator and those regulated by multiple regulators in terms of the distances to their TFs, meaning that an operon does not have to stay close to its regulator even if it is only controlled by one regulator. This is not surprising because the regulator may control many targets, some of which might be close to the regulator while others may not. This finding is opposed to what we found for TFs controlling a small number of operons (Figure S1), suggesting that the genomic locations of TGs are generally under less constraints than those of TFs.
Figure 2

TFs are under stronger constraints than TGs.

(A) All TGs are categorized into two groups, TGs regulated by one TF and TGs regulated by multiple TFs. The distributions of the distances (y axis) from TGs to their TFs are shown as box-plots. (B) All TGs are categorized into two groups, TGs that regulate other TGs, and TGs that do not regulate other TGs. The distributions of the distances (y axis) from TGs to their upstream TFs are shown as box-plots. Throughout this paper, the distance between a TF-TG pair is defined as the number of operons between the two operons. (C) and (D) are similar to (A) and (B), respectively, but are for B. subtilis. P-values of Wilcoxon tests are shown between two neighboring boxes.

TFs are under stronger constraints than TGs.

(A) All TGs are categorized into two groups, TGs regulated by one TF and TGs regulated by multiple TFs. The distributions of the distances (y axis) from TGs to their TFs are shown as box-plots. (B) All TGs are categorized into two groups, TGs that regulate other TGs, and TGs that do not regulate other TGs. The distributions of the distances (y axis) from TGs to their upstream TFs are shown as box-plots. Throughout this paper, the distance between a TF-TG pair is defined as the number of operons between the two operons. (C) and (D) are similar to (A) and (B), respectively, but are for B. subtilis. P-values of Wilcoxon tests are shown between two neighboring boxes. To further study this, we divided all TGs into two groups, TGs that are also TFs regulating downstream TGs and TGs that are not TFs. Figure 2B shows that the first group of TGs tends to locate significantly closer (Wilcoxon test P-value = 1e−3) to their regulators than the second group, directly suggesting that TGs are under stronger constraints than ordinary TGs from their upstream regulators if they are TFs themselves controlling further downstream targets. This is possibly due to the need for such TGs to have faster reaction time to send the regulatory signal down across the regulatory network. We have also performed all the analyses for B. subtilis str. 168 using data from DBTBS [7] database. The results are shown in Figure 2C and 2D and remain as significant as observed in E. coli, strongly suggesting that the observations made above are independent of which bacteria we use and hence may apply to all bacterial organisms in general.

Global genomic arrangement of regulons

All these observations led to our main hypothesis that the genomic locations of the component operons of all the regulons encoded in a genome are determined by some global organizational principle. Specifically we hypothesize that the global genomic arrangement of regulons tends to minimize the following function based on our preliminary study:where N is the total number of regulons encoded in a genome and represents the total distance between the genomic location of the TF and all the TGs of the i regulon. Note that a similar formula has been used in our recent study on the genomic arrangements of metabolic pathways [8]. We have used the following procedure to demonstrate that the D value of all the known regulons encoded in the E. coli genome is significantly smaller than those of the vast majority of alternatively arranged genomes. Specifically, we have considered one million permutations of the genomic locations of X% of operons (both TFs and TGs) of E. coli K-12, for X = 10, 20, …, 100 (see ). Figure 3A shows the D value distributions for different percentages of reshuffled locations of operons for the E. coli genome. We can clearly see that the current genomic arrangement of operons of E. coli K12 has a lower D value (the vertical dash line) than the vast majority of the D values of the reshuffled genomes, which is also supported by statistical tests (all P-values<0.05, see Table S3). It is also interesting to see that reshuffling TFs increases D values considerably more than reshuffling the same amount of TGs (Figure 3B, P-value<0.05), consistent with our observation made based on Figure 2 that TFs are under stronger constraints than TGs in terms of their genomic locations.
Figure 3

The distributions of D values calculated for the actual and reshuffled genomes.

The x-axis represents the D values), and the y-axis is the frequency (density). In (A) each curve is calculated using one million permutations of the current arrangement of the operons in a genome under a specified constraint. Ten D distributions are calculated, with each distribution calculated allowing X% of operons randomly selected among all the operons under consideration and being randomly permutated, with X = 10, 20, …, 100, respectively, where the ten curves from left to right are consistent with the order of X. The vertical dash line shows the D value for the current arrangement of the operons in the E. coli K12 genome. We also conducted permutations using a second manner, i.e. artificially forming regulons and calculating D values for permutated genomes. The result is shown as a dotted curve. (B) A comparison between the D distributions when randomly permuting 100 TGs (curve on the left) versus randomly permuting 100 TFs (curve on the right) in the genome of E. coli K-12. (C) and (D) are similar to (A) and (B), respectively, but are for B. subtilis.

The distributions of D values calculated for the actual and reshuffled genomes.

The x-axis represents the D values), and the y-axis is the frequency (density). In (A) each curve is calculated using one million permutations of the current arrangement of the operons in a genome under a specified constraint. Ten D distributions are calculated, with each distribution calculated allowing X% of operons randomly selected among all the operons under consideration and being randomly permutated, with X = 10, 20, …, 100, respectively, where the ten curves from left to right are consistent with the order of X. The vertical dash line shows the D value for the current arrangement of the operons in the E. coli K12 genome. We also conducted permutations using a second manner, i.e. artificially forming regulons and calculating D values for permutated genomes. The result is shown as a dotted curve. (B) A comparison between the D distributions when randomly permuting 100 TGs (curve on the left) versus randomly permuting 100 TFs (curve on the right) in the genome of E. coli K-12. (C) and (D) are similar to (A) and (B), respectively, but are for B. subtilis. For each known regulon in E. coli K12, we have also arbitrarily selected the same number of operons from the pool of all operons covered by the known regulons to form an artificial regulon, and do this for every known regulon. Again, we see the D value of the real genome (vertical dashed line) is significantly smaller than those of genomes with artificially formed regulons (the dotted curve in Figure 3A). To ensure that the our observations hold for other bacterial genomes in general, we have checked the observation made in this section on all the 160 known regulons of B. subtilis, using the same procedure on E. coli genome and the results are as shown in Figure 3C and 3D (see also Table S3), which are clearly highly similar to those shown in Figure 3A and 3B. This work presents a systematic study of the genomic arrangement of regulons in terms of their organization in a bacterial genome. We made a number of interesting observations related to the organizational principles of regulons in a bacterial genome, namely (1) transcription factors of regulons are under strong constraints from their regulatory targets while TGs do not seem to be under strong constraints from their TFs; (2) regulons tend to form operon clusters, each of which tend to consist of operons encoding the same metabolic pathway; and (3) the genome tends to minimize the overall distance between the TFs and their TGs across all regulons encoded in the genome. We believe that all the observations are mostly due to the need by the cell to efficiently transcribe the relevant genes. Janga et al suggested that TFs of large regulons usually have high expression levels and presumably get to their targets through diffusion, and this might be the reason that they do not need to locate close to their targets. For small regulons, TFs are simply located closely to their targets which should be evolutionarily favored. For larger regulons consisting of multiply clustered operons, the three dimensional packing of the chromosome needs to be considered. It is likely that these organizational principles, along with a few others including genomic organization of metabolic pathways [8], the selfish operon model [9] and the nucleoid compaction [10], [11], [12], [13], [14], collectively determine the local and the global organization of all bacterial genes in a genome [15].

Materials and Methods

Date sources

The genome of E. coli K-12 MG1655 was downloaded from ftp://ftp.ncbi.nih.gov as of 01/14/2009. All the predicted operons for the organism were downloaded from the DOOR [16] database at http://csbl1.bmb.uga.edu/OperonDB. All regulons data of E. coli K-12 MG1655 and of B. subtilis str. 168 were downloaded from the RegulonDB [17] and from the DBTBS [17] database, respectively, as of 03/2010.

Operon shuffling

For each reshuffled genome, the D value defined in the formula was calculated for X = 10, 20, …, 100; X is the percentage of operons to be reshuffled (i.e. their genomic locations are permutated). The following two-step procedure was conducted to randomly shuffle a specified fraction (X%) of operons. We first randomly select operons among all operons of the E. coli genome for 10,000 times and then randomly permute their locations 100 times for each specific selection of the 10,000. So we do a total of one million permutations and calculate the D value distribution over the million rearranged genomes Box plots of the distance distribution of operons (TGs) to their regulators (TFs). (A) is for E. coli and (B) is for B. subtilis. P-values of Wilcoxon tests are shown between two neighboring boxes. (EPS) Click here for additional data file. The number of operon clusters participating in the same SEED pathway under different distance cutoffs (E. coli). The first column represents the distance cutoff used to define a cluster. The second column is the number of clusters having at least two operons mapped to some SEED metabolic pathways. The third column is the number of clusters having at least two operons participating in the same SEED pathway. The fourth column is the number of clusters having all their mapped operons participating in the same SEED pathways. Regulons with at least two operons are considered. (DOC) Click here for additional data file. The number of operon clusters participating in the same SEED pathway under different distance cutoffs (B. subtilis). See Table S1 legend for details. Note the there are significantly less operons in B. subtilis than in E. coli that are mapped to the SEED pathways. This makes the numbers in Table S2 are much smaller than those in Table S1. (DOC) Click here for additional data file. Statistical tests of curves in . The ‘skewness’ and ‘kurtosis’ columns are calculated to test if the curves in Figure 3 are normal distribution. ‘skewness’ closer to 0 and ‘kurtosis’ closer to 3 indicates close to normal distribution. The ‘P-value’ column is calculated to test if the curves are significantly larger than the vertical dash line, indicating that the permutated genomes have significant larger D values than the actual genomes. (DOC) Click here for additional data file.
  17 in total

Review 1.  Identifying global regulators in transcriptional regulatory networks in bacteria.

Authors:  Agustino Martínez-Antonio; Julio Collado-Vides
Journal:  Curr Opin Microbiol       Date:  2003-10       Impact factor: 7.934

2.  STUDIES ON THE MECHANISM OF REPRESSION OF ARGININE BIOSYNTHESIS IN ESCHERICHIA COLI. II. DOMINANCE OF REPRESSIBILITY IN DIPLOIDS.

Authors:  W K MAAS
Journal:  J Mol Biol       Date:  1964-03       Impact factor: 5.469

3.  Genetic regulatory mechanisms in the synthesis of proteins.

Authors:  F JACOB; J MONOD
Journal:  J Mol Biol       Date:  1961-06       Impact factor: 5.469

4.  Genomic arrangement of bacterial operons is constrained by biological pathways encoded in the genome.

Authors:  Yanbin Yin; Han Zhang; Victor Olman; Ying Xu
Journal:  Proc Natl Acad Sci U S A       Date:  2010-03-22       Impact factor: 11.205

5.  Selfish operons: horizontal transfer may drive the evolution of gene clusters.

Authors:  J G Lawrence; J R Roth
Journal:  Genetics       Date:  1996-08       Impact factor: 4.562

6.  The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.

Authors:  Ross Overbeek; Tadhg Begley; Ralph M Butler; Jomuna V Choudhuri; Han-Yu Chuang; Matthew Cohoon; Valérie de Crécy-Lagard; Naryttza Diaz; Terry Disz; Robert Edwards; Michael Fonstein; Ed D Frank; Svetlana Gerdes; Elizabeth M Glass; Alexander Goesmann; Andrew Hanson; Dirk Iwata-Reuyl; Roy Jensen; Neema Jamshidi; Lutz Krause; Michael Kubal; Niels Larsen; Burkhard Linke; Alice C McHardy; Folker Meyer; Heiko Neuweger; Gary Olsen; Robert Olson; Andrei Osterman; Vasiliy Portnoy; Gordon D Pusch; Dmitry A Rodionov; Christian Rückert; Jason Steiner; Rick Stevens; Ines Thiele; Olga Vassieva; Yuzhen Ye; Olga Zagnitko; Veronika Vonstein
Journal:  Nucleic Acids Res       Date:  2005-10-07       Impact factor: 16.971

7.  Long-range periodic patterns in microbial genomes indicate significant multi-scale chromosomal organization.

Authors:  Timothy E Allen; Nathan D Price; Andrew R Joyce; Bernhard Ø Palsson
Journal:  PLoS Comput Biol       Date:  2006-01-13       Impact factor: 4.475

8.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions.

Authors:  Heladia Salgado; Socorro Gama-Castro; Martín Peralta-Gil; Edgar Díaz-Peredo; Fabiola Sánchez-Solano; Alberto Santos-Zavaleta; Irma Martínez-Flores; Verónica Jiménez-Jacinto; César Bonavides-Martínez; Juan Segura-Salazar; Agustino Martínez-Antonio; Julio Collado-Vides
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  Spatial patterns of transcriptional activity in the chromosome of Escherichia coli.

Authors:  Kyeong Soo Jeong; Jaeyong Ahn; Arkady B Khodursky
Journal:  Genome Biol       Date:  2004-10-27       Impact factor: 13.583

10.  Decoding the nucleoid organisation of Bacillus subtilis and Escherichia coli through gene expression data.

Authors:  Anne-Sophie Carpentier; Bruno Torrésani; Alex Grossmann; Alain Hénaut
Journal:  BMC Genomics       Date:  2005-06-06       Impact factor: 3.969

View more
  9 in total

1.  High-throughput screen of essential gene modules in Mycobacterium tuberculosis: a bibliometric approach.

Authors:  Guangyu Xu; Bin Liu; Fang Wang; Chengguo Wei; Ying Zhang; Jiyao Sheng; Guoqing Wang; Fan Li
Journal:  BMC Infect Dis       Date:  2013-05-20       Impact factor: 3.090

2.  Spatial features for Escherichia coli genome organization.

Authors:  Ting Xie; Liang-Yu Fu; Qing-Yong Yang; Heng Xiong; Hongrui Xu; Bin-Guang Ma; Hong-Yu Zhang
Journal:  BMC Genomics       Date:  2015-02-05       Impact factor: 3.969

3.  Global genomic arrangement of bacterial genes is closely tied with the total transcriptional efficiency.

Authors:  Qin Ma; Ying Xu
Journal:  Genomics Proteomics Bioinformatics       Date:  2013-01-26       Impact factor: 7.691

4.  Effect of genomic distance on coexpression of coregulated genes in E. coli.

Authors:  Lucia Pannier; Enrique Merino; Kathleen Marchal; Julio Collado-Vides
Journal:  PLoS One       Date:  2017-04-18       Impact factor: 3.240

5.  Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655.

Authors:  Ye Gao; James T Yurkovich; Sang Woo Seo; Ilyas Kabimoldayev; Andreas Dräger; Ke Chen; Anand V Sastry; Xin Fang; Nathan Mih; Laurence Yang; Johannes Eichner; Byung-Kwan Cho; Donghyuk Kim; Bernhard O Palsson
Journal:  Nucleic Acids Res       Date:  2018-11-16       Impact factor: 16.971

6.  The High Mutational Sensitivity of ccdA Antitoxin Is Linked to Codon Optimality.

Authors:  Soumyanetra Chandra; Kritika Gupta; Shruti Khare; Pehu Kohli; Aparna Asok; Sonali Vishwa Mohan; Harsha Gowda; Raghavan Varadarajan
Journal:  Mol Biol Evol       Date:  2022-10-07       Impact factor: 8.800

7.  Computational analyses of transcriptomic data reveal the dynamic organization of the Escherichia coli chromosome under different conditions.

Authors:  Qin Ma; Yanbin Yin; Mark A Schell; Han Zhang; Guojun Li; Ying Xu
Journal:  Nucleic Acids Res       Date:  2013-04-17       Impact factor: 16.971

8.  Acetobixan, an inhibitor of cellulose synthesis identified by microbial bioprospecting.

Authors:  Ye Xia; Lei Lei; Chad Brabham; Jozsef Stork; James Strickland; Adam Ladak; Ying Gu; Ian Wallace; Seth DeBolt
Journal:  PLoS One       Date:  2014-04-18       Impact factor: 3.240

9.  Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

Authors:  Tyrrell Conway; James P Creecy; Scott M Maddox; Joe E Grissom; Trevor L Conkle; Tyler M Shadid; Jun Teramoto; Phillip San Miguel; Tomohiro Shimada; Akira Ishihama; Hirotada Mori; Barry L Wanner
Journal:  MBio       Date:  2014-07-08       Impact factor: 7.867

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.