| Literature DB >> 23599001 |
Qin Ma1, Yanbin Yin, Mark A Schell, Han Zhang, Guojun Li, Ying Xu.
Abstract
The circular chromosome of Escherichia coli has been suggested to fold into a collection of sequentially consecutive domains, genes in each of which tend to be co-expressed. It has also been suggested that such domains, forming a partition of the genome, are dynamic with respect to the physiological conditions. However, little is known about which DNA segments of the E. coli genome form these domains and what determines the boundaries of these domain segments. We present a computational model here to partition the circular genome into consecutive segments, theoretically suggestive of the physically folded supercoiled domains, along with a method for predicting such domains under specified conditions. Our model is based on a hypothesis that the genome of E. coli is partitioned into a set of folding domains so that the total number of unfoldings of these domains in the folded chromosome is minimized, where a domain is unfolded when a biological pathway, consisting of genes encoded in this DNA segment, is being activated transcriptionally. Based on this hypothesis, we have predicted seven distinct sets of such domains along the E. coli genome for seven physiological conditions, namely exponential growth, stationary growth, anaerobiosis, heat shock, oxidative stress, nitrogen limitation and SOS responses. These predicted folding domains are highly stable statistically and are generally consistent with the experimental data of DNA binding sites of the nucleoid-associated proteins that assist the folding of these domains, as well as genome-scale protein occupancy profiles, hence supporting our proposed model. Our study established for the first time a strong link between a folded E. coli chromosomal structure and the encoded biological pathways and their activation frequencies.Entities:
Mesh:
Year: 2013 PMID: 23599001 PMCID: PMC3675479 DOI: 10.1093/nar/gkt261
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Information of the seven classes of growth conditions with the marker genes used for identifying the growth condition classes in M3D listed in the second column (with the gene number following in the brackets) and the number of MGC sets for each growth condition class shown in the third column
| Growth conditions | Marker genes (number of genes) | Number of MGC datasets |
|---|---|---|
| Exponential growth | Ribosomal proteins (54) | 45 |
| Stationary growth | Ribosomal proteins (54) | 131 |
| Heat shock | Heat shock proteins (14) | 54 |
| Oxidative stress | OxyR and SoxRS regulons (61) | 30 |
| Anaerobiosis | Partial Fnr regulons (53) | 55 |
| SOS response | LexA regulon (56) | 57 |
| Nitrogen limitation | NtrC and Nac regulon (65) | 34 |
| Random | N/A | 100 |
The ‘Random’ growth conditions (the last line in Table 1) correspond to 100 randomly selected MGCs from all the available MGC in the M3D database.
Properties of the folding-domain boundaries predicted for each MGC group
| MGC groups | Number of folding- domain boundaries | ALD (kb) | ALB (bp) | ALNB (bp) | #HEG | #NAP | #Transcription factories | #Fis |
|---|---|---|---|---|---|---|---|---|
| Exponential growth | 146 | 31.4 | 402 | 271 | 13 | 43 | 6 | 33 |
| Stationary growth | 84 | 54.9 | 351 | 276 | 10 | 24 | 3 | 16 |
| Heat shock | 116 | 39.6 | 424 | 193 | 13 | 31 | 6 | 19 |
| Oxidative stress | 94 | 48.9 | 344 | 276 | 3 | 31 | 2 | 15 |
| Anaerobiosis | 102 | 45 | 424 | 272 | 13 | 33 | 8 | 21 |
| SOS response | 114 | 40.2 | 471 | 269 | 6 | 34 | 1 | 20 |
| Nitrogen limitation | 95 | 48.5 | 344 | 276 | 4 | 26 | 1 | 18 |
ALD, average length of the predicted folding domains; ALB, average length of the inter-operonic regions containing folding-domain boundaries; ALNB, average length of the remaining inter-operonic regions. #HEG is the number of highly expressed genes encoded in the predicted folding-domain boundary regions. #NAP is the number of NAP binding sites in the inter-operonic regions containing a predicted folding-domain boundary. #Transcription factories is the number of superstructures near predicted folding-domain boundaries formed by NAPs associated with the ribosomal RNA operons. #Fis is for the number of Fis binding sites in the inter-operonic regions containing a predicted folding-domain boundary.
Figure 1.(a) Circos plots of predicted folding domains along the genome of E. coli K12 during the stationary growth phase. The alternating black and white bands in the outermost ring represent the partition of the E. coli genome into folding domains. (b) An expanded view of the genomic region (0–1.2 M). From the inside out, the six rings are labeled with numbers: (1) Each pair of genes involved in the same EcoCyc pathway are connected using gray lines; (2) the red histogram shows the number of pathways in which the target gene is involved; (3) the orange histogram shows the number of the coexpressed gene pairs; (4) each blue bar represents the presence of a highly expressed gene; (5) each green bar represents the presence of a known NAP-binding site, which should fall in domain boundary regions; and (6) predicted folding domains represented as alternating black-and-white bands in the seventh ring. Two thick bars are used to distinguish the adjacent folding domains as the boundaries are not visible at genome scale. (c) A comparison between the numbers of coexpressed gene pairs in the flanks of the predicted domains (orange box) and a set of randomly picked intergenic regions (gray box).
Figure 2.Boxplots showing stabilities of the predicted folding domains (exponential growth and heat shock) based on the selected MGC set versus a randomly selected MGC set as defined in the main text. The comparison among the other five pairs of predicted domain sets is shown in the left upper corner. Each box with lighter gray level represents the distance distribution between the domains predicted using the selected MGCs and domains predicted using half of the selected MGCs, and each box with darker gray level is defined similarly but against domains predicted based on randomly selected MGCs, where the y-axis is the distance axis. The Wilcoxon test P-values for each pair of distributions are shown in the top of boxes of each corresponding set of predicted folding domains.
Figure 3.(a) Degrees of overlap between each pair of MGC groups. The node size represents the size of a MGC group, and the edge width represents the number of overlapping MGCs between the two corresponding nodes. The label of each edge has two values: the first being the degree of overlap between the two corresponding MGC groups and the second being the distance between two predicted folding-domain sets, and (b) relationship between the degree of overlap among MGC groups and the distance between the corresponding folding-domain sets.
Statistical significance of correlation coefficient between predicted domain boundaries and EPODs and H-NS binding regions
| EPODs | tsEPODs | heEPODs | H-NSs | loH-NSs | shH-NSs | |
|---|---|---|---|---|---|---|
| sFDs | 3.8e-03 | 2.6e-02 | 6.4e-02 | 1.1e-02 | 4.1e-02 | 8.3e-02 |
| Random set | 4.2e-01 | 2.3e-01 | 6.1e-01 | 9.7e-02 | 9.2e-02 | 5.4e-01 |
*P < 0.05.