| Literature DB >> 35928164 |
Luis Romero1, Sebastian Contreras-Riquelme2, Manuel Lira3, Alberto J M Martin2, Ernesto Perez-Rueda4.
Abstract
Gene regulation is a key process for all microorganisms, as it allows them to adapt to different environmental stimuli. However, despite the relevance of gene expression control, for only a handful of organisms is there related information about genome regulation. In this work, we inferred the gene regulatory networks (GRNs) of bacterial and archaeal genomes by comparisons with six organisms with well-known regulatory interactions. The references we used are: Escherichia coli K-12 MG1655, Bacillus subtilis 168, Mycobacterium tuberculosis, Pseudomonas aeruginosa PAO1, Salmonella enterica subsp. enterica serovar typhimurium LT2, and Staphylococcus aureus N315. To this end, the inferences were achieved in two steps. First, the six model organisms were contrasted in an all-vs-all comparison of known interactions based on Transcription Factor (TF)-Target Gene (TG) orthology relationships and Transcription Unit (TU) assignments. In the second step, we used a guilt-by-association approach to infer the GRNs for 12,230 bacterial and 649 archaeal genomes based on TF-TG orthology relationships of the six bacterial models determined in the first step. Finally, we discuss examples to show the most relevant results obtained from these inferences. A web server with all the predicted GRNs is available at https://regulatorynetworks.unam.mx/ or http://132.247.46.6/.Entities:
Keywords: genomics; orthology; regulatory modules; regulatory networks; transcription units
Year: 2022 PMID: 35928164 PMCID: PMC9344073 DOI: 10.3389/fmicb.2022.923105
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
Total new interactions per organism.
| Contribution source → |
|
|
|
|
|
| TUs | New interactions |
|---|---|---|---|---|---|---|---|---|
| Network contributed ↓ | ||||||||
| – | 395 | 34 | 255 | 206 | 286 (15.70%) | 828 | 1821 | |
| 248 | – | 157 | 600 | 125 | 193 (11.51%) | 393 | 1,676 | |
| 139 | 1,117 (44.69%) | – | 709 | 92 | 331 (13.24%) | 679 | 2,499 | |
| 259 | 1,135 (46.95%) | 140 | – | 124 | 238 (9.84%) | 608 | 2,417 | |
| 355 | 173 (21.38%) | 8 | 109 | – | 79 | 177 | 809 | |
| 70 | 242 (31.18%) | 17 | 140 | 22 | – | 405 | 776 |
The number of interactions, the contribution percentage of each organism (row “contribution”) to the new interactions, and the extension by TU assignment, is indicated. The number of interactions in the original network is indicated in brackets (first column).
Figure 1Flux diagram showing the inference of the GRNs. Six bacterial models were used to infer the GRNs in 12,230 bacterial and 649 archaeal genomes. If the pair A (TF) – B (TG) in a reference genome is identified (by orthology) in a new genome A′ (TF) –B′ (TG), the interaction is assigned. In addition, if the TG identified in the new genome is the first one in the TU, the interaction is extended to the other gene(s). One interaction in a new genome can be derived from one or more bacterial models. Finally, the reconstructed networks were evaluated in terms of their topological properties.
Single edge comparisons between the six reference networks employed in this work and their counterparts generated following our homology-based transfer approach from the other remaining networks.
| Organism | TP | FP | FN |
|
| ||
|---|---|---|---|---|---|---|---|
|
| 254 | 499 | 2,447 | 0.094 | 0.3373 | 0.147 | 4.01e–258 |
|
| 1,538 | 709 | 1,971 | 0.4383 | 0.6845 | 0.5344 | 0.0 |
|
| 51 | 202 | 938 | 0.0516 | 0.2016 | 0.0822 | 9.46e–39 |
|
| 1,491 | 666 | 1,394 | 0.5168 | 0.6912 | 0.5914 | 0.0 |
|
| 229 | 237 | 466 | 0.3295 | 0.4914 | 0.3945 | 9.45e–229 |
|
| 71 | 138 | 2,494 | 0.0277 | 0.3397 | 0.0512 | 5.99e–52 |
Precision (P), Recall (R), and F1 were calculated using the true positive (TP), false positive (FP), and false negative edges (FN). P-value of the G-test indicates the significance of the differences between the averaged counts of TP, FP, TN, and FN in the 10,000 randomizations of the inferred networks and the results shown in the table.
Graphlets absence comparison between the six reference networks employed in this work and their counterparts generated following our homology-based transfer approach from the other remaining networks.
| Organism | TP | FP | TN | FN |
|
| |
|---|---|---|---|---|---|---|---|
|
| 2,241 | 10,008 | 622,878,341 | 145,210 | 0.0152 | 0.183 | 0.0281 |
|
| 57,366 | 38,545 | 619,477,397 | 185,907 | 0.2358 | 0.5981 | 0.3383 |
|
| 101 | 2,815 | 73,261,825 | 12,989 | 0.0077 | 0.0346 | 0.0126 |
|
| 56,206 | 50,622 | 362,053,161 | 134,678 | 0.2945 | 0.5261 | 0.3776 |
|
| 2087 | 5,176 | 17,864,376 | 19,578 | 0.0963 | 0.2873 | 0.1443 |
|
| 215 | 1876 | 117,513,083 | 234,607 | 0.0009 | 0.1028 | 0.0018 |
Comparisons between experimentally and inferred GRNs.
| Organism | Target counts | Target counts extended _tu | TF counts | TF counts extended_tu | Node count | Node count extended _tu | Edge count | Edge count extended _tu |
|---|---|---|---|---|---|---|---|---|
|
| 1748 | 2,301 | 191 | 227 | 1799 | 2,339 | 2,738 | 4,559 |
|
| 1,618 | 2,188 | 196 | 252 | 1,670 | 2,224 | 3,616 | 5,292 |
|
| 604 | 1701 | 124 | 236 | 638 | 1741 | 998 | 3,497 |
|
| 1,640 | 2,371 | 131 | 224 | 1,670 | 2,404 | 2,969 | 5,386 |
|
| 584 | 973 | 51 | 101 | 598 | 990 | 709 | 1,518 |
|
| 1,405 | 1710 | 76 | 107 | 1,431 | 1733 | 2,637 | 3,413 |
Columns as are follows: Genome name; columns 1, 3, 5, and 7 indicate the Targets, TFs, nodes, and number of interactions identified in the original networks; columns 2, 4, 6, and 8 indicate the Targets, TFs, nodes, and number of interactions identified in the extended networks.
Figure 2Online interface of the “regulatory networks” server storing the publicly available database. Diverse options are available for the user: a description of the system, a page to download the raw data, and the core section of the web to filter a GRN (purple box). To visualize load a network, the user can select the Gene Regulatory Network of the organism of interest in the “load a network” panel. In the Select network box, the user can Start selecting the name of the organism and click on the Load button to visualize the network on the right window (red box). This action will load the graph (black box) and node/edges properties (cyan box). Diverse layouts can be applied to visualize the network and specific nodes/edges to generate a new subgraph (green box) can be selected. As the graph visualization could be modified, the user can center/fit the network (White buttons) or reset the current visualization (Yellow button). Finally, for displayed nodes and edges, the user can download this network (Green button). In the example, in the right panel, the network is associated with the transcription factor DnaA (diamond) and their Target Genes (circles). Edges represent the transcription direction (when it is available). In the low panel, the TGs under the regulation of the TF are shown: NCBI ID, gene name, protein ID, start and end position, and strands. For more details of the web application, please refer to the Supplementary material.