| Literature DB >> 31856202 |
Rafael Hernandez-Guerrero1, Edgardo Galán-Vásquez2, Ernesto Pérez-Rueda1,3.
Abstract
In this work, we describe a systematic comparative genomic analysis of promiscuous domains in genomes of Bacteria and Archaea. A quantitative measure of domain promiscuity, the weighted domain architecture score (WDAS), was used and applied to 1317 domains in 1320 genomes of Bacteria and Archaea. A functional analysis associated with the WDAS per genome showed that 18 of 50 functional categories were identified as significantly enriched in the promiscuous domains; in particular, small-molecule binding domains, transferases domains, DNA binding domains (transcription factors), and signal transduction domains were identified as promiscuous. In contrast, non-promiscuous domains were identified as associated with 6 of 50 functional categories, and the category Function unknown was enriched. In addition, the WDASs of 52 domains correlated with genome size, i.e., WDAS values decreased as the genome size increased, suggesting that the number of combinations at larger domains increases, including domains in the superfamilies Winged helix-turn-helix and P-loop-containing nucleoside triphosphate hydrolases. Finally, based on classification of the domains according to their ancestry, we determined that the set of 52 promiscuous domains are also ancient and abundant among all the genomes, in contrast to the non-promiscuous domains. In summary, we consider that the association between these two classes of protein domains (promiscuous and non-promiscuous) provides bacterial and archaeal cells with the ability to respond to diverse environmental challenges.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31856202 PMCID: PMC6922389 DOI: 10.1371/journal.pone.0226604
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The total number of non-redundant domains follows a logarithmic distribution (A), whereas promiscuous domains follow a polynomial distribution (B). On the X-axis is the genome size (in ORFs), and on the Y-axis is the total number non-redundant (NR) domains identified. A non-linear least squared to fit the dataset was used.
Fig 2Functional enrichment analysis on promiscuous and non-promiscuous domains per genome.
On the X-axis is the total number of genomes with an enriched function, and on the Y-axis are the functional categories.
Promiscuous domains in Bacteria and Archaea.
| SUPFAM ID | Function | Description | R-value | P-value | Genome distribution | Number of domains |
|---|---|---|---|---|---|---|
| 46689 | LA | Homeodomain-like | -0.447598969 | 2.11E-70 | 1206 | 63188 |
| 46785 | LA | Winged helix DNA-binding domain | -0.424153905 | 2.22E-52 | 1306 | 102676 |
| 46894 | LA | C-terminal effector domain of the bipartite response regulators | -0.550775615 | 3.17E-94 | 1114 | 22398 |
| 47336 | S | ACP-like | -0.502184217 | 3.21E-102 | 872 | 11982 |
| 47384 | T | Homodimeric domain of signal transducing histidine kinase | -0.465054433 | 1.88E-65 | 1180 | 30953 |
| 47413 | LA | lambda repressor-like DNA-binding domains | -0.43766273 | 8.42E-77 | 1231 | 27826 |
| 48179 | C | 6-phosphogluconate dehydrogenase C-terminal domain-like | -0.511977864 | 4.88E-95 | 1280 | 16142 |
| 48452 | RD | TPR-like | -0.50277309 | 1.28E-70 | 1218 | 42972 |
| 48498 | K | Tetracyclin repressor-like, C-terminal domain | -0.447532579 | 1.91E-73 | 931 | 14919 |
| 50129 | O | GroES-like | -0.538824071 | 9.75E-97 | 1116 | 13619 |
| 51182 | EA | RmlC-like cupins | -0.496006478 | 4.30E-100 | 1201 | 16674 |
| 51338 | RC | Composite domain of metallo-dependent hydrolases | -0.440319429 | 1.94E-76 | 1167 | 10765 |
| 51395 | RD | FMN-linked oxidoreductases | -0.477243565 | 6.47E-93 | 1242 | 9334 |
| 51556 | RC | Metallo-dependent hydrolases | -0.521924096 | 2.10E-101 | 1285 | 16908 |
| 51735 | HA | NAD(P)-binding Rossmann-fold domains | -0.460691707 | 2.84E-70 | 1319 | 122830 |
| 51905 | HA | FAD/NAD(P)-binding domain | -0.557529039 | 3.06E-93 | 1314 | 46246 |
| 52096 | OA | ClpP/crotonase | -0.484902387 | 4.77E-76 | 1260 | 20743 |
| 52151 | RB | FabD/lysophospholipase-like | -0.526822557 | 8.97E-88 | 967 | 6126 |
| 52172 | T | CheY-like | -0.516686547 | 2.99E-78 | 1208 | 56901 |
| 52317 | RB | Class I glutamine amidotransferase-like | -0.543988302 | 1.52E-80 | 1309 | 19566 |
| 52343 | RA | Ferredoxin reductase-like, C-terminal NADP-linked domain | -0.408337767 | 1.52E-67 | 925 | 4940 |
| 52402 | F | Adenine nucleotide alpha hydrolases-like | -0.463833236 | 2.48E-71 | 1319 | 24519 |
| 52518 | HA | Thiamin diphosphate-binding fold (THDP-binding) | -0.405125258 | 5.96E-53 | 1311 | 25285 |
| 52540 | HA | P-loop containing nucleoside triphosphate hydrolases | -0.524497666 | 4.13E-94 | 1319 | 293049 |
| 52833 | RA | Thioredoxin-like | -0.466397534 | 2.88E-77 | 1307 | 35294 |
| 53098 | F | Ribonuclease H-like | -0.436713312 | 2.92E-63 | 1317 | 22384 |
| 53187 | OA | Zn-dependent exopeptidases | -0.458998335 | 1.23E-72 | 1295 | 16486 |
| 53335 | RB | S-adenosyl-L-methionine-dependent methyltransferases | -0.551620835 | 2.47E-100 | 1317 | 60989 |
| 53383 | RB | PLP-dependent transferases | -0.578930336 | 4.92E-122 | 1313 | 38176 |
| 53474 | RC | alpha/beta-Hydrolases | -0.547861371 | 6.58E-114 | 1260 | 44812 |
| 53850 | P | Periplasmic binding protein-like II | -0.476126611 | 1.73E-75 | 1301 | 65801 |
| 53901 | RC | Thiolase-like | -0.550295571 | 1.30E-101 | 1268 | 21450 |
| 54292 | RA | 2Fe-2S ferredoxin-like | -0.456931722 | 8.47E-56 | 1096 | 8687 |
| 54427 | RF | NTF2-like | -0.482112445 | 9.35E-87 | 880 | 9935 |
| 54593 | RA | Glyoxalase/Bleomycin resistance protein/Dihydroxybiphenyl dioxygenase | -0.450965341 | 7.55E-79 | 1016 | 14244 |
| 54631 | RF | CBS-domain | -0.411347444 | 1.65E-63 | 1283 | 12925 |
| 55729 | RB | Acyl-CoA N-acyltransferases (Nat) | -0.456353703 | 1.03E-86 | 1250 | 34844 |
| 55781 | T | GAF domain-like | -0.490340591 | 8.87E-102 | 1083 | 19207 |
| 55811 | L | Nudix | -0.404505327 | 7.98E-71 | 1208 | 11668 |
| 55874 | O | ATPase domain of HSP90 chaperone/DNA topoisomerase II/histidine kinase | -0.496149089 | 3.81E-69 | 1273 | 50342 |
| 55961 | R | Bet v1-like | -0.419276746 | 1.15E-66 | 810 | 8377 |
| 56059 | H | Glutathione synthetase ATP-binding domain-like | -0.46214717 | 4.47E-50 | 1275 | 18325 |
| 56112 | OB | Protein kinase-like (PK-like) | -0.438363119 | 1.36E-76 | 1206 | 17033 |
| 56176 | HA | FAD-binding domain | -0.472306546 | 6.13E-70 | 1213 | 9295 |
| 56235 | RC | N-terminal nucleophile aminohydrolases (Ntn hydrolases) | -0.459305139 | 1.50E-57 | 1243 | 10121 |
| 56784 | RC | HAD-like | -0.416096509 | 9.25E-60 | 1302 | 23797 |
| 56801 | RC | Acetyl-CoA synthetase-like | -0.547745004 | 1.80E-109 | 1107 | 18746 |
| 63380 | H | Riboflavin synthase domain-like | -0.410258238 | 7.10E-70 | 1144 | 7771 |
| 82866 | RF | Multidrug efflux transporter AcrB transmembrane domain | -0.432469755 | 9.57E-65 | 1211 | 17967 |
| 88659 | TA | Sigma3 and sigma4 domains of RNA polymerase sigma factors | -0.523716391 | 2.02E-78 | 1248 | 20632 |
| 88946 | S | Sigma2 domain of RNA polymerase sigma factors | -0.585116314 | 1.62E-105 | 1155 | 16024 |
| 103473 | P | MFS general substrate transporter | -0.483568169 | 5.55E-84 | 1285 | 39310 |
Columns are as follows: SUPFAM ID, Function, Description, R-value (correlation of WDAS score vs. genome size); P-value; number of genomes where the domain was identified; number of genomes for which the SUPFAM domain was identified and total number of domains in the indicated superfamily.
Functional categories. General: HA (Small molecule binding), R (General or several functions), RD (Dimerization domains). Information: K (Transcription), L (DNA replication, recombination, repair). Metabolism: C (Energy production and conversion), EA (Nitrogen metabolism), F (Nucleotide transport and metabolism), RA (Oxidation/Reduction), RB (Transferases), RC (Other enzymes). Processes_IC: O (Posttranslational modification, protein turnover, chaperones); OA (Proteases, peptidases and their inhibitors), P (Inorganic ion transport and metabolism), RF (Transport). Regulation: LA (DNA-binding transcription factors), OB (Kinases and phosphatases and inhibitors), T (Signal transduction), TA (Other regulatory function). Other: S (Function unknown).
Fig 3The architecture of the P-loop superfamily as a function of genome size.
A) WDAS of the P- loop (R-value = -0. 52; p-value = 4.13E-94); B) number of different domains associated with the P-loop (R-value = 0.77, p-value = 4.84e-264). On the X-axis of each graph, genome size ranges are displayed in 13 windows, with a range of 836 ORFs each. On the Y-axis are the WDASs. The lines shown in the boxes are the median values. The whisker caps represent the minimum and maximum values.
Fig 4Architecture of the wHTH superfamily as a function of genome size.
A) WDAS of the wHTH superfamily (R-value = -0.42; p-value = 2.22E-52); B) number of different domains associated with the wHTH (R-value = 0.545 and p-value = 3.07e-102). On the X-axes of each graph are genome size ranges, displayed in 13 windows, each with a length of 836 ORFs. On the Y-axes are the WDASs. The horizontal lines in the boxes are the median values. The whisker caps represent the minimum and maximum values.
Fig 5Antiquity and abundance of structural domains.
A) Promiscuous and B) non-promiscuous domains. On the X-axes are antiquity assignments, i.e., how ancient each structural domain present in the universal enzymatic reactions is, as suggested by Caetano-Anolles et al [15]; a score of 0 represents an ancient event, whereas 1.0 represents recent domain emergence. On the Y-axis is the abundance (frequencies) of the superfamilies. The size of a circle represents the proportion of each domain in relation to the total protein domains.