Literature DB >> 35924911

Metabolism in the Niche: a Large-Scale Genome-Based Survey Reveals Inositol Utilization To Be Widespread among Soil, Commensal, and Pathogenic Bacteria.

Michael Weber1, Thilo M Fuchs1.   

Abstract

Phytate is the main phosphorus storage molecule of plants and is therefore present in large amounts in the environment and in the diet of humans and animals. Its dephosphorylated form, the polyol myo-inositol (MI), can be used by bacteria as a sole carbon and energy source. The biochemistry and regulation of MI degradation were deciphered in Bacillus subtilis and Salmonella enterica, but a systematic survey of this catabolic pathway has been missing until now. For a comprehensive overview of the distribution of MI utilization, we analyzed 193,757 bacterial genomes, representing a total of 24,812 species, for the presence, organization, and taxonomic prevalence of inositol catabolic gene clusters (IolCatGCs). The genetic capacity for MI degradation was detected in 7,384 (29.8%) of all species for which genome sequences were available. IolCatGC-positive species were particularly found among Actinobacteria and Proteobacteria and to a much lesser extent in Bacteroidetes. IolCatGCs are very diverse in terms of gene number and functions, whereas the order of core genes is highly conserved on the phylum level. We predict that 111 animal pathogens, more than 200 commensals, and 430 plant pathogens or rhizosphere bacteria utilize MI, underscoring that IolCatGCs provide a growth benefit within distinct ecological niches. IMPORTANCE This study reveals that the capacity to utilize inositol is unexpectedly widespread among soil, commensal, and pathogenic bacteria. We assume that this yet-neglected metabolism plays a pivotal role in the microbial turnover of phytate and inositols. The bioinformatic tool established here enables predicting to which extent and genetic variance a bacterial determinant is present in all genomes sequenced so far.

Entities:  

Keywords:  bacterial metabolism; catabolic pathway; ecological niche; genomics; large-scale approach; myo-inositol; phytate; prevalence; virulence

Mesh:

Substances:

Year:  2022        PMID: 35924911      PMCID: PMC9430895          DOI: 10.1128/spectrum.02013-22

Source DB:  PubMed          Journal:  Microbiol Spectr        ISSN: 2165-0497


INTRODUCTION

Inositol is the structural basis for many biomolecules, such as phosphatidylinositol, that belong to the components of eukaryotic cell membranes. Their mono-, di-, or triphosphorylated derivatives, the phosphoinositides, are membrane constituents that act as cytosolic solutes and contribute to cell signaling, cell motility, membrane trafficking, and phagocytosis (1). Inositol phosphates (InsPs) are synthesized in animals and plants as secondary messengers. InsPs have been reported to represent 2% to 60% of the total organic phosphorus in animal waste (2), pointing to their role in eutrophication of the environment. InsP6, phytate, is the main storage molecule of phosphorus and minerals in plants. InsPs play multiple roles in eukaryotic cell functions, including growth, lipid metabolism, and insulin sensitivity (3). The most prominent example is the second messenger, inositol-1,4,5-triphosphate (InsP3). Several intestinal pathogens, such as Salmonella spp., Shigella spp., Escherichia coli serotypes, and Yersinia spp., modulate the InsP metabolism of the host in different ways (4). InsPs exist in various forms of phosphorylation with one to six phosphates and isomeric forms such as d-chiro, scyllo, and neo, with myo-inositol hexakiphosphate (InsP6), or phytate, the most abundant form in terrestrial and aquatic environments (5), cells (1), and in the diet as well as in the gut (6). Phytate is present in plant tissues such as bran and seeds of legumes and cereals, including oil seeds and nuts. The dephosphorylated form of phytate is myo-inositol (MI), a polyol that is a readily available carbon and energy source for microorganisms. So far, the capacity to catabolize MI has been experimentally demonstrated for a few bacteria only, including Rhizobium leguminosarum (7), Sinorhizobium meliloti (8), Lactobacillus casei (9), Klebsiella aerogenes (10), Corynebacterium glutamicum (11), Legionella pneumophila (12), Yersinia mollaretii (13), and Citrobacter koseri (14). The transporters, repressor, and enzymes involved in MI utilization have been investigated in detail for Bacillus subtilis (15, 16), whereas the enteropathogen Salmonella enterica was used to elucidate the complex regulatory network controlling the MI pathway (17–19). As MI is the most abundant carbon source detectable in the soil (20), we assumed that MI utilization is a widespread metabolic capability in microbial communities. However, it is largely unknown to which extent aquatic, soil, plant-associated, and gut bacteria are able to degrade MI that was released from phytate or cellular components. Here, we predict that MI is a substrate that can be utilized by an unexpectedly large number of bacteria as a source of carbon and energy. We determine the number and prevalence of bacterial species and genomes carrying the genetic determinant required for the degradation of dephosphorylated inositols and discuss the critical role of bacteria in the global turnover of these compounds.

RESULTS

Search for iol genes and inositol catabolic gene clusters.

To obtain an overview of the distribution of inositol catabolic gene clusters (IolCatGCs), we performed a quality selection of bacterial genome sequences available in the GenBank database by applying RefSeq exclusion criteria, resulting in a total subset of 193,757 genomes that represent 24,812 species (Fig. 1a). In the case of 50,960 genome sequences, gene annotation files were not available in the database and were therefore generated using the annotation tool Prokka (21). Next, a basic set of 23 iol genes involved in the MI metabolism was taken from Salmonella enterica serovar Typhimurium and from B. subtilis, and its respective protein sequence was queried against the NCBI nonredundant (nr) database using PSI-BLAST. The resulting sequence hits were combined into hidden Markov models (HMMs) for each iol component. All 23 HMMs were searched against the total subset of genomes.
FIG 1

Searching for IolCatGCs. (a) The workflow depicts the bioinformatic strategy applied here to perform a comprehensive prediction of all bacteria able to degrade MI. Starting with the processing of 193,757 input genome assemblies, filtering resulted in the identification of an iol gene cluster termed IolCatGC whose prevalence was then evaluated at all taxonomic levels of bacteria. (b) A schematic pathway of inositol catabolism in Gram-positive and negative bacteria is shown. Enzymes catalyzing the degradation of MI to glyceraldehyde and acetyl-CoA are indicated. IolT, transporter; IolG, IolW, IolX, IolU, IolJ, dehydrogenases of inositol and its stereoisomers; IolE, dehydratase; IolD, 3D-(3,5/4)-trihydroxycyclohexane-1,2-dione hydrolase; IolB, isomerase; IolC, biphosphate aldolase; IolA, malonate-semialdehyde dehydrogenase; TCA, tricarboxylic acid cycle.

Searching for IolCatGCs. (a) The workflow depicts the bioinformatic strategy applied here to perform a comprehensive prediction of all bacteria able to degrade MI. Starting with the processing of 193,757 input genome assemblies, filtering resulted in the identification of an iol gene cluster termed IolCatGC whose prevalence was then evaluated at all taxonomic levels of bacteria. (b) A schematic pathway of inositol catabolism in Gram-positive and negative bacteria is shown. Enzymes catalyzing the degradation of MI to glyceraldehyde and acetyl-CoA are indicated. IolT, transporter; IolG, IolW, IolX, IolU, IolJ, dehydrogenases of inositol and its stereoisomers; IolE, dehydratase; IolD, 3D-(3,5/4)-trihydroxycyclohexane-1,2-dione hydrolase; IolB, isomerase; IolC, biphosphate aldolase; IolA, malonate-semialdehyde dehydrogenase; TCA, tricarboxylic acid cycle. We defined a possibly functional IolCatGC to be present in a genome if at least three of the four core genes iolB, iolC, iolD, and iolE, which are essential for MI degradation (17) (Fig. 1b), were found in a distinct genetic determinant. Genes were assigned to an IolCatGC if they showed a maximum genetic distance of 10 kb. In 810 of 1,024 genomes with only 3 core genes, the 4th one was identified more than 10 kb apart from the IolCatGC elsewhere on the chromosome or within another contig, respectively. Possible reasons for a lack of the fourth gene are an incomplete genome sequence, assembly errors, or iol genes with frameshifts that were not considered to be functional. The genes iolG1, iolG2, iolU, and iolW encoding dehydrogenases of inositol isomers are expected to be present in genomes with a complete set of iol genes. Indeed, at least one of these dehydrogenase genes was identified in 6,502 IolCatGCs and outside the cluster in 881 species from our list. Genes encoding inositol transporters were not categorized as core genes due to their location apart from the IolCatGC in some bacteria (16) and not the genes encoding the malonate-semialdehyde dehydrogenase IolA and the biphosphate aldolase IolJ, which are also involved in the metabolism of valine or of fructose-1,6-biphosphate, respectively. The functions of genes associated with inositol catabolism are listed in Table S1 in the supplemental material. To not miss bacteria harboring an IolCatGC and to not erroneously state a species of not being capable of using MI, all genomes of strains belonging to the same species were analyzed for the presence and absence of iol genes. For each species with more than one genome available, we selected one IolCatGC that is representative in terms of cluster length and homology score (Table S2). We identified IolCatGCs within a total of 7,384 species corresponding to 29.8% of the 24,812 species’ genomes in the NCBI assembly database (Table S3). Among these are 3,651 Gram-negative and 3,733 Gram-positive species, corresponding to an almost equal distribution (14.7% versus 15%). Taken together, these findings highlight the relevance of MI degradation for many bacteria.

Length, composition, and genetic organization of the IolCatGCs.

We observed a wide range of complexity of IolCatGCs with respect to the number of operons and genes associated with inositol degradation. The vast majority, namely, 6,689 of the 7,384 IolCatGCs investigated, here, comprise 4 to 10 genes, while only 644 IolCatGCs comprise 11 to 34 genes (Fig. 2a). In addition to the core gene set, larger clusters contain genes coding for regulators, transporters, dehydrogenases, hypothetical proteins with related metabolic functions, and duplications thereof. These clusters appear to be organized as a divergon comprising at least two operons and promoters.
FIG 2

Complexity of the genetic IolCatGC organization. (a) The size of IolCatGCs in the genomes of bacterial species analyzed here is indicated by gene numbers. The species were considered that carry at least 3 and up to 34 genes within their IolCatGCs. The gene number was determined by counting genes encoding enzymes, regulators, transporters, and SrfJ, known or predicted to be involved in inositol utilization. Genes encoding transposases and hypothetical proteins or transporters without homology to putative Iol-degrading enzymes or characterized MI facilitators were not considered for defining the IolCatGC lengths. (b) Randomly selected IolCatGCs are depicted to illustrate their variability in length and composition. Regulators are stained blue; transporters, green; core genes, orange; and isomerases and dehydrogenases, gray. Numbers in brackets indicate the number of iol genes. White genes are genes of unknown function or not associated with inositol utilization.

Complexity of the genetic IolCatGC organization. (a) The size of IolCatGCs in the genomes of bacterial species analyzed here is indicated by gene numbers. The species were considered that carry at least 3 and up to 34 genes within their IolCatGCs. The gene number was determined by counting genes encoding enzymes, regulators, transporters, and SrfJ, known or predicted to be involved in inositol utilization. Genes encoding transposases and hypothetical proteins or transporters without homology to putative Iol-degrading enzymes or characterized MI facilitators were not considered for defining the IolCatGC lengths. (b) Randomly selected IolCatGCs are depicted to illustrate their variability in length and composition. Regulators are stained blue; transporters, green; core genes, orange; and isomerases and dehydrogenases, gray. Numbers in brackets indicate the number of iol genes. White genes are genes of unknown function or not associated with inositol utilization. The genome of Oceanobacillus oncorhynchi carries a 76-kb fragment with 34 genes predicted to be involved in inositol degradation. Many functions, including those of the two regulators ReiD and IolR, which play a pivotal role in the regulation of MI degradation (19, 22, 23), are encoded twice, and 12 putative dehydrogenases providing the substrate for IolE were identified (Fig. 2b). Further examples randomly chosen are the IolCatGC from Leucobacter musarum, which was isolated from a nematode; Colwellia demingiae, a psychrophilic Antarctic species; and Nesterenkonia muleiensis, a novel actinobacterium isolated from Populus euphratica. Species of the latter genus are found in soil and are characterized by a high metabolic versatility. A more compact IolCatGC organization was found in the genomes of Rhizobium loessense, a root nodule bacterium; Cronobacter sakazakii, a foodborne pathogen; and Burkholderia mallei, a pathogen causing glanders. The IolCatGC of Salmonella enterica is more complex with respect to operon organization and gene numbers than most others. Its canonical iol gene cluster (17), together with that of B. subtilis (24), is also shown in Fig. 2b.

Percentage of IolCatGC-positive genomes per species.

To exclude that a single genome erroneously represents a large number of IolCatGC-negative genomes, we analyzed the abundance of iol genes within all genomes available for each species. Out of the 7,384 species, 1,453 are represented by at least two genome sequences (Table S3a). In this group, we determined a high variability of a species pangenome with respect to the presence of iol genes. Regarding the 319 species for which at least 10 genome sequences met the quality criteria, 288 of them are represented by ≥0%, 246 by ≥50%, and 228 by ≥70% IolCatGC-positive genomes, respectively (Fig. S1; Table S3b). More than 1,000 genome sequences are available for 10 species (Table S3c). Among them are Listeria monocytogenes, Klebsiella pneumonia, Burkholderia pseudomallei, and Pseudomonas viridiflava, with at least 97% iol gene-positive sequences per species. Less than 2% of all 20,632 E. coli genome sequences comprise the IolCatGC.

IolCatGCs encoding the activator ReiD and the putative ceramidase SrfJ.

Of particular interest are IolCatGC genes that are absent in the majority of IolCatGCs and encode accessory functions not essential for MI degradation. The most interesting ones are reiD and srfJ, which code for a regulator and a putative ceramidase, respectively. ReiD belongs to the AraC family of transcriptional regulators and was experimentally determined to activate the transcription of iolE and iolG1, the genes that encode the enzymes responsible for the initial two steps in MI degradation (19). Gene srfJ is known to be activated by the two-component system SsrAB, which controls the expression of SPI-2 genes (25, 26). Its product shows a remarkable similarity to human lysosomal glycosylceramidases (E value of 4 × 10−62; 30.8% amino acid identity) and might be involved in the release of inositols from sphingolipids. Homologs of ReiD were found in 217 species and those of SrfJ in 10 species among the list of 7,384 genomes with IolCatGCs, indicating that both factors are not widespread among bacteria. Remarkably, 10 IolCatGCs harbor both genes (Fig. 3). The clusters of Citrobacter werkmanii and Mangrovibacter spp. are identical to that of Salmonella and nearly colinear in E. coli, Achromobacter spp., and Gibbsiella quercinecans. In comparison with S. Typhimurium, the genomes of Gibbsiella quercinecans, Achromobacter sp., and Escherichia coli ED1a are lacking the two genes iolI2 and iolH, and that of G. quercinecans a fourth transporter gene. The data suggest that additional regulatory, as well as enzymatic, determinants associated with MI degradation provide a benefit for some bacteria in their ecological niches.
FIG 3

Complex IolCatGCs that harbor both the transcriptional activator gene reiD and the putative ceramidase gene srfJ. All IolCatGCs are shown that both carry the activator gene reiD and the putative ceramidase-encoding gene srfJ. The color code is the same as in Fig. 2. Genes encode transporters (green), core factors IolBCDE (orange), regulators (blue), the malonate-semialdehyde dehydrogenase IolA (red), the fructose-1,6-biphosphate aldolase IolJ (black), SrfJ (yellow), uncharacterized isomerases and dehydrogenases (gray), as well as transposases, insertion elements, genes not associated with MI utilization, and unknown genes (white). MFS, major facilitator superfamily; csbX, gene that encodes MFS efflux pump (69); xynC, srfJ homolog that encodes a glucuronoxylanase (70); rhaS, gene that codes for an activator of the l-rhamnose operon (71); glpR, gene coding for a repressor of sugar metabolism pathways (72); gntR, gene that encodes repressor of gluconate operon (73); hyp, hypothetical gene. Slashes indicate contig ends. Strain identification numbers or names are given. The classes the strains belong to are mentioned on the left.

Complex IolCatGCs that harbor both the transcriptional activator gene reiD and the putative ceramidase gene srfJ. All IolCatGCs are shown that both carry the activator gene reiD and the putative ceramidase-encoding gene srfJ. The color code is the same as in Fig. 2. Genes encode transporters (green), core factors IolBCDE (orange), regulators (blue), the malonate-semialdehyde dehydrogenase IolA (red), the fructose-1,6-biphosphate aldolase IolJ (black), SrfJ (yellow), uncharacterized isomerases and dehydrogenases (gray), as well as transposases, insertion elements, genes not associated with MI utilization, and unknown genes (white). MFS, major facilitator superfamily; csbX, gene that encodes MFS efflux pump (69); xynC, srfJ homolog that encodes a glucuronoxylanase (70); rhaS, gene that codes for an activator of the l-rhamnose operon (71); glpR, gene coding for a repressor of sugar metabolism pathways (72); gntR, gene that encodes repressor of gluconate operon (73); hyp, hypothetical gene. Slashes indicate contig ends. Strain identification numbers or names are given. The classes the strains belong to are mentioned on the left.

IolCatGC gene order.

Next, we investigated whether there are distinct patterns of IolCatGCs with respect to the genetic organization of the four core genes iolB, iolC, iolD, and iolE. Within the genomes of species belonging to the same genus, we predominantly observed collinearity of the genes involved in MI metabolism, pointing to a high degree of conservation on this taxonomic level (data not shown). Performing a systematic phyla evaluation (Fig. 4), we found that 65% of all IolCatGC-positive species belonging to Actinobacteria harbor the pattern iolD-iolB-iolC-iolE (DBCE) on their genomes, and 28% have the pattern CBDE (Table S4). The most common iol gene orders were BEDC in Alphaproteobacteria (42%) and CEBD (18%) and BCDE (14%) in Gammaproteobacteria. In Firmicutes, we mainly identified the gene orders BCDE, BCED, and BEDC, corresponding to 41%, 20%, and 14%, respectively, of all species in this phylum. The pattern BCDE was present in all genomes belonging to the classes Cytophagia and Flavobacteria (each 44%) of Bacteroidetes and was also common in Chloroflexi (data not shown). Thus, the iol gene orders BCDE and BEDC are interphylum patterns present in Gram-negative, as well as Gram-positive, bacteria.
FIG 4

Patterns of iol gene order. The Sankey diagram was employed with the R package ggsankey and shows the most common order pattern of the core genes iolB, iolC, iolD, and iolE in the phylum and class levels as taxonomic nodes. The thickness of the connections corresponds with the relative proportion of species genomes carrying the same pattern. Genomes from five phyla were analyzed. In total, we found IolCatGCs in 6,026 species with unique pattern hits (81.6%) and in 548 (7.4%) species with multiple matches. Multiple matches are mainly due to the occurrence of multiple core genes in the cluster and could not be unequivocally assigned to one permutation type. A minimum of 10 species per type were required to draw a connection between taxonomic class and cluster type. In summary, we show connections for 80% (5,879) of the species in our results table.

Patterns of iol gene order. The Sankey diagram was employed with the R package ggsankey and shows the most common order pattern of the core genes iolB, iolC, iolD, and iolE in the phylum and class levels as taxonomic nodes. The thickness of the connections corresponds with the relative proportion of species genomes carrying the same pattern. Genomes from five phyla were analyzed. In total, we found IolCatGCs in 6,026 species with unique pattern hits (81.6%) and in 548 (7.4%) species with multiple matches. Multiple matches are mainly due to the occurrence of multiple core genes in the cluster and could not be unequivocally assigned to one permutation type. A minimum of 10 species per type were required to draw a connection between taxonomic class and cluster type. In summary, we show connections for 80% (5,879) of the species in our results table. To summarize, we observed a close relationship between taxonomy and pattern type, and we identified the pattern BEDC as the most abundant and widespread one. These data point to a highly conserved, ancient pathway that further evolved by the acquisition of regulatory, transporter, and nonessential enzymatic genes.

Taxonomic prevalence of IolCatGC.

To examine the distribution of iol-positive species within different taxonomic levels of bacteria, we defined a species as positive with respect to inositol catabolism if at least one genome per species carried an IolCatGC. We found IolCatGCs in species belonging to 12 out of 21 phyla (Fig. 5a; Table S5), with substantial proportions in Actinobacteria (48% out of 5,473 species), Proteobacteria (33% of 11,297), and Firmicutes (19% of 4,686). In the next largest phylum, Bacteroidetes, IolCatGCs were found in the genomes of only 3% out of 2,227 species (Fig. 5b). These data indicate an uneven distribution of IolCatGCs at the phylum level.
FIG 5

Prevalence of IolCatGCs among the bacterial kingdom. (a) Cladogram of analyzed bacterial species showing the prevalence of IolCatGCs at the family level. To filter out less relevant branches, only families with at least 10 species were considered. The taxonomic hierarchy includes phylum, class, order, and family (from inside to outside). Filled circles indicate families of which at least 10% of their species are IolCatGC positive. The outer circle heatmap indicates the percentage of species within a family that are IolCatGC positive. The figure was generated with GraPhlAn (version 1.1.4). (b) Bar plot indicating absolute and relative number of IolCatGC-positive species within selected bacterial phyla. The four largest phyla are shown in the upper part. The bar plot was generated with the help of R package ggplot2 (version 3.3.2).

Prevalence of IolCatGCs among the bacterial kingdom. (a) Cladogram of analyzed bacterial species showing the prevalence of IolCatGCs at the family level. To filter out less relevant branches, only families with at least 10 species were considered. The taxonomic hierarchy includes phylum, class, order, and family (from inside to outside). Filled circles indicate families of which at least 10% of their species are IolCatGC positive. The outer circle heatmap indicates the percentage of species within a family that are IolCatGC positive. The figure was generated with GraPhlAn (version 1.1.4). (b) Bar plot indicating absolute and relative number of IolCatGC-positive species within selected bacterial phyla. The four largest phyla are shown in the upper part. The bar plot was generated with the help of R package ggplot2 (version 3.3.2). On the family level, we identified IolCatGCs in 202 out of 473 bacterial families (Fig. 6). Analyzing the three largest families of the phylum Actinobacteria, we detected 87% IolCatGC positive out of 1,235 species of Streptomycetaceae dominated by the genus Streptomyces, 41% of 789 species belonging to Microbacteriaceae, and 28% of 395 species of Mycobacteriaceae. Fifty-six percent of the species belonging to Alphaproteobacteria and Gammaproteobacteria carry IolCatGCs. In particular, we identified iol genes in Pseudomonadaceae (48% of 1,221 species), Enterobacteriaceae (39% of 594 species), Phyllobacteriaceae (88% of 524 species), Burkholderiaceae (69% of 474 species), Rhizobiaceae (92% of 451 species), and Vibrionaceae (21% of 381 species). With respect to the phylum Firmicutes, we found IolCatGCs predominantly in Bacillaceae (34% of 1,078 species) and Paenibacillaceae (56% of 441 species) and, to lesser extent, in Lactobacillaceae (8% of 392 species), Clostridiaceae (14% of 352 species), Lachnospiraceae (14% of 324 species), and Staphylococcaceae (8% of 306 species). In Bacteroidetes, the capability of utilizing MI is less prevalent. IolCatGC-positive genomes were found in Flavobacteriaceae (2% of 810 species), Weeksellaceae (3% of 215 species), and Spirosomaceae (49% of 55 species), whereas no iol gene clusters were found in other 44 families, including the large families Bacteroidaceae and Prevotellaceae.
FIG 6

Families predicted to degrade MI. Families comprising at least 2 (Actinobacteria, Bacteroidetes, and Firmicutes) or at least 10 (Proteobacteria) species are indicated. The bars in the logarithmic scale show the total number, and the heatmap shows the percentage of species within a family that are IolCatGC positive. Scales are indicated.

Families predicted to degrade MI. Families comprising at least 2 (Actinobacteria, Bacteroidetes, and Firmicutes) or at least 10 (Proteobacteria) species are indicated. The bars in the logarithmic scale show the total number, and the heatmap shows the percentage of species within a family that are IolCatGC positive. Scales are indicated. The 7,384 species identified above belong to 776 genera (Table S6a). To analyze the most relevant ones with respect to inositol metabolism, we selected all genera that comprise at least 10 species of which ≥30% carry the IolCatGCs (Table S6b). These criteria are fulfilled by 97 genera (Fig. 7a). For 86 of them, at least 40% of their species are positive with respect to the iol gene cluster. Examples from this group are genera with at least 200 species, such as Pseudomonas, Streptomyces, Bacillus, Mesorhizobium, Paenibacillus, Rhizobium, Burkholderia, and Rhodococcus.
FIG 7

Distribution of the MI pathway at the genus level and in pathogenic bacteria. (a) Fraction of species within selected genera that carry an IolCatGC. Bacterial genera to which at least 10 species belong to, and of which ≥30% are IolCatGC positive, were selected. The number of species that carry (dark gray) or lack (light gray) the four core iol genes are shown. The site of sampling and/or typical ecological niche is indicated by colors. (b) Column plot showing the percentage of IolCatGC in all genomes of selected animal and human bacterial pathogens. Species for which at least 10 (insect pathogens, 5) genome sequences were available and of which at least 50% (with the exception of Bacillus cereus) were predicted as IolCatGC positive are shown.

Distribution of the MI pathway at the genus level and in pathogenic bacteria. (a) Fraction of species within selected genera that carry an IolCatGC. Bacterial genera to which at least 10 species belong to, and of which ≥30% are IolCatGC positive, were selected. The number of species that carry (dark gray) or lack (light gray) the four core iol genes are shown. The site of sampling and/or typical ecological niche is indicated by colors. (b) Column plot showing the percentage of IolCatGC in all genomes of selected animal and human bacterial pathogens. Species for which at least 10 (insect pathogens, 5) genome sequences were available and of which at least 50% (with the exception of Bacillus cereus) were predicted as IolCatGC positive are shown. Taken together, the high abundance of IolCatGCs within these species-rich and other genera shown in Table S3 points out the relevance of MI degradation in particular for bacteria that are found in soil, water, decaying vegetation, the rhizosphere, or in association with plants. In these environments, phytate and MI are present in large amounts as accessible carbon and energy sources for bacteria.

Plant pathogens and rhizosphere bacteria with iol genes.

From the list of IolCatGC-positive bacteria, we identified 82 species that are known as plant-pathogenic bacteria (27–29), most of them belonging to the genera Agrobacterium, Burkholderia, Dickeya, Dyadobacter, Erwinia, Pantoea, Pectobacterium, Pseudomonas, and Ralstonia (Table S7a). Among them are six of the top ten plant pathogens (30), namely A. tumefaciens, D. dadantii/solani, E. amylovora, P. carotovorum, P. syringae, and R. solanacearum (Fig. 7a). Another interesting group with respect to MI utilization are bacteria from the rhizosphere that promote plant growth by forming symbiotic root nodules. Six relevant genera from this group were therefore investigated for genomes with the capability of utilizing MI. Strikingly, IolCatGCs are highly prevalent in the genomes of 348 species (Table S7b) mainly belonging to Caballeronia (40 IolCatGC-positive genomes, corresponding to 93%), Ensifer (61 IolCatGC-positive genomes, corresponding to 100%), Mesorhizobium (459 IolCatGC-positive genomes, corresponding to 99%), Paraburkholderia (210 IolCatGC-positive genomes, corresponding to 100%), Rhizobium (829 IolCatGC-positive genomes, corresponding to 97%), and Sinorhizobium (305 IolCatGC-positive genomes, corresponding to 94%). To conclude, we hypothesize that these plant-associated bacteria gain a growth advantage by their inositol degradation capability in the phytate-rich plant environment.

Commensals capable of degrading MI.

Given that phytate and inositol derivatives contribute to the diet and are ubiquitously present in the gut, we determined which and how many members of the microbiota in the gut carry IolCatGCs. For this purpose, we compiled a list of intestinal and rumen bacteria from humans, pigs, and cattle (Table S8). Our human gut microbiota list comprises a total of 807 nonredundant bacterial species (31, 32) and 465 swine species, including data from a DSMZ list (pig intestinal bacterial collection [PiBAC]) (33–37). A cattle reference list of 485 nonredundant rumen species was composed of the Hungate collection and genome sequencing projects (38–40). Remarkably, a high percentage of these species were found to carry IolCatGCs, namely, 16.6% of the human gut species, 10.3% of those from the swine gut, and 10.9% of all ruminal bacterial species identified so far. Examples of genera with iol-positive species among these commensals are Bacillus, Blautia, Citrobacter, Clostridium, Corynebacterium, Enterobacter, Enterococcus, Klebsiella, and Paenibacillus (Fig. 7a). IolCatGCs are absent, however, in Faecalibacterium prausnitzii in all but one Bacteroides spp., all but three Bifidobacterium spp., Caprococcus spp., and Akkermansia municiphila. From these data, we conclude that a substantial percentage of gut commensals, namely, at least 10%, are capable to degrade inositols and use them as an alternative substrate in the gut. Moreover, it might be speculated that the fraction of IolCatGC-positive commensals in the gut directly depends on the diet and the metabolic state of the host.

Vertebrate pathogens harboring IolCatGC.

To identify bacterial pathogens equipped with iol genes, we compared Table S3 with a list of 636 species belonging to biosafety level 2 and 3 groups (Table S7c), resulting in a list of 87 species that are known as vertebrate pathogens, some of which are categorized as opportunistic, rare, or emerging pathogens (Table S7d). For an overview, we selected those 41 species for which ≥ 10 genomes were available, of which ≥ 50% carried IolCatGC (Fig. 7b; Table S7e and f). The genera Brucella, Burkholderia, Clostridium, Klebsiella, Listeria, Pantoea, Providencia, Serratia, and Yersinia are represented by two to eight species fulfilling these requirements. In addition, relevant pathogenic species such as Citrobacter freundii, Enterobacter cloacae, Enterococcus faecalis, Legionella pneumophila, Mannheimia haemolytica, S. enterica, and Vibrio harveyi were identified as encoding the capacity to utilize MI. Notably, most of those are enteropathogens. While 36% of all E. faecalis genomes do not carry iol genes, the genome sets of the other 41 pathogenic species shown in Fig. 7b with a proportion of ≥90% IolCatGC-positive sequences are much more coherent with respect to MI utilization. It is worth noting that the genes required for MI utilization are missing in the genera Bartonella, Bordetella, Borrelia, Campylobacter, Chlamydia, Mycobacterium, Rickettsia, and Streptococcus and in most Staphylococcus spp., including Staphylococcus aureus, all of which do not belong to Enterobacteriaceae and, with the exception of Campylobacter, do not proliferate in the gut. To complement this survey on pathogens, we investigated the genomes of relevant bacterial insect pathogens (41–43) for the presence of IolCatGCs. We identified 20 entomopathogenic species without and 24 species with iol genes (Table S7g; Fig. 7b), among them, Bacillus thuringiensis, Photorhabdus luminescens, and Xenorhabdus nematophila, which play pivotal roles in pest control approaches. Paenibacillus larvae, the etiological agent of the American foul brood, colonizes the gut of honeybees and also carries an IolCatGC. Six insect-pathogenic Yersinia spp. (44) point to the fact that the interaction of species with invertebrates is, as of yet, underinvestigated. Taken together, we identified a high number of pathogenic bacteria that are probably able to utilize MI and its derivatives. Given that the examples from above infect their hosts via the gut, lung, or bloodstream, we assume that the utilization of MI, which is present in food as well as in membrane compounds, provides a fitness advantage in different compartments of host organisms.

MI utilization by archaea or fungi.

Fungi, in particular, yeasts, have been reported to utilize MI (45). For example, Cryptococcus neoformans is known to grow with inositol as a sole carbon and energy source (46). In contrast to the enzymes encoded by IolCatGC, the environmental and pathogenic yeast relies on an inositol oxygenase activity that is responsible for the conversion of inositol to d-glucuronic acid (47). However, to the best of our knowledge, there is no literature that describes a fungal MI degradation pathway with enzymatic functions similar to those in bacteria. It was reported that inositol induces the sporulation response of Beauveria bassiana and Metarhizium anisopliae (48). The genome of B. bassiana encodes proteins with similarity to IolR, IolG1, IolG2, and IolE (~25% sequence identity each); IolC (31%); IolD (45%); and IolB (32%) of Actinomyces ruminicola, but a reiterated search of these fungi-specific protein sequences did not result in increasing similarity rates. By analyzing 1,511 genomes of Archaebacteria, homologs of enzymes involved in MI degradation were identified in two genomes each of Halobacteriales archaeon and Desulfurococcaceae archaeon and in Thermocladium modestius (Table S8). Some of these sequences were derived from metagenomics, and experimental evidence for a functionality of the respective IolCatGC was not found. We therefore conclude that, so far, the capability of degrading MI via IolCatGCs is restricted to members of the kingdom Eubacteria.

DISCUSSION

The extensive genome sequences survey performed here required the public availability of hundreds of thousands of genome sequences, a well-adapted bioinformatics annotation pipeline, and the necessary parallel computing power. To our knowledge, this is one of the first comprehensive approaches that deciphers the prevalence and distribution of a single metabolic pathway across 24,812 bacterial species. A study addressing the highly complex production of the vitamin B12 family of cofactors investigated 11,000 bacterial species by comparative genomics (49). The annotation of IolCatGCs was based on stringent criteria to ensure high reliability of the results, including the definition of cluster core genes, a stringent cutoff for the HMM similarity score, and filtering for proximal localization of the annotated cluster components. The bioinformatics pipeline established here can be used to predict the prevalence of a selected bacterial pathway, genetic island, virulence factor, or any other genetic determinant in all taxonomic levels of bacteria. Moreover, the pipeline applied here and its data output can seamlessly be integrated into metagenomic studies. Inositol phosphates accumulate in terrestrial environments where they constitute the major class of organic phosphorus compounds and are also present to great extent in aquatic environments (5). Thus, our finding that a huge number of bacterial genomes attributed to genera mainly found in the environment are potentially able to utilize MI is consistent with the presence of phytate and inositols in the environment (Fig. 6; see Table S6 in the supplemental material). Examples are Pseudomonas, Streptomyces, Bacillus, Paenibacillus, Clostridium, and Halomonas from saline environments and genera belonging to the ubiquitous Actinobacteria (Amycolatopsis, Actinomadura, Actinomyces, Curtobacterium, Gordonia, Mycobacterium, and Rathayibacter). Of particular interest is the microbiota of the rhizosphere for two reasons, namely, the availability of MI to promote bacterial growth and the dephosphorylation of phytate as organic phosphorus source (50). As expected, a majority of genomes belonging to main plant-associated bacterial genera carry the information to utilize MI. A benefit of inositol degradation was demonstrated for Rhizobium leguminosarum (7). This study supports the assumption that MI catabolism plays an important role for R. leguminosarum symbiosis with plants. InsPs and phytates are widespread in organisms and in the diet, respectively, and MI is present in substantial amounts in the gut of humans and animals (51, 52). Indeed, we predicted more than 10% of commensal bacteria to be capable of utilizing MI for proliferation, indicating that IolCatGC is critical for some microbes to occupy microenvironments in the gut or to circumvent a depletion of other nutrients. In line with the substantial amount of IolCatGC-positive species frequently found in the gut, a metatranscriptome approach provided evidence that the capability of utilizing MI contributes to the bacterial fitness in a model of human gut microbial succession (53). When the metabolism of gut microbiota from mice was investigated by a metagenomic and metatranscriptomic approach, many iol genes were found to be differentially regulated in the presence of three antibiotics (54). These data corroborate iol gene activation in commensals and the functional role of the MI metabolism in the gut. Moreover, different amounts of mineral phosphorus and microbial phytases fed to chicken shaped the composition of their microbiota (55), and vegetarians’ microbiota revealed degradation of up to 100% phytate to MI-phosphate products lower than InsP3 (56). Therefore, it might be expected that nutrition affects the microbiological profile of gut microbiota in a IolCatGC-related manner. The two commensal species, Mitsuokella jalaludinii and Mitsuokella multacida, which are present in the porcine gastrointestinal tract as well as in the rumen of cattle (Tables S2 and S7), have been identified as antagonists of S. Typhimurium (57). As Mitsuokella spp. are able to degrade MI, it is tempting to speculate that the mechanism underlying this antagonism is based on a metabolic competition, which includes MI utilization. Several pathogenic bacteria, mainly enteropathogens, are able to utilize MI and might thus gain an adaptive growth advantage during proliferation in gut niches in which other nutrients are not available or provide less energy than the polyol. Indeed, a transposon-directed insertion site sequencing (TraDIS)-based approach that systematically tested an S. Typhimurium transposon mutant library in chicken, pigs, and calves pointed to a strong attenuation of several iol gene mutants in enteritis models of these animals (58). Experimental data demonstrate that Legionella pneumophila utilizes MI to promote its infection of amoebae and macrophages (12).

Conclusion.

Our comprehensive search for iol gene clusters exploited nearly 200,000 bacterial genomes and revealed that this metabolic capacity is more widely distributed among the bacterial kingdom than thought so far. Analysis of all bacterial taxonomic levels revealed an uneven distribution of the MI degradation pathway. Many soil and rhizosphere bacteria carry IolCatGCs in their genomes and benefit from this pathway due to the high concentration of phytate and inositol isomers in the environment. Remarkably, 10% to 16% of the human and animal microbiota members were identified as being capable of degrading MI, pointing to a metabolic niche that provides a growth advantage for gut bacteria. The presence of conserved iol gene clusters in one-quarter of all bacterial species sequenced so far strongly suggests that the MI degradation pathway plays a yet underestimated role in the metabolism and ecology of bacteria.

MATERIALS AND METHODS

Selection of genome assemblies.

Bacterial genome assemblies were downloaded from GenBank (accessed August 2020) using the public NCBI FTP server (ftp://ftp.ncbi.nlm.nih.gov/genomes/), including the nucleotide sequence fasta (fna), the annotated feature file (gff), and the translated protein sequences (faa) for each accession number. Assemblies were discarded if they matched a RefSeq exclusion criterion (https://www.ncbi.nlm.nih.gov/assembly/help/anomnotrefseq/). Genome assemblies with missing gene annotation files (50,960) were further analyzed for protein-coding sequences using Prokka (version 1.14.5) with default parameters (21).

Genome data analysis.

The genome sequence of S. Typhimurium strain 14028 (GenBank accession no. NC_003197) was used to select the iol genes (iolA, iolB, iolC, iolD, iolE, iolG1, iolG2, iolH, iolI, iolI2, srfJ, reiD, iolR, iolT1, and iolT2) as an input (17). The genes iolX, iolW, iolU, iolJ, and iolS were taken from B. subtilis (GenBank accession no. NC_000964) (59, 60). For each gene associated with MI utilization (see Table S1 in the supplemental material), the following steps were applied. Translated protein sequences were queried against the NCBI nonredundant (nr) database using PSI-BLAST (61) with a total of 5 iterations; a cutoff of 70% percentage identity and 70% sequence coverage was chosen to select for highly similar sequences. The resulting list of orthologous protein sequences was aligned with the tool MUSCLE (62, 63), and the resulting multiple alignment was loaded into hmmbuild of package hmmer (version 3.3.2) to generate an iol gene-specific hidden Markov model (HMM) (64). In total, we obtained 23 HMMs for the gene cluster identification (Table S9).

IolCatGC screening.

To implement a large-scale high-throughput search for IolCatGCs, we developed a custom R annotation pipeline which is based on a series of processing steps. For each genome, this included an import of the genome assembly file (gff) and translated protein sequences (faa) into R using packages rtracklayer and Biostrings (65). Next, hmmsearch applied the precomputed HMMs on the imported protein sequences with default parameters. Gram-positive strains were additionally queried with models iolX, iolW, iolU, iolJ, and iolS. All resulting hits were filtered by a stringent E value cutoff of 10−10. Next, all iol gene hits were assigned to gene clusters, which were identified by a minimum distance of 10 kbp between two consecutive hit genes on the same contig sequence. One representative cluster with the largest number of iol core genes (iolC, iolB, IolD, and iolE) and the largest number of total unique iol genes was selected for the assembly. This annotation procedure was applied to all input genome assemblies by using multicore functions of R package parallel.

IolCatGC selection at the species level.

In order to select the most representative gene cluster of a species, we implemented a selection method which ranks the identified IolCatGC according to the following criteria: the most frequent IolCatGC size across all genomes per species, average HMM score, and a minimum number of three iol core genes. The top-ranked genome is considered the candidate IolCatGC representative genome for each species. The entire summary table, including information about all species, candidate IolCatGC size, and frequency of occurrence, is available as Table S2. Gene maps were generated with R package gggenes to display the composition and structure of selected clusters (66).

Taxonomic analysis.

We used the taxonomic data supplied by the NCBI taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy). The two central database files, nodes.dmp and names.dmp, were downloaded (on 25 February 2022) and processed by the R package taxonomizr (version 0.5.3; https://github.com/sherrillmix/taxonomizr). These files provide hierarchical relationships between the taxonomic identity of species and strains and the respective taxonomic levels, including genus, family, order, class, and phylum. Mapping functions of the package provided the assignment of genome species IDs to their respective taxonomy. Only genomes with taxonomic assignments on all five levels were retained. Candidate species were not considered. All phylogenetic figures were generated with GraPhlAn (version 1.1.4) (67), requiring the generation of tree and annotation files as described (https://github.com/biobakery/graphlan).

Cluster analysis.

The cluster analysis was done by comparing the core gene order in all IolCatGCs. To implement this comparison, we generated all possible permutations of the four core genes (iolB, iolC, iolD, and iolE [BCDE]) and removed reverse duplicates. As the gene sequence can be interrupted by other noncore genes, we used the LCS algorithm from R package qualV in forward and reverse directions to screen for clusters which are structured in the same order (68).

Data availability.

All data are available in the tables in the supplemental material.
  63 in total

1.  A new generation of homology search tools based on probabilistic inference.

Authors:  Sean R Eddy
Journal:  Genome Inform       Date:  2009-10

Review 2.  The utilization of sugars by yeasts.

Authors:  J A Barnett
Journal:  Adv Carbohydr Chem Biochem       Date:  1976       Impact factor: 12.200

3.  Mitsuokella jalaludinii inhibits growth of Salmonella enterica serovar Typhimurium.

Authors:  Uri Y Levine; Shawn M D Bearson; Thad B Stanton
Journal:  Vet Microbiol       Date:  2012-03-27       Impact factor: 3.293

4.  Identification of two scyllo-inositol dehydrogenases in Bacillus subtilis.

Authors:  Tetsuro Morinaga; Hitoshi Ashida; Ken-ichi Yoshida
Journal:  Microbiology       Date:  2010-02-04       Impact factor: 2.777

5.  The csbX gene of Azotobacter vinelandii encodes an MFS efflux pump required for catecholate siderophore export.

Authors:  William J Page; Elena Kwon; Anthony S Cornish; Anne E Tindale
Journal:  FEMS Microbiol Lett       Date:  2003-11-21       Impact factor: 2.742

6.  Metabolism of myo-Inositol by Legionella pneumophila Promotes Infection of Amoebae and Macrophages.

Authors:  Christian Manske; Ursula Schell; Hubert Hilbi
Journal:  Appl Environ Microbiol       Date:  2016-07-29       Impact factor: 4.792

Review 7.  Insect Pathogenic Bacteria in Integrated Pest Management.

Authors:  Luca Ruiu
Journal:  Insects       Date:  2015-04-14       Impact factor: 2.769

8.  The small RNA RssR regulates myo-inositol degradation by Salmonella enterica.

Authors:  Carsten Kröger; Johannes E Rothhardt; Dominik Brokatzky; Angela Felsl; Stefani C Kary; Ralf Heermann; Thilo M Fuchs
Journal:  Sci Rep       Date:  2018-12-10       Impact factor: 4.379

9.  1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses.

Authors:  Yuanqiang Zou; Wenbin Xue; Guangwen Luo; Ziqing Deng; Panpan Qin; Ruijin Guo; Haipeng Sun; Yan Xia; Suisha Liang; Ying Dai; Daiwei Wan; Rongrong Jiang; Lili Su; Qiang Feng; Zhuye Jie; Tongkun Guo; Zhongkui Xia; Chuan Liu; Jinghong Yu; Yuxiang Lin; Shanmei Tang; Guicheng Huo; Xun Xu; Yong Hou; Xin Liu; Jian Wang; Huanming Yang; Karsten Kristiansen; Junhua Li; Huijue Jia; Liang Xiao
Journal:  Nat Biotechnol       Date:  2019-02-04       Impact factor: 54.908

10.  Identifying determinants of bacterial fitness in a model of human gut microbial succession.

Authors:  Lihui Feng; Arjun S Raman; Matthew C Hibberd; Jiye Cheng; Nicholas W Griffin; Yangqing Peng; Semen A Leyn; Dmitry A Rodionov; Andrei L Osterman; Jeffrey I Gordon
Journal:  Proc Natl Acad Sci U S A       Date:  2020-01-22       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.