| Literature DB >> 35974327 |
Daniel R Garza1,2, F A Bastiaan von Meijenfeldt3, Bram van Dijk4, Annemarie Boleij5, Martijn A Huynen6, Bas E Dutilh6,7,8.
Abstract
BACKGROUND: Microbial pan-genomes are shaped by a complex combination of stochastic and deterministic forces. Even closely related genomes exhibit extensive variation in their gene content. Understanding what drives this variation requires exploring the interactions of gene products with each other and with the organism's external environment. However, to date, conceptual models of pan-genome dynamics often represent genes as independent units and provide limited information about their mechanistic interactions.Entities:
Keywords: Gene frequency distribution; Genome-scale metabolic models; Pan-genome evolution; Prokaryote evolution; Reactomes
Mesh:
Year: 2022 PMID: 35974327 PMCID: PMC9382767 DOI: 10.1186/s12862-022-02052-3
Source DB: PubMed Journal: BMC Ecol Evol ISSN: 2730-7182
Fig. 1Toy model. A Example of a metabolic reaction. Reactants and products are depicted as circles and reactions as rectangles, respectively. Reaction directionality is indicated by the arrows. B Three functional reactomes derived from the toy model, each capable of synthesizing the biomass compounds M10_i, M11_i, and M12_i from the environmental precursors (‘MX_e’ compounds depicted with green circles). C Pan-reactome network aggregates reactions from the different reactomes into a single network. The “_e” and “_i” termination of metabolites denote external and internal metabolites, respectively. D An example of a panEFM. Each reaction in this network is essential since its removal would impair the synthesis of the biomass components. E Collection of all nine possible panEFMs that can be created from the reactions in this toy pan-reactome in a rich environment. Dark squares denote the presence of reactions. The frequency of reactions across the collection of panEFMs is shown in the last row
Fig. 2Identifying environment-reaction associations. A Residual reaction frequencies predicted from the collection of panEFMs that exist across the 208 environments that support the growth of the toy pan-reactome (Fig. 1C). The residuals are the difference between the average frequency over the collection of panEFMs defined within each environment and the average across all environments. Reactions along the x-axis are sorted by the environment-driven score (EDS), see text for details; B Metabolite-reaction association matrix defined by the pairwise correlation between the metabolites and reactions with non-zeros residuals; C Subnetworks generated from the rows of the metabolite-reaction association matrix (shown in B), showing the positive (+) and negative (−) associations between metabolite usage and reaction frequencies in panEFMs; D, E Elastic net prediction of the metabolite usage that evolved in a simulation of a Moran-like process. The evolved reaction frequencies were used to predict how the resulting strains use metabolites in their environments (y-axis) and compared to their usage in the simulated environment (x-axis)
Fig. 3Evolutionary landscape of possible pan-reactome reaction frequencies and metabolite usage profiles based on sampling panEFMs in 1000 random environments. A UMAP projection of reaction frequencies in the collection of panEFMs sampled from different prokaryotic families (Table S2). Each smaller point represents the reaction frequency distribution calculated from 1000 panEFMs sampled in one random environment. The large dots are the frequencies observed in the natural pan-reactomes. B UMAP projection of the metabolite usage profiles obtained from the same panEFMs projected in A. The large dots are the elastic net (EN) predictions of these profiles that were predicted from the natural pan-reactomes reaction frequencies. The ENs were trained on the sampled panEFMs (Table S5)
Variables that were used to compare the panEFMs and pan-reactomes of 46 prokaryote families (Fig. 4, Table S6)
| Variable | Description |
|---|---|
| NicheBreadth | Predicted niche breadth from global environmental sequencing datasets (see Methods) |
| diversity(panEFMs) | The diversity between reaction frequency of panEFMs sampled in different virtual environments (Average squared pairwise Euclidean distance) |
| fluidity(panEFMs) | The average dissimilarity between panEFMs independently of the random environments in which it was sampled |
| pan(panEFMs) | Total reactions that are included in at least one of the panEFMs sampled in different virtual environments |
| pan(Reactomes) | The number of reactions found in the pan-reactome of a prokaryote family |
| size(panEFMs) | The average size of panEFMs sampled in different virtual environments |
| size(Reactomes) | The average size of the natural reactomes from a prokaryote family |
| core(panEFMs) | The number of reactions that are present in at least 98% of the panEFMs sampled in different virtual environments |
| core(Reactomes) | The number of reactions present in at least 98% of the natural reactomes from a prokaryote family |
| shell(panEFMs) | The number of reactions that are present in 3 to 98% of all the panEFMs sampled in different virtual environments |
| shell(Reactomes) | The number of reactions that are present in 3 to 98% of the natural reactomes from a prokaryote family |
| cloud(panEFMs) | The number of reactions present in up to 3% of the panEFMs sampled in different virtual environments |
| cloud(Reactomes) | The number of reactions present in up to 3% of the natural reactomes from a prokaryote family |
| diversity(Metabs) | The diversity between metabolite usage profiles of panEFMs sampled in different virtual environments (Average squared pairwise Euclidean distance) |
| EnvDReacs | The number of reactions with an environment-driven score (EDS) EDS significantly > 0 (adj. p < 0.05 on a Z-test) |
| EnvDMetabs | The number of metabolites that are significantly associated with reactions with a non-zero EDS |
Fig. 4Correlation of the variables measured from panEFMs with reactomes and metagenomes across 46 prokaryotic families. Only significant values are shown. A description of the variables is available in Table 1. Detailed Pearson correlation values and adjusted p-values are available in Table S6