| Literature DB >> 34614189 |
Francisco Zorrilla1,2,3, Filip Buric1, Kiran R Patil2,3, Aleksej Zelezniak1,4.
Abstract
Metagenomic analyses of microbial communities have revealed a large degree of interspecies and intraspecies genetic diversity through the reconstruction of metagenome assembled genomes (MAGs). Yet, metabolic modeling efforts mainly rely on reference genomes as the starting point for reconstruction and simulation of genome scale metabolic models (GEMs), neglecting the immense intra- and inter-species diversity present in microbial communities. Here, we present metaGEM (https://github.com/franciscozorrilla/metaGEM), an end-to-end pipeline enabling metabolic modeling of multi-species communities directly from metagenomes. The pipeline automates all steps from the extraction of context-specific prokaryotic GEMs from MAGs to community level flux balance analysis (FBA) simulations. To demonstrate the capabilities of metaGEM, we analyzed 483 samples spanning lab culture, human gut, plant-associated, soil, and ocean metagenomes, reconstructing over 14,000 GEMs. We show that GEMs reconstructed from metagenomes have fully represented metabolism comparable to isolated genomes. We demonstrate that metagenomic GEMs capture intraspecies metabolic diversity and identify potential differences in the progression of type 2 diabetes at the level of gut bacterial metabolic exchanges. Overall, metaGEM enables FBA-ready metabolic model reconstruction directly from metagenomes, provides a resource of metabolic models, and showcases community-level modeling of microbiomes associated with disease conditions allowing generation of mechanistic hypotheses.Entities:
Mesh:
Year: 2021 PMID: 34614189 PMCID: PMC8643649 DOI: 10.1093/nar/gkab815
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of the metaGEM pipeline workflow highlighting tools, inputs, and outputs. Short reads are quality filtered and adapter trimmed using fastp (27). Quality controlled reads are assembled individually using MEGAHIT (28). Using either kallisto (31) or bwa (29), quality controlled reads are mapped to each assembly to obtain contig coverage information across samples. Coverage information and assemblies are used by CONCOCT (8), MetaBAT2 (10) and MaxBin2 (9) to generate three bin sets for each sample. The metaWRAP (32) bin_refinement module is used to dereplicate bin sets for each sample and find the highest quality version of each bin. The metaWRAP bin_reassemble module is used to extract quality controlled short reads mapping from the focal sample to the bin, which are used to generate two single genome assemblies using strict and permissive parameters. The original and reassembled versions are compared for quality, and the best version is kept. Refined and reassembled MAGs are used to generate GEMs using CarveMe (33). These models can be quality checked using MEMOTE (34). Community simulations are carried out for each sample using SMETANA (17). Other features include: taxonomic classification using mOTUs2 (36) and/or GTDB-Tk (35), a custom mapping-based abundance estimation module that does not make use of marker-gene or reference-genome based approaches, growth rate estimation for high coverage MAGs using GRiD (37), and pangenome analysis using prokka (38) and roary (39). EukRep (40) can be used to scan for eukaryotic contigs in the CONCOCT bin sets, which can then be processed by EukCC (41) to provide completeness, contamination, and taxonomic assignments for eukaryotic MAGs.
List of tools used by metaGEM
| Tool | Task | Repository |
|---|---|---|
| Snakemake v5.10.0 ( | Workflow management |
|
| fastp v0.20.0 ( | Short read QC filtering and adapter removal |
|
| MEGAHIT v1.2.9 ( | Short read assembly |
|
| bwa v0.7.17 ( | Contig coverage |
|
| SAMtools v1.9 ( | Contig coverage |
|
| kallisto v0.46.1 ( | Contig coverage |
|
| CONCOCT v1.1.0 ( | Contig binning |
|
| MetaBAT2 v2.12.1 ( | Contig binning |
|
| MaxBin2 v2.2.5 ( | Contig binning |
|
| metaWRAP v1.2.3 ( | Bin refinement and reassembly |
|
| CarveMe v1.2.2 ( | GEM reconstruction |
|
| SMETANA v1.2.0 ( | Community GEM simulation |
|
| MEMOTE v0.9.13 ( | GEM quality report |
|
| GTDB-Tk v1.1.0 ( | MAG taxonomy assignment |
|
| mOTUs2 v2.5.1 ( | MAG taxonomy assignment |
|
| GRiD v1.3 ( | MAG growth rate estimation |
|
| Prokka v1.14.6 ( | MAG functional annotation |
|
| Roary v3.13.0 ( | Pangenome analysis |
|
| EukRep v0.6.6 ( | Identify eukaryotic MAGs |
|
| EukCC v0.1.4.3 ( | Eukaryotic MAG taxonomy and quality |
|
Figure 2.Abundance, quality, and diversity comparisons of reconstructions. (A) Abundance estimates generated by metaGEM using a mapping based approach compared to marker gene based approach of mOTUs2 in small lab culture communities dataset. (B) Distribution of genes, reactions and metabolites in genome scale metabolic models across AGORA (15), BiGG (54), EMBL GEMs (33), KBase (49), medium quality (MQ) metaGEM and high quality (HQ) metaGEM sets. (C) Distribution of metabolic distances between a set of 200 randomly chosen reference EMBL GEMs compared to a randomly chosen set of 800 EMBL GEMs and 800 randomly chosen metaGEMs from the gut microbiome dataset. (D) Cumulative core and pan genome curves for the top 10 most commonly reconstructed gut microbiome species based on EC numbers present in the reconstructed genome scale metabolic models. (E) Comparison of EC numbers between 165 species reconstructed from the gut microbiome dataset and also found in the AGORA collection. Inset venn diagram shows average value of EC numbers unique to the compared sets as well as their average intersect.
Figure 3.SMETANA simulations uncover differences in metabolism across conditions. (A) Alluvial diagram showing top 10 compounds exchanged with statistical significance across conditions between eight species, representing 279 interactions (NGT n = 61, IGT n = 58, T2D n = 160) across 41 samples (NGT n = 12, IGT n = 12, T2D n = 17). Thickness of lines are proportional to magnitude of SMETANA score. (B) Radar plot of average SMETANA scores based on 543 interactions (NGT n = 50, IGT n = 161, T2D n = 249) across 18 samples (NGT n = 4, IGT n = 6, T2D n = 8) grouped by metabolite class across conditions for receiver Faecalibacterium prausnitzii C. (C) Network diagrams of interactions involving Faecalibacterium prausnitzii C (centered in each subgraph) as a receiver across conditions. Thickness of lines are proportional to magnitude of SMETANA score.