| Literature DB >> 35069467 |
Alejandro Rodríguez-Gijón1, Julia K Nuy1, Maliheh Mehrshad2, Moritz Buck2, Frederik Schulz3, Tanja Woyke3, Sarahi L Garcia1.
Abstract
Our view of genome size in Archaea and Bacteria has remained skewed as the data has been dominated by genomes of microorganisms that have been cultivated under laboratory settings. However, the continuous effort to catalog Earth's microbiomes, specifically propelled by recent extensive work on uncultivated microorganisms, provides an opportunity to revise our perspective on genome size distribution. We present a meta-analysis that includes 26,101 representative genomes from 3 published genomic databases; metagenomic assembled genomes (MAGs) from GEMs and stratfreshDB, and isolates from GTDB. Aquatic and host-associated microbial genomes present on average the smallest estimated genome sizes (3.1 and 3.0 Mbp, respectively). These are followed by terrestrial microbial genomes (average 3.7 Mbp), and genomes from isolated microorganisms (average 4.3 Mbp). On the one hand, aquatic and host-associated ecosystems present smaller genomes sizes in genera of phyla with genome sizes above 3 Mbp. On the other hand, estimated genome size in phyla with genomes under 3 Mbp showed no difference between ecosystems. Moreover, we observed that when using 95% average nucleotide identity (ANI) as an estimator for genetic units, only 3% of MAGs cluster together with genomes from isolated microorganisms. Although there are potential methodological limitations when assembling and binning MAGs, we found that in genome clusters containing both environmental MAGs and isolate genomes, MAGs were estimated only an average 3.7% smaller than isolate genomes. Even when assembly and binning methods introduce biases, estimated genome size of MAGs and isolates are very similar. Finally, to better understand the ecological drivers of genome size, we discuss on the known and the overlooked factors that influence genome size in different ecosystems, phylogenetic groups, and trophic strategies.Entities:
Keywords: archaea; bacteria; genome size; genomics; microbial ecology
Year: 2022 PMID: 35069467 PMCID: PMC8767057 DOI: 10.3389/fmicb.2021.761869
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Overview of the genome size distribution across Earth’s microbiomes. Genome size distribution of Archaea and Bacteria (A) from different environmental sources and across different archaeal and bacterial phyla (B) are shown for a total of 26,101 representative genomes. Isolate genomes were gathered from GTDB (release95) and environmental MAGs were gathered from GEMs (Nayfach et al., 2020) and stratfreshDB (Buck et al., 2021a). We use one representative genome per mOTU (defined by 95% ANI) from the union of GEMs catalog and stratfreshDB in the plots. From the GTDB database, we selected one representative isolate genome per species cluster that was circumscribed based on the ANI (≥95%) and alignment fraction [(AF) > 65%] between genomes (Parks et al., 2020). To construct the figures, we plotted the min-max estimated genome sizes, which were calculated based on the genome assembly size and completeness estimation provided. Venn diagram of the intersection between the representative environmental MAGs and the representative isolate genomes (C). The intersection was calculated using FastANI (Jain et al., 2018) and was determined with a threshold of 95%. The coding density (D) and GC content (%) (E) are shown for the archaeal and bacterial MAGs across different ecosystem categories and isolates. Pair-wise t-test was performed in all variables of (D,E) and shown in (F), where white is significant (p < 0.05) and black is not significant (p > 0.05). In (B), we only included phyla with more than five genomes.
FIGURE 2Phylogenetic trees of archaeal (A) and bacterial (B) representative genomes show variation in genome size between and within phyla. The trees were constructed using GTDB-tk (v 1.5.0) using de novo workflow using aligned concatenated set of 122 and 120 single copy marker proteins for Archaea and Bacteria, respectively (Chaumeil et al., 2020). Moreover, in this mode, GTDB-tk adds 1,672 and 30,238 backbone genomes for Archaea and Bacteria, respectively. Tree is visualized in anvi’o (Eren et al., 2021). Estimated genome size is presented in scale from 0 to 6 Mpb or 0 to14 Mbp for Archaea or Bacteria. In the tree, the origin of the environmental genomes is labeled: aquatic, terrestrial and host-associated (same MAGs as Figure 1). Highlighted phyla with more representative genomes are color-coded. Boxplots show the average estimated genome size per phyla within archaeal and bacterial (C) domains. The average estimated size per genus within Halobacteriota (D), Thermoproteota (E), Actinobacteriota (F), Bacteroidota (G), Firmicutes A (H), Patescibacteria (I), Proteobacteria (J). The presence of phyla and genera is colored in gray if they contain MAGs from different ecosystem category (non-specific ecosystem). The average estimated size per genus extracted from aquatic ecosystems (K), host-associated ecosystems (L), terrestrial ecosystems (M), or non-specific ecosystems (N). Letters in boxplot panels are the result of non-parametric tests, Wilcoxon (C) or Kruskal-Wallis (D–N). Different letters show significant differences p < 0.05 (all statistical test with multiple testing were corrected with Benjamini-Hochberg).