| Literature DB >> 34378983 |
Abstract
Saccharibacteria (formerly TM7) have reduced genomes and a small cell size and appear to have a parasitic lifestyle dependent on a bacterial host. Although there are at least 6 major clades of Saccharibacteria inhabiting the human oral cavity, complete genomes of oral Saccharibacteria were previously limited to the G1 clade. In this study, nanopore sequencing was used to obtain three complete genome sequences from clade G6. Phylogenetic analysis suggested the presence of at least 3 to 5 distinct species within G6, with two discrete taxa represented by the 3 complete genomes. G6 Saccharibacteria were highly divergent from the more-well-studied clade G1 and had the smallest genomes and lowest GC content of all Saccharibacteria. Pangenome analysis showed that although 97% of shared pan-Saccharibacteria core genes and 89% of G1-specific core genes had putative functions, only 50% of the 244 G6-specific core genes had putative functions, highlighting the novelty of this group. Compared to G1, G6 harbored divergent metabolic pathways. G6 genomes lacked an F1Fo ATPase, the pentose phosphate pathway, and several genes involved in nucleotide metabolism, which were all core genes for G1. G6 genomes were also unique compared to that of G1 in that they encoded d-lactate dehydrogenase, adenylate cyclase, limited glycerolipid metabolism, a homolog to a lipoarabinomannan biosynthesis enzyme, and the means to degrade starch. These differences at key metabolic steps suggest a distinct lifestyle and ecological niche for clade G6, possibly with alternative hosts and/or host dependencies, which would have significant ecological, evolutionary, and likely pathogenic implications. IMPORTANCE Saccharibacteria are ultrasmall parasitic bacteria that are common members of the oral microbiota and have been increasingly linked to disease and inflammation. However, the lifestyle and impact on human health of Saccharibacteria remain poorly understood, especially for the clades with no complete genomes (G2 to G6) or cultured isolates (G2 and G4 to G6). Obtaining complete genomes is of particular importance for Saccharibacteria, because they lack many of the "essential" core genes used for determining draft genome completeness, and few references exist outside clade G1. In this study, complete genomes of 3 G6 strains, representing two candidate species, were obtained and analyzed. The G6 genomes were highly divergent from that of G1 and enigmatic, with 50% of the G6 core genes having no putative functions. The significant difference in encoded functional pathways is suggestive of a distinct lifestyle and ecological niche, probably with alternative hosts and/or host dependencies, which would have major implications in ecology, evolution, and pathogenesis.Entities:
Keywords: Saccharibacteria; TM7; genomics; nanopore; nanopore sequencing; oral microbiome
Mesh:
Year: 2021 PMID: 34378983 PMCID: PMC8386444 DOI: 10.1128/mSphere.00530-21
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
Saccharibacteria genomes improved using nanopore sequencing in this study
| New MAG | Previous MAG name | Clade | Previous no. of contigs | Previous size (bp) | Updated no. of contigs | Updated size (bp) | Updated longest contig (bp) | Complete | Near complete (longest contig >700,000 bp or <5 contigs) |
|---|---|---|---|---|---|---|---|---|---|
| JB001 | Candidatus_Nanogingivalaceae_FGB1_strain_JCVI_27_bin.3 | G6 | 67 | 704,215 | 1 | 663,355 | 663,355 | X | |
| JB002 | Candidatus_Saccharimonas_sp._strain_JCVI_32_bin.49 | G6 | 14 | 620,057 | 1 | 637,739 | 637,739 | X | |
| JB003 | Candidatus_Nanogingivalaceae_FGB1_strain_JCVI_28_bin.11 | G6 | 34 | 719,702 | 1 | 691,584 | 691,584 | X | |
| TM7c-JB | Candidatus_Nanosynbacter_TM7c_strain_JCVI_32_bin.19 | G1 | 7 | 793,808 | 1 | 793,363 | 793,363 | X | |
| None | Candidatus_Nanosynbacter_sp._TM7_MAG_III_A_2_strain_JCVI_32_bin.12 | G1 | 76 | 696,341 | 8 | 837,467 | 808,188 | X | |
| None | Candidatus_Nanosynbacter_GGB2_strain_JCVI_32_bin.57 | G1 | 32 | 1,040,784 | 6 | 1,054,499 | 762,750 | X | |
| G6_32_bin_33_unicycler | Candidatus_Nanogingivalaceae_FGB1_strain_JCVI_32_bin.33 | G6 | 97 | 521,278 | 31 | 594,688 | 77,761 | ||
| None | Candidatus_Nanosynbacteraceae_FGB1_strain_JCVI_32_bin.22 | G1 | 68 | 636,728 | 35 | 913,508 | 182,700 | ||
| None | Candidatus_Nanosynbacteraceae_FGB2_strain_JCVI_32_bin.44 | G1 | 31 | 725,781 | 15 | 819,428 | 300,554 | ||
| G3_32_bin_36_unicycler | Candidatus_Nanosyncoccus_FGB2_strain_JCVI_32_bin.36 | G3 | 32 | 667,180 | 4 | 688,219 | 265,262 | X |
MAG, metagenome-assembled genome.
FIG 1JB001, JB002, and JB003 are clade G6 Saccharibacteria representing two distinct species. (A) Phylogenetic tree of Saccharibacteria annotated with genome data. Phylogenetic analysis of the 123 Saccharibacteria genomes listed in Table S1 in the supplemental material. Firmicutes was used as an outgroup. The bars in the innermost layer represent the number of singleton gene clusters (i.e., genes appearing in only that one genome) in each genome. The bars in the second layer represent the redundancy (likely contamination) within each genome. The bars in the third layer represent the %GC content of each genome. The bars in the fourth layer represent the total length in base pairs of each genome. The fifth layer displays the source/reference for each genome. The sixth layer displays the genomes that are complete. The outermost layer, and the color of the branches of the tree, illustrate which Saccharibacteria clade each genome is part of. Orange stars indicate genomes that were used in the full pangenome analysis (Fig. S2; Table S4). Yellow stars indicate genomes that were used in the pangenome analysis of compete genomes only (Fig. 2; Table S3) as well as the full pangenome analysis (Fig. S2; Table S4). A larger version of this figure, with the name of each genome labeled, is available in Fig. S1. Note that CP025011_1_Candidatus_Saccharibacteria_bacterium_YM_S32_TM7_50_20_chromosome_complete_genome and c_000000000001 (GCA_003516025.1_ASM351602v1_genomic.fa), the only two complete genomes in clades G3 and G5, are from environmental, not oral, samples. The raw data in the annotations of the tree are available in Table S1. A blue star indicates the genome isolated from a mammalian rumen, and red stars indicate genomes that were isolated from environmental sources. All other genomes are from human oral samples. (B) Average nucleotide identity (%ANI) of G6 genomes. Heat map of all-versus-all comparison of %ANI of all 11 G6 genomes. The tree on the right is a scaled-up version of the G6 portion of the phylogenetic tree in panel A. Full percentage identity, which takes alignment length into account, is available in Table S2. (C) Whole-genome alignment of TM7x versus complete G6 genomes. The tree on the left is based on the whole-genome alignment itself.
FIG 2Pangenome analysis of complete genomes in Saccharibacteria clade G1 versus clade G6 identifies core genes encoding distinct functional pathways. (A) Pangenomes of complete G1 and G6 genomes. The dendrogram in the center organizes the 2,279 gene clusters identified across the genomes represented by the innermost 7 layers: TM7x, BB001, HB001, PM004, JB003, JB001, and JB002. The data points within these 7 layers indicate the presence of a gene cluster in a given genome. From inside to outside, the next 6 layers indicate known versus unknown COG category, COG function, COG pathway, KEGG class, KEGG module, and KOfam. The next layer indicates single-copy pan-Saccharibacteria core genes. The next 6 layers indicate the combined homogeneity index, functional homogeneity index, geometric homogeneity index, maximum number of paralogs, number of genes in the gene cluster, and the number of contributing genomes. The outermost layer highlights gene clusters that correspond to the pan-Saccharibacteria core genes (found in all 7 genomes), the G1 core genes (found in all G1 genomes and no G6 genomes), and the G6 core genes (found in all G6 but no G1 genomes). The pie chart adjacent to each group of core genes indicates the breakdown of COG categories of the gene clusters in the group. The 7 genome layers are ordered based on the tree of the %ANI comparison, which is displayed with the red and white heat map. The layers underneath the %ANI heat map, from top to bottom, indicate the number of gene clusters, the number of singleton gene clusters, the GC content, and the total length of each genome. The Venn diagrams in the inset show the number of overlapping and nonoverlapping genes between JB001 and JB002 and between JB001 and TM7x. The number in parenthesis is the number of genes with unknown functions (UF). (B) KEGG pathways encoded by G1 and G6 core genes. KEGG metabolic map overlaid with the pathways encoded by the pan-Saccharibacteria core genes (black), G1 core genes (green), and G6 core genes (red), as indicated by the Venn diagram key. Enzymes of interest are labeled with text and arrows. Pathways are indicated by labeled boxes; the cell wall metabolism pathway is labeled with the red background to distinguish it due to the odd shape and overlap with the glycolysis pathway space.