Literature DB >> 35604118

Phylogenomic Analyses of Snodgrassella Isolates from Honeybees and Bumblebees Reveal Taxonomic and Functional Diversity.

Luc Cornet1, Ilse Cleenwerck2, Jessy Praet2, Raphaël R Leonard3, Nicolas J Vereecken4, Denis Michez5, Guy Smagghe6, Denis Baurain3, Peter Vandamme2.   

Abstract

Snodgrassella is a genus of Betaproteobacteria that lives in the gut of honeybees (Apis spp.) and bumblebees (Bombus spp). It is part of a conserved microbiome that is composed of a few core phylotypes and is essential for bee health and metabolism. Phylogenomic analyses using whole-genome sequences of 75 Snodgrassella strains from 4 species of honeybees and 14 species of bumblebees showed that these strains formed a monophyletic lineage within the Neisseriaceae family, that Snodgrassella isolates from Asian honeybees diverged early from the other species in their evolution, that isolates from honeybees and bumblebees were well separated, and that this genus consists of at least seven species. We propose to formally name two new Snodgrassella species that were isolated from bumblebees: i.e., Snodgrassella gandavensis sp. nov. and Snodgrassella communis sp. nov. Possible evolutionary scenarios for 107 species- or group-specific genes revealed very limited evidence for horizontal gene transfer. Functional analyses revealed the importance of small proteins, defense mechanisms, amino acid transport and metabolism, inorganic ion transport and metabolism and carbohydrate transport and metabolism among these 107 specific genes. IMPORTANCE The microbiome of honeybees (Apis spp.) and bumblebees (Bombus spp.) is highly conserved and represented by few phylotypes. This simplicity in taxon composition makes the bee's microbiome an emergent model organism for the study of gut microbial communities. Since the description of the Snodgrassella genus, which was isolated from the gut of honeybees and bumblebees in 2013, a single species (i.e., Snodgrassella alvi), has been named. Here, we demonstrate that this genus is actually composed of at least seven species, two of which (Snodgrassella gandavensis sp. nov. and Snodgrassella communis sp. nov.) are formally described and named in the present publication. We also report the presence of 107 genes specific to Snodgrassella species, showing notably the importance of small proteins and defense mechanisms in this genus.

Entities:  

Keywords:  Snodgrassella; bumblebee; functional analysis; honeybee; metabolic modeling; metagenomics; microbiome; phylogenomics; species delineation

Year:  2022        PMID: 35604118      PMCID: PMC9239279          DOI: 10.1128/msystems.01500-21

Source DB:  PubMed          Journal:  mSystems        ISSN: 2379-5077            Impact factor:   7.324


INTRODUCTION

Honeybees (Apis spp.) and bumblebees (Bombus spp.) harbor a gut microbiome that is important in health and metabolism (1–3). This microbiome is highly conserved, with 95% of the gut microbionts falling within a few phylotypes that include Actinobacteria (Bifidobacterium, Bombiscardovia), Bacteroidetes (Apibacter), Firmicutes (Lactobacillus [the so-called Firm-5 or Lacto-1 taxon], Bombilactobacillus [Firm-4, Lacto-2], and Apilactobacillus [Lacto-3]), Alphaproteobacteria (Bartonella, Bombella, Commensalibacter), Betaproteobacteria (Snodgrassella), and Gammaproteobacteria (Frischella, Gilliamella) (2, 4–18). The genera Bifidobacterium, Lactobacillus, Bombilactobacillus, Gilliamella, and Snodgrassella are generally considered the core microbionts of honeybees and bumblebees (10, 19, 20). These gut-related organisms coevolved within their hosts during the last 80 million years (9, 10, 21) and contribute to carbohydrate digestion (2, 18, 22) and pathogen defense (23–26). The importance of bees for ecosystem integrity, the contribution of the bee gut microbiota to its hosts’ health, and the relative simplicity of the taxonomic composition of the bee gut microbiota, along with their mode of transmission, which is mainly vertical, have made bees an emerging model organism for the study of gut-related microbial communities (1, 18, 27–29). Snodgrassella and Gilliamella are physically closely associated within the hindgut: the former grows in contact with the ileum epithelium, while the latter forms a dense biofilm on top of the Snodgrassella layer and is in contact with the gut lumen (9, 11, 30). As a result of intricate coevolution of these two bacteria, horizontal gene transfers (HGTs) have been reported between these two taxa (2, 31). Among others, they share Rhs toxin proteins, which are involved in type VI secretion system (T6SS)-mediated competition (2, 32–34). The formal description and naming of Snodgrassella alvi were based on three strains: one from a honeybee gut sample (wkB2T from Apis mellifera) and two from bumblebee gut samples (wkB12 from Bombus bimaculatus and wkB29 from Bombus vagans) (21). All three strains shared more than 99.1% of their small subunit (SSU [16S]) rRNA gene sequences, which allocated them within the Neisseriaceae family. Many additional strains have been reported since, and some phylogenetic analyses revealed a clear separation between Snodgrassella isolates of Apis spp. and those of Bombus spp. (1, 32–35), whereas others reported that the two groups were not monophyletic (10, 22). In the present study, we used core gene phylogenomics to demonstrate the separation of Snodgrassella isolates into three clades: one colonizing Bombus spp. and two colonizing Apis spp. We report an early divergence of Snodgrassella strains from Asian honeybees, which implies that Apis-colonizing strains are paraphyletic. We used average nucleotide identity (ANI) analyses to demonstrate that the genus Snodgrassella consists of at least seven species. We further analyzed differential gene contents within these species and examined evolutionary scenarios for the emergence of 107 specific genes. We finally used our own Snodgrassella isolates to describe and formally name two of these novel species from bumblebees as Snodgrassella gandavensis, with LMG 30236 (=CECT 30450) as the type strain, and Snodgrassella communis, with LMG 28360 (=CECT 30451) as the type strain.

RESULTS AND DISCUSSION

Species delimitation within the Snodgrassella genus.

We studied the diversity within the genus Snodgrassella using extensive phylogenomic and average nucleotide identity (ANI) analyses (Fig. 1). Highly conserved genes were selected to perform the phylogenomic analyses. Multiple events of HGT affecting Snodgrassella genomes have been reported (2, 31, 32), and such events may be damaging for inferring species phylogenies (36–38). To minimize interference of HGT, we used shared Neisseriaceae core genes only, by incorporating 35 non-Snodgrassella bacteria from this family into our data set. We further imposed a strict unicopy presence onto core genes in order to avoid artifacts linked to paralogous sequences, which also have been reported to be deleterious to phylogenomic studies (39, 40). Finally, we enforced high geometric and functional indices (0.8 for both), resulting in the selection of genes with less than 20% gap insertions and 20% substitutions, respectively. Our phylogenomic data set was thus composed of 254 genes that were structurally and functionally conserved within the family Neisseriaceae. Our analyses of the Neisseriaceae phylogeny confirmed with 100% bootstrap support that the genus Snodgrassella is monophyletic within the family Neisseriaceae (see Fig. S1 in the supplemental material) (21, 41). Its nearest-neighbor taxon was Populibacter corticis (40). Prior to the description of P. corticis in 2017 (41), the lineage formed by Stenoxybacter acetivorans and Snodgrassella spp. was suggested to represent a gut-specific clade within the Betaproteobacteria, based on the observation that also S. acetivorans was isolated from an insect (i.e., termite gut sample) (21). The description of P. corticis, an organism isolated from bark tissue of poplar canker, appeared to disrupt this image, yet, more should be known about the latter organism before the hypothesis of a gut-specific clade within the Betaproteobacteria is abandoned. In the rest of the present study, we used P. corticis CFCC 13594T as an outgroup for studying Snodgrassella phylogeny.
FIG 1

Graphical abstract. We used 9 newly sequenced Snodgrassella genomes and 67 (including the outgroup) public genome assemblies to reconstruct the phylogeny of the genus Snodgrassella with 254 core genes, all functionally and structurally conserved in the Neisseriaceae family. We used independent phylogenetic methods (bootstrapping, leave-one-out analysis, and taxon sampling) to assess the robustness of our phylogenomic trees. Combined with an average nucleotide identity analysis, our results indicated new species delimitations within this genus. We further investigated specific gene contents within the species groups and their functional importance.

Graphical abstract. We used 9 newly sequenced Snodgrassella genomes and 67 (including the outgroup) public genome assemblies to reconstruct the phylogeny of the genus Snodgrassella with 254 core genes, all functionally and structurally conserved in the Neisseriaceae family. We used independent phylogenetic methods (bootstrapping, leave-one-out analysis, and taxon sampling) to assess the robustness of our phylogenomic trees. Combined with an average nucleotide identity analysis, our results indicated new species delimitations within this genus. We further investigated specific gene contents within the species groups and their functional importance. Neisseriaceae phylogeny. The maximum likelihood tree was inferred from 254 core genes under the PROTGAMMALGF model with RAxML from a supermatrix of 111 organisms by 86,654 unambiguously aligned amino acid positions. Bootstrap support values are shown at the nodes. Download FIG S1, PDF file, 0.02 MB. Our amino acid- and nucleotide-based phylogenomic and leave-one-out analyses all revealed a clear separation between Snodgrassella isolates from A. mellifera and Bombus spp., with full support values in bootstrap and leave-one-out analyses (Fig. S2 and S3). The 75 Snodgrassella strains formed three clades. Those isolated from Asian honeybees, referred to as Apis group1, formed a monophyletic group separated from the two other clades with full support in both bootstrap and leave-one-out analyses (Fig. S2 and S3). The same three clades were reported by Powell et al. (42) using minD as a single phylogenetic marker gene. Yet, the latter study did not use an outgroup to root the obtained phylogeny. Using 254 core genes and P. corticis CFCC 13594T as an outgroup, we demonstrated that Apis group1 is the first diverging group of the Snodgrassella phylogeny (Fig. S2 and S3). The separation into three lineages was confirmed by removing P. corticis CFCC 13594T from the phylogenomic analyses in order to check the presence of a long-branch attraction artifact (Fig. S4). The Snodgrassella phylogeny showed shorter branch lengths in Snodgrassella isolates from Apis mellifera (here referred to as Apis group2) than those in isolates from Bombus spp. (Fig. 2). The latter represented five lineages (Fig. 2 and Table 1). The first, Bombus group1, was composed of four isolates from Bombus pensylvanicus (all collected in the United States) and was fully supported by the bootstrap and leave-one-out analyses (Fig. S2 and S3). The second, Bombus group2, consisted of a single isolate from Bombus nevadensis (United States). The three remaining groups were fully supported in the bootstrap and leave-one-out analyses as well and were referred to as Bombus group3, which was composed of three isolates from Bombus appositus (United States), Bombus group4, which was composed of two isolates from Bombus pascuorum and Bombus lapidarius (Belgium), and Bombus group5, which comprised 17 isolates collected from 11 Bombus species in Belgium and the United States (Fig. 2 and Table 1) and which included the isolates wkB12 and wkB29, reported by Kwong and Moran (21). This topology was confirmed with sparser taxon sampling, after removing 40 strains with dRep (43) and computing another large phylogenomic analysis (Fig. 3). The branching patterns of the seven groups were the same in all amino acid- and nucleotide-based trees inferred, except for Bombus group2. Bombus group2 appeared as a sister group of Bombus group3 and Bombus group4 in all analyses (with a bootstrap of <70), except in the nucleotide-based phylogenomic analysis, in which Bombus group2 branched with Bombus group1 (Fig. S3).
FIG 2

Snodgrassella phylogeny and ANI comparison. A maximum likelihood tree was inferred from 254 core gene sequences under the PROTGAMMALGF model with RAxML (71) from a supermatrix of 76 organisms by 86,654 unambiguously aligned amino acid positions. Only bootstrap values below 100 are shown at the nodes. (A) Host. Host species were taken from biosample metadata in the NCBI portal, except for the 9 newly sequenced genomes, for which the metadata were taken from reference 15. (B) ANI heat map. ANI values were computed with fastANI (74). A triangular matrix was constructed according to pairwise distances. Colors associated with ANI values are given in the fastANI key. (C) GTDB hits. The blue squares on the right were determined using GTDBtk on the genomes of the associated strains (77).

TABLE 1

Details of the Snodgrassella strains and public assemblies

Genome accession no.Strain no.bSpeciesGroupIsolation sourceGeographic originCompleteness (%)Contamination (%)LengthNo. of contigs (nt) N 50
GCF_002777775.1wkB237*Snodgrassella sp. nov.Apis group1 A. andreniformis Singapore99.570.852,321,01221327,688
GCF_002777855.1wkB298*Snodgrassella sp. nov.Apis group1 A. cerana Singapore99.570.852,338,27142300,195
GCF_002777655.1wkB273*Snodgrassella sp. nov.Apis group1 A. florea Singapore99.570.852,317,51131335,346
GCF_013344955.1ESL0324 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,599,15121554,030
GCF_013344995.1ESL0304 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,427,36223239,422
GCF_013346865.1ESL0323 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,665,38737227,440
GCF_016100435.1W8158 Snodgrassella alvi Apis group2 A. mellifera USA99.571.502,572,92722817,650
GCF_016100485.1W6238H14 Snodgrassella alvi Apis group2 A. mellifera USA99.571.072,493,31533676,521
GCF_016100525.1W6238H11 Snodgrassella alvi Apis group2 A. mellifera USA99.571.072,488,14929680,931
GCF_016100575.1M0351 Snodgrassella alvi Apis group2 A. mellifera USA99.571.072,445,778241,728,374
GCF_016100865.1W8132 Snodgrassella alvi Apis group2 A. mellifera USA99.571.072,402,21217168,5768
GCF_016100875.1W8124* Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,374,073131,777,221
GCF_016101225.1W8135 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,403,944181,368,577
GCF_016101245.1W8134 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,404,985181,368,577
GCF_016101465.1M0110 Snodgrassella alvi Apis group2 A. mellifera USA99.570.642,542,13649358,478
GCF_016101495.1M0112 Snodgrassella alvi Apis group2 A. mellifera USA99.570.642,542,58347358,478
GCF_016101535.1M0118 Snodgrassella alvi Apis group2 A. mellifera USA99.570.642,544,96753358,478
GCF_000600005.1wkB2* Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,527,97812,527,978
GCF_002088655.1N-23 Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,421,69312838,667
GCF_002088575.1N9* Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,403,33512942,232
GCF_002088755.1N-S1 Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,420,8739850,074
GCF_002088595.1N-S2 Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,421,2297361,427
GCF_002088635.1N-S3* Snodgrassella alvi Apis group2 A. mellifera Norway99.570.432,463,5187970,149
GCF_002088665.1N-S4* Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,421,48638133,936
GCF_002088765.1N-S5 Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,417,6157756,721
GCF_002088525.1N-W4 Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,421,2517555,544
GCF_002088555.1N-W7* Snodgrassella alvi Apis group2 A. mellifera Norway99.570.852,423,3186270,113
GCF_002777865.1PEB0171* Snodgrassella alvi Apis group2 A. mellifera USA99.571.282,520,62277117,138
GCF_002777875.1PEB0178 Snodgrassella alvi Apis group2 A. mellifera USA99.151.372,524,93213543,076
GCF_002777735.1wkB9 Snodgrassella alvi Apis group2 A. mellifera USA99.570.882,548,53515663,898
GCF_002088405.1A-10-12 Snodgrassella alvi Apis group2 A. mellifera USA99.570.432,501,6556365,004
GCF_002089015.1A11 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,430,26512242,222
GCF_002088695.1A-11-12 Snodgrassella alvi Apis group2 A. mellifera USA99.570.432,500,9859053,637
GCF_002088735.1A-1-12* Snodgrassella alvi Apis group2 A. mellifera USA99.570.432,502,2865891,493
GCF_002088675.1A12* Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,399,91921419,825
GCF_002088395.1A2* Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,425,1868456,386
GCF_002088585.1A-2-12 Snodgrassella alvi Apis group2 A. mellifera USA99.570.432,502,6827564,388
GCF_002088455.1A3 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,428,73110944,108
GCF_002088475.1A5 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,430,37612040,275
GCF_002088465.1A-5-24 Snodgrassella alvi Apis group2 A. mellifera USA99.570.432,490,74317228,499
GCF_002088515.1A-9-24 Snodgrassella alvi Apis group2 A. mellifera USA99.570.432,501,1076283,544
GCF_002088415.1Aw-18 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,497,1118855,786
GCF_002019415.1Aw-20* Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,498,4976575,805
GCF_003202885.1ESL0196* Snodgrassella alvi Apis group2 A. mellifera Switzerland99.570.432,446,304151,281,809
GCF_002777925.1MS1-3 Snodgrassella alvi Apis group2 A. mellifera USA99.570.852,500,6639350,239
GCF_002777695.1wkB332* Snodgrassella alvi Apis group2 A. mellifera Malaysia99.571.282,487,41930502,922
GCF_002777815.1wkB339 Snodgrassella alvi Apis group2 A. mellifera Malaysia99.570.852,499,91727431,105
GCF_002406645.1E1* Snodgrassella alvi Apis group2 A. mellifera USA98.240.852,388,32215647,816
GCF_002777705.1HK3*Snodgrassella sp. nov.Bombus group1 B. pensylvanicus USA99.570.852,609,1767492,612
GCF_002777795.1HK9x*Snodgrassella sp. nov.Bombus group1 B. pensylvanicus USA99.570.902,568,638110120,216
GCF_002777325.1Pens2-2-5Snodgrassella sp. nov.Bombus group1 B. pensylvanicus USA99.570.432,486,143132102,952
GCF_002777745.1WF3-3*Snodgrassella sp. nov.Bombus group1 B. pensylvanicus USA99.570.852,603,50367133,888
GCF_002777825.1Nev4-2*Snodgrassella sp. nov.Bombus group2 B. nevadensis USA91.451.282,582,9669095,658
GCF_002777315.1App2-2*Snodgrassella sp. nov.Bombus group3 B. appositus USA99.150.002,483,89211947,667
GCF_002777425.1App4-8Snodgrassella sp. nov.Bombus group3 B. appositus USA99.570.432,577,72411571,597
GCF_002777465.1App6-4*Snodgrassella sp. nov.Bombus group3 B. appositus USA99.570.852,623,32010897,352
GCA_914768095R-53680*Snodgrassella sp. nov.Bombus group4 B. lapidarius Belgium99.570.432,526,95146155,676
GCA_914768025LMG 30236*Snodgrassella sp. nov.Bombus group4 B. pascuorum Belgium99.570.002,507,15745132,469
GCF_000695565.1wkB12*Snodgrassella sp. nov.Bombus group5 B. bimaculatus USA99.570.432,438,49734337,275
GCF_002777415.1Fer1-2*Snodgrassella sp. nov.Bombus group5 B. fervidus USA99.570.852,442,70866153,029
GCF_002777485.1Fer2-2*Snodgrassella sp. nov.Bombus group5 B. fervidus USA99.570.852,358,79653132,767
GCF_002777335.1Gris2-3-4Snodgrassella sp. nov.Bombus group5 B. griseocollis USA99.570.432,417,52062183,509
GCF_002777525.1Gris1-3Snodgrassella sp. nov.Bombus group5 B. griseocollis USA99.570.482,441,97258128,979
GCF_002777615.1Gris1-6Snodgrassella sp. nov.Bombus group5 B. griseocollis USA99.570.432,459,69050149,104
GCF_002777595.1Gris3-4Snodgrassella sp. nov.Bombus group5 B. griseocollis USA99.570.432,453,20848122,377
GCA_914768055R-53528*Snodgrassella sp. nov.Bombus group5 B. hypnorum Belgium99.570.432,287,01343179,794
GCF_002777345.1Snod2-1-5*Snodgrassella sp. nov.Bombus group5 B. impatiens USA99.570.432,355,70762146,579
GCA_914768015R-54863*Snodgrassella sp. nov.Bombus group5 B. lapidarius Belgium99.570.432,434,13760124,074
GCA_914768085R-54236Snodgrassella sp. nov.Bombus group5 B. lucorum Belgium99.570.432,314,67026184,956
GCA_914768035R-54678*Snodgrassella sp. nov.Bombus group5 B. lucorum Belgium99.570.432,265,68532179,671
GCF_002777635.1Occ4-2*Snodgrassella sp. nov.Bombus group5 B. occidentalis USA99.570.432,487,51954151,012
GCA_914768065R-53633*Snodgrassella sp. nov.Bombus group5 B. pascuorum Belgium99.570.432,402,46442196,721
GCA_914768075R-54841Snodgrassella sp. nov.Bombus group5 B. terrestris Belgium99.570.432,400,98440170,243
GCA_914768045LMG 28360*Snodgrassella sp. nov.Bombus group5 B. terrestris Belgium99.570.432,310,39228184,956
GCF_000695545.1wkB29Snodgrassella sp. nov.Bombus group5 B. vagans USA99.570.432,398,20688184,072
GCF_001590725.1CFCC 13594* Populibacter corticis Outgroup P. canker USA98.720.002,372,91413195,649

Completeness and contamination values/metrics were estimated with CheckM (61). Assembly statistics were computed with QUAST (62).

Asterisks indicate the strains retained by dRep after dereplication.

FIG 3

Snodgrassella phylogeny, after dereplication of highly similar genomes, and metabolic analysis per species group. (A) Maximum likelihood tree inferred on 254 core genes under the PROTGAMMALGF model with RAxML (71) from a supermatrix of 36 organisms by 86,654 unambiguously aligned amino acid positions. Bootstrap support values are shown at the nodes. (B) Functional analyses were performed using COG (68) and Mantis (53). Numbers indicated in the pie charts correspond to absolute numbers of OGs identified in the respective Snodgrassella subgroups. Specific genes were computed for entire groups before dereplication.

Snodgrassella phylogeny and ANI comparison. A maximum likelihood tree was inferred from 254 core gene sequences under the PROTGAMMALGF model with RAxML (71) from a supermatrix of 76 organisms by 86,654 unambiguously aligned amino acid positions. Only bootstrap values below 100 are shown at the nodes. (A) Host. Host species were taken from biosample metadata in the NCBI portal, except for the 9 newly sequenced genomes, for which the metadata were taken from reference 15. (B) ANI heat map. ANI values were computed with fastANI (74). A triangular matrix was constructed according to pairwise distances. Colors associated with ANI values are given in the fastANI key. (C) GTDB hits. The blue squares on the right were determined using GTDBtk on the genomes of the associated strains (77). Snodgrassella phylogeny, after dereplication of highly similar genomes, and metabolic analysis per species group. (A) Maximum likelihood tree inferred on 254 core genes under the PROTGAMMALGF model with RAxML (71) from a supermatrix of 36 organisms by 86,654 unambiguously aligned amino acid positions. Bootstrap support values are shown at the nodes. (B) Functional analyses were performed using COG (68) and Mantis (53). Numbers indicated in the pie charts correspond to absolute numbers of OGs identified in the respective Snodgrassella subgroups. Specific genes were computed for entire groups before dereplication. Details of the Snodgrassella strains and public assemblies Completeness and contamination values/metrics were estimated with CheckM (61). Assembly statistics were computed with QUAST (62). Asterisks indicate the strains retained by dRep after dereplication. Comparison of the Snodgrassella bootstrap protein phylogeny and the leave-one-out protein phylogeny. The maximum likelihood tree (right panel) was inferred on 254 core genes under the PROTGAMMALGF model with RAxML from a supermatrix of 76 organisms by 86,654 unambiguously aligned amino acid positions. One hundred concatenations of 70,000-aa-long subsamples of the core genes were generated, and the corresponding trees were computed with RAxML. Bootstrap and leave-one-out support values are shown at the nodes. The leave-one-out tree (left panel) is a cladogram without informative branch lengths. Download FIG S2, PDF file, 0.05 MB. Comparison of the Snodgrassella bootstrap DNA phylogeny and the leave-one-out DNA phylogeny. (A) The maximum likelihood tree was inferred on 254 core genes under the GTRGAMMA model with RAxML from a supermatrix of 76 organisms by 264,981 aligned nucleotides, using two different partitions for codon positions 1&2 and position 3. One hundred concatenations of 100,000 nucleotides of the core genes were generated, and the corresponding trees were computed with RAxML. Trees were inferred using only codon positions 1&2 (B) and using two different partitions for codon positions 1&2 and position 3 (C). Bootstrap and leave-one-out support values are shown at the nodes. The leave-one-out trees are cladograms without informative branch lengths. Download FIG S3, PDF file, 0.08 MB. Unrooted Snodgrassella phylogeny. The maximum likelihood tree was inferred from 254 core genes under the PROTGAMMALGF model with RAxML from a supermatrix of 75 organisms (without P. corticis) by 86,654 unambiguously aligned amino acid positions, after removal of the outgroup. Bootstrap support values are shown at the nodes. Download FIG S4, PDF file, 0.02 MB. To quantify the taxonomic divergence between the observed phylogenomic groups, we performed ANI analyses. The latter are generally used for species delimitation in bacterial taxonomy (44), with ANI values of about 95 to 96% corresponding to the species delineation threshold (45, 46). The ANI analysis (Fig. 2, part ANI) revealed values within each of the phylogenomic groups (except Bombus group2, for which only a single genome was available) that were consistently above 95% ANI, while ANI values between genome sequences of different groups were consistently below 87%. Therefore, each of the seven phylogenomic groups corresponded to a distinct Snodgrassella species in modern bacterial taxonomy. These seven species also corresponded with the seven Snodgrassella species defined in the GTDB database (https://gtdb.ecogenomic.org/searches?s=al&q=Snodgrassella) (47) (Fig. 2, GTDB part). Additional Snodgrassella species likely exist as gut symbionts in stingless bees and Bombus species from as yet poorly examined continents, as suggested by their SSU rRNA phylogeny (10). The S. alvi strain wkB2T clustered within Apis group2, and by taxonomic convention, the name S. alvi should therefore be restricted to organisms from Apis group2. All other phylogenomic groups detected in the present and in earlier studies represent novel Snodgrassella species. Below we propose to formally name Bombus group4 as Snodgrassella gandavensis, with LMG 30236 (=CECT 30450) as the type strain, and Bombus group5 as Snodgrassella communis, with LMG 28360 (=CECT 30451) as the type strain. The formal naming of the remaining phylogenomic groups (i.e., Apis group1, Bombus group1, Bombus group2, and Bombus group3) can be done upon characterization and public deposit of reference cultures to conform to standard practices in bacterial taxonomy (44). It has been suggested that the transmission mode of Snodgrassella between workers and larvae is mainly vertical within a colony (27, 32, 48). Although for most of the groups (i.e., Snodgrassella species) detected in the present study, only a limited number of isolates or genomes were available, our data showed that indeed several Snodgrassella species occur in more than one Bombus species, and therefore this rejects strict cospeciation. This result is in agreement with data reported by Powel et al. (42), who demonstrated that Bombus group5 Snodgrassella (i.e., S. communis) was detected in several Bombus species, and with the experimental demonstration of host jumps in Bombus species (2). The close phylogenetic relationship between Snodgrassella species associated with bumblebees and Snodgrassella species associated with the Western honeybee additionally supports the hypothesis of horizontal transfer of symbionts between bee clades. The origin of extant bumblebees is estimated to the Miocene, as based on fossil records and molecular phylogeny (49, 50). The origin of this clade is associated with a global cooling of the Palearctic region (51). Ancestors of bumblebees were likely sharing host plants with the Western honeybee ancestor clade, which is also associated with a temperate climate, and not with the Apis cerana honeybee clade, which is associated with a subtropical to tropical warmer climate. Overall, the similarity of microbiota among corbiculate bees seems to be related to both phylogeny and sharing of common habitats/climates.

Specific gene analysis.

Snodgrassella has been coevolving with other bacteria in the gut of honeybees and bumblebees for 80 million years (9, 10, 21). Interestingly the age of the common ancestor of the clade of the hosts (i.e., corbiculate bees, which include both bumblebees and honeybees) is also estimated round ~85 million years (52). Different species and genotypes have evolved and may have developed different functions in their hosts. We investigated the presence of group-specific genes (i.e., genes that are exclusively present in one group [or species] and that are shared by all members of this group) and determined their functionalities. The rationale was that a conserved gene probably confers a benefit to members of that group. We detected 107 specific genes: 19 were specific for Apis group1, 12 for Apis group2 (i.e., S. alvi), 10 for Bombus group1, 12 for Bombus group3, 24 for Bombus group4 (S. gandavensis), and 1 for Bombus group5 (S. communis) (Fig. 3). No specific genes were found for Bombus group2. In a next step, genes shared by multiple groups or species were examined. Only two group combinations presented specific genes: Apis group1 and Apis group2 shared 4 genes that were absent in all others, and all Snodgrassella spp. together shared 25 genes (Fig. 3). Of these 107 genes, 37 did not correspond to proteins known in COG or Mantis (53). The latter is the most recent and comprehensive tool for protein annotation and contains Pfam, eggnog, NPFM, TIGRfams, and Kofam family data for the three domains of life (53). These 37 proteins were relatively short (median, 53 amino acids [aa]; interquartile range [IQR], 46 aa) compared to the rest of the 107 genes (median, 197 aa; IQR, 202.25), with only six proteins above 100 aa (Table 2). Small proteins in bacteria can have roles in transport and signal transduction, can act as chaperones, can be involved in stress responses or virulence (54, 55), and can also be used as bacteriocins (56). Unknown genes are a recurrent issue in metagenomic studies, where they may represent up to 40% of the genes (57). Besides these genes without a match, 24 genes encoded proteins with unknown function but with hits in COG or Mantis, while six genes had a general predicted function only and could not be affiliated with a metabolic pathway. Together, 67 of 107 specific genes (62.6%) and their putative benefits remained unknown (Table 2 and Fig. 3).
TABLE 2

Functional analysis of the specific genes

GroupGene IDbLength (aa)Gene productMapping toolCOG pathwayFunction
Apis group1GC_00003784100  Unknown 
GC_00004036330AfsAMantisDefense mechanisms2-Oxo-3-(phosphooxy)propyl 3-oxoalkanoate synthase
GC_00004009235GlnQCOGAmino acid transport and metabolismABC-type polar amino acid transport system, ATPase component (GlnQ) (PDB no. 4YMS)
GC_0000380686  Unknown 
GC_0000394743  Unknown 
GC_0000395083 MantisDefense mechanismsPermeases of drug metabolite transporter (DMT) superfamily
GC_0000386032  Unknown 
GC_0000392032  Unknown 
GC_00003767110  Unknown 
GC_0000400586  Unknown 
GC_0000402878  Unknown 
GC_0000377433 MantisFunction unknown 
GC_0000376898SoxRCOGTranscriptionDNA-binding transcriptional regulator, MerR family (SoxR) (PDB no. 2VZ4)
GC_0000384653  Unknown 
GC_0000404369  Unknown 
GC_0000401139vWA-MoxRMantisDefense mechanismsvWA-MoxR-associated protein middle region (VMAP-M) 1—sensing of invasive entities (EMBL)
GC_0000388753  Unknown 
GC_0000465672  Unknown 
GC_0000391647  Unknown 
 
Apis group2GC_00001836*230 COGFunction unknownUncharacterized protein, contains DUF2461 domain
GC_00001864244 MantisFunction unknownUncharacterized protein
GC_0000181375YozGCOGTranscriptionDNA-binding transcriptional regulator, XRE family (YozG) (PDB no. 3TYR)
GC_00001812*213PspECOGInorganic ion transport and metabolismRhodanese-related sulfurtransferase (PspE) (PDB no. 1TQ1)
GC_00001818*598LepACOGTranslation, ribosomal structure and biogenesisTranslation elongation factor EF-4, membrane-bound GTPase (LepA) (PDB no. 3DEG)
GC_00001830114 MantisFunction unknown 
GC_00001846394AraJCOGCarbohydrate transport and metabolismPredicted arabinose efflux permease AraJ, MFS family (AraJ) (PDB no. 4LDS)
GC_00001829101MdaBCOGGeneral function prediction onlyPutative NADPH-quinone reductase (modulator of drug activity B) (MdaB) (PDB no. 1D4A)
GC_00001849289 MantisFunction unknown 
GC_00001848355Lys9COGAmino acid transport and metabolismSaccharopine dehydrogenase, NADP-dependent (Lys9) (PDB no. 1E5L)
GC_00001833272FTR1COGInorganic ion transport and metabolismHigh-affinity Fe2+/Pb2+ permease (FTR1)
GC_00001824*180 MantisFunction unknown 
 
Apis group1&2GC_00001778262YaaAMantisStress responsePeroxide stress protein YaaA
GC_00001768183  Unknown 
GC_00001766167PadCCOGSecondary metabolites biosynthesis, transport and catabolismPhenolic acid decarboxylase (PadC) (PDB no. 2GC9)
GC_00001776328 MantisGeneral function prediction onlyShort C-terminal domain
 
Bombus group1GC_00003663110MSP7_CMantisGeneral function prediction onlyMSP7-like protein C-terminal domain
GC_0000359276 MantisFunction unknown 
GC_0000353531  Unknown 
GC_0000349170GmrSDCOGDefense mechanismsDNAse/DNA nickase specific for phosphorothioated or glycosylated phage DNA, GmrSD/DndB/SspE family
GC_0000364862 MantisFunction unknown 
GC_0000348929  Unknown 
GC_0000362329  Unknown 
GC_0000349335  Unknown 
GC_0000354069  Unknown 
GC_0000350241 MantisFunction unknown 
 
Bombus group3GC_00004090277 MantisDefense mechanismsRhs family protein
GC_0000388986DHADMantisAmino acid transport and metabolismDihydroxy-acid dehydratase
GC_00003888115  Unknown 
GC_0000388262 MantisFunction unknown 
GC_0000378369  Unknown 
GC_00004096680 MantisDefense mechanismsRhs family protein
GC_0000394552  Unknown 
GC_00003718266 MantisDefense mechanismsRhs family protein
GC_0000380294  Unknown 
GC_00004066361 MantisDefense mechanismsRhs family protein
GC_0000377883 MantisFunction unknown 
GC_0000379671 MantisGeneral function prediction onlyDeoxyhypusine monooxygenase activity
 
Bombus group4GC_00004382166 MantisFunction unknown 
GC_00004653110 MantisFunction unknown 
GC_0000444238  Unknown 
GC_0000462947  Unknown 
GC_0000448936  Function unknown 
GC_00004467254MMPLCOGDefense mechanismsPredicted exporter protein, RND superfamily (MMPL) (PDB no. 5KHN) drug exporter
GC_00004294*395AraJCOGCarbohydrate transport and metabolismPredicted arabinose efflux permease AraJ, MFS family (AraJ) (PDB no. 4LDS)
GC_00004680160  Unknown 
GC_0000472534  Unknown 
GC_0000430469Fic_NCOGTranscriptionFic family protein (PDB no. 3CUC)
GC_0000418646  Unknown 
GC_00004560*62HIS2COGAmino acid transport and metabolismHistidinol phosphatase or related hydrolase of the PHP family (HIS2) (PDB no. 1M65)
GC_0000463133  Unknown 
GC_0000435780  Unknown 
GC_0000474264  Unknown 
GC_0000473530  Unknown 
GC_00004255104 MantisFunction unknown 
GC_0000445061  Unknown 
GC_0000430872 MantisFunction unknown 
GC_0000424938 MantisFunction unknown 
GC_0000434039  Unknown 
GC_0000471634  Unknown 
GC_0000444633  Unknown 
GC_0000472866 MantisFunction unknown 
 
Bombus group5GC_00002418120  Unknown 
 
Apis-Bombus groupsGC_00001509376 MantisFunction unknown 
GC_0000137168 MantisFunction unknown 
GC_00001428*270NlpACOGInorganic ion transport and metabolismABC-type metal ion transport system, periplasmic component/surface antigen (NlpA) (PDB no. 1P99)
GC_00001520548DAK1COGCarbohydrate transport and metabolismDihydroxyacetone kinase (DAK1) (PDB no. 1UN8)
GC_00001461309DnaJCOGPosttranslational modification, protein turnover, chaperonesDnaJ-class molecular chaperone with C-terminal Zn finger domain (DnaJ) (PDB no. 1BQ0)
GC_00001484273YvaKCOGSecondary metabolites biosynthesis, transport and catabolismEsterase/lipase (YvaK) (PDB no. 4DIU)
GC_00001382118 MantisFunction unknown 
GC_00001454299YjjUCOGLipid transport and metabolismPredicted phospholipase, patatin/cPLA2 family (YjjU)
GC_00001527359 MantisFunction unknown 
GC_00001467142 MantisFunction unknown 
GC_00001482268NosYCOGPosttranslational modification, protein turnover, chaperonesABC-type transport system involved in multicopper enzyme maturation, permease component (NosY)
GC_00001439381 MantisFunction unknown 
GC_00001398208 MantisFunction unknown 
GC_00001393135EmrECOGDefense mechanismsMultidrug transporter EmrE and related cation transporters (EmrE) (PDB no. 2I68)
GC_00001469144ElaACOGGeneral function prediction onlyPredicted N-acyltransferase, GNAT family (ElaA) (PDB no. 1XEB
GC_00001463296SpeBCOGAmino acid transport and metabolismArginase/agmatinase family enzyme (SpeB) (PDB no. 1CEV)
GC_00001414118EmrECOGDefense mechanismsMultidrug transporter EmrE and related cation transporters (EmrE) (PDB no. 2I68)
GC_00001408*277NlpACOGInorganic ion transport and metabolismABC-type metal ion transport system, periplasmic component/surface antigen (NlpA) (PDB no. 1P99)
GC_00001441142 MantisFunction unknown 
GC_00001474186 MantisGeneral function prediction onlyPeptidase activity
GC_00001539248MtnXCOGAmino acid transport and metabolism2-Hydroxy-3-keto-5-methylthiopentenyl-1-phosphate phosphatase (methionine salvage) (MtnX) (PDB no. 2FEA)
GC_00001507251 COGMobilome: prophages, transposonsPhage repressor protein C, contains Cro/C1-type HTH and peptidase s24 domains
GC_00001374530AspBCOGAmino acid transport and metabolismAspartate/methionine/tyrosine aminotransferase (AspB) (PDB no. 2O0R)
GC_00001419471ArgDCOGAmino acid transport and metabolismAcetylornithine aminotransferase/4-aminobutyrate aminotransferase (ArgD) (PDB no. 1SF2)
GC_00001464225YaeF/YiiXMantisAmino acid transport and metabolismPermuted papain-like amidase enzyme, YaeF/YiiX, C92 family

Functional analyses were performed with anvi’o (67), using COGs (68) and Mantis (53). Pathways correspond to COG pathways as indicated by anvi’o. Unknown genes correspond to genes without hits (unseen in databases).

Asterisks indicate putative HGT events.

Functional analysis of the specific genes Functional analyses were performed with anvi’o (67), using COGs (68) and Mantis (53). Pathways correspond to COG pathways as indicated by anvi’o. Unknown genes correspond to genes without hits (unseen in databases). Asterisks indicate putative HGT events. Eleven of the 40 identified specific genes represented defense mechanisms, which are important for gut colonization in bees (14) (Table 2 and Fig. 3). The two most conserved defense mechanisms (4 genes each) corresponded to drug exporter and Rhs proteins (Table 2). These two defense mechanisms generate similar benefits and can provide competitive advantages. Steele and Moran (34) recently demonstrated that Rhs proteins represent different toxins, which are injected in neighboring cells through a type VI secretion system (T6SS) (2, 32–34). The four Rhs proteins found in the present study were specific genes of Bombus group3, a gut symbiont thus far detected only in B. appositus. The presence of specific toxins in these three strains might explain the specificity for their host by T6SS mediated-competition. The last three defense mechanisms included proteins for glycosylation of phage DNA (Bombus group1), sensing of invasive viruses, and antibiotic production (Apis group1) (Table 2). Apis group1 possessed a specific protein closely related to the 2-oxo-3-(phosphooxy)propyl 3-oxoalkanoate synthase, which triggers the production of the antibiotic virginiamycin in Streptomyces virginiae (58). Among the remaining identified specific genes, 9 genes belonged to the metabolic category amino acid transport and metabolism, 4 genes belonged to inorganic ion transport and metabolism, and 4 genes belonged to carbohydrate transport and metabolism (Fig. 3). Because host gut environments are scarce in iron and amino acids, de novo biosynthetic pathways and iron importers have been reported to be essential for gut host colonization (59). Specific genes linked to carbohydrate metabolism might provide a benefit linked to a carbohydrate-rich diet (1), even if such genes are more commonly found in Gilliamella than Snodgrassella genomes (30). The 12 remaining identified specific genes represented different metabolic pathways with unclear function in, or benefit to, the host (Table 2).

Horizontal gene transfer.

Horizontal gene transfer is a process that plays a major role in bacterial evolution. Because of the spatial proximity of Snodgrassella and other gut bacteria during 80 million years of coevolution, HGT events may have impacted the specific gene complement. We investigated the potential presence of such events by using MetaCHIP (60). We included all bee gut symbiont genomes available in RefSeq (561 genomes: i.e., 75 of Snodgrassella plus 486 of representatives of other bee gut symbionts) to maximize the odds of HGT detection and detected only 57 events of HGT in which Snodgrassella spp. were a donor or recipient (Fig. S5 and Table S3). This is less than the 87 events reported by Kwong et al. (2). MetaCHIP is designed to infer HGT from genomes assembled from microbial communities (60) and detects potential HGT events using a BLAST best-hit approach and then validates these hits by performing a duplication-transfer-loss (DTL) reconciliation between a species tree and an individual gene tree. The DTL filter used by MetaCHIP is more stringent than the BLAST E value threshold of 1e−50 used by Kwong et al. (2), which may explain this difference. Gene flow between phylotype genomes of RefSeq bee gut symbionts. HGT events were inferred with MetaCHIP on the 561 phylotype genomes (75 Snodgrassella plus 486 other bacterial bee gut phylotypes) present in RefSeq. Taxonomy of the genomes was determined by GTDBtk. Bands connect donors and recipients of HGT events (as determined using MetaCHIP); colors correspond to the donor of the genes. Widths of bands correspond to the number of HGTs, also indicated on the border of the circular plot. Download FIG S5, PDF file, 0.03 MB. HGT events reported between the 486 phylotype genomes of RefSeq and Snodgrassella. Functional analyses were performed with Mantis. Download Table S3, PDF file, 0.4 MB. None of the 107 specific genes described above was present among the 57 HGT-affected genes detected by MetaCHIP using BLASTP and a filter on 98% identity (Table S3). We investigated the absence of HGT in the group of 107 specific genes further by enriching the comparison with orthologous sequences taken from the 561 gut symbiont genomes used above and with orthologous sequences from the 247 genomes unrelated to the bee gut ecosystem, thereby also covering bacterial and archaeal diversity (Fig. 4). This allowed us to analyze the taxonomic diversity of the specific genes and revealed that 68 genes were unique to Snodgrassella, while 39 genes were found in other bacteria too (27 genes in multiple other bacteria and 12 genes in only one other bacterium) (Fig. 4). The evolutionary history of the 68 genes unique to Snodgrassella is explained more easily by duplication in Snodgrassella than by acquisition by HGT or gene loss in all other bacteria. In contrast, the 27 specific genes found in other bee gut symbionts or in other bacteria unrelated to the bee gut microbiome suggest that gene loss in other Neisseriaceae was the most plausible evolutionary path of the genes. We computed individual gene trees for these 27 OGs (available at https://github.com/Lcornet/SNOD) and compared them manually to species trees (Fig. S6). For only 8 genes out of the 27 (indicated in Table 2), Snodgrassella sequences clustered with those of other bee gut symbionts, which may suggest HGT events. For 12 genes (out of 107), no convincing evidence of HGT was detected because these genes were only present in two species, making the comparison to the species tree impossible. Ten of the latter genes were detected in Gilliamella and Snodgrassella only, and two were detected in Snodgrassella and Frischella only. These genes included three out of four specific Rhs proteins of Bombus group3, which were shared with Gilliamella genomes.
FIG 4

Orthologous enrichment of group-specific genes. A total of 107 specific genes from bee gut bacteria (including 75 Snodgrassella genomes) were enriched with sequences from 486 genomes from bee gut bacteria and 247 genomes from bacteria not related to the bee gut ecosystem, using Forty-Two (75, 76). The y axis represents the absolute number of sequences, and the x axis represents individual genes. Genes were grouped by species group: A.G.1 for Apis group1 (19 genes), A.G.2 for Apis group2 (12 genes), A.G.1&2 for the two Apis groups together (4 genes), B.G.1 for Bombus group1 (10 genes), B.G.3 for Bombus group3 (12 genes), B.G.4 for Bombus group4 (24 genes), B.G.5 for Bombus group5 (1 gene), and A.B.G for Apis and Bombus groups together (25 genes).

Orthologous enrichment of group-specific genes. A total of 107 specific genes from bee gut bacteria (including 75 Snodgrassella genomes) were enriched with sequences from 486 genomes from bee gut bacteria and 247 genomes from bacteria not related to the bee gut ecosystem, using Forty-Two (75, 76). The y axis represents the absolute number of sequences, and the x axis represents individual genes. Genes were grouped by species group: A.G.1 for Apis group1 (19 genes), A.G.2 for Apis group2 (12 genes), A.G.1&2 for the two Apis groups together (4 genes), B.G.1 for Bombus group1 (10 genes), B.G.3 for Bombus group3 (12 genes), B.G.4 for Bombus group4 (24 genes), B.G.5 for Bombus group5 (1 gene), and A.B.G for Apis and Bombus groups together (25 genes). Gut bacterium phylogenomic tree, used as a species tree for HGT inference. The maximum likelihood tree was inferred on 76 core genes (unicopy genes determined by CheckM) under the PROTGAMMALGF model with RAxML from a supermatrix of 843 organisms by 12,930 unambiguously aligned amino acid positions. Bootstrap support values are shown at the nodes. Download FIG S6, PDF file, 0.02 MB.

Conclusion.

In the present study, we used highly conserved Neisseriaceae core genes to perform multiple phylogenomic analyses. We demonstrated monophyly of Snodgrassella strains isolated from bumblebees and paraphyly of strains isolated from honeybees. We also demonstrated that Snodgrassella strains from Asian honeybees are an early diverging group. Combined with ANI analyses, our data further indicated that this genus comprises at least seven species. Below we describe and formally name two new Snodgrassella species from bumblebees: i.e., S. gandavensis sp. nov. for Bombus group4 (with LMG 30236 as the type strain) and S. communis sp. nov. for Bombus group5 (with LMG 28360 as the type strain). We detected 107 specific genes among these seven species. For the large majority of these 107 genes, there was no evidence for HGT. Functional analyses revealed the importance of small proteins, defense mechanisms, amino acid transport and metabolism, inorganic ion transport and metabolism, and carbohydrate transport and metabolism among these specific genes.

Description of Snodgrassella gandavensis sp. nov.

Snodgrassella gandavensis (gan.da.ven′sis. M.L. masc. adj. gandavensis, of Gandavum, the Latin name for Ghent, referring to the place where these bacteria were first isolated). Cells are nonmotile, Gram stain-negative rods about 1.2 μm long and 0.8 μm wide that occur singly or in pairs. Optimal growth is on rich agar medium at 37°C in a CO2-enriched atmosphere. S. gandavensis grows under anaerobic conditions when supplemented with 10 mM KNO3, but not under aerobic conditions. The bacterium is positive for catalase activity and nitrate reduction and negative for oxidase activity. The type strain is LMG 30236 (=CECT 30450), which was isolated in 2015 from the gut of Bombus pascuorum sampled in 2015 in Wetteren, Belgium. The whole-genome sequence of LMG 30236T has a size of 2.51 Mbp. The DNA G+C content is 43.76 mol%. The whole-genome sequence is publicly available under accession no. GCA_914768025.1. The 16S rRNA gene sequence is publicly available under accession no. OU943324. Additional information is provided in Table 3 and the supplemental material.
TABLE 3

Differential characteristics between Snodgrassella alvi, Snodgrassella gandavensis, and Snodgrassella communis

CharacteristicResult fora:
S. alvi NCIMB 14803TS. gandavensis (Bombus group4)
S. communis (Bombus group5)
LMG 30236TR-53680LMG 28360TR-53528R-54236
Growth on BHI agar++++
Growth on AC agar with 2% NaCl++ww++++
Urease production+++
Growth on DNase agar+
Hydrolysis of Tween 20++

+, present; −, absent; w, weak reaction.

Differential characteristics between Snodgrassella alvi, Snodgrassella gandavensis, and Snodgrassella communis +, present; −, absent; w, weak reaction.

Description of Snodgrassella communis sp. nov.

Snodgrassella communis (com.mu’nis. L. fem. adj. communis, common, because of its wide host range). Cells are nonmotile, Gram stain-negative rods about 1.2 μm long and 0.8 μm wide that occur singly or in pairs. Optimal growth is on rich agar medium at 37°C in a CO2-enriched atmosphere. S. communis grows under anaerobic conditions when supplemented with 10 mM KNO3. Growth under aerobic conditions is strain dependent. The organism is positive for catalase activity and nitrate reduction and negative for oxidase activity. The type strain is LMG 28360 (=CECT 30451), which was isolated in 2013 from the gut of Bombus terrestris sampled in 2013 in Ghent, Belgium. The whole-genome sequence of LMG 28360T has a size of 2.31 Mbp. The DNA G+C content is 43.26 mol%. The whole-genome sequence is publicly available under accession no. GCA_914068745.1. The 16S rRNA gene sequence is publicly available under accession no. OU943323. Additional information is provided in Table 3 and the supplemental material.

MATERIALS AND METHODS

All custom scripts specifically developed for this study are available at https://github.com/Lcornet/SNOD.

Snodgrassella whole-genome sequences.

Whole-genome sequences of 75 Snodgrassella isolates were analyzed. To this end, all Snodgrassella genomes from the Reference Sequence database of NCBI (RefSeq) (61, 62) were downloaded on 25 June 2021. CheckM v1.1.3 (63) with the lineage_wf option was used to identify genomes with contamination levels below 5% and completeness above 95%, which yielded 66 RefSeq genomes. In addition, we determined whole-genome sequences of nine bumblebee isolates from an earlier study (15). Together, the genomes of 48 isolates originated from honeybee gut samples (from 4 species) and 27 from bumblebee gut samples (from 14 species). Assembly quality metrics for all 75 genomes were obtained with QUAST v5.0.2 (64). The isolates, along with their geographic origin, accession numbers, and CheckM and QUAST parameters, are listed in Table 1.

DNA extraction.

Genomic DNA was isolated using a Maxwell 16 tissue DNA purification kit (catalog no. AS1030) and a Maxwell 16 instrument (catalog no. AS2000). The integrity and purity of the DNA were evaluated on a 1.0% (wt/vol) agarose gel and by spectrophotometric measurements at 234, 260, and 280 nm. A Quantus fluorometer and a QuantiFluor ONE double-stranded DNA (dsDNA) system (Promega Corporation, Madison, WI, USA) were used to estimate the DNA concentration.

Library construction and genome sequencing.

Library preparation and whole-genome sequencing were performed by the Oxford Genomics Centre (University of Oxford, United Kingdom). Library preparation was performed using an adapted protocol of the NEB prep kit. Paired-end sequence reads (PE150) were generated using an Illumina NovaSeq 6000 platform (Illumina, Inc., USA).

Genome assembly.

Quality checking and trimming of the raw sequence reads and de novo genome assembly were performed using the Shovill v1.1.0 pipeline (https://github.com/tseemann/shovill), which uses SPAdes v3.14.0 (65) as its core and which subsamples reads to a sequencing depth of 150×. Contigs shorter than 500 bp were excluded from the final assembly. Quality checking of the assembly was performed using QUAST v5.0.2 (64) and CheckM v1.1.3 (63).

Core gene analysis.

ToRQuEMaDA (66), 10 June 2021 version, was used to select genomes representing the diversity of other Neisseriaceae (i.e., bacteria not belonging to the genus Snodgrassella). We selected 35 Neisseriaceae genomes using the following options of tqmd_cluster.pl: dist-threshold of 0.90, kmer-size of 12 and max-round of 10 and CheckM v1.1.3 activated (see Table S1 in the supplemental material). The 110 genomes were then imported into the pangenomic workflow of anvi’o v7.1 (67). The genomes were first loaded individually into anvi’o using the anvi-gen-contigs-database script with default options, and NCBI COGs (68) were associated with each genome database using the anvi-run-ncbi-cogs script with default options. The consolidated database was constructed using the anvi-gen-genomes-storage script with default options and anvi-pan-genome script with the following options: min-occurrence of 2 and mcl-inflation of 10. The protein sequences of all genomes, along with their COG annotations, were extracted from the anvi’o database using the anvi-get-sequences-for-gene-clusters script with default options and the anvi-summarize script with default options, respectively. Functional and geometric indices of anvi’o were computed using the anvi-compute-gene-cluster-homogeneity script with default options. Orthologous groups (OGs) were reconstructed from anvi’o output files using the custom script anvio_pan-to-OGs.py with default options. Finally, core genes were selected using the custom script anvio_OGs-filtration.py with the following options: pfilter set to yes, fraction set to 1, unwanted orgs limit set to 0, cfilter set to yes, maxcopy set to 1, hfilter set to 1, maxfunctional index set to 0.8 and maxgeometricindex set to 0.8. These settings resulted in the selection of 254 core genes out of 11,185 OGs, present in 100% of the 110 Neisseriaceae genomes. Information about the 35 Neisseriaceae genomes used in the study. Download Table S1, PDF file, 0.4 MB.

Phylogenomic analyses. (i) Neisseriaceae phylogeny.

The 254 protein OGs were aligned using the anvi’o workflow. Conserved sites were selected using BMGE v1.12 (69) with moderately severe settings (entropy cut-off = 0.5, gap cut-off = 0.2). A supermatrix of 110 organisms by 85,654 unambiguously aligned amino acid positions (0.15% missing character states) was generated using SCaFoS v1.30k (70), with default settings. A Neisseriaceae phylogenomic analysis was inferred using RAxML v8.1.17 (71) with 100 bootstrap replicates under the PROTGAMMALGF model.

(ii) Snodgrassella phylogeny.

Populibacter corticis (GCF_001590725.1) was selected as an outgroup for the Snodgrassella phylogenomic analysis. The concatenated sequences of 75 Snodgrassella and P. corticis genomes were extracted from the 254-core-gene supermatrix of Neisseriaceae to produce a supermatrix of 76 organisms by 85,654 unambiguously aligned amino acid positions (0.04% missing character states), from which a phylogenomic tree was inferred with RAxML as described above. A leave-one-out analysis was then performed by randomly deleting ≈20% of genes present in our data set. To this end, the corresponding alignments were also reduced to 76 organisms and used to construct 100 data sets of about 70,000 conserved positions by randomly combining alignment files using the script jack-ali-dir.pl from Bio-MUST-Core (available at https://metacpan.org/dist/Bio-MUST-Core). The 100 supermatrices were assembled using SCaFoS v1.30k (70) with default settings. Trees were inferred using RAxML v8.1.17 (71) using the fast experimental tree search method and the PROTGAMMALGF model. A consensus tree was built from the set of 100 trees using the program consense v3.695 (from the PHYLIP package [72], but modified to handle long sequence names), with default settings. In order to test the possible influence of a long branch attraction artifact, the outgroup was eliminated from the supermatrix, and a new phylogenomic analysis was performed with the same protocol as described above but on a matrix of 75 organisms by 85,654 unambiguously aligned amino acid positions (0.03% missing character states). We checked the effect of taxon sampling reduction by deleting 40 Snodgrassella strains from our supermatrix, while conserving their diversity and the outgroup, using dRep v2.2.3 (43) with default settings. A phylogenetic analysis was inferred with the same protocol described above but using a matrix of 36 organisms by 85,654 unambiguously aligned amino acid positions (0.22% missing character states). Subsequently, nucleotide-based phylogenomic trees were inferred from the 254 core genes. Protein sequence alignments were back-translated by capturing and aligning the corresponding DNA sequences with the program leel (available at https://metacpan.org/dist/Bio-MUST-Apps-FortyTwo). A supermatrix of 76 organisms by 264,981 aligned nucleotides (1.59% of missing character states) was generated using SCaFoS v1.30k (70). A large phylogenomic analysis was inferred using RAxML v8.1.17 (71) with 100 bootstrap replicates under the GTRGAMMA model and using two different partitions for codon positions 1 and 2 together (here, “1&2”) and position 3. As for the protein trees, leave-one-out analyses were then performed by random selection of genes to construct 100 data sets of about 100,000 aligned nucleotides. The consensus trees were produced with the same protocol described above, but using the GTRGAMMA model of RAxML v8.1.17 (71). Two leave-one-out analyses were generated: one using only codon positions 1&2 and one using two different partitions for codon positions 1&2 and 3.

Gut bacterial phylogeny.

A total of 486 genomes representing the main phylotypes of bacteria found in the gut of Apis spp. and Bombus spp. (excluding Snodgrassella) were downloaded from RefSeq, prior to filtering the genomes with CheckM v1.1.3 (63) as described above. Also, 247 additional genomes covering a broad diversity of bacteria and archaea were selected with ToRQuEMaDA (66) using the following options: dist-threshold of 0.90, kmer-size of 12 and max-round of 10, and CheckM activated (Table S2). To perform a large phylogenomic analysis, we used core genes extracted by CheckM. The CheckM taxon set option was first run to produce a lineage set file. Then the CheckM analyze option was run, using the bacterial domain set while providing 843 genomes as input (i.e., 75 Snodgrassella genomes, 35 Neisseriaceae genomes, and the 733 genomes mentioned above). The CheckM qa option was then used to produce the marker files. The custom script Checkm-to-OGs.py allowed us to reconstruct OGs from CheckM marker files, selecting only unicopy OGs, thereby producing 76 OGs. The OGs were aligned using MAFFT v7.453 (73), run with the anysymbol, auto and reorder parameters. Conserved sites were selected using BMGE v1.12 (69) with moderately severe settings (entropy cut-off = 0.5, gap cut-off = 0.2). A matrix of 843 organisms by 12,930 unambiguously aligned amino-acid positions (6.72% missing character states) was produced using SCaFoS v1.30k (70), with default settings. A phylogenomic tree was produced with the same protocol as above for large phylogenomics. Information about the 486 bacterial bee gut phylotypes (without Snodgrassella) and 247 other bacterial genomes (without the Neisseriaceae) used in the study. Download Table S2, PDF file, 0.6 MB.

Average nucleotide identity analysis.

Pairwise average nucleotide identities were determined for 75 Snodgrassella genomes and the P. corticis (GCF_001590725.1) genome using fastANI v1.32 (74) with default settings. A distinct anvi’o pangenomic data set using 75 genomes of Snodgrassella and P. corticis (GCF_001590725.1) was constructed using the same protocol described above. The custom script anvio_OGs-filtration.py was used to determine which genes occurred specifically in a given group of organisms. This specific gene analysis was performed on a set of 4,685 orthologous groups. We used the same options as for the core genes, with the exception that the number of representatives of a group was set to 0.6 (60%). The inflation number was set to 10 in anvi’o (67) with the value of 60% recovered orthologous groups, even if the inflation parameter was too strict for some of them. All members of a group had to comprise an OG before it was considered specific. An orthologous enrichment was then performed to compensate for the potential lack of sensitivity in orthology inference. In addition, to enrich the OGs with potentially absent members of the group due to the 60% threshold, the enrichment could also add any Snodgrassella from other groups to control that the added sequences were indeed specific and not the result of a false orthology inference. We performed the orthologous enrichment with Forty-Two v0210570 (75, 76) (available at https://metacpan.org/dist/Bio-MUST-Apps-FortyTwo). We used the 843 genomes that included the 75 Snodgrassella genomes to perform the enrichment. The custom script Confirm-OGs.py was used to validate the specific OGs based on the criteria that all Snodgrassella of the considered group, but no Snodgrassella from other groups, must be present in the OGs after enrichment. We performed this approach on all groups defined in the present study, as well as all pairwise combinations of them, including one with all groups together (i.e., all Snodgrassella strains). All nodes of the tree have also been tested to assess the presence of specific genes shared between groups defined in this study. A total of 107 individual gene phylogenies were computed by aligning the sequences using MAFFT v7.453 (73) with the same options described above, selecting the conserved sites with BMGE v1.12 (69) with the same options described above, and inferring the trees with RAxML v8.1.17 (71) with 100 bootstrap replicates under the PROTGAMMALGF model.

Metabolic analyses.

The functions of the group-specific OGs were first determined using NCBI COGs (68) if the information was available as a product of the anvi’o workflow. COG pathways and COG functions were used to label the metabolic functions. When COG analyses yielded no results, Mantis (53) was used with default settings, and the information from the consensus annotation file was used to label function. OGs without any hit in COG or Mantis analyses were considered unknown genes, while OGs having hits with proteins of unknown function were classified as “Unknown function.” GTDBtk (77) was used to infer the GTDB taxonomy (47) of the 561 phylotype genomes (75 Snodgrassella and 486 representatives of the other bacterial bee gut phylotypes [described above]). MetaCHIP v1.10.15 (60) with default settings was then used to infer HGT. The group-specific genes were then searched in the MetaCHIP results using BLASTp v2.2.28 (78) with a threshold of 98% of identity on 95% of query length. Twenty-seven out of 107 single-gene phylogenies, containing Snodgrassella and at least one additional bacterial genus, generated during the specific gene analysis were manually inspected for HGT.

Biochemical characterization.

Biochemical characteristics were determined for the bumblebee isolates LMG 28360T, R-53528, R-54236, LMG 30236T, and R-53680, and for the type strain of S. alvi, NCIMB 14803. Growth was tested on nutrient agar (Oxoid), tryptic soy agar (TSA) (Oxoid), brain heart infusion (BHI) agar (BD Difco), Columbia agar (Oxoid) supplemented with 5% sheep blood, and All Culture (AC) agar (2% tryptose, 0.3% beef extract, 0.3% yeast extract, 0.3% malt extract, 0.5% dextrose, 0.02% ascorbic acid, 1.5% agar [all wt/vol]) after 2, 3, and 4 days of incubation at 37°C in a CO2-enriched atmosphere (6% CO2, 15% O2) provided by CO2Gen Compact sachets (Oxoid). Hemolysis of sheep blood was checked. Growth was also tested at 37°C in ambient atmosphere (0.04% CO2, 20.95% O2) on AC agar and in an anaerobic atmosphere (9 to 13% CO2, <1% O2) provided by AnaeroGen sachets (Oxoid) on AC agar and on AC agar supplemented with 10 mM KNO3. For the tests mentioned below, cultures were incubated in a CO2-enriched atmosphere. The temperature growth range was tested after 2 days of incubation on AC agar at 4, 15, 20, 28, 37, 40, and 45°C. Cell and colony morphology, motility, oxidase and catalase activities and Gram stain reaction were assessed on cultures grown for 2 days on AC agar at 37°C. Motility was determined by examining wet mounts in broth by phase-contrast microscopy. Oxidase and catalase activities and Gram staining were tested using conventional procedures (79). The effect of NaCl on growth was investigated in AC broth supplemented with different concentrations of NaCl (0 to 10% with 1% intervals [wt/vol]) after 3 days of incubation. The pH range for growth was evaluated after 3 days of incubation in AC broth buffered at pH 4.0 to 9.0 at intervals of 1 pH unit using the following buffer systems: acetate buffer (pH 4.0 to 5.0), phosphate buffer (pH 6.0 to 8.0), and Tris-HCl (pH 9.0). Nitrate reduction and denitrification and urease, indole and H2S production were tested using standard microbiological procedures (79) after 72 h of incubation. Cells were grown on DNase agar (Sigma-Aldrich), AC agar with 0.8% (wt/vol) gelatin (Merck), AC agar with 0.8% (wt/vol) soluble starch, AC agar with 1.3% (w/vol) dried skim milk (Oxoid), and AC agar with 1% (vol/vol) Tween 20 or 80 and were checked for growth after 72 of incubation and for DNase activity, gelatinase activity, and hydrolysis of starch, casein, and Tween 20 and 80, respectively.

Data availability.

L. Cornet and P. Vandamme have submitted data to the European Nucleotide Archive (ENA) under project accession no. PRJEB47378, read accession no. SAMEA9570070 to SAMEA9570078, and genome accession no. GCA_914768015, GCA_914768025, GCA_914768035, GCA_914768045, GCA_914768055, GCA_914768065, GCA_914768075, GCA_914768085, and GCA_914768095. These database records are publicly available via the ENA website: https://www.ebi.ac.uk/ena/browser/home. Detailed species descriptions. Download Text S1, PDF file, 0.7 MB.
  75 in total

1.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

2.  Establishment of characteristic gut bacteria during development of the honeybee worker.

Authors:  Vincent G Martinson; Jamie Moy; Nancy A Moran
Journal:  Appl Environ Microbiol       Date:  2012-02-03       Impact factor: 4.792

Review 3.  Small bacterial and phagic proteins: an updated view on a rapidly moving field.

Authors:  Mélodie Duval; Pascale Cossart
Journal:  Curr Opin Microbiol       Date:  2017-10-27       Impact factor: 7.934

4.  Genomics of the honey bee microbiome.

Authors:  Nancy A Moran
Journal:  Curr Opin Insect Sci       Date:  2015-08-01       Impact factor: 5.186

5.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

Review 6.  Gut microbial communities of social bees.

Authors:  Waldan K Kwong; Nancy A Moran
Journal:  Nat Rev Microbiol       Date:  2016-05-03       Impact factor: 60.633

7.  Morphometric analysis of fossil bumble bees (Hymenoptera, Apidae, Bombini) reveals their taxonomic affinities.

Authors:  Manuel Dehon; Michael S Engel; Maxence Gérard; A Murat Aytekin; Guillaume Ghisbain; Paul H Williams; Pierre Rasmont; Denis Michez
Journal:  Zookeys       Date:  2019-11-21       Impact factor: 1.546

8.  UniProt: the universal protein knowledgebase in 2021.

Authors: 
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

9.  Phylotranscriptomic consolidation of the jawed vertebrate timetree.

Authors:  Iker Irisarri; Denis Baurain; Henner Brinkmann; Frédéric Delsuc; Jean-Yves Sire; Alexander Kupfer; Jörn Petersen; Michael Jarek; Axel Meyer; Miguel Vences; Hervé Philippe
Journal:  Nat Ecol Evol       Date:  2017-07-24       Impact factor: 15.460

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.