The verrucomicrobial subdivision 2 class Spartobacteria is one of the most abundant bacterial lineages in soil and has recently also been found to be ubiquitous in aquatic environments. A 16S rRNA gene study from samples spanning the entire salinity range of the Baltic Sea indicated that, in the pelagic brackish water, a phylotype of the Spartobacteria is one of the dominating bacteria during summer. Phylogenetic analyses of related 16S rRNA genes indicate that a purely aquatic lineage within the Spartobacteria exists. Since no aquatic representative from the Spartobacteria has been cultured or sequenced, the metabolic capacity and ecological role of this lineage are yet unknown. In this study, we reconstructed the genome and metabolic potential of the abundant Baltic Sea Spartobacteria phylotype by metagenomics. Binning of genome fragments by nucleotide composition and a self-organizing map recovered the near-complete genome of the organism, the gene content of which suggests an aerobic heterotrophic metabolism. Notably, we found 23 glycoside hydrolases that likely allow the use of a variety of carbohydrates, like cellulose, mannan, xylan, chitin, and starch, as carbon sources. In addition, a complete pathway for sulfate utilization was found, indicating catabolic processing of sulfated polysaccharides, commonly found in aquatic phytoplankton. The high frequency of glycoside hydrolase genes implies an important role of this organism in the aquatic carbon cycle. Spatiotemporal data of the phylotype's distribution within the Baltic Sea indicate a connection to Cyanobacteria that may be the main source of the polysaccharide substrates.
The verrucomicrobial subdivision 2 class Spartobacteria is one of the most abundant bacterial lineages in soil and has recently also been found to be ubiquitous in aquatic environments. A 16S rRNA gene study from samples spanning the entire salinity range of the Baltic Sea indicated that, in the pelagic brackish water, a phylotype of the Spartobacteria is one of the dominating bacteria during summer. Phylogenetic analyses of related 16S rRNA genes indicate that a purely aquatic lineage within the Spartobacteria exists. Since no aquatic representative from the Spartobacteria has been cultured or sequenced, the metabolic capacity and ecological role of this lineage are yet unknown. In this study, we reconstructed the genome and metabolic potential of the abundant Baltic Sea Spartobacteria phylotype by metagenomics. Binning of genome fragments by nucleotide composition and a self-organizing map recovered the near-complete genome of the organism, the gene content of which suggests an aerobic heterotrophic metabolism. Notably, we found 23 glycoside hydrolases that likely allow the use of a variety of carbohydrates, like cellulose, mannan, xylan, chitin, and starch, as carbon sources. In addition, a complete pathway for sulfate utilization was found, indicating catabolic processing of sulfated polysaccharides, commonly found in aquatic phytoplankton. The high frequency of glycoside hydrolase genes implies an important role of this organism in the aquatic carbon cycle. Spatiotemporal data of the phylotype's distribution within the Baltic Sea indicate a connection to Cyanobacteria that may be the main source of the polysaccharide substrates.
Representatives of the bacterial phylum Verrucomicrobia are morphologically diverse and are present in various terrestrial and aquatic habitats, including oligotrophic, eutrophic, extreme, polluted, and manmade ones (1). The few existing isolates have been collected from diverse ecological niches, including soil, freshwater, marine habitats, and feces, displaying aerobic, facultative anaerobic, or obligate anaerobic heterotrophic lifestyles. These Verrucomicrobia cultivars have the capacity to utilize various carbon compounds, such as plant polymers, for example, cellulose, xylan, and pectin (2, 3), and sugars or methane (4, 5). In addition, Verrucomicrobia have been found to be involved in nitrogen fixation in termites (6) and soil (7).A recent 16S rRNA study of 181 soils, sampled across different soil types and continents, found Verrucomicrobia to be highly abundant, averaging 23% of the 16S rRNA gene sequences per sample (8). This by far exceeded previous estimates, which was attributed to primer mismatches in commonly used 16S rRNA primers (8). In most of these samples, the Verrucomicrobia were dominated (92% in total) by sequences belonging to subdivision 2 class Spartobacteria (8). This class comprises one of the primary lineages in the phylum Verrucomicrobia (9) but has currently only one cultivated representative, Chthoniobacter flavus, an aerobic heterotrophic bacterium isolated from pasture soil that is able to grow on carbohydrate components of plant biomass (10). Although the majority of the Spartobacteria appear to be soil inhabiting (9), they have also been detected as endosymbionts of nematode worms (“Xiphinematobacter” [11]) and in aquatic environments (e.g., see references 12 to 15). In a global investigation of the distribution and diversity of marine Verrucomicrobia, Spartobacteria were negatively correlated with salinity and were the dominant group in most areas with low salinities (16). The authors speculated that aquatic Verrucomicrobia play a significant role in aquatic ecosystems. Although it is well known that bacteria are the main consumers of organic matter in the aquatic environment, the specific roles of individual organisms are not well understood (17). Key enzymes for degradation of polysaccharides derived from plant/algae biomass are glycoside hydrolases (GHs) containing single or multiple catalytic modules frequently attached to one or more accessory noncatalytic carbohydrate binding modules (CBMs) (18, 19). This class of enzymes is well represented in soil-isolated Verrucomicrobia, including Chthoniobacter flavus (20).We recently performed a 16S rRNA gene sequence analysis of 213 samples collected in summer that spanned the entire salinity range of the Baltic Sea (21). This revealed pronounced shifts in the bacterial community at different phylogenetic levels along the salinity gradient. The pyrosequencing reads in the brackish water environment (between salinities 5 and 8) were dominated (>10% of reads in many samples) by an operational taxonomic unit (OTU) belonging to the Spartobacteria. Despite elaborate attempts, no aquatic Spartobacteria have been isolated (e.g., see reference 22) and no genome has been sequenced so far. To get a better understanding of the physiology and ecological role of this organism, and of aquatic Spartobacteria in general, we applied shotgun metagenomics to a sample with high abundance of the Spartobacteria OTU, resulting in the first reconstruction of an aquatic Spartobacteria genome. The metagenome analysis revealed a rich repertoire of glycoside hydrolases, and the spatiotemporal distribution of the OTU suggests a connection to phytoplankton-derived polysaccharides.
RESULTS AND DISCUSSION
Metagenome of “Spartobacteria baltica” bin.
Metagenomic sequencing of the surface water sample yielded 37,658,923 bp of 454 pyrosequencing data that was assembled into 58,176 contigs. In order to isolate contigs belonging to the target genome, we used an emerging self-organizing map (ESOM) approach that clusters genome fragments into phylogenetic groups based on tetranucleotide frequency distributions (23, 24). Since our previous 16S rRNA gene sequencing indicated that Verrucomicrobia are highly dominated by a single operational taxonomic unit (OTU) in this sample (21), the risk for extensive coclustering of other verrucomicrobial genomes was considered low. The resulting ESOM contained a distinct region highly enriched in contigs with best BLAST matches to Verrucomicrobia (Fig. 1). The region was separated by a ridge, indicating large differences in tetranucleotide frequencies, from surrounding areas containing Actinobacteria, Bacteroidetes, Proteobacteria, and Cyanobacteria and a heterogeneous area containing a mixture of clades. The contigs within the region were assigned to a “Spartobacteria baltica” bin.
FIG 1
Emerging self-organizing map of the metagenome contigs. Pixels are colored according the taxonomic annotation of the contig(s) that occupies the pixel. Background color represents the distance in data space between the pixels in the neighborhood; hence the white ridges represent borders between regions of highly dissimilar tetranucleotide frequency distributions.
Emerging self-organizing map of the metagenome contigs. Pixels are colored according the taxonomic annotation of the contig(s) that occupies the pixel. Background color represents the distance in data space between the pixels in the neighborhood; hence the white ridges represent borders between regions of highly dissimilar tetranucleotide frequency distributions.The “Spartobacteria baltica” metagenome bin contained 334 contigs with an average length of 5.4 kb (7.0× mean coverage) and a total of 1,811,214 bp (Table 1). The contigs had a relatively high GC content (62%), which is in the range of currently sequenced Verrucomicrobia genomes (25), with the exception of the methanotrophic thermophile “Candidatus Methylacidiphilum infernorum.” The “Spartobacteria baltica” bin carries 2,226 predicted genes, of which 2,189 (98%) encode proteins. Only one contig with a 16S rRNA gene was found. Since the sequence coverage of the 16S rRNA gene (9×) was not significantly above the average of the “Spartobacteria baltica” bin (7×), the genome is likely to encode a single ribosomal RNA operon. The V3-V4 region of the 16S rRNA gene was identical to the sequence of the Spartobacteria OTU that we previously detected in high abundance in this sample (21). The 16S rRNA gene also had high similarity to cloned 16S sequences (97 to 99% identity) obtained earlier in the central Baltic Sea (22). Thirty-three genes encoding tRNAs for 17 standard amino acids were found (see Table S1 in the supplemental material). Of the protein-coding genes, 1,533 (69%) were functionally predicted, 621 (28%) were assigned to KEGG maps, 1,404 (63%) were assigned to clusters of orthologous groups (COGs), and 1,443 (65%) were assigned to specific domains in the Pfam database. In comparison with Chthoniobacter flavus, “Spartobacteria baltica” has a high overall functional assignment of the predicted proteins (Table 1).
TABLE 1
Comparison of metagenome/genome properties from the “Spartobacteria baltica” bin and Chthoniobacter flavus (56), the closest relative that has been genome sequenced
Property
“Spartobacteria baltica”
Chthoniobacter flavus
No. of genes
2,226
6,778
No. of bases
1,811,214
7,848,700
No. of coding bases
1,657,652
6,925,094
GC (%)
62
61
No. of DNA scaffolds
334
62
No. of RNAs
37
62
No. of rRNAs
3
4
5S
1
2
16S
1
1
23S
1
1
No. of tRNAs
34
58
No. of genes with function prediction (%)
1,533 (69)
3,584 (53)
No. of genes with Pfam assignment (%)
1,443 (65)
4,074 (60)
No. of genes with COG assignment (%)
1,541 (69)
3,658 (54)
Total no. of COG IDs[a]
964
1,426
No. of shared COG IDs[b]
831
831
No. of unique COG IDs[b]
133
595
No. of contigs
334
Avg coverage (×)
7.0
N50/L50
904,350/6,763
N75/L75
1,358,441/3,933
IDs, identifications.
Comparing Chthoniobacter flavus and “Spartobacteria baltica.”
Comparison of metagenome/genome properties from the “Spartobacteria baltica” bin and Chthoniobacter flavus (56), the closest relative that has been genome sequencedIDs, identifications.Comparing Chthoniobacter flavus and “Spartobacteria baltica.”To assess the completeness and purity of the “Spartobacteria baltica” bin, we used a set of 40 housekeeping genes known to normally occur in single copies (see Table S2 in the supplemental material) (26, 27). Thirty-eight of the 40 genes were found, distributed over 17 contigs. Three were found in multiple copies (duplicates), but two of these duplicates were found adjacent on the contig, indicating that sequencing errors had resulted in split genes. Placing the above-described 17 contigs in a reference phylogenetic tree based on their housekeeping genes (26) shows that all contigs are inferred to belong to the Verrucomicrobia (Fig. S1). Given that we recapture all but two of the housekeeping genes and that these, with few exceptions, occur in single copies, the “Spartobacteria baltica” bin likely represents a substantial fraction of a single genome (28), although some fragments are likely missing due to incomplete coverage (see Materials and Methods). The binning approach may also fail to group contigs having markedly different tetranucleotide compositions, such as recently acquired genome fragments of distant phylogenetic origin (29). Redoing the binning after spiking the metagenome with artificial contigs of the closest sequenced relative (Chthoniobacter flavus Ellin428) did, however, generate a cohesive cluster for this genome (Fig. S2), suggesting that the method should be accurate also for “Spartobacteria baltica” in this metagenomic context.An advantage with metagenomics compared to isolate sequencing is that it gives direct insight into population heterogeneity (e.g., see references 30 and 31). Manual inspection of patterns of single nucleotide polymorphisms on aligned reads in a random selection of “Spartobacteria baltica” contigs suggested that the metagenome bin represents a population of closely related strains (similar enough to coassemble) that undergo recombination (see Fig. S3 in the supplemental material for an example). Deeper genomic coverage is needed to assess the population structure in detail.
Phylogenetic analysis of “Spartobacteria baltica.”
A phylogenetic analysis of a concatenation of 31 of the single-copy genes confirmed our previous 16S rRNA-based placement of “Spartobacteria baltica” within the class Spartobacteria (21) in the superphylum Planctomycetes, Verrucomicrobia, and Chlamydia (PVC) (Fig. 2). The class Spartobacteria currently comprises one validly described order (Chthoniobacterales) and family (Chthoniobacteriaceae) (9). The pasture soil isolate Chthoniobacter flavus Ellin428 (10, 20) is the closest cultured and sequenced relative (Fig. 2) but is still phylogenetically distant from “Spartobacteria baltica” (0.48 branch length distance in Fig. 2). Interestingly, C. flavus has a nearly three-times-larger genome (Table 1), which corroborates an earlier estimate of genome sizes based on metagenome data, which indicated considerably larger bacterial genomes in soil than in marine environments (32). Based on 16S rRNA gene “Spartobacteria baltica” phylogeny of the Silva 111 SSU Ref NR tree (33), “Spartobacteria baltica” belongs to the lineage “LD29.” Detailed phylogenetic analysis of 16S rRNA genes from this lineage shows that solely environmental sequences derived from brackish, freshwater, and wastewater environments are found in this lineage (Fig. 3; see also Fig. S4 in the supplemental material). The most similar (99% identity) 16S rRNA sequences are from a freshwater lake in the Netherlands (GenBank accession number AF009975) and short sequences from the Baltic Sea (GenBank accession number EF627955). The lineage “LD29” was among the first 16S rRNA sequences of the Spartobacteria found by Zwart et al. (15). Therefore, we named the first genomically characterized phylotype of this lineage “Spartobacteria baltica.”
FIG 2
Maximum likelihood tree based on a concatenated alignment of 31 conserved genes of “Spartobacteria baltica” and representative genomes of Verrucomicrobia (10 representatives), Chlamydiae (5 representatives), Planctomycetes (8 representatives), and Spirochaetes/Actinobacteria (21 representatives). The tree was rooted using the Spirochaetes/Actinobacteria as an outgroup. All groups except the Verrucomicrobia have been grouped into wedges for clarity. Dots indicate bootstrap values of >98%.
FIG 3
Phylogenetic tree of nonredundant sequences of >1,200 bp in the Spartobacteria class obtained from the Silva database 111 SSU Ref NR. “Candidatus Methylacidiphilum infernorum” was used to root the tree. The tree was calculated using the RAxML algorithm with rapid bootstrap analysis (1,000 bootstraps). Only nodes supported by high bootstrap values are marked (filled circles, >95%). The origins of the sequences are indicated by the accession number, the isolation source, and the length of the sequence in bp.
Maximum likelihood tree based on a concatenated alignment of 31 conserved genes of “Spartobacteria baltica” and representative genomes of Verrucomicrobia (10 representatives), Chlamydiae (5 representatives), Planctomycetes (8 representatives), and Spirochaetes/Actinobacteria (21 representatives). The tree was rooted using the Spirochaetes/Actinobacteria as an outgroup. All groups except the Verrucomicrobia have been grouped into wedges for clarity. Dots indicate bootstrap values of >98%.Phylogenetic tree of nonredundant sequences of >1,200 bp in the Spartobacteria class obtained from the Silva database 111 SSU Ref NR. “Candidatus Methylacidiphilum infernorum” was used to root the tree. The tree was calculated using the RAxML algorithm with rapid bootstrap analysis (1,000 bootstraps). Only nodes supported by high bootstrap values are marked (filled circles, >95%). The origins of the sequences are indicated by the accession number, the isolation source, and the length of the sequence in bp.
Metabolism of “Spartobacteria baltica.”
A reconstruction of the energy metabolism by manual annotation of the metagenome bin revealed that “Spartobacteria baltica” uses a set of pathways typical of many aerobic heterotrophic organisms (see Table S3 in the supplemental material). Glucose can be converted to glucose 6-phosphate and degraded to pyruvate via the typical Embden-Meyerhof pathway (EMP). Pyruvate is further oxidized to acetyl coenzyme A (acetyl-CoA) that is used in the tricarboxylic acid cycle (TCA). The presence of fructose 1,6-bisphosphatase indicates the possibility for gluconeogenesis via the EMP, and the presence of genes coding for 2-oxoglutarate dehydrogenase, succinate dehydrogenase, and succinyl-CoA synthetase indicates a complete tricarboxylic acid cycle. The products of the TCA cycle and Embden-Meyerhof pathway are precursors of several amino acids. The pathways for the formation of l-alanine, l-valine, l-leucine l-isoleucine, l-serine, and l-glycine, starting with intermediates of the EMP, are fully represented by the corresponding genes (Table S3). Also, biosynthetic pathways for the formation of l-aspartate, l-glutamate, l-glutamine, l-proline, l-threonine, l-lysine, and l-histidine from precursors of the TCA cycle were found. The biosynthetic pathways for l-arginine, l-methionine, and l-cysteine are not complete. Although we found the genes for a complete pentose phosphate pathway, which is involved in the regeneration of NADPH but also generates precursors for l-tryptophane, l-phenylalanine, and l-tyrosine biosynthesis, a few genes for a complete pathway of these three amino acids are missing. However, these more-complex pathways may miss single enzymes due to incomplete genome coverage. The same also accounts for the biosynthesis of purine and pyrimidine nucleotides and the genes coding for lipopolysaccharides and peptidoglycan biosynthesis (Table S3). However, although Verrucomicrobia are described to have a Gram-negative staining cell wall, the class Opitutae has been reported to lack peptidoglycan (34), suggesting that Spartobacteria may also miss the corresponding genes.Although the organism seems to be capable of biosynthesis of amino acids, essential prerequisites for the use of N2, NO3− or NO2− for the generation of nitrogen precursors were not found. While their absence may reflect incomplete genomics coverage, many ABC-type transporters involved in spermidine/putrescine (potABCD), peptide (oppABCDF), and branched-chain amino acid (livKHMGF) uptake may indicate the uptake of organic nitrogen and recycling of the acquired ammonia groups were found in the genome (see Table S3 in the supplemental material). Moreover, chitin—a putative substrate of “Spartobacteria baltica”—can support the nitrogen requirements of the organism (see below) (35). The metagenome “Spartobacteria baltica” has a complete pstABSC transporter system putatively involved in the uptake of phosphate and ABC transporters for iron (fhuDBC) and zinc (znuABC).The sulfur metabolism is almost complete in the “Spartobacteria baltica” metagenome. Genes involved in the reduction of sulfate to hydrogen sulfide (via adenylylsulfate, 3′-phosphoadenylylsulfate, and sulfite), which is a precursor for the biosynthesis of l-cysteine by a cysteine synthase, were predicted. The sulfur metabolism plays an important role in the degradation of phytoplankton-derived polysaccharides since sulfated polysaccharides are frequently found in algae and Cyanobacteria (36). The metagenome bin also contains genes for a sulfate permease that may facilitate uptake of sulfate.
Polysaccharide-degrading enzymes.
The aerobic heterotrophic metabolism described above requires a carbon source for the generation of energy and as a substrate for anabolism. Interestingly, “Spartobacteria baltica” contains several genes encoding glycoside hydrolases (GHs), key enzymes to degrade polysaccharide compounds. In total, 23 GHs representing 13 different GH families, as defined in the carbohydrate-active enzyme database (CAZy), were detected (18), suggesting the use of several different substrates, like cellulose, mannan, xylan, chitin, and starch (Table 2).
TABLE 2
Comparison of CAZyme distributions in “Spartobacteria baltica” and the aquatic subdivision 1 Verrucomicrobia “Verrucomicrobium AAA168-F10” (46)
CAZy family
No. of “Spartobacteriabaltica”
No. of “Verrucomicrobium AAA168F10”
GH1
4
2
GH2
3
GH3
3
5
GH5
2
3
GH9
1
3
GH10
1
3
GH13
8
GH16
1
1
GH17
1
1
GH18
2
GH26
1
GH30
2
GH31
1
GH43
1
3
GH57
1
GH77
1
GH78
2
GH81
5
GH109
19
GH119
1
Comparison of CAZyme distributions in “Spartobacteria baltica” and the aquatic subdivision 1 Verrucomicrobia “Verrucomicrobium AAA168-F10” (46)
(i) Genes relevant for cellulose degradation.
Three identified GH-encoding genes may be relevant for cellulose degradation; two are GH5 members, and one belongs to the family GH9. One of the predicted GH5 proteins has a single GH5 catalytic module, whereas the other GH5 member is supplemented with additional modules, including a family 6 carbohydrate binding module (CBM6) (Fig. 4A). The first GH5 protein sequence cannot be assigned to any subfamily, although it is distantly related to subfamilies GH5_7 and GH5_41 (Fig. 4B). The closest relatives to this sequence are GH5 enzymes from Stackebrandtia nassauensis, a cellulolytic member of the Actinobacteria (37), and Lentisphaera araneosa, an exopolymer-producing bacterium (38). The second modular GH5 member can be assigned to the recently described subfamily GH5_46 (39) (Fig. 4C), a poorly biochemically characterized GH5 subfamily. Carboxymethyl cellulose (CMC) activity has been described for a GH5_46 subfamily isolated from cow rumen (40), which currently is the only characterized enzyme in this subfamily (see Fig. S5 in the supplemental material). Moreover, the appended CBM6 module is known to bind to various β-glycans (41). Notably, the genome of Chthoniobacter flavus Ellin42 also contains two genes coding for GH5 proteins. However, these two, and a second GH5 representative from the “Verrucomicrobium AAA168-F10” genome (42), do not cluster together with the “Spartobacteria baltica” genes in the phylogenetic analysis, and none of them can currently be assigned to any GH5 subfamily (data not shown).
FIG 4
Analyses of “Spartobacteria baltica” GH5 sequences. (A) Modular structure of the GH5 protein sequence (gene id 2119805716 in Table S3). (B) A maximum likelihood tree of selected bacterial GH5 catalytic module sequences, including “Spartobacteria baltica” (gene id 2119806690 in Table S3). (C) A maximum likelihood tree of the subfamily GH5_46, including “Spartobacteria baltica” (gene id 2119805716 in Table S3), and selected bacterial sequences from related GH5 subfamilies. The phylogenetic analysis was restricted to the catalytic module.
Analyses of “Spartobacteria baltica” GH5 sequences. (A) Modular structure of the GH5 protein sequence (gene id 2119805716 in Table S3). (B) A maximum likelihood tree of selected bacterial GH5 catalytic module sequences, including “Spartobacteria baltica” (gene id 2119806690 in Table S3). (C) A maximum likelihood tree of the subfamily GH5_46, including “Spartobacteria baltica” (gene id 2119805716 in Table S3), and selected bacterial sequences from related GH5 subfamilies. The phylogenetic analysis was restricted to the catalytic module.
(ii) Genes relevant for chitin degradation.
In addition, the “Spartobacteria baltica” metagenome bin also reveals two genes encoding candidate chitinolytic proteins belonging to the family GH18. One of the GH18 proteins contains two CBM2 modules and a CBM33 module. Another identified CBM33 module is independent, i.e., not connected to any catalytic module. Interestingly, members of CBM33 were recently shown to have enzymatic activity on insoluble substrates like chitin and cellulose (43, 44). Via a mechanism involving hydrolysis and oxidation, CBM33 enzymes boost degradation of chitin and cellulose by making crystalline polysaccharide regions accessible to enzymatic cleavage of GHs. The gene products from the three identified GH3 genes may harbor the chitobiase activity required for complete hydrolysis of chitin. A bacterial GH3 protein with N-acetylhexosaminidase activity has previously been reported to have a function in the chitin utilization system (45).
(iii) Other polysaccharide-degrading enzymes.
The genes encoding members of the GH families GH1, GH2, GH3, GH10, GH30, and GH43 represent candidates for the hydrolysis of noncellulosic poly- and oligosaccharides and the side branches of hemicelluloses and pectins. For instance, endo-1,4-β-xylanase activity has been described in the families GH10, GH30, and GH43. However, the “Spartobacteria baltica” GH30 sequences cannot be classified into any of the defined GH30 subfamilies, and the top BLAST hit for the “Spartobacteria baltica” GH43 protein is a β-xylosidase (GenBank accession number ACE82692) from Cellvibrio japonicus. Since almost all characterized enzymes in GH10 are endo-1,4-β-xylanases, it is plausible that the sequence assigned to GH10 in our study can exhibit this activity. The identified GH16 and GH17 sequences are most likely involved in degradation of laminarin, a β-1,3-glucan found mainly in brown algae (46). Of note, we discovered a family GH119 gene in the “Spartobacteria baltica” bin. Currently, GH119 contains only six members in the CAZy database, and the only biochemically characterized enzyme is an α-amylase (47).Representatives from Verrucomicrobia have been shown to degrade polysaccharides in soil (subdivision 4 [2], subdivision 2 [10], and termite subdivision 4 [6]). Recently, Martinez-Garcia et al. (42) were able to identify coastal and freshwater Verrucomicrobia as polysaccharide-degraders using fluorescently labeled laminarin and xylan in combination with single-cell genomics. The coastal “Verrucomicrobium AAA168-F10” from the family Verrucomicrobiaceae (subdivision 1) contains 58 glycoside hydrolases putatively involved in the degradation of mucopolysaccharides, glycoproteins, peptidoglycan, celluloses, hemicelluloses, and glycogen. One of the three GH5 sequences identified in AAA168-F10 also falls within subfamily GH5_46 and, although truncated, shows high similarity to the GH5_46 member of “Spartobacteria baltica” (see Fig. S5 in the supplemental material).
Ecological role of “Spartobacteria baltica.”
Utilization of polysaccharides by bacteria has been demonstrated in aquatic environments (17), but the identity and specific roles of the microbes performing this process are still elusive (48). “Spartobacteria baltica” has the genetic potential to use a variety of polysaccharides as carbon, nitrogen, and sulfur sources. In the marine environment, phytoplankton is a major source of such substrates, and a multitude of hydrolytic enzymes and sulfatases have been shown to be expressed during the decay of a phytoplankton bloom in associated bacteria (49). Previous studies reported a link between the dynamics of phytoplankton biomass and Spartobacteria in freshwater lakes (50, 51); moreover, Arnds et al. (12) found Spartobacteria cells attached to filamentous algae in a humic freshwater lake.In the central Baltic Sea, pronounced phytoplankton blooms occur seasonally, with spring blooms being dominated by eukaryotic phytoplankton and summer blooms being dominated by Cyanobacteria (52). In a previous study, the seasonal dynamics of surface water microbial communities in the central Baltic Sea (at the Landsort Deep) was investigated by 454 sequencing of amplicons of the V6 region of 16S rRNA genes (53). One of the most abundant OTUs in this data set is identical to the V6 region of the “Spartobacteria baltica” 16S rRNA gene. In the temporal study, the OTU displayed pronounced seasonal dynamics and peaked in July (with 5% of the reads) (see Fig. S6 in the supplemental material). This coincided roughly with blooms of filamentous Cyanobacteria (22), but the limited numbers of samples (n = 8) do not allow meaningful statistics. Instead, we used the 16S data from 213 samples of the Baltic Sea transect study (21) to search for spatial correlations between the “Spartobacteria baltica” OTU and other OTUs. Interestingly, the most highly correlated OTU was a picocyanobacterium (identical to Synechococcus/Cyanobium sequences from freshwater [54] and from the Baltic Sea [55], displaying a Spearman rank abundance correlation of 0.80 [P value of <10−16]) (Fig. S7). Hence, the spatial data indicate a connection to picocyanobacteria, but it should be noted that filamentous Cyanobacteria were not accurately quantified in the study, and correlations to these may therefore have been missed. Moreover, the genomic findings indicate that substrates may additionally originate from eukaryotic phytoplankton, such as chitin. Besides crustaceans and copepods, phytoplankton blooms of Thalassiosira and Skeletomena are considered important sources of chitin since they produce chitin strands to increase their buoyancy (56). These species are highly abundant during spring phytoplankton blooms in the Baltic Sea (57) and may therefore provide the substrate during this period.In summary, we have performed genomic analysis of the first aquatic representative of the Spartobacteria, one of the most abundant heterotrophic bacteria in the brackish Baltic Sea and other aquatic environments. The genome reveals a rich repertoire of polysaccharide-degrading genes, and the spatiotemporal data indicate ecological connections to phytoplankton. Further studies investigating seasonality and local distribution of microorganisms in the Baltic Sea will give more details on the interaction between aquatic Spartobacteria and phytoplankton; moreover, the enzymatic characterization of the glycoside hydrolases can give insight into their mode of action and substrate specificity.
MATERIALS AND METHODS
Sampling, DNA preparation, and sequencing.
The water sample was obtained on a research cruise (MSM0803) of the RV Maria S. Merian in June and July 2008 at 59°47.88′N, 24°46.75′E (see Herlemann et al. [21] for details). Water samples for DNA analysis were filtered (0.22-µm-pore-size white polycarbonate filters), and DNA was extracted according to Weinbauer et al. (58). The sample was sequenced at the Swedish Institute for Communicable Disease Control using 454 pyrosequencing (Roche) and a protocol for library preparation that allows minute amounts of sample DNA (59).
Metagenome assembly, binning, and annotation.
454 pyrosequencing reads were assembled using the Newbler assembler (Roche) with default parameter settings except that the “large” flag was used. Contigs with a size of ≥2 kb were subjected to phylogenetic binning by an emerging self-organizing map using the ESOM analyzer (60) based on tetranucleotide frequency distributions of contigs (23). The same parameter settings and initial data normalization as those used in Dick et al. (23) were applied, but a 50- by 80-pixel grid was used. Projecting contigs in the size range of 1 to 2 kb on the ESOM map that had already been generated with the longer contigs resulted in an additional 168 contigs (231,753 bp in total) falling in the “Spartobacteria baltica” region, indicating that a fraction of the genome was missing among the >2-kb contigs. However, since the approach is unreliable for contigs <2 kb (23, 24), to minimize the risk for assigning external contigs to the genome, we restricted the analysis to contigs with a size of >2 kb. For making the spiked metagenome, the draft genome of Chthoniobacter flavus Ellin428 was downloaded from NCBI and split into 5-kb-long “contigs” and added to the metagenome. When running the ESOM analyzer on this, an 80- by 110-pixel grid was used, with other settings as described above. A Perl program for generating input to the ESOM analyzer can be downloaded at https://github.com/tetramerFreqs/Binning. For coloring the contigs in the map according to (probable) phylum affiliation, contigs were BLASTx (61) searched against the NCBI nr database, and based on the BLAST output, MEGAN (62) was used to extract phylum-level annotations.All contig sequences were annotated with the IMG/M metagenome analysis pipeline (see Table S3 in the supplemental material) (63). Automatic annotations with functional predictions were also improved manually with the annotation platform provided by Integrated Microbial Genomes (64). Metabolic pathways were reconstructed using MetaCyc (65) as a reference data set. Detailed information about the automatic genome annotation can be obtained from the JGI IMG website (http://img.jgi.doe.gov/w/doc/about_index.html).
Construction of the 16S rRNA gene tree.
The metagenome revealed the complete 16S rRNA gene which was used for phylogenetic analysis. The phylogenetic 16S rRNA tree was constructed using the ARB program suite (66). All 16S rRNA spartobacterial sequences available in the Silva release 111 NR (33) were downloaded from the Silva browser (total of 631 sequences), the full-length sequence of “Spartobacteria baltica” was added, and “Candidatus Methylacidiphilum infernorum” was used as an outgroup. A core tree was estimated from 1,012 unambiguously aligned sequence positions of all nearly full-length (>1,200 bp) sequences (633 sequences), using maximum-likelihood analysis (RAxML) with rapid bootstrapping (1,000 replicates) and the GTRMIXI rate distribution model provided in the ARB package (Fig. 3). A total of 435 short sequences (>300 bp), positionally filtered by base frequency (50%), were added without changing the global tree topology by using the ARB parsimony tool (data not shown). Based on these results, a phylogenetic tree containing all sequences of >300 bp from the “LD29” lineage, including Chthoniobacter flavus as a reference and “Xiphinematobacteraceae” as an outgroup, was extracted (total of 168 sequences) (see Fig. S4 in the supplemental material). Phylogenetic trees were graphically processed using Fig tree (http://tree.bio.ed.ac.uk/software/figtree/).
Glycoside hydrolase identification and annotation.
The domain structures of automatically annotated glycoside hydrolases were manually curated using SMART (67), Pfam (68), and the Conserved Domain Database (69). Glycoside hydrolase family annotations were revised by comparison to the carbohydrate-active enzymes database (http://www.cazy.org) (18). For the phylogenetic analysis, sequences of GH5 catalytic domains were aligned using MUSCLE (70), and the phylogenetic trees were generated using PhyML (71). Bootstrap support was calculated using 100 replicates. Subfamilies GH5_1 and GH5_4 were used as outgroups in the phylogenetic analysis.
Phylogenomic analysis.
A phylogeny was estimated using a set of 31 conserved single-copy phylogenetic marker protein sequences, downloaded as HMMER3 HMM models (http://hmmer.janelia.org) from Pfam 26.0 (68) (PF00163, PF00203, PF00281, PF00347, PF00416, PF00828, PF03118, PF11987, PF00164, PF00237, PF00318, PF00366, PF00572, PF01000, PF03588, PF13393, PF00181, PF00238, PF00333, PF00410, PF00573, PF01193, PF03947, PF13603, PF00189, PF00252, PF00344, PF00411, PF00750, PF02403, PF10458). “Spartobacteria baltica” contigs were six-frame translated and searched with Pfam hmm profiles, as were the protein sequence complements of reference genomes. Marker proteins were identified in “Spartobacteria baltica” bin contigs and 44 microbial reference genomes based on the selection in the 2009 GEBA tree (72). The “Spartobacteria baltica” bin marker proteins were identified in twelve different contigs after six-frame translation. The sequences were aligned with Probcons (73) and analyzed with Zorro (74). Positions with a Zorro score of ≥6 were selected, and individual alignments were concatenated, producing an alignment with 7,597 well-aligned sites. A maximum likelihood tree was calculated with RAxML 7.2.8 using the LG substitution matrix (75) and a gamma model of rate heterogeneity (PROTGAMMALGF).
Nucleotide sequence accession number.
The complete metagenome (all sequence reads) of the sample has been deposited in the European Nucleotide Archive under accession number ERP002583.MLTreeMap analysis of all contigs in the “Spartobacteria baltica” ESOM bin. MLTreeMap (M. S. Stark et al., BMC Genomics 11:461, 2010) searches for phylogenetic markers and places them in a maximum likelihood phylogeny, in this case, the GEBA phylogeny (D. Wu et al., Nature 462:1056–1060, 2009). DownloadFIG S1, PDF file, 0.1 MBEmerging self-organizing map of the metagenome supplemented with artificial C. flavus contigs that were generated by splitting the Chthoniobacter flavus genome into 1,539 5-kb pieces. Pixels are colored according to the taxonomic annotation of the contig(s) that occupies the pixel. Background color represents the distance in data space between the pixels in the neighborhood; hence the white ridges represent borders between regions of highly dissimilar tetranucleotide frequency distributions. As can be seen, the C. flavus contigs form a cohesive cluster next to a cluster of metagenome contigs enriched in blast matches to Verrucomicrobia (representing the “Spartobacteria baltica” bin of Fig. 1). Only a few non-C. flavus contigs reside in the C. flavus region. All but one of these are binned as “Spartobacteria baltica” in Fig. 1. DownloadFIG S2, PDF file, 0.3 MBA 500-bp subset (red box) of a 25-kb contig (Contig06127) viewed in Strainer. White horizontal bars represent 454 reads, and colored vertical lines represent single nucleotide polymorphisms compared to the consensus (white) sequence. Reads A to C are 91 to 94% identical to the consensus sequence. Reads A and C seem to derive from a different strain than that represented by the consensus sequence, while read B appears to represent a recombinant of the two strains. Other contig regions display more-complex patterns, involving more strains and recombinants thereof, while yet other regions are purely clonal. DownloadFIG S3, PDF file, 0.1 MBPhylogenetic 16S rRNA tree of all 16S rRNA “LD29” sequences of >300 bp that are available in the Silva PARC release 111. Full-length sequences of the “LD29” lineage (14 sequences) (Fig. 3), including “Spartobacteria baltica” and Chthoniobacter flavus as references and “Xiphinematobacteraceae” as a root (GenBank accession numbers AF217460, AF217461, and AF217462), were used to calculate a core tree of 1,012 unambiguously aligned sequence positions, using maximum likelihood analysis (RAxML) with rapid bootstrapping (1,000 replicates). A total of 150 short sequences (>300 bp), affiliated with the “LD29” cluster and positionally filtered by base frequency (50%), were added without changing the global tree topology using the ARB parsimony tool. DownloadFIG S4, PDF file, 0.4 MBProtein sequence alignment of the catalytic region of selected GH5_46 members using MUSCLE. VbGH5_46A is “Spartobacteria baltica” gene id 2119805716 in Table S3, Verrucomicrobia SAG AAA168-F10, and “Cow rumen GH5” (GenBank accession number ADX05696) is currently the only characterized GH5_46 enzyme. The two catalytic glutamate residues are marked with an asterisk. DownloadFIG S5, PDF file, 0.2 MBSeasonal dynamics of “Spartobacteria baltica” based on data from the Landsort study (A. F. Andersson et al., ISME J 4:171–181, 2009). The representative sequence of one of the most abundant OTUs of this study matches perfectly to the V6 region of the “Spartobacteria baltica” 16S rRNA gene. The seasonal dynamics of the OTU is displayed, measured as a proportion of the total number of reads per sample. In the study, an average of 20,200 pyrosequencing reads of the V6 region of the 16S rRNA gene were obtained from each of eight surface water samples collected from May to October 2003 and in May 2004. DownloadFIG S6, PDF file, 0.1 MBSpatial correlation between the “Spartobacteria baltica” OTU and a picocyanobacterial OTU based on data from the Baltic Sea transect study (D. P. Herlemann et al., ISME J 5:1571–1579, 2011). For each sample, the relative abundance of “Spartobacteria baltica” is shown on the x axis and the relative abundance of the picocyanobacterium is shown on the y axis (in log scale). Samples are colored and sized according to salinity and depth, respectively. The Spearman rank order correlation ρ is 0.80, and the Pearson correlation r is 0.57 (both P values of <10−16). The sequence of the picocyanobacterium OTU is given here:AATCCCTTTCGCTCCCCTAGCTTTCGTCCATGAGCGTCAGTTATGGCCCAGCAGAGCGCCTTCGCCACTGGTGTTCTTCCCGATATCTACGCATTTCACCGCTACACCGGGAATTCCCTCTGCCCCTACCACACTCTAGTCTTACAGTTTCCATCGCCGAAATGGAGTTGAGCTCCACGTTTTAACGACAGACTTGTAAAACCGCCTGCGGACGCTTTACGCCCAATAATTCCGGATAACGCTTGCCACTCCCGTATTACCGCGGCTGCTGGCACGGAATTAGCCGTGGCTTATTCATCAAGTACCGTCAGATCTTCTTCCTTGATAAAAGAGGTTTACAGCCCAGAGGCCTTCATCCCTCACGCGGCGTTGCTCCGTC DownloadFIG S7, PDF file, 0.2 MBtRNA genes in the “Spartobacteria baltica” bin.TABLE S1, PDF file, 0.1 MB.Housekeeping genes used for evaluation and their copy numbers in the “Spartobacteria baltica” bin.TABLE S2, PDF file, 0.1 MB.Annotated genes of the “Spartobacteria baltica” bin.TABLE S3, PDF file, 0.5 MB.
Authors: Hanno Teeling; Bernhard M Fuchs; Dörte Becher; Christine Klockow; Antje Gardebrecht; Christin M Bennke; Mariette Kassabgy; Sixing Huang; Alexander J Mann; Jost Waldmann; Marc Weber; Anna Klindworth; Andreas Otto; Jana Lange; Jörg Bernhardt; Christine Reinsch; Michael Hecker; Jörg Peplies; Frank D Bockelmann; Ulrich Callies; Gunnar Gerdts; Antje Wichels; Karen H Wiltshire; Frank Oliver Glöckner; Thomas Schweder; Rudolf Amann Journal: Science Date: 2012-05-04 Impact factor: 47.728
Authors: John T Wertz; Eunji Kim; John A Breznak; Thomas M Schmidt; Jorge L M Rodrigues Journal: Appl Environ Microbiol Date: 2011-12-22 Impact factor: 4.792
Authors: Arjan Pol; Klaas Heijmans; Harry R Harhangi; Dario Tedesco; Mike S M Jetten; Huub J M Op den Camp Journal: Nature Date: 2007-11-14 Impact factor: 49.962
Authors: Zongli Zheng; Abdolreza Advani; Ojar Melefors; Steve Glavas; Henrik Nordström; Weimin Ye; Lars Engstrand; Anders F Andersson Journal: Nucleic Acids Res Date: 2010-04-30 Impact factor: 16.971
Authors: Ron Caspi; Tomer Altman; Kate Dreher; Carol A Fulcher; Pallavi Subhraveti; Ingrid M Keseler; Anamika Kothari; Markus Krummenacker; Mario Latendresse; Lukas A Mueller; Quang Ong; Suzanne Paley; Anuradha Pujar; Alexander G Shearer; Michael Travers; Deepika Weerasinghe; Peifen Zhang; Peter D Karp Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971
Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971
Authors: Ravin Poudel; Ari Jumpponen; Megan M Kennelly; Cary L Rivard; Lorena Gomez-Montano; Karen A Garrett Journal: Appl Environ Microbiol Date: 2019-01-09 Impact factor: 4.792
Authors: Fengxiao Zhu; Sean Storey; Mardiana Mohd Ashaari; Nicholas Clipson; Evelyn Doyle Journal: Environ Sci Pollut Res Int Date: 2016-12-26 Impact factor: 4.223
Authors: Olov Svartström; Johannes Alneberg; Nicolas Terrapon; Vincent Lombard; Ino de Bruijn; Jonas Malmsten; Ann-Marie Dalin; Emilie El Muller; Pranjul Shah; Paul Wilmes; Bernard Henrissat; Henrik Aspeborg; Anders F Andersson Journal: ISME J Date: 2017-07-21 Impact factor: 10.302
Authors: William D Orsi; Jason M Smith; Shuting Liu; Zhanfei Liu; Carole M Sakamoto; Susanne Wilken; Camille Poirier; Thomas A Richards; Patrick J Keeling; Alexandra Z Worden; Alyson E Santoro Journal: ISME J Date: 2016-03-08 Impact factor: 10.302
Authors: Johannes Alneberg; Brynjar Smári Bjarnason; Ino de Bruijn; Melanie Schirmer; Joshua Quick; Umer Z Ijaz; Leo Lahti; Nicholas J Loman; Anders F Andersson; Christopher Quince Journal: Nat Methods Date: 2014-09-14 Impact factor: 28.547
Authors: Marcela S Montecchia; Micaela Tosi; Marcelo A Soria; Jimena A Vogrig; Oksana Sydorenko; Olga S Correa Journal: PLoS One Date: 2015-03-20 Impact factor: 3.240