Rapidly thawing permafrost harbors ∼30 to 50% of global soil carbon, and the fate of this carbon remains unknown. Microorganisms will play a central role in its fate, and their viruses could modulate that impact via induced mortality and metabolic controls. Because of the challenges of recovering viruses from soils, little is known about soil viruses or their role(s) in microbial biogeochemical cycling. Here, we describe 53 viral populations (viral operational taxonomic units [vOTUs]) recovered from seven quantitatively derived (i.e., not multiple-displacement-amplified) viral-particle metagenomes (viromes) along a permafrost thaw gradient at the Stordalen Mire field site in northern Sweden. Only 15% of these vOTUs had genetic similarity to publicly available viruses in the RefSeq database, and ∼30% of the genes could be annotated, supporting the concept of soils as reservoirs of substantial undescribed viral genetic diversity. The vOTUs exhibited distinct ecology, with different distributions along the thaw gradient habitats, and a shift from soil-virus-like assemblages in the dry palsas to aquatic-virus-like assemblages in the inundated fen. Seventeen vOTUs were linked to microbial hosts (in silico), implicating viruses in infecting abundant microbial lineages from Acidobacteria, Verrucomicrobia, and Deltaproteobacteria, including those encoding key biogeochemical functions such as organic matter degradation. Thirty auxiliary metabolic genes (AMGs) were identified and suggested virus-mediated modulation of central carbon metabolism, soil organic matter degradation, polysaccharide binding, and regulation of sporulation. Together, these findings suggest that these soil viruses have distinct ecology, impact host-mediated biogeochemistry, and likely impact ecosystem function in the rapidly changing Arctic. IMPORTANCE This work is part of a 10-year project to examine thawing permafrost peatlands and is the first virome-particle-based approach to characterize viruses in these systems. This method yielded >2-fold-more viral populations (vOTUs) per gigabase of metagenome than vOTUs derived from bulk-soil metagenomes from the same site (J. B. Emerson, S. Roux, J. R. Brum, B. Bolduc, et al., Nat Microbiol 3:870-880, 2018, https://doi.org/10.1038/s41564-018-0190-y). We compared the ecology of the recovered vOTUs along a permafrost thaw gradient and found (i) habitat specificity, (ii) a shift in viral community identity from soil-like to aquatic-like viruses, (iii) infection of dominant microbial hosts, and (iv) carriage of host metabolic genes. These vOTUs can impact ecosystem carbon processing via top-down (inferred from lysing dominant microbial hosts) and bottom-up (inferred from carriage of auxiliary metabolic genes) controls. This work serves as a foundation which future studies can build upon to increase our understanding of the soil virosphere and how viruses affect soil ecosystem services.
Rapidly thawing permafrost harbors ∼30 to 50% of global soil carbon, and the fate of this carbon remains unknown. Microorganisms will play a central role in its fate, and their viruses could modulate that impact via induced mortality and metabolic controls. Because of the challenges of recovering viruses from soils, little is known about soil viruses or their role(s) in microbial biogeochemical cycling. Here, we describe 53 viral populations (viral operational taxonomic units [vOTUs]) recovered from seven quantitatively derived (i.e., not multiple-displacement-amplified) viral-particle metagenomes (viromes) along a permafrost thaw gradient at the Stordalen Mire field site in northern Sweden. Only 15% of these vOTUs had genetic similarity to publicly available viruses in the RefSeq database, and ∼30% of the genes could be annotated, supporting the concept of soils as reservoirs of substantial undescribed viral genetic diversity. The vOTUs exhibited distinct ecology, with different distributions along the thaw gradient habitats, and a shift from soil-virus-like assemblages in the dry palsas to aquatic-virus-like assemblages in the inundated fen. Seventeen vOTUs were linked to microbial hosts (in silico), implicating viruses in infecting abundant microbial lineages from Acidobacteria, Verrucomicrobia, and Deltaproteobacteria, including those encoding key biogeochemical functions such as organic matter degradation. Thirty auxiliary metabolic genes (AMGs) were identified and suggested virus-mediated modulation of central carbon metabolism, soil organic matter degradation, polysaccharide binding, and regulation of sporulation. Together, these findings suggest that these soil viruses have distinct ecology, impact host-mediated biogeochemistry, and likely impact ecosystem function in the rapidly changing Arctic. IMPORTANCE This work is part of a 10-year project to examine thawing permafrost peatlands and is the first virome-particle-based approach to characterize viruses in these systems. This method yielded >2-fold-more viral populations (vOTUs) per gigabase of metagenome than vOTUs derived from bulk-soil metagenomes from the same site (J. B. Emerson, S. Roux, J. R. Brum, B. Bolduc, et al., Nat Microbiol 3:870-880, 2018, https://doi.org/10.1038/s41564-018-0190-y). We compared the ecology of the recovered vOTUs along a permafrost thaw gradient and found (i) habitat specificity, (ii) a shift in viral community identity from soil-like to aquatic-like viruses, (iii) infection of dominant microbial hosts, and (iv) carriage of host metabolic genes. These vOTUs can impact ecosystem carbon processing via top-down (inferred from lysing dominant microbial hosts) and bottom-up (inferred from carriage of auxiliary metabolic genes) controls. This work serves as a foundation which future studies can build upon to increase our understanding of the soil virosphere and how viruses affect soil ecosystem services.
Anthropogenic climate change is elevating global temperatures, most rapidly at the poles (1). High-latitude perennially frozen ground, i.e., permafrost, stores 30 to 50% of global soil carbon (C; ∼1,300 Pg) (2, 3) and is thawing at a rate of ≥1 cm of depth yr−1 (4, 5). Climate feedbacks from permafrost habitats are poorly constrained in global climate change models (1, 6), due to the uncertainty of the magnitude and nature of carbon dioxide (CO2) or methane (CH4) release (7). A model ecosystem for studying the impacts of thaw in a high-C peatland setting is Stordalen Mire, in Arctic Sweden, which is at the southern edge of current permafrost extent (8). The Mire contains a mosaic of thaw stages (9), from intact permafrost palsas, to partially thawed moss-dominated bogs, to fully thawed sedge-dominated fens (10–13). Thaw shifts hydrology (14), altering plant communities (13) and shifting belowground organic matter (OM) toward more labile forms (11, 13), with concomitant shifts in microbiota (15–17) and C gas release (8, 10, 18–20). Of particular note is the thaw-associated increase in emissions of CH4, due to its ~25-times-greater climate forcing potential than CO2 (per kg, at a 100-year time scale [20, 157]), and the associated shifts in key methanogens. These include novel methanogenic lineages (15) with high predictive value for the character of the emitted CH4 (12). More finely resolving the drivers of C cycling, including microbiota, in these dynamically changing habitats can increase model accuracy (21) to allow a better prediction of greenhouse gas emissions in the future.Given the central role of microbes to C processing in these systems, it is likely that viruses infecting these microbes impact C cycling, as has been robustly observed in marine systems (22–27). Marine viruses lyse approximately one-third of ocean microorganisms per day, liberating C and nutrients at the global scale (22–24, 28), and viruses have been identified as one of the top predictors of C flux to the deep ocean (29). Viruses can also impact C cycling by metabolically reprogramming their hosts, via the expression of virus-carried auxiliary metabolic genes (AMGs) (28, 30). AMG classification is still in its infancy, with clear definitions still being established (31), but generally these genes are not involved in viral replication and instead allow viruses to directly manipulate host metabolism during infection. This metabolic manipulation potentially affects biogeochemistry, including marine C processing (31–35). In contrast, very little is known about soil virus roles in C processing, or indeed about soil viruses generally. Soils’ heterogeneity in texture, mineral composition, and OM content results in significant inconsistency of yields from standard virus capture methods (36–39). While many soils contain large numbers of viral particles (107 to 109 virus particles per gram of soil [37, 40–42]), knowledge of soil viral ecology has come mainly from the fraction that desorbs easily from soils (<10% in reference 43) and the much smaller subset that has been isolated (44).One approach to studying soil viruses has been to bypass the separation of viral particles, by identifying viruses from bulk-soil metagenomes; these are commonly referred to as microbial metagenomes but contain sequences of diverse origin, including proviruses and infecting viruses. Using this approach, several recent studies have powerfully expanded our knowledge of soil viruses and have highlighted the magnitude of genetic novelty that these entities may represent. An analysis of 3,042 publicly available assembled metagenomes spanning 10 ecotypes (19% from soils) increased by 16-fold the total number of known viral genes, doubled the number of microbial phyla with evidence of viral infection, and revealed that the vast majority of viruses appeared to be habitat specific (45). This approach was also applied to 178 metagenomes from the thawing permafrost gradient of Stordalen Mire (46), where viral linkages to potential hosts were appreciably advanced by the parallel recovery of 1,529 microbial metagenome-assembled genomes (MAGs) (17). This effort recovered ∼2,000 thaw-gradient viruses, more than doubling the known viral genera in RefSeq; identified linkages to abundant microbial hosts encoding important C-processing metabolisms such as methanogenesis; and demonstrated that CH4 dynamics was best predicted by viruses of methanogens and methanotrophs (46). Viral analyses of bulk-soil metagenomes have, thus, powerfully expanded knowledge of soil viruses and highlighted the large amount of genetic novelty that they represent. However, this approach is by nature inefficient at capturing viral signal, with typically <2% of reads identified as viral (46, 47). The small amount of viral DNA present in bulk-soil extracts can lead to poor or no assembly of viral sequences in the resulting metagenomes and omission from downstream analyses (discussed further in references 37, 39, 48, and 49). In addition, viruses that are captured in bulk-soil metagenomes likely represent a subset of the viral community, since >90% of free viruses adsorb to soil (43), and so, depending on the specific soil, communities, and extraction conditions, bulk-soil metagenomes are likely to be depleted for some free viruses and enriched for actively reproducing and temperate viruses.Examination of free viruses, while potentially a more efficient and comprehensive approach to soil viral ecology, requires optimized methods to resuspend them (50). Researchers have pursued optimized viral resuspension methods for specific soil types and metagenomically sequenced the recovered viral particles, generating viromes (reviewed in references 40, 42, and 51). In marine systems, viral ecology has relied heavily on viromes, since the leading viral particle capture method is broadly applicable, highly efficient, and relatively inexpensive (52), with now relatively well established downstream pipelines for quantitative sample-to-sequence (53) and sequence-to-ecological-inference (54–56) processing, collectively resulting in great advances in marine viromics (57). Due to the requirement of habitat-specific resuspension optimization, soil viromics is in its early stages. In addition, because particle yields are typically low, most soil virome studies have amplified extracted viral DNA using multiple displacement amplification, which renders the data sets both stochastically and systematically biased and nonquantitative (54, 58–63). The few polar soil viromes have been from Antarctic soils and further demonstrated the genetic novelty of this gene pool while suggesting that resident viral communities were dominated by tailed viruses, had high habitat specificity, and were structured by pH (51, 64, 65).Having previously optimized viral resuspension methods for the active layer of the permafrost thaw gradient in the Stordalen Mire (41), here we sequenced and analyzed a portion of the viruses recovered from that optimization effort, with no amplification beyond that minor, quantitative form inherent to sequencing library preparation. The seven resulting viromes yielded 378 genuine viral contigs, 53 of which could be classified as viral populations (virus operational taxonomic units [vOTUs]; approximately representing species-level taxonomy) (66–68). The goal of this effort was to efficiently target viral particle genomes via viromes from Stordalen Mire, investigate their ecology and potential impacts on C processing using a variety of approaches, and compare the findings to that of viral analyses of bulk-soil metagenomes from the work of Emerson et al. (46).
RESULTS AND DISCUSSION
Viruses in complex soils.
Using recently developed bioinformatics tools to characterize viruses from three different habitats along a permafrost thaw gradient, viral particles were purified from active-layer soil samples (i.e., samples from the upper, unfrozen portion of the soil column) via a previously optimized method tailored for these soils (41) (Fig. 1). DNA from viral particles was extracted and sequenced, to produce seven Stordalen Mire viromes (Table 1), spanning palsas (underlain by intact permafrost), bogs (here underlain by partially thawed permafrost), and fens (where permafrost has thawed entirely). The viromes ranged in size from 2 to 26 million reads, with an average of 19% (range, 5 to 32%) of the reads assembling into 28,025 putative soil viral contigs that clustered into 27,675 unique viral contigs (clustered at 95% average nucleotide identity [ANI] across 80% of the contig length) (2 clustered from 3 to 10 kb and 348 clustered at <3 kb; see Fig. S1 in the supplemental material). These contigs are putative soil viruses because they passed through an 0.45-μm filter and remained in the viral fractions for the CsCl densities. Among these, VirSorter predicted that 393 contigs were possible viruses (VirSorter categories 1, 2, and 3 [per reference 69]; see Materials and Methods and Table S1). After manual inspection, three putative plasmids were identified and removed (i.e., contigs 5, 394, and 3167 [Table S1]), along with two putatively archaeal viruses (vOTUs 165 and 225; later determined to be bacterial genome fragments; see Text S1 in the supplemental material). Finally, 10 additional contigs that did not meet our threshold for read recruitment of 90% ANI across 75% of contig covered were removed, resulting in 378 probable virus sequences (Table S1). Of these, 53 bacteriophages (phage) were considered well-sampled viral populations (55, 56, 66–68), also known as viral operational taxonomic units (vOTUs), as they had contig lengths of ≥10 kb (average, 19.6 kb; range, 10.3 to 129.6 kb), were most robustly viral (VirSorter category 1 or 2 [69]), were clustered at 95%, and were relatively well-covered contigs (average 74× coverage [Table 1]). These 53 vOTUs accounted for 10% of the assembled reads and 54% of the reads that recruited to contigs and are the basis for the analyses in this paper due to their genome sizes, which allowed for more reliable taxonomic, functional, and host assignments and fragment recruitment.
FIG 1
Overview of sample-to-ecology method pipeline. Sampling of the permafrost thaw chronosequence at Stordalen Mire (68°21 N, 19°03 E, 359 m above sea level). Three cores were collected in July 2014; coring locations are indicated on this aerial image of the site (taken in June 2011 by Scott Saleska), and representative photos (taken by Gary Trubl) of cores are shown below. Viruses were resuspended as previously described in the work of Trubl et al. (41), bulk-soil-derived viromes were described in the work of Emerson et al. (46), and metagenome-assembled genomes (MAGs) were described in the work of Woodcroft et al. (17). QC, quality control.
TABLE 1
Soil virome read information
Sample
BioSampleaccession no.
DNA quantity(nM)
No. of reads
Avg coverageof 53 vOTUs
Total
Total assembled
All putative viral
53 vOTU
Palsa chilledreplicate A
SAMN08784142
8.09
10,216,080
540,430
45,972
26,562
15×
Palsa frozenreplicate A
SAMN08784143
19.41
26,000,204
1,279,210
70,200
59,800
13×
Bog frozenreplicate A
SAMN08784152
9.09
14,499,010
2,611,272
102,944
44,948
14×
Bog frozenreplicate B
SAMN08784154
0.8
4,446,734
466,018
53,806
27,126
50×
Bog chilledreplicate B
SAMN08784153
2.95
15,578,086
4,210,756
1,190,166
532,770
52×
Fen chilledreplicate A
SAMN08784163
0.69
2,108,484
665,226
242,898
182,174
194×
Fen chilledreplicate B
SAMN08784165
0.37
2,001,976
649,040
231,428
168,566
160×
The seven viromes are provided along with their DNA quantity, total number of reads, total number of assembled reads, the number of reads that mapped to soil viral contigs, the number of reads that mapped to the 53 vOTUs, and the average adjusted coverage. Adjusted coverage was calculated by mapping reads back to this nonredundant set of contigs to estimate their relative abundance, calculated as number of base pairs mapped to each read normalized by the length of the contig and the total number of base pairs sequenced in the metagenome. For a read to be mapped, it had to have ≥90% average nucleotide identity between the read and the contig, and then for a contig to be considered detected, reads had to cover ≥75% of the contig.
Overview of sample-to-ecology method pipeline. Sampling of the permafrost thaw chronosequence at Stordalen Mire (68°21 N, 19°03 E, 359 m above sea level). Three cores were collected in July 2014; coring locations are indicated on this aerial image of the site (taken in June 2011 by Scott Saleska), and representative photos (taken by Gary Trubl) of cores are shown below. Viruses were resuspended as previously described in the work of Trubl et al. (41), bulk-soil-derived viromes were described in the work of Emerson et al. (46), and metagenome-assembled genomes (MAGs) were described in the work of Woodcroft et al. (17). QC, quality control.Soil virome read informationThe seven viromes are provided along with their DNA quantity, total number of reads, total number of assembled reads, the number of reads that mapped to soil viral contigs, the number of reads that mapped to the 53 vOTUs, and the average adjusted coverage. Adjusted coverage was calculated by mapping reads back to this nonredundant set of contigs to estimate their relative abundance, calculated as number of base pairs mapped to each read normalized by the length of the contig and the total number of base pairs sequenced in the metagenome. For a read to be mapped, it had to have ≥90% average nucleotide identity between the read and the contig, and then for a contig to be considered detected, reads had to cover ≥75% of the contig.Overview of predicting viruses from contigs. Unique contigs are plotted by their length in kilobases (A) and log10 transformation (B). Contigs predicted to be viral based on VirSorter (69) are plotted by their length in kilobases (C) and log10 transformation (D). Dashed lines are provided to show size ranges. Download FIG S1, TIF file, 0.3 MB.Soil viruses’ bioinformatics information. All 393 possible soil viruses are listed. For the vOTUs, the virome(s) from which they originated, their genomic information, and their coverage are provided. For the other putative soil viral contigs, the origin virome(s) and contig length are provided. Additionally, the three mobile genetic elements and 10 viral contigs with no coverage are reported with their virome(s) of origin (if applicable) and contig length. A dagger denotes that the contig did not meet our threshold for read mapping and therefore could not be counted as detected. Download Table S1, XLSX file, 0.0 MB.Supplemental materials and methods. Download Text S1, PDF file, 0.3 MB.There is no universal marker gene (analogous to the 16S rRNA gene in microbes) to provide taxonomic information for viruses. Therefore, we applied a gene-sharing network where nodes were genomes and edges between nodes indicated the gene content similarities and accommodating fragmented genomes of various sizes (70–75). In such networks, viruses sharing a high number of genes localize into viral clusters (VCs) which represent approximately genus-level taxonomy (74, 75). We represented relationships across the 53 vOTUs with 2,010 known bacterial and archaeal viruses (RefSeq, version 75) as a weighted network (Fig. 2A). Only 15% of the Mire vOTUs had similarity to RefSeq viruses (Fig. 2A and B). Four vOTUs fell into 4 VCs comprised of viruses belonging to the Felixounavirinae and Vequintavirinae (VC10); Tevenvirinae and Eucampyvirinae (VC4); Ap22virus (VC9); and the Bcep22virus, F116virus, and Kpp25virus (VC3) (Fig. 2A). Corroborating its taxonomic assignment by clustering, vOTU_4 contained two marker genes (i.e., major capsid protein and baseplate protein) specific for the Felixounavirinae and Vequintavirinae viruses (76), phylogenetic analysis of which indicated a close relationship of vOTU_4 to the Cr3virus within the Vequintavirinae (Fig. S2). The other five populations that clustered with RefSeq viruses were each found in different clusters with taxonomically unclassified viruses (Fig. 2A). Viruses derived from the dry palsa clustered with soil-derived RefSeq viruses, while those from the bog clustered with a mixture of soil and aquatic RefSeq viruses and those from the fen clustered mainly with aquatic viruses (Fig. 2A and C). Though of limited power due to small numbers, this suggests some conservation of habitat preference within genotypic clusters, which has also been observed in marine viruses with only ∼4% of VCs being globally ubiquitous (73). Most (∼85%) of the Mire vOTUs were unlinked to RefSeq viruses, with 41 vOTUs having no close relatives (i.e., singletons) and the remaining 4 vOTUs clustering in doubletons. This separation between a large fraction of the Mire vOTUs and known viruses is due to a limited number of common genes between them, i.e., ∼70% of the total proteins in these viromes are unique (Fig. 2B), reflecting the relative novelty of these viruses and the undersampling of soil viruses (39).
FIG 2
Relating Stordalen Mire viruses to known viral sequence space. (A) Clustering of recovered vOTUs with all RefSeq (v 75) viral genomes or genome fragments with genetic connectivity to these data. Shapes indicate major viral families, and RefSeq sequences only indirectly linked to these data are in gray. The contig numbers are shown within circles. Each node is depicted as a different shape, representing viruses belonging to Myoviridae (rectangle), Podoviridae (diamond), Siphoviridae (hexagon), or uncharacterized viruses (triangle) and viral contigs (circle). Edges (lines) between nodes indicate statistically weighted pairwise similarity scores (see Materials and Methods) of ≥1. Color denotes habitat of origin, with “other” encompassing wastewater, sewage, feces, and plant material. Contig-encompassing viral clusters are encircled by a solid line. (B) The pie chart represents the number of the Stordalen Mire (SM) viral proteins that are recovered by protein clusters (PCs) (yellow and red) and singletons (gray). (C) Percentage of vOTUs that link to those from palsa, bog, and fen as well as RefSeq viruses.
Relating Stordalen Mire viruses to known viral sequence space. (A) Clustering of recovered vOTUs with all RefSeq (v 75) viral genomes or genome fragments with genetic connectivity to these data. Shapes indicate major viral families, and RefSeq sequences only indirectly linked to these data are in gray. The contig numbers are shown within circles. Each node is depicted as a different shape, representing viruses belonging to Myoviridae (rectangle), Podoviridae (diamond), Siphoviridae (hexagon), or uncharacterized viruses (triangle) and viral contigs (circle). Edges (lines) between nodes indicate statistically weighted pairwise similarity scores (see Materials and Methods) of ≥1. Color denotes habitat of origin, with “other” encompassing wastewater, sewage, feces, and plant material. Contig-encompassing viral clusters are encircled by a solid line. (B) The pie chart represents the number of the Stordalen Mire (SM) viral proteins that are recovered by protein clusters (PCs) (yellow and red) and singletons (gray). (C) Percentage of vOTUs that link to those from palsa, bog, and fen as well as RefSeq viruses.Phylogenetic analysis of vOTU_4. Phylogenetic relationships between vOTU_4 and its related viruses. A maximum-likelihood tree was constructed upon a concatenation of two structural proteins (major capsid protein and baseplate protein) that are common to the Felixounavirinae and Vequintavirinae viruses. The numbers at the branch represent the bootstrapping probabilities from 1,000 replicates. Edges with bootstrap values above 75% are represented. The scale bar indicates the number of substitutions per site. Download FIG S2, TIF file, 0.6 MB.Annotation of the 53 vOTUs resulted in only ∼30% of the genes being annotated, which is not atypical; >60% of genes contained in uncultivated viruses have typically been classified as unknown in other studies (46, 69, 77–81). Of genes with annotations, we first considered those involved in lysogeny, to provide insight into the viruses’ replication cycle. Only three viruses carried an integrase gene (other characteristic lysogeny genes were not detected) (82, 83) (Table S2), suggesting that they could be temperate viruses, two of which were from the bog habitat. It had been proposed that since soils are structured and considered harsh environments, a majority of soil viruses would be temperate viruses (84). Although our data set is small, a dominance of temperate viruses is not observed here. We hypothesize that the low encounter rate produced by the highly structured soil environment could, rather than selecting for temperate phage, select for efficient virulent viruses (concept derived from references 85 to 87). Recent analyses of the viral signal mined from bulk-soil metagenomes from this site provide more evidence for our hypothesis of efficient virulent viruses because >50% of the identified viruses were likely not temperate (based on the fact that they were not detected as prophage [46]). As a more comprehensive portrait of soil viruses grows, spanning various habitats, this hypothesis can be further tested. Beyond integrase genes, the remaining annotated genes spanned known viral genes and host-like genes. Viral genes included those involved in structure and replication, and their taxonomic affiliations were unknown or highly variable, supporting the quite limited affiliation of these vOTUs with known viruses. Host-like genes included AMGs, which are described in greater detail in the next section.Virally encoded auxiliary metabolic genes and other genes of interest. Genes were annotated, and AMGs were identified by running assembled contigs through a pipeline developed by the Wrighton lab at The Ohio State University previously described in the work of Daly et al. (23). The habitat from which the vOTU was derived is listed. Predicted genes that are AMGs are boldface, and unannotated genes are not present. Additionally, the PhoH-like protein is boldface due to its highly debated function as a phosphate starvation gene (reviewed in reference 24). Download Table S2, XLSX file, 0.0 MB.
Host-linked viruses are predicted to infect key C cycling microbes.
In order to examine these viruses’ impacts on the Mire’s resident microbial communities and processes, we sought to link them to their hosts via emerging standard in silico host prediction methods, significantly empowered by the recent recovery of 1,529 MAGs from the site (508 from palsa, 588 from bog, and 433 from fen [17]). Tentative bacterial hosts were identified for 17 of the 53 vOTUs (Fig. 3; Table S3): these hosts spanned four genera among three phyla (Verrucomicrobia: Pedosphaera, Acidobacteria: Acidobacterium and “Candidatus Solibacter,” and Deltaproteobacteria: Smithella). Eight viruses were linked to more than one host but always within the same species. The four predicted microbial hosts are among the most abundant in the microbial communities and have notable roles in C cycling (16, 17). Three are acidophilic, obligately aerobic chemoorganoheterotrophs and include the Mire’s dominant polysaccharide-degrading lineage (Acidobacteria), and the fourth is an obligate anaerobe shown to be syntrophic with methanogens (Smithella). Acidobacterium is a highly abundant, diverse, and ubiquitous soil microbe (88–90) and a member of the most abundant phylum in Stordalen Mire. The relative abundance of this phylum peaked in the bog at 29% but still had a considerably high relative abundance in the other two habitats (5% in palsa and 3% in fen) (17). It is a versatile carbohydrate utilizer, has recently been identified as the primary degrader of large polysaccharides in the palsa and bog habitats in the Mire, and is also an acetogen (17). Seven vOTUs were inferred to infect Acidobacterium, implicating these viruses in indirectly modulating a key stage of soil organic matter decomposition. The second identified acidobacterial host was in the newly proposed species “Candidatus Solibacter usitatus,” another carbohydrate degrader (91). The third predicted host was Pedosphaera parvula, within the phylum Verrucomicrobia, which is ubiquitous in soil, is abundant across our soils (∼3% in palsa and ∼7% in bog and fen habitats, based on metagenomic relative abundance [17]), and utilizes cellulose and sugars (92–96), and in this habitat, this organism could be acetogenic (17). Last, vOTU_28 was linked to the deltaproteobacterium Smithella sp. strain SDB, another acidophilic chemoorganoheterotroph but an obligate anaerobe, with a known syntrophic relationship with methanogens (97, 98). Collectively, these virus-host linkages provide evidence for the Mire’s viruses to be impacting the C cycle via population control of relevant C-cycling hosts, consistent with previous results in this system (46) and other wetlands (99).
FIG 3
Viral-host linkages between vOTUs and MAGs. Seventeen vOTUs were linked to 4 host lineages by multiple lines of evidence, with 15 linked by CRISPRs (solid line) and 2 linked by BLAST (dashed line). Node shape denotes organism (oval for microbe and triangle for virus), and number references vOTU or bin (see Table S2). Viral nodes are color coded by habitat of origin (green for bog and blue for fen).
Viral-host linkages between vOTUs and MAGs. Seventeen vOTUs were linked to 4 host lineages by multiple lines of evidence, with 15 linked by CRISPRs (solid line) and 2 linked by BLAST (dashed line). Node shape denotes organism (oval for microbe and triangle for virus), and number references vOTU or bin (see Table S2). Viral nodes are color coded by habitat of origin (green for bog and blue for fen).Viral-host linkage supporting information. NCBI BLAST linkages were determined based on queries, and CRISPR information was provided using Crass software. Host genome IDs were assigned from the Joint Genome Institute’s Integrated Microbial Genomes Database. Microbial bins were pulled from the work of Woodcroft et al. (1). Download Table S3, XLSX file, 0.0 MB.We next sought to examine viral AMGs for connections to C cycling. To more robustly identify AMGs than by the standard protein family-based search approach, we used a custom-built in-house pipeline previously described in the work of Daly et al. (100) and further tailored to identify putative AMGs based on the metabolisms described in the 1,529 MAGs recently reported from these same soils (17). From this, we identified 30 AMGs from 13 vOTUs (Fig. 4; Tables S2 and S4), encompassing C acquisition and processing (three involved in polysaccharide binding, one involved in polysaccharide degradation, and 23 involved in central C metabolism) and sporulation. Glycoside hydrolases that help break down complex OM are abundant in resident microbiota (17) and may be especially useful in this high-OM environment; notably, they have been found in soil (at our site [46]), rumen (101, 158) and marine systems (albeit scarcely [73]). In addition, central C metabolism genes in viruses may increase nucleotide and energy production during infection and have been increasingly observed as AMGs (25–35). Finally, two different AMGs were found in regulating endospore formation, spoVS and whiB, which aid in formation of the septum and coat assembly, respectively, improving spores’ heat resistance (102, 103). A WhiB-like protein has been previously identified in mycobacteriophage TM4 (WhiBTM4) and experimentally shown to not only transcriptionally regulate host septation but also cause superinfection exclusion (i.e., exclusion of secondary viral infections [104]). While these two sporulation genes have been found only in Firmicutes and Actinobacteria, the only vOTU to have whiB was linked to an acidobacterial host (vOTU_178 [Fig. 4]). A phylogenic analysis of the whiB AMG grouped it with actinobacterial versions and, more distantly, with another mycobacteriophage (Fig. 4), suggesting either (i) misidentification of host (unlikely, as it was linked to three different acidobacterial hosts, each with zero mismatches of the CRISPR spacer), (ii) that the virus could infect hosts spanning the two phyla (unlikely, as only ∼1% of identified virus-host relationships span phyla [45]), or (iii) that the gene was horizontally transferred into the Acidobacteria. Identification of these 30 diverse AMGs (carried by 25% of the vOTUs) suggests a viral modulation of host metabolisms across these dynamic environments and supports the findings from bulk metagenome-derived viruses of Emerson et al. (46) at this site. That study’s AMGs spanned the same categories as those reported here, except for whiB, which was not found, but the study did not discuss them, other than the glycoside hydrolases, one of which was experimentally validated.
FIG 4
Characterization of select AMGs. FastTree phylogenies were constructed for select AMGs (one from each group), and their structures and those of their nearest neighbors were predicted using I-TASSER (detailed in Table S3). Tree lineages are shaded blue for bacteria and red for viruses. “vOTU” sequences are from the 53-vOTU virome-derived data set, while “SoilVir contig” represents homologs from the 378 probable viral contigs. The predicted protein structure for each AMG is labeled (A, B, C, or D) to match the location in the corresponding phylogenetic tree. The first predicted model for each soil virus is shown and was used for the TM-align comparison. The scale bar indicates the number of substitutions per site.
Characterization of select AMGs. FastTree phylogenies were constructed for select AMGs (one from each group), and their structures and those of their nearest neighbors were predicted using I-TASSER (detailed in Table S3). Tree lineages are shaded blue for bacteria and red for viruses. “vOTU” sequences are from the 53-vOTU virome-derived data set, while “SoilVir contig” represents homologs from the 378 probable viral contigs. The predicted protein structure for each AMG is labeled (A, B, C, or D) to match the location in the corresponding phylogenetic tree. The first predicted model for each soil virus is shown and was used for the TM-align comparison. The scale bar indicates the number of substitutions per site.Structural comparison between select AMGs and phylogenetic neighbors. Predicted structures for AMGs and neighbors were determined, and a comparison of the first models of their predicted structures was performed using TM-align. Structural similarity between two proteins is rated on a scale of 0.0 to 1.0, with TM-scores of <0.30 suggesting random structural similarity, scores of 0.5 suggesting similar folds, and scores near 1 suggesting a perfect match between two structures. Download Table S4, XLSX file, 0.0 MB.Thus far, the limited studies of soil viruses have identified few AMGs relative to studies of marine environments. This may be due to undersampling or to difficulties in identifying AMGs; since AMGs are homologs of host genes, they can be mistaken for microbial contamination (105) and thus are more difficult to discern in bulk-soil metagenomes (whereas marine virology has been dominated by viromes); also, microbial gene function is more poorly understood in soils (106). Alternately, soil viruses could indeed carry fewer AMGs. One could speculate on a link between host lifestyle and the usefulness of carrying AMGs; most known AMGs are for photo- and chemoautotrophs (73, 107, 108), although this may be due to more studies of these metabolisms or phage-host systems. Thus far, soils are described as dominated by heterotrophic bacteria (109–113), and if AMGs were indeed less useful for viruses infecting heterotrophs, that could explain their limited detection in soil viruses. However, a deeper and broader survey of soil viruses will be required to explore this hypothesis.
Evaluating sample storage on vOTU recovery.
While our previous research demonstrated that differing storage conditions (frozen versus chilled) of these Arctic soils did not yield different viral abundances (by direct counts [41]), the impact of storage method on viral community structure was unknown. Here, we examined that in the palsa and bog habitats for which viromes were successfully recovered from both storage conditions. Storage impacted recovered community structure only in the bog habitat, with separation among the viromes’ reads (Fig. 5A and B) and a broader recovery of vOTUs from the chilled sample (Fig. 5C and D), leading to higher diversity metrics (Fig. S3) and appreciable separation of the recovered chilled-versus-frozen bog vOTU profiles in ordination (Fig. 5E). The greater recovery of vOTUs from the chilled sample was likely partly due to higher DNA input and sequencing depth, which was 107-fold more than bog frozen replicate A (BFA) and 350-fold more than bog frozen replicate B (BFB). This led to 1.6- to 9-fold more reads assembling into contigs (compared to viromes BFA and BFB, respectively [Table 1]) and 3.5- to 9-fold more distinct contigs; while one might expect that as the number of reads increased, a portion would assemble into already-established contigs, that was not observed. This higher proportional diversity in the chilled bog virome than in the two frozen ones could have several potential causes. Freezing might have decreased viral diversity by damaging viral particles, although these viruses regularly undergo freezing (albeit not with the rapidity of liquid nitrogen). Alternatively, there could be a persistent metabolically active microbial community under the chilled conditions with ongoing viral infections, distinct from those in the field community. Finally, there could have been bog-specific induction of temperate viruses under chilled conditions (since this difference was not seen in the palsa samples). The bog habit is very acidic (pH ∼4 versus ∼6 in palsa and fen [11, 46]), with a dynamic water table, and both of these have been hypothesized or demonstrated to increase selection for temperate viruses (80, 114–118). In addition, galacturonic acid, a compound produced by the Sphagnum moss that dominates the bog habitats and is present at our site, is a known inhibitor of microbial activity and inferred to shape microbial communities (reviewed in references 119 and 120) and is an important bog chemical at this site (11, 121). In addition, of the 19 vOTUs shared between this study and the bulk-soil metagenome study of Emerson et al. (46) (in which the soil was likely to be enriched for temperate viruses based on its majority sampling of microbial DNA), 13 were unique to the bog, and of those, 10 were present in only the chilled rather than frozen viromes and the remaining 3 were enriched in the chilled viromes.
FIG 5
Viral community structure across the thaw gradient. (A and B) Social network analyses of the total reads from the seven viromes (A) and all the reads mapped to the 53 vOTUs (B). Dots in the social networks represent statistical samples taken from the marginal posterior distributions (Bayesian method), and each habitat is denoted by a black circle. (C) Euler diagram relating the seven viromes and their 53 vOTUs. (D) The relative abundance of vOTUs (columns) in the seven viromes (rows). Reads were mapped to this nonredundant set of contigs to estimate their relative abundance. “Chilled” and “Frozen” indicate sample storage at 4°C or flash-freezing in liquid nitrogen and storage at −80°C. “A” and “B” denote technical replicates. Dots after contig names indicate membership in a viral cluster, and fill color indicates habitat specificity. (E) Principal-coordinate analysis of the viromes by normalized relative abundance of the 53 vOTUs.
Viral community structure across the thaw gradient. (A and B) Social network analyses of the total reads from the seven viromes (A) and all the reads mapped to the 53 vOTUs (B). Dots in the social networks represent statistical samples taken from the marginal posterior distributions (Bayesian method), and each habitat is denoted by a black circle. (C) Euler diagram relating the seven viromes and their 53 vOTUs. (D) The relative abundance of vOTUs (columns) in the seven viromes (rows). Reads were mapped to this nonredundant set of contigs to estimate their relative abundance. “Chilled” and “Frozen” indicate sample storage at 4°C or flash-freezing in liquid nitrogen and storage at −80°C. “A” and “B” denote technical replicates. Dots after contig names indicate membership in a viral cluster, and fill color indicates habitat specificity. (E) Principal-coordinate analysis of the viromes by normalized relative abundance of the 53 vOTUs.Viral biodiversity increases with permafrost thaw. Richness, Shannon’s diversity index, and Pielou’s evenness index were calculated for each virome, and the viromes were plotted by habitat. Chilled samples are denoted with a lighter color, and frozen samples are denoted with a darker color. (A) Diversity indices for all seven viromes. (B) Diversity indices of six viromes (bog chilled B was removed). Download FIG S3, TIF file, 0.6 MB.Finally, while the chilled bog sample was an outlier to all other viromes (Fig. 5E), a social network analysis (also known as k-mer analysis) of the total reads (Fig. 5A) and of the reads that mapped to the viromes (Fig. 5B) indicated that habitat remained the primary driver of recovered communities Because of this, the diversity analyses were redone with the chilled bog sample taken out (Fig. S3B) instead of subsampling the reads (in addition to the virome normalization that we had already done, called “total-sum scaling” [described in references 26, 56, and 74]), because this is a smaller data set (subsampling smaller data sets described further in reference 122) and the storage effect was observed only for the bog.
Habitat specificity of the 53 vOTUs along the thaw gradient.
We explored the ecology of the recovered vOTUs across the thaw gradients, by fragment recruitment mapping against (i) the viromes and (ii) bulk-soil metagenomes. Virome mapping revealed that the relative abundance of each habitat’s vOTUs increased along the thaw gradient; relative to the palsa vOTU abundances, bog vOTUs were 3-fold more abundant and fen vOTUs were 12-fold more abundant (Fig. 5D). This is consistent with overall increases in virus-like particles with thaw observed previously at the site via direct counts (41). Only a minority (11%) of the vOTUs occurred in more than one habitat, and none were shared between the palsa and fen (Fig. 5C). Consistent with this, principal-coordinate analyses (PCoA; using a Bray-Curtis dissimilarity metric) separated the vOTU-derived community profiles according to habitat type, which also explained ∼75% of the variation in the data set (Fig. 5E). Mapping of the 214 bulk-soil metagenomes from the three habitats (17) revealed that a majority (41; 77%) of the vOTUs were present in the bulk-soil metagenomes (Fig. 6), collectively occurring in 62% (123) of them. Of the 41 vOTUs present, most derived from the bog, and their distribution among the 133 metagenomes reflected this, peaking quite dramatically in the bog (Fig. S4). This strong bog signal in the bulk-soil metagenomes—both in the proportion of bog-derived vOTUs present in the bulk metagenomes and in the abundance of all vOTUs in the bog samples—is consistent with the hypothesized higher abundance of temperate viruses in the bog, suggested by the chilled-versus-frozen storage results above. Overall, vOTU abundances in larger and longer-duration bulk-soil metagenomes indicated less vOTU habitat specificity than in the seven viromes: 10% were unique to one habitat, 22% of vOTUs were present in all habitats, 22% were shared between palsa and bog, 27% were shared between palsa and fen, and 68% were shared between bog and fen (Fig. 6). The difference in observations from vOTU read recruitment of viromes and bulk-soil metagenomes could be due to many actual and potential differences, arising from their different source material (but from the same sites) and different methodology, including vOTUs’ actual abundances (they derive from different samples), infection rates, temperate versus lytic states, burst size, and/or virion stability and extractability.
FIG 6
vOTU abundance in 133 bulk-soil metagenomes. The heat map represents abundance of vOTUs (rows) in the bulk-soil metagenomes (columns); metagenome reads were mapped to the nonredundant set of contigs to estimate their relative abundances. Only the 41 vOTUs present in the metagenomes (out of 53) and the 63 bulk-soil metagenomes (out of 214) that contained matches to the vOTUs are shown. Metagenome names denote source: habitat of origin (P, palsa; S, bog; E, fen), soil core replicate (1, 2, or 3), depth (3-cm intervals denoted with respect to geochemical transitions; generally, S = 1 to 4 cm, M = 5 to 14 cm, D = 11 to 33 cm, and X = 30 to 50 cm), month collected (5 to 10 for May to October, respectively), and year collected (2010, 2011, or 2012).
vOTU abundance in 133 bulk-soil metagenomes. The heat map represents abundance of vOTUs (rows) in the bulk-soil metagenomes (columns); metagenome reads were mapped to the nonredundant set of contigs to estimate their relative abundances. Only the 41 vOTUs present in the metagenomes (out of 53) and the 63 bulk-soil metagenomes (out of 214) that contained matches to the vOTUs are shown. Metagenome names denote source: habitat of origin (P, palsa; S, bog; E, fen), soil core replicate (1, 2, or 3), depth (3-cm intervals denoted with respect to geochemical transitions; generally, S = 1 to 4 cm, M = 5 to 14 cm, D = 11 to 33 cm, and X = 30 to 50 cm), month collected (5 to 10 for May to October, respectively), and year collected (2010, 2011, or 2012).Identified viral signal in the MAGs. (A) The stacked bar chart shows the percentage of viral signal occurrences from the 53 vOTUs collected in 2014 in the 133 bulk-soil metagenomes that had a signal collected from 2010 to 2012. In 2010, only fen samples were collected for microbial metagenomes. Viral signal occurrences were normalized by the number of viromes constructed for each habitat and the number of metagenomes for each habitat. The total number of occurrences for each year is italicized. (B) The number of occurrences (presented as a percentage) of a “viral signal” in a bulk-soil metagenome partitioned by the origin of the bulk-soil metagenome and the vOTU. Download FIG S4, TIF file, 0.4 MB.The vOTUs’ habitat preferences observed in both read data sets are consistent with our network analytics (Fig. 2), with the numerous documented physicochemical and biological shifts along the thaw gradient, and with observations of viral habitat specificity at other terrestrial sites. Changes in physicochemistry are known to impact viral morphology (reviewed in references 37, 124, and 125) and replication strategy (36, 37). In addition, at Stordalen Mire (and at other, similar sites [112]), microbiota are strongly differentiated by thaw-stage habitat, with some limited overlap among “dry” communities (i.e., those above the water table, the palsa and shallow bog) and among “wet” ones (those below the water table, the deeper bog and fen) (15–17). These shifting microbial hosts likely impact viral community structure, creating distinctive viral communities across the palsa, bog, and fen habitats. Expanding from the 53 vOTUs examined here, the recent analysis by Emerson et al. (46) of nearly 2,000 vOTUs recovered from the bulk-soil metagenomes also showed strong habitat specificity among the recovered vOTUs (only 0.1% were shared among all habitats, with <4.5% shared between any two habitats). These findings are also consistent with observations of distinct viral communities from desert, prairie, and rainforests (126) and from grasslands and arctic soils (45). In contrast, an emerging paradigm in the marine field is “seascape ecology” (127), where the majority of taxa are detected across broad geographical areas, as are marine viruses (7, 26). This important difference in habitat specificity between soils and oceans may be due to the greater physical structuring of soil habitats.Although vOTU richness and diversity appeared to increase along the thaw gradient (roughly equivalent in palsa and bog and ∼2-fold higher in fen, omitting the chilled bog sample [Fig. S3B]), this data set captured only a small fraction of the viral diversity (based on a collector’s curve comparison [Fig. S5]), and therefore, the undersampling prevents diversity inferences. Intriguingly, while our virome-derived vOTU richness was lowest in the palsa, the much greater sampling by Emerson et al. (46) recovered the most vOTUs in the palsa, more than double that in the fen (42% versus 18.9% of total vOTUs). This major difference could potentially be due to the known increase in microbial alpha diversity along the thaw gradient (16, 17), causing increased difficulty of viral genome reconstruction in the bulk-soil metagenomes; specifically, this could be due to poorer assembly of temperate phages within an increasingly diverse microbiota or of lytic or free viruses due to concomitantly increasing viral diversity (which is consistent with the increased vOTU richness with thaw in our virome data set). Notably, these diversity inferences are limited to double-stranded DNA (dsDNA) viruses, because our methods did not capture single-stranded DNA (ssDNA) or RNA viruses, a known part of the soil virosphere (128–130).Overview of Stordalen Mire viral sequences identified in bulk-soil metagenomes compared to viromes. Collector curves of vOTUs from viromes (A) and mined from bulk-soil metagenomes (B) (adapted from reference 46). Average (red) of 50 (A) or 200 (B) randomizations of sample order (teal). Download FIG S5, TIF file, 0.6 MB.
Challenges in characterizing the soil virosphere.
The low yield of viral contigs given the relatively large sequencing depth of the viromes reflects several factors that currently challenge soil viromics. First, resuspending viruses from soils is a challenge due to their adsorption to the soil matrix (43). Second, yields of viral DNA are often very low (due to both low input biomass and potentially low extraction efficiency), requiring amplification; this leads to biases (54, 58–65) or poor assembly and few viral contigs (described further in reference 131). Third, viral contig identification requires a reference database, yet soil viruses are underrepresented in current databases; for example, a majority (85%) of our sequence space was unknown. Fourth, nonviral DNA may coextract (a common feature among marine viromes, as described in reference 34). Last, the optimal approach to identifying ecological units within viral sequence space is unclear.In this study, DNA yields (and sequencing inputs) decreased along the thaw gradient, as did total reads, but counterintuitively, viral reads increased (Table 1) (the fen had ∼5-fold more viral reads than the palsa). This may have been partly due to the shift to a more aquatic-type habitat, for which viruses are better represented in the databases, or to an actual increase in viral DNA (as a portion of total) concomitant with known viral abundance increases (41). A large portion of the assembled reads were nonviral (Table 1), representing either microbial contamination or gene transfer agents (GTAs), i.e., virus-like capsids that package microbial DNA (reviewed in reference 123). Since the viral particle purification protocol involved an 0.45-µm filter followed by CsCl density gradient-based separation of the viral particles (removing free genomic DNA), contamination by microbial DNA seems unlikely. While ultrasmall microbial cells have been found in our soils (46) and other permafrost soils (reviewed in references 132 and 133) and may have passed through the 0.45-µm filters, they would be expected to be removed in the CsCl gradient since their density is similar to that of larger microbial cells and not viruses (reviewed in references 133 to 135). Therefore, to identify GTAs we searched our contigs for 16S rRNA genes and for known GTAs. We found six contigs that had 16S rRNA matches to multiple microbes (136) and 94 contigs with matches to known GTAs (123), together accounting for ∼25% of the assembled reads. GTAs may, thus, represent an appreciable and unavoidable contaminant in soil viromes, as has been observed in marine systems (reviewed in reference 123). Against this backdrop of potential contaminant DNA and a preponderance of unknown genes in viral sequence space, identifying ecological units in soil viromes is a challenge. We performed a sensitivity analysis on three ways to characterize the ecological units in our data set: read characterization, contigs, and as vOTUs. While all three methods have validity, there is a higher probability for inclusion of contaminants that can dramatically impact conclusions from the first two approaches. We, therefore, erred on the side of caution and reported our findings in the context of identified vOTUs.This study’s virome-based approach contrasts with that used in the work of Emerson et al. (46), which recovered vOTUs from bulk-soil metagenomes from the same site but with different years, months, depths, and preservation methods (Fig. 7). While the viromes derived from separated viral particles, the bulk-soil metagenomes captured viruses within hosts—i.e., those engaged in active infection and those integrated into hosts—as well as free viruses successfully extracted with the general extraction protocol. This study generated 18 Gb of sequence from 7 viromes, while Emerson et al. (46) analyzed 178 Gb from 190 metagenomes (178 bulk-soil and 12 size-fractioned metagenomes), and based on rarefaction, neither approach captured the total viral diversity in these soils (Fig. S5). The efficiency of vOTU recovery was >2-fold higher using the virome approach (2.93 vOTUs/Gbp of virome versus 1.30 vOTUs/Gbp of bulk-soil metagenome), suggesting that equivalent virome-focused sequencing effort could yield >4,300 vOTUs (although dsDNA viral diversity would likely saturate below that). The efficiency difference is caused not only by enrichment for viral DNA in the viromes but conversely by the higher stringency for vOTU identification required in the complex bulk-soil metagenomes; Emerson et al. (46) required ≥95% nucleotide identity across ≥90% of each read, and each contig had to have ≥70% of the contig covered with ≥1× coverage depth, while in these viromes we required ≥90% ANI and each contig had to have ≥75% of the contig covered with ≥1× coverage depth. Of the 19 vOTUs that were shared between the two data sets, the longer, virome-derived sequences defined them. These findings suggest that viromes (which greatly enrich for viral particles) and bulk-soil metagenomes (which are less methodologically intensive and provide simultaneous information on both viruses and microbes) offer complementary views of viral communities in soils, and if only one method can be applied, its selection will depend on the goal of the study.
FIG 7
Contrasting Stordalen Mire viruses derived from viromes and bulk-soil metagenomes. Currently, two data sets exist describing Stordalen Mire (SM) archaeal and bacterial viruses. Emerson et al. (46) characterized the viral signal in bulk-soil metagenomes (described in reference 17), while here we characterize viruses from viromes, derived from separated viral particles. There are three possible stages of the viral life cycle at which to capture viruses: proviruses (those integrated into a host genome; blue), active infections (viruses undergoing lytic infection; red), and free viruses (viruses not currently infecting a host; purple). The largest oval represents all the theoretical SM viruses (gray). The next largest oval represents the vOTUs reported in the work of Emerson et al. (46) (orange). Within that oval are the vOTUs derived from bulk-soil metagenomes (green) and from size-fractioned bulk-soil metagenomes also used in that study (blue). The final oval represents the vOTUs identified in this study (yellow circle). An asterisk indicates that the viral signal was mined from bulk-soil metagenomes. A dagger indicates that viruses were resuspended from the soils using a previously optimized protocol (41). A numeral sign indicates the vOTU yield normalized per gigabase-pair of metagenome. The active viruses or proviruses detected in the size-fractioned bulk-soil metagenomes are only those that infect microbial hosts that could pass through the reduced-pore-size filters (more sample information is in reference 46).
Contrasting Stordalen Mire viruses derived from viromes and bulk-soil metagenomes. Currently, two data sets exist describing Stordalen Mire (SM) archaeal and bacterial viruses. Emerson et al. (46) characterized the viral signal in bulk-soil metagenomes (described in reference 17), while here we characterize viruses from viromes, derived from separated viral particles. There are three possible stages of the viral life cycle at which to capture viruses: proviruses (those integrated into a host genome; blue), active infections (viruses undergoing lytic infection; red), and free viruses (viruses not currently infecting a host; purple). The largest oval represents all the theoretical SM viruses (gray). The next largest oval represents the vOTUs reported in the work of Emerson et al. (46) (orange). Within that oval are the vOTUs derived from bulk-soil metagenomes (green) and from size-fractioned bulk-soil metagenomes also used in that study (blue). The final oval represents the vOTUs identified in this study (yellow circle). An asterisk indicates that the viral signal was mined from bulk-soil metagenomes. A dagger indicates that viruses were resuspended from the soils using a previously optimized protocol (41). A numeral sign indicates the vOTU yield normalized per gigabase-pair of metagenome. The active viruses or proviruses detected in the size-fractioned bulk-soil metagenomes are only those that infect microbial hosts that could pass through the reduced-pore-size filters (more sample information is in reference 46).Over the last 2 decades, viruses have been revealed to be ubiquitous, abundant, and diverse in many habitats, but their role in soils has been underexplored. The observations made here from virome-derived viruses in a model permafrost-thaw ecosystem show that these vOTUs are primarily novel, change with permafrost thaw, and infect hosts highly relevant to C cycling. The next important step is to more comprehensively characterize these viral communities (from more diverse samples and including ssDNA and RNA viruses) and begin quantifying their direct and indirect impacts on C cycling in this changing landscape. This should encompass the complementary information present in virome, bulk metagenomes, and the viral signal from MAGs, analyzed in the context of the abundant metadata available. With increasing characterization of soil viruses and their mechanistic interactions with hosts and quantification of their biogeochemical impacts, soil viral ecology may significantly advance our understanding of terrestrial ecosystem biogeochemical cycling, as has marine viral ecology in the oceans.
MATERIALS AND METHODS
Sample collection.
Samples were collected from 16 to 19 July 2014 from peatland cores in the Stordalen Mire field site near Abisko, Sweden (Fig. 1; more site information in references 7 to 13). The soils derived from palsa (one stored chilled and the other stored frozen), bog (one stored chilled and two stored frozen), and fen (both stored chilled) habitats along the Stordalen Mire permafrost thaw gradient. These three subhabitats are common to northern wetlands and together cover ∼98% of Stordalen Mire’s nonlake surface (9). The sampled palsa, bog, and fen are directly adjacent, such that all cores were collected within a 120-m total radius. For this work, the cores were subsampled at 36 to 40 cm, and material from each was divided into two sets. Set 1 was chilled and stored at 4°C, and set 2 was flash-frozen in liquid nitrogen and stored at −80°C as described in the work of Trubl et al. (41). Both sets were processed using a viral resuspension method optimized for these soils (41). Briefly, 10 ml of a 1% potassium citrate resuspension buffer amended with 10% phosphate-buffered saline, 5 mM EDTA, and 150 mM magnesium sulfate was added; viruses were physically dispersed using vortexing for 1 min with manual shaking for 30 s (done 3 times) and then shaking of the tubes at 400 rpm for 15 min at 4°C; and finally, the tubes were centrifuged for 20 min at 15,000 × g at 4°C to pellet debris, and the supernatant was filtered through an 0.45-µm celluloseacetate vacuum filter (Corning, Corning, NY, USA) into a new 50-ml tube. Three washes were done on each sample. For CsCl density gradient purification of the particles, CsCl density layers of rho 1.2, 1.4, 1.5, and 1.65 were used to establish the gradient; we included a 1.2-g/cm3 CsCl layer to try to remove any small microbial cells that might have come through the 0.45-μm filter (for microbial cell densities, see references 137 and 138; for viral particle densities, see reference 50). We then collected the 1.4- to 1.52-g/cm3 range from the gradient for DNA extraction, to target the dsDNA range (according to reference 50). The viral DNA was extracted using Wizard columns (Promega, Madison, WI; products A7181 and A7211), cleaned up with AMPure beads (Beckman Coulter, Brea, CA; product A63881), and quantified using a Qubit fluorometer (Invitrogen). DNA libraries were prepared using the Nextera XT DNA library preparation kit (Illumina, San Diego, CA; product FC-131-1024) and sequenced using an Illumina MiSeq (V3; 600 cycles, 6 samples/run, 150-bp paired end) at the University of Arizona Genetics Core facility. Seventeen viral contigs were previously described in the work of Emerson et al. (46) (Fig. 7).The 214 bulk-soil metagenomes and associated recovered MAGs used here for analyses were described previously (17) and derive from the same sampling sites from 2010 to 2012 and 5-cm increments from 1- to 85-cm depths. They were extracted using a modification of the PowerSoil kit (Qiagen, Hilden, Germany) and sequenced via TruSeq Nano (Illumina) library preparation, or for low-concentration DNA samples, libraries were created using the Nextera XT DNA sample preparation kit (Illumina).
vOTU recovery.
Eight viromes were prepared, and seven samples were successfully sequenced (2 palsa, one chilled and one frozen; 3 bog, one chilled and two frozen; and 2 fen, both chilled). The sequences were quality controlled using Trimmomatic (139) (adaptors were removed, reads were trimmed as soon as the per-base quality dropped below 20 on average on 4-nucleotide [nt] sliding windows, and reads shorter than 50 bp were discarded) and then assembled separately with IDBA-UD (140), and contigs were processed with VirSorter to distinguish viral from microbial contigs (virome decontamination mode [69]). The same contigs were also compared by BLAST to a pool of putative laboratory contaminants (i.e., phages cultivated in the lab: Enterobacteria phage PhiX17, Alpha3, M13, Cellulophaga baltica phages, and Pseudoalteromonas phages) that we cultured. All contigs matching these genomes at more than 95% ANI were removed. VirSorted contigs were manually inspected by observing the key features of the viral contigs that VirSorter evaluates (e.g., the presence of a viral hallmark gene places the contigs in VirSorter category 1 or 2, but further inspection is needed to confirm that it is a genuine viral contig and not a GTA or plasmid). To identify GTAs, we searched through all of our contigs assembled by IDBA-UD for (i) taxa related to the 5 types of GTAs (keyword searches were done on Rhodobacterales, Desulfovibrio, Brachyspira, Methanococcus, and Bartonella) and (ii) microbial DNA from the SILVA rRNA database (release 128 [136]), with all the assembled contigs with ≥95% ANI. The percentage of reads that mapped to these contigs is described in Text S1 in the supplemental material.After verification that the VirSorted contigs were genuine viruses, quality-controlled reads from the seven viromes were pooled and assembled together with IDBA-UD to generate a nonredundant set of contigs. Resulting contigs were rescreened as described above, removing all identifiable contamination. The contigs then underwent further quality checks by (i) removing all contigs of <10 kb and (ii) using only contigs from VirSorter categories 1 and 2.To detect putative archaeal viruses, the VirSorter output was used as an input for MArVD (with default settings [141]). The output putative archaeal virus sequences were then filtered to include only those contigs of ≥10 kb in size, resulting in the set of putative archaeal vOTUs described here.Viral genes were annotated using a pipeline described in the work of Daly et al. (100). Briefly, for each contig, open reading frames (ORFs) were freshly predicted using MetaProdigal (142), and sequences were compared to KEGG (143) and UniRef and InterProScan (144) using USEARCH (145), with single and reverse best-hit matches with a bit score greater than 60. AMGs were identified by manual inspection of the protein annotations guided by known resident microbial metabolic functions (identified in reference 17). To determine confidence in functional assignment, representatives for each AMG underwent phylogenetic analyses. First, each sequence was used for a BLAST search and the top 100 hits were investigated to identify main taxon groups. An alignment with the hits and the matching viral sequence (MUSCLE with default parameters [146]) was done with manual curation to refine the alignment (e.g., regions of very low conservation from the beginning or end were removed). FastTree (default parameters with 1,000 bootstraps [147]) was used to make the phylogeny, and iTOL (148) was used to visualize and edit the tree (any distance sequences were removed). To see if this AMG was widespread across the putative soil viruses, a BLASTp search (default settings) of each AMG against all putative viral proteins from our viromes was done. The sequences from identified homologs (based on a bit score of >70 and an E value of 10−4) were used with the AMG of interest to construct a new phylogenic tree (same methods as used before). Finally, structures were predicted using I-TASSER (149) for our AMGs of interest and their neighbors. To assess correct structural predictions, AMGs of interest and their neighbors’ structures were compared with TM-align (TM-score normalized by the length of the reference protein [150]).
Gene-sharing network construction, analysis, and clustering of viral genomes (fragments).
We built a gene-sharing network where the viral genomes and contigs are represented by nodes and significant similarities as edges (74, 75). We downloaded 198,556 protein sequences representing the genomes of 1,999 bacterial and archaeal viruses from NCBI RefSeq (v 75 [151]). Including protein sequences from the 53 Stordalen Mire viral contigs, a total of 199,613 protein sequences were subjected to all-to-all BLASTp searches (default parameters, an E value threshold of 10−4, and a bit score of 50) and defined as protein clusters (PCs) in the same manner as previously described (70). The resulting output was parsed in the form of a matrix comprising the genomes (vOTUs) and PCs. We then determined the similarities between genomes by calculating the probability of finding the number of PCs shared between the genomes and/or vOTUs, based on the hypergeometric formula as previously described (74, 75). A similarity score was obtained by taking the negative logarithm (base 10) of the hypergeometric P value multiplied by the total number of pairwise genome (vOTUs) comparisons (i.e., 2,052 × 2,051). Genome (vOTU) pairs with a similarity score of ≥1 were previously shown to be significantly similar through permutation test of PCs and/or singletons (proteins that do not have close relatives) between genomes (vOTUs). The resulting network comprising 1,772 viral genomes, including 53 vOTUs and 58,201 edges, was visualized with Cytoscape (version 3.1.1; http://cytoscape.org/), using an edge-weighted spring-embedded model, which places the genomes or fragments sharing more PCs closer to each other. There were 398 RefSeq viruses not showing significant similarity to vOTUs that were excluded for clarity. To gain detailed insights into the genetic connections, the network was decomposed into a series of coherent groups of nodes (also known as VCs [70, 72, 73]), with an optimal inflation factor of 1.6. Thus, the discontinuous network structure of individual components, together with the vOTUs, indicates their distinct gene pools (71). To assign vOTUs into VCs, PCs needed to include ≥2 genomes and/or genome fragments, then the Markov clustering algorithm was used, and the optimal inflation factor was calculated by exploring values ranging from 1.0 to 5 by steps of 0.2. The taxonomic affiliation was taken from the NCBI taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy) and the International Committee on Taxonomy of Viruses (ICTV) taxonomy (ICTV Master Species List v1.3, as of February 2018).
vOTU ecology.
Virome reads were mapped back to the nonredundant set of contigs to estimate their coverage, calculated as number of base pairs mapped to each read normalized by the length of the contig, and by the total number of base pairs sequenced in the metagenome in order to be comparable between samples (Bowtie 2, threshold of 90% ANI on the read mapping, and 75% of contig covered to be considered detected [58, 152]). The heat map of the vOTU’s relative abundances across the seven viromes, as inferred by read mapping, was constructed in R (CRAN 1.0.8 package pheatmap).The 214 bulk-soil metagenomes and 1,529 associated recovered MAGs used here for analyses were described in the work of Woodcroft et al. (17). The paired MAG reads were mapped to the viral contigs with Bowtie 2 (as described above for the virome reads). The heat map of the vOTU’s relative abundances across the 214 bulk-soil metagenomes, as inferred by read mapping, was constructed in R (CRAN 1.0.8 package pheatmap); only microbial metagenomes with a viral signal were shown.
Viral-host methodologies.
We used two different approaches to predict putative hosts for the vOTUs: one relying on CRISPR spacer matches (45, 46, 100, 153) and one relying on direct sequence similarity between virus and host genomes (154). For CRISPR linkages, Crass (v0.3.6, default parameters), a program that searches through raw metagenomic reads for CRISPRs, was used (further information in Table S3) (155). For BLAST, the vOTU nucleotide sequences were compared to the MAGs (17) as described in the work of Emerson et al. (46). Any viral sequences with a bit score of 50, E value threshold of 10−3, and ≥70% ANI across ≥2,500 bp were considered for host prediction (described in reference 154).
Phylogenetic analyses to resolve taxonomy.
Two phylogenies were constructed. The first had the alignment of the protein sequences that are common to all Felixounavirinae and Vequintavirinae as well as vOTU_4, and the second had an alignment of select sequences from PC_03881, including vOTU_165. These alignments were generated using the ClustalW implementation in MEGA5 (version 5.2.1; http://www.megasoftware.net/). We excluded noninformative positions with the BMGE software package (156). The alignments were then concatenated into a FASTA file, and the maximum likelihood tree was built with MEGA5 using a JTT (Jones-Taylor-Thornton) model for each tree. A bootstrap analysis with 1,000 replications was conducted with uniform rates and a partial depletion of gaps for a 95% site coverage cutoff score.
Data processing and availability.
All data (sequences, site information, and supplemental tables and files) are available as a data bundle at the IsoGenie project database under data downloads at https://isogenie.osu.edu/. Additionally, viromes were deposited under BioProject identifier (ID) PRJNA445426 and SRA SUB3893166, with the following BioSample accession numbers: SAMN08784142 for palsa chilled replicate A, SAMN08784143 for palsa frozen replicate A, SAMN08784152 for bog frozen replicate A, SAMN08784154 for bog frozen replicate B, SAMN08784153 for bog chilled replicate B, SAMN08784163 for fen chilled replicate A, and SAMN08784165 for fen chilled replicate B. Data were processed using either The Ohio Supercomputer Center services (Columbus, OH) or the CyVerse app (74).Codon usage frequency for the linked viruses and their microbial hosts. Principal-coordinate analysis of the codon usage frequency of microbial hosts and their linked viruses, using the Bray-Curtis dissimilarity metric. Microbial hosts are denoted by circles and colored by phylum (A) or genus/species (B). The associated viruses have a color matching their host and are denoted with a square. (C) The average dissimilarity metric between the viral contigs linked to potential microbial hosts is plotted against each virus’s contig length (103). Average dissimilarity distance was used with viral contigs with multiple hosts. Download FIG S6, TIF file, 0.5 MB.
Authors: Sharath Srinivasiah; Jaysheel Bhavsar; Kanika Thapar; Mark Liles; Tom Schoenfeld; K Eric Wommack Journal: Res Microbiol Date: 2008-05-08 Impact factor: 3.992
Authors: David Paez-Espino; Emiley A Eloe-Fadrosh; Georgios A Pavlopoulos; Alex D Thomas; Marcel Huntemann; Natalia Mikhailova; Edward Rubin; Natalia N Ivanova; Nikos C Kyrpides Journal: Nature Date: 2016-08-17 Impact factor: 49.962
Authors: Manuel Delgado-Baquerizo; Angela M Oliverio; Tess E Brewer; Alberto Benavent-González; David J Eldridge; Richard D Bardgett; Fernando T Maestre; Brajesh K Singh; Noah Fierer Journal: Science Date: 2018-01-19 Impact factor: 47.728
Authors: Boris Wawrik; Christopher R Marks; Irene A Davidova; Michael J McInerney; Shane Pruitt; Kathleen E Duncan; Joseph M Suflita; Amy V Callaghan Journal: Environ Microbiol Date: 2016-06-27 Impact factor: 5.491
Authors: Anna M Kielak; Cristine C Barreto; George A Kowalchuk; Johannes A van Veen; Eiko E Kuramae Journal: Front Microbiol Date: 2016-05-31 Impact factor: 5.640
Authors: David A Pearce; Kevin K Newsham; Michael A S Thorne; Leo Calvo-Bado; Martin Krsek; Paris Laskaris; Andy Hodson; Elizabeth M Wellington Journal: Front Microbiol Date: 2012-12-05 Impact factor: 5.640
Authors: Regina L Wilpiszeski; Jayde A Aufrecht; Scott T Retterer; Matthew B Sullivan; David E Graham; Eric M Pierce; Olivier D Zablocki; Anthony V Palumbo; Dwayne A Elias Journal: Appl Environ Microbiol Date: 2019-07-01 Impact factor: 4.792
Authors: Germán Bonilla-Rosso; Théodora Steiner; Fabienne Wichmann; Evan Bexkens; Philipp Engel Journal: Proc Natl Acad Sci U S A Date: 2020-03-16 Impact factor: 11.205
Authors: Peter F Chuckran; Viacheslav Fofanov; Bruce A Hungate; Ember M Morrissey; Egbert Schwartz; Jeth Walkup; Paul Dijkstra Journal: mSystems Date: 2021-05-11 Impact factor: 6.496
Authors: Kristopher Kieft; Zhichao Zhou; Rika E Anderson; Alison Buchan; Barbara J Campbell; Steven J Hallam; Matthias Hess; Matthew B Sullivan; David A Walsh; Simon Roux; Karthik Anantharaman Journal: Nat Commun Date: 2021-06-09 Impact factor: 14.919
Authors: M Consuelo Gazitúa; Dean R Vik; Simon Roux; Ann C Gregory; Benjamin Bolduc; Brittany Widner; Margaret R Mulholland; Steven J Hallam; Osvaldo Ulloa; Matthew B Sullivan Journal: ISME J Date: 2020-11-16 Impact factor: 10.302