Yunha Hwang1, Peter R Girguis1. 1. Department of Organismic and Evolutionary Biology, Harvard Universitygrid.38142.3c, Cambridge, Massachusetts, USA.
Abstract
Some marine microbes are seemingly "ubiquitous," thriving across a wide range of environmental conditions. While the increased depth in metagenomic sequencing has led to a growing body of research on within-population heterogeneity in environmental microbial populations, there have been fewer systematic comparisons and characterizations of population-level genetic diversity over broader expanses of time and space. Here, we investigated the factors that govern the diversification of ubiquitous microbial taxa found within and between ocean basins. Specifically, we use mapped metagenomic paired reads to examine the genetic diversity of ammonia-oxidizing archaeal ("Candidatus Nitrosopelagicus brevis") populations in the Pacific (Hawaii Ocean Time-series [HOT]) and Atlantic (Bermuda Atlantic Time Series [BATS]) Oceans sampled over 2 years. We observed higher nucleotide diversity in "Ca. N. brevis" at HOT, driven by a higher rate of homologous recombination. In contrast, "Ca. N. brevis" at BATS featured a more open pangenome with a larger set of genes that were specific to BATS, suggesting a history of dynamic gene gain and loss events. Furthermore, we identified highly differentiated genes that were regulatory in function, some of which exhibited evidence of recent selective sweeps. These findings indicate that different modes of genetic diversification likely incur specific adaptive advantages depending on the selective pressures that they are under. Within-population diversity generated by the environment-specific strategies of genetic diversification is likely key to the ecological success of "Ca. N. brevis." IMPORTANCE Ammonia-oxidizing archaea (AOA) are one of the most abundant chemolithoautotrophic microbes in the marine water column and are major contributors to global carbon and nitrogen cycling. Despite their ecological importance and geographical pervasiveness, there have been limited systematic comparisons and characterizations of their population-level genetic diversity over time and space. Here, we use metagenomic time series from two ocean observatories to address the fundamental questions of how abiotic and biotic factors shape the population-level genetic diversity and how natural microbial populations adapt across diverse habitats. We show that the marine AOA "Candidatus Nitrosopelagicus brevis" in different ocean basins exhibits distinct modes of genetic diversification in response to their selective regimes shaped by nutrient availability and patterns of environmental fluctuations. Our findings specific to "Ca. N. brevis" have broader implications, particularly in understanding the population-level responses to the changing climate and predicting its impact on biogeochemical cycles.
Some marine microbes are seemingly "ubiquitous," thriving across a wide range of environmental conditions. While the increased depth in metagenomic sequencing has led to a growing body of research on within-population heterogeneity in environmental microbial populations, there have been fewer systematic comparisons and characterizations of population-level genetic diversity over broader expanses of time and space. Here, we investigated the factors that govern the diversification of ubiquitous microbial taxa found within and between ocean basins. Specifically, we use mapped metagenomic paired reads to examine the genetic diversity of ammonia-oxidizing archaeal ("Candidatus Nitrosopelagicus brevis") populations in the Pacific (Hawaii Ocean Time-series [HOT]) and Atlantic (Bermuda Atlantic Time Series [BATS]) Oceans sampled over 2 years. We observed higher nucleotide diversity in "Ca. N. brevis" at HOT, driven by a higher rate of homologous recombination. In contrast, "Ca. N. brevis" at BATS featured a more open pangenome with a larger set of genes that were specific to BATS, suggesting a history of dynamic gene gain and loss events. Furthermore, we identified highly differentiated genes that were regulatory in function, some of which exhibited evidence of recent selective sweeps. These findings indicate that different modes of genetic diversification likely incur specific adaptive advantages depending on the selective pressures that they are under. Within-population diversity generated by the environment-specific strategies of genetic diversification is likely key to the ecological success of "Ca. N. brevis." IMPORTANCE Ammonia-oxidizing archaea (AOA) are one of the most abundant chemolithoautotrophic microbes in the marine water column and are major contributors to global carbon and nitrogen cycling. Despite their ecological importance and geographical pervasiveness, there have been limited systematic comparisons and characterizations of their population-level genetic diversity over time and space. Here, we use metagenomic time series from two ocean observatories to address the fundamental questions of how abiotic and biotic factors shape the population-level genetic diversity and how natural microbial populations adapt across diverse habitats. We show that the marine AOA "Candidatus Nitrosopelagicus brevis" in different ocean basins exhibits distinct modes of genetic diversification in response to their selective regimes shaped by nutrient availability and patterns of environmental fluctuations. Our findings specific to "Ca. N. brevis" have broader implications, particularly in understanding the population-level responses to the changing climate and predicting its impact on biogeochemical cycles.
Entities:
Keywords:
accessory genes; ammonia-oxidizing archaea; auxiliary genes; comparative genomics; comparative studies; metagenomics; microdiversity; pangenome; phosphorus limitation; population genetics; strain-level diversity; time series analysis
Marine microbial populations are shaped by a complex interplay of dispersal, drift, and selection (1). While currents connect global oceans, there exists heterogeneity in the nutrient availability and environmental variables across the marine water column, forming distinct ecosystems with differential selective pressures across spatial scales (2). Previous global surveys (3, 4) of the marine water column microbiome revealed both ubiquitous and site-specific microbial populations that inhabit geographically distant waters. In particular, much research has been done on the globally widespread and abundant marine microbial taxa (e.g., Prochlorococcus [5] and SAR11 [6]), also referred to as “keystone taxa” (7), that often occupy important functional niches in the marine ecosystem. With increased resolution in both sampling and genome sequencing, there is growing evidence that marine microbial populations with massive population sizes consist of highly diverse strains and subpopulations with flexible genomes, representing populations with high within-population diversity (8–11). Therefore, there is an increasing research interest in understanding how these ubiquitous microbial taxa adapt to different selective regimes in diverse and dynamic ocean environments and how microbial adaptation and natural selection are encoded in their genomic diversity.Ammonia-oxidizing archaea (AOA) belonging to the phylum “Candidatus Thaumarchaeota” (or the GTDB phylum Thermoproteota and class Nitrososphaeria, according to the standardization of archaeal taxonomy proposed recently by Rinke et al. [12]) are another example of ubiquitous and abundant microbial taxa that play an important role in the carbon and nitrogen turnover of the marine ecosystem. AOA fix carbon and occupy a functional niche by generating energy through nitrification. AOA are particularly abundant in the upper ocean (13, 14) as well as the mesopelagic zones (15) and have been shown to dominate the nitrification process in the global oceans (16, 17). “Candidatus Nitrosopelagicus brevis” (18) is an AOA species found to be abundant across the lower euphotic zone in the oligotrophic ocean (19) and characterized by a highly streamlined genome of ~1.2 Mbp. Despite its small genome, “Ca. N. brevis” is one of the most ecologically successful archaeal species in the ocean, being found across wide geochemical and environmental gradients, with evidence of strain-level diversity resulting in physiological differentiation in nitrogen source (e.g., urea) utilization (20, 21). Previous studies of AOA abundance and function in the upper ocean have revealed significant seasonality, with decreased abundance and activity in summers (22–25), indicating population-level responses to environmental fluctuations. Due to their ecological importance, there has been increasing research interest in temporal variations in the population structure and adaptive strategies of AOA populations in different environments (21, 26–29). However, there have been limited systematic surveys of the within-population diversity of AOA, and we know little about the mechanisms of diversity generation and maintenance in natural AOA populations.In order to understand how the within-population diversity of AOA varies over space and time, we compared “Ca. N. brevis” populations over 2 years between two well-studied sites, Hawaii Ocean Time-series (HOT) station ALOHA (30) in the Pacific Ocean and the Bermuda Atlantic Time Series (BATS) station (31) in the Atlantic Ocean. These sites are comparable in that they are both oligotrophic with similar levels of primary production (8). However, there are key ecological distinctions between the two sites in both nutrient availability and seasonal variations in water column stratification. For example, BATS experiences stronger seasonal fluctuations in light, temperature, and nutrient concentrations than HOT, with deeper mixing (~150 to 200 m) in winters (31, 32). Additionally, they differ in their geochemistry, with notably lower inorganic phosphorus concentrations at BATS than at HOT (33–35) and higher inputs of iron and other metals at HOT than at BATS (36). Previous surveys of the rates of nitrification and ammonia oxidation in these two sites revealed high degrees of temporal variability (37, 38), suggesting that AOA populations in these locations may experience dynamicity over time.To better understand how an ecologically “successful” species of microbe adapts across diverse environmental gradients, we compared the “Ca. N. brevis” populations of HOT and BATS to address the following questions. (i) Do “Ca. N. brevis” populations differ in their levels of nucleotide diversity in different selective regimes? (ii) Do “Ca. N. brevis” gene contents differ between the ocean basins? (iii) Are “Ca. N. brevis” populations shaped by genome-wide or gene-specific selective sweeps? We first investigated the genetic diversity of “Ca. N. brevis” populations in metagenomes from the HOT and BATS sites over a 2-year time series and contextualized our findings by comparing them with those for other abundant community members in the ecosystem. We then used mapped metagenomic reads to calculate the nucleotide diversity and identify single nucleotide variants (SNVs) and their linkage in each sample. We conducted pangenomic analyses to identify basin-specific genes in “Ca. N. brevis” populations and then analyzed the shared fraction of the genome to determine highly differentiated alleles between the two sites. SNV profiles were compared over time to delineate temporal fluctuations in population structure and their correlation with environmental variables.
RESULTS
“Ca. N. brevis” is abundant at both HOT and BATS but more diverse at HOT.
Across 130 metagenomes from HOT and BATS (see Fig. 1a for sampling locations; see also Table S1 in the supplemental material for metagenome metadata), we binned 891 above-medium-quality (>70% completeness and <10% contamination) (39) metagenome-assembled genomes (MAGs), which were dereplicated into 170 representative MAGs (rMAGs). Of these rMAGs, 76 were present (>5× coverage and >0.5 breadth using mapped reads) at both sites, 61 were specific to BATS, 31 were specific to HOT, and 2 could not be detected in either sample with sufficient abundance and confidence (for further details on the rMAGs, see Table S2). The community composition and structure stayed stable over time, with statistically significant seasonal fluctuations being observed in the surface water (SW) samples at BATS (P = 0.001 [by permutational multivariate analysis of variance {PERMANOVA}]) (for the changes in the community profile over time, see Fig. S1 at https://doi.org/10.6084/m9.figshare.19357958). An rMAG belonging to the “Ca. Nitrosopelagicus brevis” species (rMAG_Nbrevis) was the second most abundant rMAG (after Prochlorococcus) across all samples and the most abundant in the samples below the euphotic zone (BEZ samples) at both sites (Fig. 1b; see also Fig. S2 at https://doi.org/10.6084/m9.figshare.19357964 for the relative abundances of the top six most abundant rMAGs across samples), with relative abundances ranging between 5.7 and 84.1%. rMAG_Nbrevis was 1.13 Mbp, smaller than the reference genome of strain CN25 (1.23 Mbp) and missing the two putative genomic islands identified previously (19). We predicted 1,408 genes in rMAG_Nbrevis and estimated it to be 99.27% complete, with no contamination. Interestingly, we found “Ca. N. brevis” MAGs to feature high codon usage bias (CUB), and we estimated “Ca. N. brevis” to have the shortest minimal doubling time (3.5 ± 2.2 h) (n = 79 medium- and high-quality “Ca. N. brevis” MAGs assembled across all sampling depths) out of the eight most frequently detected populations (see Fig. S3 at https://doi.org/10.6084/m9.figshare.19357982). This is surprising and inconsistent with the findings from previous culture studies of “Ca. N. brevis” C25 and U25, with reported doubling times of 7 days (20), which are significantly longer than the observed doubling time of 1 day for Prochlorococcus (40). It is possible that the in situ doubling time of the “Ca. N. brevis” population is much shorter and/or that there are other evolutionary processes at play that led to the observed particularly prominent CUB in “Ca. N. brevis” genomes. Notably, we detected three other much less abundant thaumarchaeal rMAGs: an AOA belonging to an unidentified species of the “Ca. Nitrosopelagicus” genus and two non-ammonia-oxidizing heterotrophic Thaumarchaeota (UBA57) members (41) (see Fig. S4 at https://doi.org/10.6084/m9.figshare.19357970). Our subsequent analyses focused on the populations of rMAG_Nbrevis in the BEZ samples because of their high abundance (read mapping coverage of >5×, the minimum threshold to reliably detect minor alleles [42]) throughout the 2-year sampling period.
FIG 1
Microbial populations in the below-euphotic zones (BEZ) of the Hawaii ALOHA Time-series (HOT) and the Bermuda Atlantic Time Series (BATS). (a) Sampling locations. (b) Microbiome structures of the BEZ samples in both sites over an ~2-year sampling period. The top five most abundant species are highlighted. Sampling points are labeled with color-coded triangles along the x axis. (c) pN/pS ratios and nucleotide diversity of the top six most abundant species. Each point on the plot depicts a species-level population in a sample, detected with at least 10× coverage and 0.8 breadth. Data points are color-coded using the same designation as the ones in panel b. “Ca. N. brevis” population clusters are highlighted with arrows. (d) Change in the nucleotide diversity of “Ca. N. brevis” populations in BEZ samples at HOT and BATS over the sampling period. Statistically significant differences between the two sites are noted with asterisks (P < 1E−6 [by a paired t test]).
Microbial populations in the below-euphotic zones (BEZ) of the Hawaii ALOHA Time-series (HOT) and the Bermuda Atlantic Time Series (BATS). (a) Sampling locations. (b) Microbiome structures of the BEZ samples in both sites over an ~2-year sampling period. The top five most abundant species are highlighted. Sampling points are labeled with color-coded triangles along the x axis. (c) pN/pS ratios and nucleotide diversity of the top six most abundant species. Each point on the plot depicts a species-level population in a sample, detected with at least 10× coverage and 0.8 breadth. Data points are color-coded using the same designation as the ones in panel b. “Ca. N. brevis” population clusters are highlighted with arrows. (d) Change in the nucleotide diversity of “Ca. N. brevis” populations in BEZ samples at HOT and BATS over the sampling period. Statistically significant differences between the two sites are noted with asterisks (P < 1E−6 [by a paired t test]).Sample metadata. All samples used in this study and the metadata (date, sampling depth, temperature [degrees Celsius], salinity [parts per trillion], library size, NCBI SRA accession numbers for reads and contigs, and assembly methods) are shown. Download Table S1, CSV file, 0.03 MB.rMAG metadata. Quality and statistics (contamination, completeness, strain heterogeneity, size, N50, broad taxonomic assignments, and ocean specificity) of dereplicated rMAGs that were used to delineate genetically defined populations in this study are shown. Download Table S2, CSV file, 0.03 MB.We detected polymorphisms (or single nucleotide variants [SNVs]) in 5.28% ± 1.1% of the sites across rMAG_Nbrevis. SNVs were spread evenly across the genome (see Fig. S5 at https://doi.org/10.6084/m9.figshare.19357991): 74.8% ± 1.5% of the SNVs were synonymous, 15.8% ± 0.9% were nonsynonymous, and 5.0% ± 0.3% were intergenic. Such a low ratio of nonsynonymous to synonymous SNVs (pN/pS ratio) (0.068 [±0.007]) signifies that these populations have undergone purifying selection (43). We calculated the nucleotide diversity and pN/pS ratios of all rMAGs across samples and found that “Ca. N. brevis” populations have the lowest pN/pS ratio (0.068 ± 0.007) at both sites and a medium level of nucleotide diversity (π, 0.020 ± 0.002) (Fig. 1c visualizes the five most abundant species). We estimated the lower bound of the effective population size (N) for rMAG_Nbrevis to be ~2.3 × 109 based on the nucleotide diversity (πneutral = 0.284 ± 0.037) of nonconserved third codon positions (n = 25,839 ± 5,296). This is within the same order of magnitude as what has been described (N = ~1.5 × 109) for the prochlorococcal ecotype using single-cell genomes (9) and similar to the estimation (N = ~2.6 × 109) for the most abundant Prochlorococcus rMAG in our data set (Cyanobacteriia_123_1).“Ca. N. brevis” populations at HOT had statistically higher nucleotide diversity than those at BATS (P < 1E−6 [by a paired t test]; t = 7.5). In addition, we observed only minor fluctuations in nucleotide diversity in “Ca. N. brevis” populations, with no statistically significant difference in variance between the sites (P > 0.05 [by an F test]) over the 2-year sampling period (Fig. 1d). As established previously (42), nucleotide diversity was not correlated with coverage in these samples (see Fig. S6 at https://doi.org/10.6084/m9.figshare.19358003). Statistically significant differences (P < 0.05 [by Welch’s t test]) in nucleotide diversity between the sites were observed in 35 out of 75 non-“Ca. N. brevis” rMAGs that were present at both sites. The majority (31 out of 35) of the populations were more diverse at HOT than at BATS.
“Ca. N. brevis” within-population diversity is maintained by a high recombination-to-mutation ratio.
Homologous recombination plays an important role in generating and maintaining the genetic diversity of microbial populations. In order to estimate the rate of recombination, we first examined the linkage of SNVs across the genome. Using the paired-end reads (2× 150 bp) with a median insert size of ~250 bp, we calculated the linkage of SNV pairs that were found up to 420 bp apart. A decay in linkage disequilibrium, which is characteristic of recombination, was observed in the “Ca. N. brevis” populations at both sites, as signified by the decrease in the r2 metric of linkage with increasing distance between SNVs (Fig. 2a). We also observed more linkage between nonsynonymous SNVs (N-N linkage) than between synonymous SNVs (S-S) or between nonsynonymous and synonymous SNVs (N-S) (Fig. 2a). This pattern of linkage can result from selective sweeps of nonsynonymous alleles and has previously been identified in genetically diverse populations of soil bacteria (44), in populations of Neisseria gonorrhoeae undergoing balancing selection and adaptive horizontal gene transfer (HGT) (45), and in phylogroup-specific genes of Listeria species (46). The average normalized r2 values across the genome (mean r2) for each population varied significantly between different taxa and were highly negatively correlated with nucleotide diversity (Spearman’s rho = −0.87; P < 1E−10) (Fig. 2b). We estimated the ratio of the recombination rate to the mutation rate (gamma/mu) of “Ca. N. brevis” to be 24.6 (±4.1), approaching levels similar to those of previously described quasisexual microbial populations of Prochlorococcus (47), Synechococcus (48), and Vibrio (49, 50). We estimated the recombination rates of the four other most abundant populations at SW and deep chlorophyll maximum (DCM) sampling depths (Fig. 2c) and found gamma/mu values similar to those of “Ca. N. brevis.” Interestingly, for populations present at both sites (“Ca. N. brevis” and Cyanobacteriia_123_1 [Prochlorococcus spp.]), we detected statistically significant differences in gamma/mu values (P < 0.05 [by Welch’s t test]) between the sites (Fig. 2c), with higher recombination rates estimated for the “Ca. N. brevis” populations at HOT (26.7 ± 3.45) than for those at BATS (22.2 ± 3.48) and higher recombination rates for the Prochlorococcus populations at BATS (52.5 ± 24.1) than for those at HOT (31.8 ± 15.0). This suggests the possibility of site-specific environmental influences on the recombination rate of populations of the same species.
FIG 2
Recombining populations of “Ca. N. brevis.” (a) Decreasing normalized r2 metric with increasing distance between pairs of SNVs in protein-coding sequences. Pairs of SNVs were categorized based on the resulting amino acid change, where N-N denotes a pair of nonsynonymous SNVs, N-S denotes a pair of synonymous and nonsynonymous SNVs, and S-S denotes a pair of synonymous SNVs. The number of SNV pairs included to calculate the average r2 metric in a distance interval is visualized by the size of the data point. (b) Populations across the entire data set (SW, BEZ, and DCM) with >30× coverage displaying a diverse range of linkages between SNVs. Each point on the plot depicts a species-level population in a sample, and the shape of the point denotes the site from which it was derived. The top five most frequently observed species-level populations are color-coded. (c) Recombination-to-mutation ratio (gamma/mu) calculated for the top five most abundant populations with 50× coverage. Statistically significant differences between the gamma/mu values of the two sites are noted with asterisks (P < 0.05 [by Welch’s t test]).
Recombining populations of “Ca. N. brevis.” (a) Decreasing normalized r2 metric with increasing distance between pairs of SNVs in protein-coding sequences. Pairs of SNVs were categorized based on the resulting amino acid change, where N-N denotes a pair of nonsynonymous SNVs, N-S denotes a pair of synonymous and nonsynonymous SNVs, and S-S denotes a pair of synonymous SNVs. The number of SNV pairs included to calculate the average r2 metric in a distance interval is visualized by the size of the data point. (b) Populations across the entire data set (SW, BEZ, and DCM) with >30× coverage displaying a diverse range of linkages between SNVs. Each point on the plot depicts a species-level population in a sample, and the shape of the point denotes the site from which it was derived. The top five most frequently observed species-level populations are color-coded. (c) Recombination-to-mutation ratio (gamma/mu) calculated for the top five most abundant populations with 50× coverage. Statistically significant differences between the gamma/mu values of the two sites are noted with asterisks (P < 0.05 [by Welch’s t test]).
“Ca. N. brevis” at BATS has a larger pangenome with basin-specific accessory genes.
We observed a high variation in gene content among the “Ca. N. brevis” MAGs across samples. We compared the pangenome openness with those of three other abundant bacteria found in the same data set (Prochlorococcus, SAR11, and SAR324) (see Fig. S7 at https://doi.org/10.6084/m9.figshare.19358024) and found that “Ca. N. brevis” featured a level of pangenome openness similar to those of Prochlorococcus and SAR324, while SAR11 exhibited a much more closed pangenome. Using the 43 “Ca. N. brevis” MAGs binned (>98% average nucleotide identity [ANI] with each other), we evaluated the “Ca. N. brevis” pangenome across HOT and BATS and examined the differential coverage of genes between the BEZ samples of the two sites. The pangenome consisted of 3,316 genes, with 3,099 of these being sufficiently long for read mapping. We removed 258 genes that were mapped at a coverage higher than three times the average coverage of rMAG_Nbrevis in the corresponding sample, as they likely represented genes shared with other species or binning errors. The coverage of pangenome genes relative to the average genome coverage varied significantly among genes in a sample, indicating within-population variation in gene content across multiple strains (see Fig. S8 at https://doi.org/10.6084/m9.figshare.19358027). In addition, we found distinct gene contents between sites; we identified 616 genes that were differentially abundant between the sites (51) (adjusted P value of <0.001 by analysis of variance [ANOVA]) (Fig. 3a). Of the differentially abundant genes, we further identified 182 genes that were basin specific, defined as detected (coverage relative to the average genome coverage of >10%) at one site but not detected in any sample from the other site. Interestingly, 158 of the basin-specific genes were specific to BATS, indicating a larger pangenome of “Ca. N. brevis” populations at BATS (see Fig. 5a). Levels of pangenome openness were compared between the sites using 22 “Ca. N. brevis” MAGs from HOT BEZ and 21 from BATS BEZ, and we estimated “Ca. N. brevis” to have a more open pangenome by approximately 2-fold in BATS BEZ (γ = 0.44) than in HOT BEZ (γ = 0.18) (see Fig. S9 at https://doi.org/10.6084/m9.figshare.19358030). In order to confirm that this finding is likely not an artifact of the higher completeness of “Ca. N. brevis” MAGs binned from BATS, we compared the completenesses of the 43 BEZ “Ca. N. brevis” MAGs included in the pangenome analysis and found that those binned from HOT featured statistically higher completeness (P < 0.01 [by Welch’s t test]) (see Fig. S10 at https://doi.org/10.6084/m9.figshare.19358033), suggesting that the difference in the actual pangenome sizes between the two sites may be even larger than that observed in this metagenomic analysis.
FIG 3
Larger pangenome of “Ca. N. brevis” populations at BATS. (a) Differentially present genes with statistical significance (adjusted P value of <0.001 [by ANOVA]) and their abundances relative to the “Ca. N. brevis” populations (estimated by the genome coverage) over time. The first row-side colors denote frequently observed functions, and the second row-side colors highlight basin-specific genes, defined as those not present in the other ocean at >10% of the expected abundance in any of the samples across the 2-year period. (b) Basin-specific genes in rMAG_Nbrevis with elevated linkage (r2) values and lower nucleotide diversity. rMAG_Nbrevis, which was binned from a HOT sample, contained two HOT-specific genes, encoding a phosphorus transporter and a transport regulator.
FIG 5
Temperature-correlated variation in allele frequencies of the “Ca. N. brevis” population at HOT. (a) Principal-coordinate analysis (PCoA) using pairwise F as a distance metric for BEZ samples at HOT and BATS. “Ca. N. brevis” populations in samples collected at different time points are depicted as data points, sized proportionally to the raw coverage. Data points with low coverage (<20×) were excluded from this analysis, and only loci that were detected with a coverage of >20× and within 2 standard deviations from the sample mean across all samples were included in the calculation of pairwise F. The shape and color of the data points correspond to the site and temperature of the corresponding sample, respectively. (b) Frequency fluctuation of temperature-correlated SNVs over the sampling period at HOT. The top panel shows the temperature fluctuation over time, and the bottom panel shows the fluctuation in allele frequencies of all 94 nonsynonymous SNVs (with each color depicting a unique SNV) with frequencies correlated with temperature with statistical significance (adjusted P value of <0.05 [Spearman’s correlation]).
Larger pangenome of “Ca. N. brevis” populations at BATS. (a) Differentially present genes with statistical significance (adjusted P value of <0.001 [by ANOVA]) and their abundances relative to the “Ca. N. brevis” populations (estimated by the genome coverage) over time. The first row-side colors denote frequently observed functions, and the second row-side colors highlight basin-specific genes, defined as those not present in the other ocean at >10% of the expected abundance in any of the samples across the 2-year period. (b) Basin-specific genes in rMAG_Nbrevis with elevated linkage (r2) values and lower nucleotide diversity. rMAG_Nbrevis, which was binned from a HOT sample, contained two HOT-specific genes, encoding a phosphorus transporter and a transport regulator.We examined the putative functions of the differentially present accessory genes and found 19 genes that were involved in phosphate transport. Interestingly, the high-affinity and low-velocity phosphate transporter complex PstSABC (52) (see Fig. S11a at https://doi.org/10.6084/m9.figshare.19358042) was approximately 7-fold more abundant at BATS, while the phosphonate transport system PhnCDE (53) was specific to BATS (see Fig. S11b at https://doi.org/10.6084/m9.figshare.19358042). In contrast, the low-affinity and high-velocity inorganic phosphate transport (Pit) (52) system was specific to HOT (see Fig. S11a at https://doi.org/10.6084/m9.figshare.19358042). Interestingly, the two phosphate systems (PstSABC and Pit) of differing affinity levels were found to substitute for each other at the same location in the genome, adjacent to a phoU gene (see Fig. S11a at https://doi.org/10.6084/m9.figshare.19358042). Furthermore, we identified phylogenetic diversity in the Pit system (a phosphate transport regulator [DUF47] family protein and pit) (see Fig. S12 at https://doi.org/10.6084/m9.figshare.19358048) detected in “Ca. N. brevis” MAGs assembled from HOT. This indicates that this locus (see Fig. S11a at https://doi.org/10.6084/m9.figshare.19358042) may be undergoing frequent substitutions of gene cassettes involved in phosphate transport. The diversity of phosphate transport systems was previously identified by Qin et al. (21); however, the basin-specific preference of the systems had not been identified to date. Basin-specific genes were functionally enriched in membrane transport, redox balance, and transcriptional regulation (Fig. 4a; see also Text S1 in the supplemental material for further details). rMAG_Nbrevis, derived from a HOT sample, contained two genes (DUF47 and pit) specific to HOT, for which we could calculate the r2 and nucleotide diversity values in the context of the rest of the genome. We found significantly higher linkage and lower nucleotide diversity values for these two genes than for the rest of the genome, suggesting the possibility of a recent selective sweep and/or a high level of purifying selection acting on these loci (Fig. 3b).
FIG 4
Differentiated alleles in the shared gene set between HOT and BATS “Ca. N. brevis” populations. (a) Differentiated genes (marked in red) with elevated F values with statistical significance (adjusted P value of <0.05 [by a Z test]). Abbreviations for putative gene functions are as follows: PhoU, phosphate-specific transport system accessory protein; TR, putative transcription regulator; ASNS, asparagine synthase; AlbA, DNA/RNA-binding protein AlbA; TARS, threonine-tRNA ligase; WHTH, winged-helix-turn-helix DNA-binding protein; MIP, MIP channel protein; RpS5, 30S ribosomal protein S5; RpL27, 50S ribosomal protein L27; HTH, HTH_45 domain-containing protein; PII, putative nitrogen regulatory protein P-II. (b) Predicted structure of the PhoU protein and nucleotide substitutions between HOT (pink) and BATS (light blue). Two putative pockets are depicted in dark blue and green rods. Substitutions are highlighted with red spheres, and the corresponding amino acid changes are labeled. (c) F values between all pairs of HOT and BATS populations of the top three most abundant populations in the BEZ besides “Ca. N. brevis.” (d) Elevated linkage (normalized mean r2) for highly differentiated genes (statistically high F) (adjusted P value of <0.05).
Differentiated alleles in the shared gene set between HOT and BATS “Ca. N. brevis” populations. (a) Differentiated genes (marked in red) with elevated F values with statistical significance (adjusted P value of <0.05 [by a Z test]). Abbreviations for putative gene functions are as follows: PhoU, phosphate-specific transport system accessory protein; TR, putative transcription regulator; ASNS, asparagine synthase; AlbA, DNA/RNA-binding protein AlbA; TARS, threonine-tRNA ligase; WHTH, winged-helix-turn-helix DNA-binding protein; MIP, MIP channel protein; RpS5, 30S ribosomal protein S5; RpL27, 50S ribosomal protein L27; HTH, HTH_45 domain-containing protein; PII, putative nitrogen regulatory protein P-II. (b) Predicted structure of the PhoU protein and nucleotide substitutions between HOT (pink) and BATS (light blue). Two putative pockets are depicted in dark blue and green rods. Substitutions are highlighted with red spheres, and the corresponding amino acid changes are labeled. (c) F values between all pairs of HOT and BATS populations of the top three most abundant populations in the BEZ besides “Ca. N. brevis.” (d) Elevated linkage (normalized mean r2) for highly differentiated genes (statistically high F) (adjusted P value of <0.05).Supplemental results on ocean-specific genes. Download Text S1, DOCX file, 0.01 MB.
Regulatory genes are key to the differentiation of “Ca. N. brevis” populations at HOT and BATS.
In order to understand how “Ca. N. brevis” populations differ between HOT and BATS in the shared gene content, we calculated the fixation indices (F) of each shared gene in rMAG_Nbrevis between the two sites. The average F values across all shared genes between the HOT and BATS populations were relatively low, at 3.8% ± 3.2%, indicating that the alleles of most genes were dispersed between the two populations. However, we identified 14 genes with statistically high F values ranging up to 44.2% (adjusted P value of <0.05 [by a right-tailed test]) (Fig. 4a). Notably, these genes with high F values were found colocalized, possibly due to genetic hitchhiking in selective sweep events. We were able to predict the functions of 12 out of 18 genes with high F values and found that the majority of these genes have functions associated with transcriptional regulation. For instance, the gene with the highest F was one of the three phoU genes in rMAG_Nbrevis. Notably, this phoU gene was found adjacent to the Pst and Pit systems (see Fig. S11a at https://doi.org/10.6084/m9.figshare.19358042). We identified eight nonsynonymous substitutions in this phoU sequence at BATS and observed that these substitutions were enriched in or near the two putative pockets (Fig. 4b). Other high-F genes involved in transcription regulation were DNA/RNA-binding protein AlbA (AlbA), a winged-helix-turn-helix DNA-binding protein (WHTH), a putative nitrogen regulatory protein (PII), an HTH_45 domain-containing protein (HTH), and a putative transcription regulator (TR). Additionally, another large fraction of the high-F genes consisted of housekeeping genes, such as 30S ribosomal protein S5 (RpS5), DNA-directed RNA polymerase subunit K (RpoK), asparagine synthetase (ASNS), and threonine-tRNA ligase (TARS), and genes involved in transport, such as MIP channel protein (MIP) and sodium-dependent bicarbonate transport domain-containing protein (SBT). The phoU gene at BATS had significantly lower nucleotide diversity and higher mean r2 values, suggesting higher selective pressure on phoU at BATS than at HOT (Fig. 4c). Similar patterns in the mean r2 and nucleotide diversity characteristic of selection at BATS were observed for WHTH and MIP (see Fig. S13 at https://doi.org/10.6084/m9.figshare.19358054), both of which were located near phoU (see Fig. S11a at https://doi.org/10.6084/m9.figshare.19358042). Using consensus SNVs in neutral third positions in the codon, we estimated that “Ca. N. brevis” populations at the two sites diverged at least hundreds of thousands of years ago (see Text S2 in the supplemental material and Fig. S14 at https://doi.org/10.6084/m9.figshare.19358066 for more details).Supplemental results on the time of divergence between HOT and BATS populations. Download Text S2, DOCX file, 0.01 MB.
HOT “Ca. N. brevis” populations exhibit fluctuations in population structure that correlate with temperature.
We observed greater fluctuations in the temperature and salinity of the BEZ samples from the HOT site than for those from the BATS site (see Fig. S15 at https://doi.org/10.6084/m9.figshare.19358069). In order to quantify the differences in allele frequencies between “Ca. N. brevis” populations in different samples, we calculated the F scores between every pair of samples in alleles that were sequenced with a coverage of >20× and within 2 standard deviations from the mean. We observed higher F values between HOT samples than between BATS samples (t = 4.7; P < 1E−5 [by Welch’s t test]) (see Fig. S16 at https://doi.org/10.6084/m9.figshare.19358090). Using the pairwise mean F as a distance metric, we conducted a principal-coordinate analysis (PCoA) and identified clustering of “Ca. N. brevis” populations at HOT samples with increasing temperature (Fig. 5a). We identified 94 nonsynonymous SNVs with frequency variation correlated with the temperature fluctuations at HOT with statistical significance (adjusted P value of <0.05 [Spearman’s correlation]) (Fig. 5b). The temperature-correlated SNVs were found across 71 genes; the gene harboring the largest number of temperature-correlated SNVs encoded a prohibitin (PHB) domain-containing protein, with 7 out of 16 nonsynonymous SNVs fluctuating in frequencies with correlation with temperature. PHBs are a family of membrane proteins with proposed functions involved in cellular scaffolding and signaling, with lipid- and protein-binding properties (54). Other genes with more than one temperature-correlated SNV included genes with putative regulatory domains, such as a KH (RNA-binding) domain and an HTH (DNA-binding helix-turn-helix) domain, and genes with predicted redox properties, such as vitamin B12-dependent ribonucleotide reductase, ATP-dependent desthiobiotin synthetase, and NADH-quinone oxidoreductase subunit L.Temperature-correlated variation in allele frequencies of the “Ca. N. brevis” population at HOT. (a) Principal-coordinate analysis (PCoA) using pairwise F as a distance metric for BEZ samples at HOT and BATS. “Ca. N. brevis” populations in samples collected at different time points are depicted as data points, sized proportionally to the raw coverage. Data points with low coverage (<20×) were excluded from this analysis, and only loci that were detected with a coverage of >20× and within 2 standard deviations from the sample mean across all samples were included in the calculation of pairwise F. The shape and color of the data points correspond to the site and temperature of the corresponding sample, respectively. (b) Frequency fluctuation of temperature-correlated SNVs over the sampling period at HOT. The top panel shows the temperature fluctuation over time, and the bottom panel shows the fluctuation in allele frequencies of all 94 nonsynonymous SNVs (with each color depicting a unique SNV) with frequencies correlated with temperature with statistical significance (adjusted P value of <0.05 [Spearman’s correlation]).
DISCUSSION
Population-level responses to differentiated selective pressures in the Pacific and Atlantic Oceans.
Characterizing the variation in the genetic diversity of natural microbial populations over space and time is critical for understanding the adaptive strategies and population-level responses to the changing environment. Here, we examined the genetic diversity in populations of “Ca. N. brevis,” one of the most abundant archaeal species, with key ecological functions in global carbon and nitrogen cycling. We contextualize our findings with other abundant members of the community and found that some microbial populations exhibit more genetic differentiation between environments. For instance, among the microbial populations surveyed in the BEZ of HOT and BATS (Fig. 1c), two other archaeal populations, Poseidoniia_95_1 and Nitrososphaeria_111_0, showed statistically significant differences (P < 0.01 [by a t test]) in both nucleotide diversity and pN/pS ratios between the sites, while no significant differences were detected in the other bacterial populations, Gammaproteobacteria_112_0, SAR324_101_1, and Acidimicrobiia_57_1. Future research should investigate the relationship between the genetic diversity of microbial populations and their ecological roles (55), as well as their genomic features (i.e., natural competence genes [56]), to understand how specific microbial populations evolve across environmental gradients.
Standing genetic diversity and environmental fluctuations.
How microbial populations develop resilience against environmental fluctuations is an active area of research (57–59) with important environmental as well as biotechnological implications. One of the proposed mechanisms of population-level adaptation to random fluctuations is standing genetic variation (60), in which preexisting genetic variation within the population can allow rapid adaptation to random changes in the environment. In contrast, environmental changes with periodicity, such as seasonality, have also been proposed to decrease genetic diversity by shortening the exclusion times of subpopulations (61). In this study, we observed higher nucleotide diversity in the “Ca. N. brevis” population at HOT, where more stochastic environmental fluctuations (as proxied by temperature) were observed. Furthermore, we show population structure clustering with temperature at HOT and identified alleles with frequencies that are highly correlated with temperature. We therefore propose that the higher genetic diversity at HOT is linked to the fluctuating conditions in the below-euphotic zone and that it may provide the standing variation needed for resilience against environmental instability at HOT. Correlations between in situ temperatures and sequence heterogeneity, as well as temperature-correlated allele frequency trajectories, have also been observed in marine SAR11 populations (62), suggesting that environmentally mediated fine-scale selection may be a prevalent evolutionary process across cosmopolitan marine microbial populations. Our results can also be compared with those from previous work on the populations of Prochlorococcus at HOT and BATS (8), which revealed a similar pattern of higher diversity at HOT and basin-specific genomic islands. Although both “Ca. N. brevis” and Prochlorococcus populations are more diverse at HOT, the depths at which these populations are found experience very different environmental fluctuation regimes. In shallower depths (SW and DCM), investigated by Kashtan et al. (8), BATS experiences larger seasonal variations, while at the BEZ, HOT experiences more stochastic fluctuations (see Fig. S15 at https://doi.org/10.6084/m9.figshare.19358069). Kashtan et al. (8) attributed the pattern of lower diversity in Prochlorococcus at BATS to the shorter exclusion times as a result of stronger seasonality. Our results, in juxtaposition with those of Prochlorococcus, suggest that the frequency, periodicity, and range of environmental fluctuation may be critical in determining the evolutionary outcome of genetic diversity (63).
Environmental control of the pangenome size.
Microbial populations exhibit high levels of gene content variability; however, how the environment controls the evolution of the pangenome remains enigmatic. One hypothesis is that microbial populations that occupy a more diverse set of environments could evolve more open pangenomes in order to rapidly adapt to various selective pressures (46). Here, we show that “Ca. N. brevis” at BATS has a larger pangenome than at HOT, with more basin-specific genes. For instance, the “Ca. N. brevis” populations at BATS gained different types of high-efficiency phosphorus import systems and lost low-efficiency inorganic phosphorus transport systems, likely in response to the constant phosphorus limitation at BATS (33). We also posit that for “Ca. N. brevis,” BATS is likely an environment with more selective constraints than HOT, based on the more frequent detection of signatures of gene-level selective sweeps in “Ca. N. brevis” populations at BATS than at HOT. A recent study by Liao et al. (46) investigated the pangenome of the Listeria genus across various environments and found that phylogroups that occupy a more diverse range of habitats had more open pangenomes. Our results expand upon this finding by showing that species-level populations occupying environments with differential selective pressures exhibit different sizes of the pangenome and postulating that a larger pangenome is an adaptive response to a more diverse set of selective pressures in an environment. The mechanism of pangenome evolution is an active field of research accelerated by the high-throughput discovery of mobile genetic elements (MGEs). Interestingly, in contrast to Prochlorococcus, for which basin-specific genes were organized into genomic islands that are recombined (9, 64), we did not detect a strong organization of basin-specific genes in “Ca. N. brevis,” suggesting that there may be a disparity in the mechanisms of gene gain and loss between the two species. However, the cooccurrence of sets of basin-specific genes in “Ca. N. brevis” should be further investigated with single-cell sequencing, as shotgun metagenomic sequencing cannot resolve the linkage of gene presence over longer distances in a genome. Importantly, in accordance with a previous survey of various MGEs in Thaumarchaeota (65), we did not detect any identifiable MGEs in “Ca. N. brevis” MAGs, and as such, “Ca. N. brevis” may be employing yet-to-be-characterized mechanisms of genetic material exchange.
Importance of regulatory genes in adaptation across environmental gradients.
Previous surveys of diversity and ecotyping of microbial populations have utilized functional genes due to the ease and cost-effectiveness of amplicon sequencing of conserved genes of known ecological functions (e.g., see references 29 and 66–68). However, we find that most differentiated alleles between HOT and BATS “Ca. N. brevis” populations are enriched in regulatory genes, highlighting the importance of regulatory genes in understanding microbial evolution across diverse environments. Furthermore, many of the alleles with fluctuating frequencies correlating with temperature over time were in genes involved in regulation, and a significant fraction of the basin-specific genes were predicted to have a regulatory function. Our results provide lines of evidence that genetic variation in regulatory genes may be as important to microbial habitat expansion and resilience to temporal environmental fluctuations as modifications in key metabolic enzymes. With the decreasing cost of sequencing, characterization of genetic diversity in these regulatory genes, as well as genes of unknown function, will be critical to our understanding of microbial evolution and ecology.
Conclusions.
The differences in the “Ca. N. brevis” populations of the two oceans exist across multiple dimensions, from population-level diversity to differentiated alleles to basin-specific genes. The key finding in this study is that two strategies of genetic diversification, homologous recombination and gene content variation, may be employed differentially within a population, even at the species level, depending on the environment. “Ca. N. brevis” populations at the HOT observatory were characterized by higher genetic diversity, stronger signatures of recombination, and greater fluctuations in population-wide allele frequencies over time. In contrast, “Ca. N. brevis” populations at the BATS observatory were characterized by a larger pangenome, with auxiliary genes involved in the transport of phosphorus and other nutrients, redox balance, and transcriptional regulation. Our results further highlight the importance of genetic variation among regulatory genes in “Ca. N. brevis” and the potential role that variation plays in ecological success across different locales. This finding prompts future research to expand in scope beyond metabolic genes to include regulatory genes in understanding microbial adaptation and evolution. Furthermore, our results highlight the role of genetic material exchange in thaumarchaeal genome diversification and illustrate our knowledge gap in the lesser-known mechanisms and controls of genetic material exchange in Thaumarchaeota. Current and future research on marine Thaumarchaeota and other cosmopolitan microbial populations should consider how differential modes of genetic diversification might have shaped genomic heterogeneity and flexibility.
MATERIALS AND METHODS
Metagenomic data set and binning of metagenome-assembled genomes.
We used the time series metagenome data set of reads and assembled contigs of HOT (22°45′N, 158°00′W) and BATS (31°50′N, 64°10′W) previously reported by Biller et al. (69). The sampling procedure, library preparation, sequencing, quality filtering, and assembly are described in the original publication (69). We used 64 HOT and 62 BATS metagenomic samples collected approximately monthly between 2003 and 2004 from surface water (SW), the deep chlorophyll maximum (DCM), and the bottom of the euphotic zone (BEZ). Sample metadata can be found in Table S1 in the supplemental material. Assembled contigs with lengths of >1 kbp were used for metagenome-assembled genome (MAG) binning. First, differential coverages of contigs were calculated by all-versus-all mapping of reads against contigs across all samples from each site using Bowtie2 v2.3.2 in sensitive mode (70), which were then filtered at 97% sequence identity. Differential coverages across all 124 samples were used as the inputs for the binning tools CONCOCT (71), maxBin2 (72), ABAWACA (https://github.com/CK7/abawaca), and metaBAT2 (73), and the resulting bins were consolidated using DAS Tool (74). The quality of MAGs was estimated using CheckM v1.1.3 (75).
Dereplication and annotation of MAGs.
Binned MAGs from both sites were quality filtered at >70% completeness and <10% contamination thresholds and dereplicated using dRep (76) at a 96% average nucleotide identity (ANI) threshold. Dereplicated MAGs were designated representative MAGs (rMAGs). Broad taxonomic classifications of rMAGs were made using GTDB-Tk v1.5.0 classify_wf (77). Of the four thaumarchaeal rMAGs, one was determined to belong to “Ca. N. brevis” based on an ANI threshold of >97% using fastANI (78) against the reference genome of “Ca. N. brevis” (19). This thaumarchaeal rMAG (rMAG_Nbrevis) and all other MAGs that were clustered with it by dRep were annotated by first predicting the genes using Prodigal v2.6.3 (79) and then aligning the genes using Diamond v2.0.7.145 (80) against the UniRef100 database (81) with an E value cutoff of 1E−5.
Read mapping, SNV calling, calculation of nucleotide diversity, pN/pS ratio (ratio of nonsynonymous to synonymous polymorphisms), linkage disequilibrium, and estimations of the recombination rate, maximal growth rate, and effective population size.
All rMAGs were combined to create a reference genome database against which paired-end reads (150 bp by 2) from each sample were mapped with Bowtie2 in sensitive mode (70). Genome-wide inStrain (42) profiles were created with default settings, except for minimum percent identity filtering at 94%. Relative abundances of rMAGs were determined using the genome-wide average read mapping coverage. Statistical significances of the seasonality (determined by the sample collection month) in the community compositions for each site and depth were determined via PERMANOVA using the adonis function with 999 permutations on the bray distance matrices calculated using the Vegan package in R. Only rMAGs with an average coverage of >5× and breadth (fraction of the rMAG covered by at least one read) of >0.5 were included for relative abundance calculation and further analyses. inStrain profile (42) was used to call SNVs and calculate nucleotide diversity. pN/pS ratios of genes were calculated for each rMAG in a sample by inStrain profile (42) and were averaged for the calculation of the genome-wide pN/pS ratio. Linkage disequilibrium (r2) was calculated for all pairs of SNVs with at least 20 pairs of connecting reads and normalized by rarefying to 20 read pairs using inStrain (42). The recombination rate was inferred using the mcorr package (82) for all populations with >50× coverage, and gamma/mu values above 100 were discarded. The estimated minimal doubling time based on codon usage patterns was calculated using gRodon (83). The lower bounds of the effective population size (N) of “Ca. N. brevis” were estimated by calculating πneutral in nonconserved third codon positions as previously described by Kashtan et al. (9) In short, the formula N = 1.5πneutral/(μ(3−4πneutral)) (84) was used to estimate N, assuming a mutation rate (μ) of 10−10 mutations per bp per generation (9, 85). Note that this is a lower bound of N because not all third codon positions are neutral, and neutral sites are likely saturated with mutations, thereby underestimating πneutral.
Estimations of pangenome sizes and openness and analyses of basin-specific genes.
Pangenome sizes were compared for HOT and BATS “Ca. N. brevis” populations found in the BEZ habitat. We chose to focus on the BEZ samples because of the consistently high abundances of “Ca. N. brevis” at this sampling depth over the entire sampling period at both sites. For pangenome construction, we used Roary v3.13.0 (86) with flag -s on quality-filtered (completeness of >70% and contamination of <10%) MAGs binned from BEZ samples and clustered with rMAG_Nbrevis at >96% ANI. All genes in the pangenome were extracted to create a reference database to which all reads across all BEZ samples from HOT and BATS were mapped using Bowtie2 v2.3.2 in sensitive mode (70) and filtering for a maximum of 15 mismatches (>90% sequence identity) per read. Relative abundances of genes were determined by normalizing the coverages of genes by the average coverage of rMAG_Nbrevis in each sample, and genes with a relative abundance of >3 were discarded as genes likely also present in populations other than rMAG_Nbrevis. Variation in relative gene coverage in BEZ samples from HOT versus BATS was assessed via ANOVA (51), and the resulting P values were adjusted for multiple-hypothesis tests using the Benjamini-Hochberg procedure (87). Genes with an adjusted P value of <0.01 were classified as basin specific if present at coverages of <10% of the average coverage of rMAG_Nbrevis across all time points in either HOT or BATS. Genetic neighborhoods of phosphate and phosphonate import systems were visualized using easyfig v2.2.2 (88) with a BLASTn E value threshold of 1E−5. Phylogenetic trees of the proteins involved in the inorganic phosphate transport (Pit) system were constructed as follows: relevant proteins were clustered based on UniRef100 annotations, and the top 100 BLAST results of the representative sequences against the NCBI nr database were used for alignments. Alignments were made using MUSCLE (89) followed by BMGE v1.12 trimming (90), and the tree was calculated using IQ-TREE v2.1.2 (91) with flags -m MFP -bb 1000 and visualized using iTOL (92). We also calculated the pangenome openness (γ) using a method previously described by Liao et al. (46) for populations for which >15 medium- to high-quality MAGs could be binned across all samples (all depths and sites).
F calculation and estimation of divergence time between HOT and BATS backbone populations.
The F (fixation index) (measure of differential allele frequencies between populations) value for each “Ca. N. brevis” gene was calculated between two metagenomic samples using a method previously described by Crits-Christoph et al. (44). The mean F was calculated for each BEZ sample pair for which rMAG_Nbrevis was detected at >20× coverage, by averaging the F values of genes that were mapped at >20× coverage in all queried samples. The PhoU protein structure was predicted using Phyre2 (http://www.sbg.bio.ic.ac.uk/phyre2) (93) in intensive mode. Pockets were determined using fpocket2 (94) and visualized using PyMOL (Schrödinger, LLC). Pairwise mean F values were used to create a distance matrix between samples, which was then used for principal-coordinate analysis (PCoA) using the pcoa function in R with cailliez correction. The lower bound for the time of divergence between the HOT and BATS backbone populations was estimated by first identifying the SNVs where the consensus alleles of the two populations differ (here, consensus single nucleotide polymorphisms [SNPs]) using inStrain compare (42). We used the method used previously for Prochlorococcus by Kashtan et al. (9) to calculate the time of divergence based on the number of consensus SNPs in the third base codons by assuming a constant mutation rate of 10−10 mutations per bp per generation and a generation time of 7 days (20).
Statistical analyses and visualization.
All statistical tests were performed using R v4.0.2 (95) and visualized using ggplot2 (96).
Data availability.
Metagenomic reads and contigs used in this study can be found under NCBI BioProject accession number PRJNA385855. rMAG_Nbrevis binned in this study is available in Data Set S1 in the supplemental material.Fasta file of rMAG_Nbrevis. Download Data Set S1, TXT file, 1.1 MB.
Authors: Ashley Shade; Jordan S Read; Nicholas D Youngblut; Noah Fierer; Rob Knight; Timothy K Kratz; Noah R Lottig; Eric E Roden; Emily H Stanley; Jesse Stombaugh; Rachel J Whitaker; Chin H Wu; Katherine D McMahon Journal: ISME J Date: 2012-06-28 Impact factor: 10.302
Authors: Matthew R Olm; Alexander Crits-Christoph; Keith Bouma-Gregson; Brian A Firek; Michael J Morowitz; Jillian F Banfield Journal: Nat Biotechnol Date: 2021-01-18 Impact factor: 68.164
Authors: Mart Krupovic; Kira S Makarova; Yuri I Wolf; Sofia Medvedeva; David Prangishvili; Patrick Forterre; Eugene V Koonin Journal: Environ Microbiol Date: 2019-03-18 Impact factor: 5.491
Authors: Ashley Shade; Hannes Peter; Steven D Allison; Didier L Baho; Mercè Berga; Helmut Bürgmann; David H Huber; Silke Langenheder; Jay T Lennon; Jennifer B H Martiny; Kristin L Matulich; Thomas M Schmidt; Jo Handelsman Journal: Front Microbiol Date: 2012-12-19 Impact factor: 5.640