Brandon Kieft1, Zhou Li2,3, Samuel Bryson4, Byron C Crump5, Robert Hettich2,3, Chongle Pan2,3, Xavier Mayali6, Ryan S Mueller1. 1. Department of Microbiology, Oregon State University, Corvallis, OR, United States. 2. Oak Ridge National Laboratory, Oak Ridge, TN, United States. 3. Graduate School of Genome Science and Technology, The University of Tennessee, Knoxville, Knoxville, TN, United States. 4. Department of Civil and Environmental Engineering, The University of Washington, Seattle, WA, United States. 5. College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, United States. 6. Lawrence Livermore National Laboratory, U.S. Department of Energy, Livermore, CA, United States.
Abstract
Linking microbial community structure to ecological processes requires understanding of the functional roles among individual populations and the factors that influence their distributions. These structure-function relationships are particularly difficult to disentangle in estuaries, due to highly variable physico-chemical conditions. Yet, examining microbe-mediated turnover of resources in these "bioreactor" ecosystems is critical for understanding estuarine ecology. In this study, a combined metagenomics and metaproteomics approach was used to show that the unequal distribution of microbial populations across the Yaquina Bay estuary led to a habitat-specific taxonomic and functional structure and a clear spatial distribution in microbe-mediated capacities for cycling of carbon and nitrogen. For example, size-fractionation revealed that communities inhabiting suspended particulate material encoded more diverse types of metabolisms (e.g., fermentation and denitrification) than those with a planktonic lifestyle, suggesting that the metabolic reactions can differ between size fractions of the same parcel of an estuarine water column. Similarly, communities inhabiting oligotrophic conditions in the lower estuary were enriched in genes involved in central carbon metabolism (e.g., TCA cycle), while communities in the upper estuary were enriched in genes typical of copiotrophic populations (e.g., cell growth, cell division). Integrating gene and protein data revealed that abundant populations of Flavobacteriales and Rhodobacterales encoded similar genomic functions, yet differed significantly in protein expression, dedicating a large proportion of their respective proteomes to rapid growth and division versus metabolic versatility and resource acquisition. This suggested potentially distinct life-strategies between these two co-occurring lineages and was concomitant with differing patterns of positive evolutionary selection on their encoded genes. Microbial communities and their functions across Yaquina Bay appear to be structured by population-level habitat preferences, resulting in spatially distinct elemental cycling, while within each community, forces such as competitive exclusion and evolutionary selection influence species life-strategies and may help maintain microbial diversity.
Linking microbial community structure to ecological processes requires understanding of the functional roles among individual populations and the factors that influence their distributions. These structure-function relationships are particularly difficult to disentangle in estuaries, due to highly variable phyn class="Chemical">sico-chemical conpan>ditionpan>s. Yet, examining microbe-mediated turnover of resources in these "bioreactor" ecosystems is critical for understanpan>ding pan> class="Chemical">estuarine ecology. In this study, a combined metagenomics and metaproteomics approach was used to show that the unequal distribution of microbial populations across the Yaquina Bay estuary led to a habitat-specific taxonomic and functional structure and a clear spatial distribution in microbe-mediated capacities for cycling of carbon and nitrogen. For example, size-fractionation revealed that communities inhabiting suspended particulate material encoded more diverse types of metabolisms (e.g., fermentation and denitrification) than those with a planktonic lifestyle, suggesting that the metabolic reactions can differ between size fractions of the same parcel of an estuarinewater column. Similarly, communities inhabiting oligotrophic conditions in the lower estuary were enriched in genes involved in central carbon metabolism (e.g., TCA cycle), while communities in the upper estuary were enriched in genes typical of copiotrophic populations (e.g., cell growth, cell division). Integrating gene and protein data revealed that abundant populations of Flavobacteriales and Rhodobacterales encoded similar genomic functions, yet differed significantly in protein expression, dedicating a large proportion of their respective proteomes to rapid growth and division versus metabolic versatility and resource acquisition. This suggested potentially distinct life-strategies between these two co-occurring lineages and was concomitant with differing patterns of positive evolutionary selection on their encoded genes. Microbial communities and their functions across Yaquina Bay appear to be structured by population-level habitat preferences, resulting in spatially distinct elemental cycling, while within each community, forces such as competitive exclusion and evolutionary selection influence species life-strategies and may help maintain microbial diversity.
Estuaries host communities of microorganisms that influence the exchange of nutrients between terrestrial, freshn class="Chemical">water, anpan>d marine biomes (Cole anpan>d Caraco, 2001; Bauer et al., 2013). These boundary ecosystems oftenpan> exhibit stronpan>g spatiotemporal gradienpan>ts in salinity, turbidity, anpan>d resource availability, resulting in microbial communities with highly variable characteristics across these conpan>ditionpan>s (Crump et al., 2004; Jeffries et al., 2016). Biotic anpan>d abiotic dynamism is particularly evidenpan>t in estuaries during winter alonpan>g the Oregonpan> coast whenpan> heavy overlanpan>d precipitationpan> anpan>d steep nearshore topography cause coastal river flooding evenpan>ts that tranpan>sport pan> class="Chemical">significant amounts of suspended particulate matter into estuarinewater columns (Hickey and Banas, 2003; Hastings et al., 2012; Goñi et al., 2013).
Microbes colonizing such particles represent an important component of aquatic biogeochemical cycles by acting to liberate particulate n class="Chemical">carbon anpan>d nutrienpan>ts to the planpan>ktonpan>ic microbial loop through decompopan> class="Chemical">sition processes, and by supplementing higher trophic levels of the food web through resulting biomass production (Goulder, 1977; Wainright, 1990; Brown and Ozretich, 2009; Stocker, 2012). Studies examining microbial communities in Oregon estuaries have observed significant differences in the phylogenetic composition and respiratory activity between particle-associated and free-living microbial communities and across marine–estuarine–riverine gradients (Crump et al., 1998, 1999, 2017; Smith et al., 2013). Beyond phylogenetic characterization, however, the functional roles of microbial communities and their relative influence on ecological processes occurring in different habitats within heterogeneous estuarine systems have seldom been examined (Simon et al., 2014).
Applying a combination of high-throughput n class="Chemical">DNA sequenpan>cing anpan>d metaproteomics technpan>iques, we investigated the phylogenpan>etic anpan>d functionpan>al structure of microbial communities with two conpan>trasting, operationpan>ally defined lifestyles (particle-associated anpan>d free-living) in two spatially separated locationpan>s (upper anpan>d lower) of Yaquina Bay, Oregonpan>, Unpan>ited States. Upan> class="Chemical">sing these data, we estimated how the mechanisms of microbe-mediated carbon and nitrogen turnover differed between the four habitats. This was achieved using previously identified metabolic marker genes and proteins, whose relative abundances provide a proxy for different components of aquatic microbe-mediated biogeochemical cycles (Lauro et al., 2011; Llorens-Marès et al., 2015). We hypothesized that habitat preference (i.e., a distribution biased toward a specific habitat) at the population level would lead to communities with distinct taxonomic and functional structures, corresponding with habitat-specific carbon and nitrogen cycling on both large (kilometers, between locations) and small (microns, between particle-attached and free-living) spatial scales (Hyndes et al., 2014) in the Yaquina Bay estuary.
To understand the ecology of successful microorganisms in this ecosystem, we also reconstructed the encoded genomes and expressed proteomes at the time of sampling from 15 populations in two dominant lineages (Flavobacteriales and Rhodobacterales) to determine whether their life-strategy – i.e., the set of traits used for survival, growth, and reproduction (Barnard et al., 2013) – contributed to their pervan class="Chemical">sivenpan>ess across habitats. To determine if life-strategy of the two taxa was conpan>sequenpan>tial for their coexistenpan>ce anpan>d codominanpan>ce (i.e., whether they displayed overlapping or divergenpan>t life-strategies anpan>d, thus, occupied a pan> class="Chemical">similar or different niche space), we compared the relative abundances of their expressed proteins and their protein-coding genes. Similarly, we examined positive evolutionary selection in these two lineages using classical analyses of synonymous and non-synonymous mutation rates on protein-coding genes to determine whether evolutionary pressure on each lineage was related to its life-strategy.
Materials and Methods
Sample Collection and Processing
Surface n class="Chemical">water samples were collected onpan> November 22, 2014 from two pan> class="Chemical">sites in Yaquina Bay (OR, United States). Samples were taken from the top 1 m of the water column at a tidal height of ∼1.8 m near slack current (-0.1 m/s) in both the riverine end-member of the estuary (hereafter, “upper”: 44.58°N, -123.99°W) and the coastal estuary mouth (hereafter, “lower”: 44.62°N, -124.04°W), which are separated by 8 km (Supplementary Figure S1). Along with tidal influence, Yaquina River flow was ∼17 m3/s (Oregon Water Resources Department, Station ID 14306030), resulting in upper and lower estuary sampling sites that represent differing salinity and nutrient zones during our sampling period (Brown and Ozretich, 2009; Shafer et al., 2016). Triplicate 12 L water samples were collected in acid-washed carboys and filtered through in-line 3-μm and 0.22-μm PES membranes (Pall Corporation, Ann Arbor, MI, United States) to recover operationally defined particle-associated and free-living microbial communities, respectively. Collected biomass was stored at -80°C until processing.
Community DNA was extracted using the CTAB protocol described in (Doherty et al., 2017). Total protein was extracted using SDS lysis method detailed in (Bryson et al., 2016, 2017). Extractions from 1 and 11 L of water yielded sufficient quantities of analytes for further analyses (1–5 μg DNA and 150–300 μg of protein, respectively).
16S Amplicon Library Construction and Analysis
The v4 locus of the 16S rRNA gene was amplified and sequenced on the Illumina MiSeq (Illumina, Inc., San Diego, CA, United States) un class="Chemical">sing the two-step PCR protocol of the Nextera XT index kit. First-step PCRs were performed with 30 cycles upan> class="Chemical">sing ∼50 ng of DNA, AccuStart II PCR ToughMix Polymerase (QuantaBio, Beverly, MA, United States) following manufacturer instructions, and primers encoding universal prokaryotic v4-complement sequences (515F and 806R) (Takahashi et al., 2014). AMPure XP beads were used for product purification. Second-step PCRs were performed per Nextera XT kit instructions.
Libraries were normalized by concentration, pooled, and sequenced un class="Chemical">sing anpan> Illumina MiSeq v2 kit (251 bp, paired-enpan>d reads), genpan>erating >17,000 reads per library. Reads were trimmed anpan>d quality filtered upan> class="Chemical">sing FastQC and Sickle (Joshi and Fass, 2011). Low quality (<25 phred scores across 15 base sliding windows) sequences within reads were trimmed and >200 bp reads were retained. The “mothur” program was used to remove adapter regions and for read-pair assembly, unsupervised OTU clustering (97% identity), taxonomic assignment with the RDP classifier, rarefaction, relative abundance calculations, and OTU table generation (Schloss et al., 2009).
Metagenome Sequencing, Assembly, and Annotation
n class="Species">Metagenome libraries were prepared upan> class="Chemical">sing Nextera XT (Illumina, Inc., San Diego, CA, United States) and Wafergen (Wafergen Bio-systems, Fremont, CA, United States) kits. Libraries were quantified with Bioanalyzer HS-DNA Chips, normalized by concentration, and pooled. Metagenome sequencing was performed on Illumina HiSeq 3000 using 151 bp long, paired-end reads and yielded >16 Gb per library (>32 M reads). Metagenome reads were filtered as defined for 16S amplicon reads, with the exception that the length threshold was 100 bases. IDBA-UD (Peng et al., 2013) was used for de novo metagenome assembly of the combined read sets from all three replicate community DNA samples in each of the four habitats (i.e., upper and lower estuary communities with both particle-associated and free-living lifestyles). All 16S-amplicon and metagenome sequences are available from NCBI (BioSample SAMN04917373, BioProject PRJNA320136, short read archive SRS1422236 with accession numbers SRX1738750-90 for 16S-amplicon reads, and SRX1738728-32 for metagenome reads).
Predicted coding sequences (CDS) on contigs were determined un class="Chemical">sing Prodigal (Hyatt et al., 2012). Taxonpan>omic aspan> class="Chemical">signment of CDS was based on best hits to the RefSeq database (downloaded June 14, 2014) using the Diamond algorithm in “blastp” mode (Buchfink et al., 2015). Diamond blastp was chosen for annotation to the Class taxonomic level based on speed and sensitivity tests comparing the algorithm with traditional blastp and Diamond blastp in sensitive mode (Supplementary Figure S2). The GhostKoala server was used to functionally annotate each CDS by assigning a KEGG Orthology (KO) number (Kanehisa et al., 2016). Additional functional assignments of CDS were based on best rpsBLAST hits to protein families from the Pfam (Finn et al., 2016) and cluster of orthologous genes (COG) (Huerta-Cepas et al., 2016) databases.
Read depths of contigs/scaffolds and of CDS were defined by separately aligning reads from all libraries against each of the four resulting assembles un class="Chemical">sing the Bowtie2 program (Lanpan>gmead anpan>d Salzberg, 2012). The relative abundanpan>ces of CDS were approximated upan> class="Chemical">sing the reads per kilobase of genome equivalents (RPKG) metric, which was calculated with “MicrobeCensus.” This metric uses CDS read-depth and a genome equivalent metric based on metagenome library size and single-copy marker genes in order to normalize relative abundances of CDS across metagenomes (Nayfach and Pollard, 2015).
Contigs from each n class="Species">metagenome were automatically binned into organpan>ism-specific sets upan> class="Chemical">sing the “maxbin2.1” program, which uses contig read-depths and tetranucleotide (TN) frequency data for bin assignment (Wu et al., 2016). More than 600 total bins were auto-generated. The lineage-specific workflow of “CheckM” (Parks et al., 2015) was used to assess initial bin quality by calculating completeness and contamination metrics. Bins with >85% completeness (83 bins; bolded typeface in Supplementary Table S1) were manually curated using “Vizbin” (Laczny et al., 2015) and “mmgenome” (Karst et al., 2016). Outlying contigs were identified in plots of GC content, TN frequency, and read-depths and removed. Resulting bins were reassessed in “CheckM,” and those with >85% completeness and <10% contamination were analyzed further (40 bins; bolded and italicized typeface in Supplementary Table S1). Taxonomy of manually curated bins was determined by BLAST-based annotation of binned CDS, by comparison to the “amphora2” marker gene set (Wu and Scott, 2012), and with “CheckM” rerun with the taxon-specific workflow. Thirty-nine bins with >85% completeness and <10% contamination were retained for further population-level analyses.
Peptide Mass Spectrometry, Database Construction, and Protein Identification
Extracted proteins were digested with trypn class="Chemical">sin at room temperature (enpan>zyme:substrate ratio of 1:100 w:w). Twenpan>ty-five micrograms of pan> class="Chemical">peptides were analyzed for each of the 12 metaproteomes using 2D-Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) with previously described conditions (Bryson et al., 2016). LC-MS/MS measurements were performed on an LTQ Orbitrap Elite mass spectrometer (Thermo Scientific, Waltham, MA, United States) (Washburn et al., 2001). MS and MS/MS scans were acquired with a resolution of 30,000 and 15,000, respectively, and the 10 most abundant precursor ions were selected for MS/MS analysis by higher-energy collisional dissociation.
n class="Chemical">Sipros Enpan>semble (Guo et al., 2017) was used to search all MS spectra against a peptide database conpan>structed from all pan> class="Species">metagenome CDS clustered at 100% identity using cd-hit (Li and Godzik, 2006), and with reverse sequences of all peptides added as decoys to calculate false discovery rate (FDR) (Bryson et al., 2016). Searches considered 7–60 residue peptides with a maximum of two missed tryptic cuts. The two-peptide rule – one unique plus one shared peptide, or two unique peptides – was used to determine confident protein identifications, and a 1% FDR at the peptide level was implemented as in (Bryson et al., 2016). Normalized balanced spectral counts (NBSC) of peptides for identified proteins were used as measures of protein relative abundances across all samples (Supplementary Table S2). The proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Vizcaíno et al., 2016) partner repository with the dataset identifier PXD008093.
DNA Sequence Analyses
The relative abundances of KO, Pfam, and COG protein families within each metagenome were approximated based on the cumulative RPKG of all CDS assigned to a given protein family. The relative abundance of COG categories was calculated by summing RPKG for all CDS assigned to COG numbers in each COG category.To infer n class="Chemical">carbon anpan>d pan> class="Chemical">nitrogen cycling potential of each microbial community, relative abundances of metabolic marker genes (KO numbers) defined in previous studies (Lauro et al., 2011; Llorens-Marès et al., 2015; Vavourakis et al., 2016) were calculated using marker KO RPKG ratios. A total of 40 marker genes were used in the analysis, representing 10 microbe-mediated elemental cycling reactions (Supplementary Table S3). Metabolic potential for each metagenome was calculated as in (Lauro et al., 2011) and averaged across replicates. The relative abundance of proteins (in NBSC) encoded by marker KO genes were similarly analyzed. Statistically significant differences in the relative abundances of element cycling processes between habitats were assessed using ANOVA with the Tukey–Kramer post hoc test and Benjamin-Hochberg FDR controlled at 5% using the “STAMP” software package (Parks et al., 2014).
OTU relative abundances based on 16S amplicon sequencing data were summed at Order-level taxonomic groupings for community divern class="Chemical">sity anpan>alyses. Biplots of principal componpan>enpan>t anpan>alyses (PCA) examining the beta-diverpan> class="Chemical">sity of communities were produced using the “pca3d” library in R (Weiner, 2017). Weighted Unifrac calculations, PCA, Mantel tests, Procrustes analyses, and permutational ANOVA tests were performed in R using the “vegan” package (Oksanen et al., 2017). Faith’s (1992) phylogenetic alpha-diversity was calculated using the “picante” package and significant differences between samples were defined using pairwise Welch’s t-tests. R scripts and input data for statistical calculations can be found in Supplementary Table S2.
n class="Chemical">Significanpan>tly discriminating taxonpan>omic anpan>d functionpan>al features betweenpan> communities were idenpan>tified upan> class="Chemical">sing the linear discriminant analysis (LDA) method implemented in “LEfSe” (Segata et al., 2011). “Enrichment” of a feature refers to one with an LDA score >2.0 (log 10) using the strict all-against-all multi-class analysis and a p < 0.05 cutoff for the among- and between-class LEfSe tests. LEfSe, by default, performs significance testing on normalized relative abundances. Thus, a correction variable was added to input files when the feature abundance was in RPKG, which is based on a unique genome equivalent metric calculated for each sample (Nayfach, personal communication).
Positive Selection Detection on Groups of Orthologous Proteins
ProteinOrtho was used to predict orthologous protein groups (OGs) shared between populations (Lechner et al., 2011). OGs were scanned for evidence of horizontal transfer un class="Chemical">sing GENECONV, anpan>d those with pan> class="Chemical">significant hits were not considered in further analyses (Sawyer, 1989; Posada and Crandall, 2001). The “ETE3” toolkit (Huerta-Cepas et al., 2016) was used to discover evidence of positive selection on OGs using Codeml and PAML. Clustal Omega with trimAL and RaxML were implemented for multiple codon sequence alignments and tree-building, respectively (Sievers et al., 2011; Stamatakis, 2014). Trees were visualized with the Interactive Tree of Life server (Letunic and Bork, 2016).
Models M1 vs. M2, M7 vs. M8, and M8 vs. M8a were used as sets of null and alternative models, respectively, for defining a threshold of n class="Chemical">significanpan>ce for popan> class="Chemical">sitive selection (Yang et al., 2000). A significant difference in dN/dS ratio (omega) based on likelihood ratio tests (LRT) of model outcomes with the BEB posterior probability method was used to infer the best model fit for a given OG alignment (Yang et al., 2005). LRT p-values from all tests were corrected by controlling at a FDR of 5%. If an OG had a corrected q-value < 0.05 under any of the three models, this gene group was considered to be under putative positive selection (e.g., as in Tang and Zhang, 2007).
Results
Community Taxonomic Structure and Diversity
Amplicon sequencing of 16S v4 loci and clustering at 97% identity yielded 6,792 OTUs from ∼240,000 total paired-end reads (>17,000 paired-end reads per replicate) across all four sampled habitat communities (Supplementary Table S4). The OTU-level Unifrac distance matrix and Order-level Euclidean distance matrix calculated from OTU counts yielded highly similar community beta-diverpan> class="Chemical">sity patterns (Mantel’s test, Pearson r = 0.962, p < 0.001), and PCA likewise resulted in similar data shape between these two metrics (Procrustes symmetric correlation = 0.982, p < 0.001). Therefore, community structure analysis was conducted using Euclidean PCA and biplotting of Order-level taxa relative abundances across samples to determine which groups most strongly drove differences in community structure (Figure ). PCA indicated that 23.4% of community structure variance was explained by sampling location and 20.1% by lifestyle (Figure ). However, only location significantly differentiated overall sample grouping, while lifestyle did not (Permutational ANOVA, perms = 999; df = 11; location: p = 0.002, R2 = 0.54; lifestyle: p = 0.14, R2 = 0.85). In contrast, community alpha-diversity was not different between the upper and lower estuary communities (p = 0.45). Rather, communities in the particle-associated lifestyle were significantly more diverse than their free-living counterparts both overall (t-test, p < 0.0001), and within each location (each p < 0.01; Supplementary Figure S3).Euclidean distance PCA with feature biplots of taxonomic community structure. Plotted data are relative abundance of 16S OTUs grouped at Order-level taxa. Sample triplicates per habitat (points) are plotted alongn class="Chemical">side a set of the most importanpan>t Orders driving sample separationpan> (vectors). Arrow directionpan> anpan>d magnitude represenpan>t both the covarianpan>ce betweenpan> features (Orders) anpan>d their effect onpan> PC loadings. pan> class="Species">Ellipses represent 95% confidence intervals for centroid positions. FL, free-living; PA, particle-attached.
Population-Level Habitat Preferences
Of the 6,792 OTUs recovered, 157 (2.3%) were overrepresented in one of the four estuary habitats based onn class="Chemical">significanpan>t relative abundanpan>ce chanpan>ges across samples upan> class="Chemical">sing linear discriminant analysis (LDA score >2). Despite being a minor subset of all OTUs, these populations represented 75.5% of all sampled 16S reads, indicating that relatively high-abundance populations exhibited habitat preference (Supplementary Table S5).
The three most abundant Orders in all samples were Pelagibacterales, Flavobacteriales, and Rhodobacterales, collectively representing 39–54% of total 16S reads (Supplementary Figure S4). The relative habitat preferences of several populations in these three lineages influenced divergent habitat taxonomic structures (Figure ). For example, the SAR11 clade member HTCC1062 (Pelagibacterales) was cumulatively the most abundant Yaquina Bay organism (OTUs 1, 14, 55) and exhibited clear habitat preference for the free-living fraction of the marine-influenced lower estuary (each OTU LDA score >3.2; Supplementary Table S5). In the Flavobacteriales and Rhodobacterales lineages, notable populations with n class="Chemical">significanpan>t habitat preferenpan>ce included a Cellulophaga (OTU 2; Flavobacteriales) in the free-living lifestyle of the upper estuary (LDA = 4.8) anpan>d a Thalassobius (OTU 4; Rhodobacterales) in the particulate-associated lifestyle of the upper estuary (LDA = 4.5).
An archaeal population of Nitrosopumilales (OTU 30), which represented >98% of estuary Thaumarchaeota, was strongly associated with the free-living fraction of the lower estuary (LDA score = 3.6), caun class="Chemical">sing this group to be as influenpan>tial as Pelagibacterales in driving unique taxonpan>omic structure of the habitat despite its lower relative abundanpan>ce (Figure ). Likewise, OTUs belonpan>ging to lineages that typically inhabit anpan>aerobic or microaerophilic enpan>vironpan>menpan>ts (e.g., Desulfovibrionpan>ales) were relatively rare community members (cumulatively 0.5–2.5%; Supplemenpan>tary Figure S5), but their enpan>richmenpan>t in the particle-associated habitats drove distinct alpha-diverpan> class="Chemical">sity (Supplementary Figure S4) and beta-diversity structure (Figure ) of this lifestyle in each estuary location.
Community Functional Structure
Assembly of ∼62 Gbp shotgunn class="Species">metagenome pan> class="Chemical">DNA sequence yielded ∼1 M contigs (1.8 Gbp) and ∼2.5 M predicted CDS across all four sampled habitats (Supplementary Table S4). Between 71 and 78% of CDS were assigned to a taxonomic group and between 32 and 58% were annotated to COG, KEGG, or Pfam protein families. Functional structure based on KEGG-annotated CDS relative abundance was significantly correlated to taxonomic structure inferred from 16S analyses (Mantel’s test, Pearson r = 0.832; p < 0.0001), leading to the similar ordination pattern (Figures , ).
Euclidean distance PCA of functional community structure. Plotted data are KO RPKG relative abundance across samples based onn class="Species">metagenome CDS anpan>notationpan>. pan> class="Species">Ellipses represent 95% confidence intervals for centroid positions. FL, free-living; PA, particle-attached.
Of the 4,764 KEGG protein families identified across all n class="Species">metagenomes, 432 (9.1%) were biased in their distributionpan> betweenpan> samples (Supplemenpan>tary Table S6). Several functionpan>al pathways showed clear patterns of habitat enpan>richmenpan>t based onpan> the distributionpan> of KEGG families (Figure anpan>d Supplemenpan>tary Figure S6), including photosynthepan> class="Chemical">sis in the particulate-associated fractions (Supplementary Figure S6A), ABC transport in the free-living fractions, flagellar motility in the upper estuary (Supplementary Figure S6C), TCA cycle and glycolysis in the lower estuary (Supplementary Figures S6D,E), and ribosomal and transcription in the upper estuary (Supplementary Figures S6F,G).
Community Carbon and Nitrogen Cycling Capacities
Out of 40 marker genes for 13 biogeochemical flux processes, 31 genes and 10 processes were identified in our n class="Species">metagenome assemblies (Supplemenpan>tary Table S3). Most mispan> class="Chemical">sing genes were involved in processes not expected to occur at detectable levels in aerobic estuarine surface water (anammox and methanogenesis). Of the 31 metagenome-encoded marker genes, 21 proteins were identified in metaproteomes (Supplementary Table S3). The relative abundance of marker genes and marker proteins within each habitat were correlated, indicating that the gene abundance for C- and N-cycling processes was generally a useful indicator of protein abundance within our samples (Figure ).
Major processes within the (A) C- or (B) N-cycles and relative abundance of biological marker genes that catalyze fluxes. Arrow n class="Chemical">sizes show relative abundanpan>ce of total marker genpan>e RPKG for each process. Bar plots show marker genpan>es (left-axis) anpan>d lines plots show marker proteins (right-axis). Dotted arrows represenpan>t a process with no recovered marker genpan>es or proteins. Axis labels anpan>d scales are equivalenpan>t for all plots anpan>d onpan>ly drawn onpan> onpan>e plot for clarity. pan> class="Chemical">Significantly non-equal potential across habitats was determined with a FDR-corrected ANOVA: q < 0.05). Error bars represent standard error of the mean for each set of replicate samples. MP, metaproteome; MG, metagenome in ANOVA results.
As expected, the encoded genetic capacity (n class="Species">metagenome) for aerobic C respirationpan> is likely the dominanpan>t pathway of estuary C flux anpan>d was equal across all habitats (FDR-corrected ANOVA, q = 0.18; Figure ). Aerobic C fixationpan> potenpan>tial, onpan> the other hanpan>d, was pan> class="Chemical">significantly higher in the particle-associated fractions of both locations, and marker gene relative abundance for anaerobic C fixation and fermentation were overrepresented in particle-attached community of the lower estuary (Figure ). Notably, carbon monoxide (CO) oxidation genes were highly enriched in upper estuary samples, and over 68% of CO oxidation capacity was annotated to the Roseobacter clade (e.g., Roseovarius, Ruegeria).
With regard to N-cycling processes, the encoded capacity for N asn class="Chemical">similationpan> is likely the dominanpan>t pathway of estuary N flux anpan>d was equal across all habitats (FDR-corrected ANOVA, q = 0.13; Figure ). Importanpan>tly, though, enpan>coded community capacity for N mineralizationpan> appeared to be highest in the particle-attached lifestyles in each locationpan>. Marker genpan>e anpan>d marker protein relative abundanpan>ces suggested enpan>richmenpan>t in ammonpan>ificationpan> anpan>d denpan>itrificationpan> potenpan>tial in the marine-influenpan>ced lower estuary, particularly in the particle-associated communities (Figure ). Onpan> the other hanpan>d, marker genpan>e abundanpan>ce for N fixationpan> was pan> class="Chemical">significantly enriched in the upper estuary, though detected proteins did not show this trend.
A comparison of the taxonomic distribution of 16S rRNA reads, n class="Species">metagenome CDS, anpan>d the N or C marker genpan>es showed that manpan>y taxa appear to conpan>tribute disproportionpan>ately to community biogeochemical cycling capacity compared to their relative abundanpan>ce in the community (Supplemenpan>tary Figure S7). These taxa anpan>d processes included ammonpan>ificationpan> in Eppan> class="Chemical">silon-proteobacteria (ammonification: 85%, 16S: 0.4%), fermentation in Aquificae and Planctomycetia (fermentation: 63%, 16S: 0.02%; fermentation: 9%, 16S: 0.7%), N fixation in Alpha-proteobacteria (N fixation: 81.3%, 16S: 15.3%), and aerobic C fixation in Eukaryota (C fixation: 10%, 16S: 1.4%).
Population-Level Life-Strategies
Flavobacteriales and Rhodobacterales were ubiquitous, though unevenly distributed (Figure ), across all samples; their combined pangenome (>450,000 CDS) and panproteome (>4,500 proteins) represented 24.7 and 31.4% of total sampled genes and proteins, respectively (Figure ). Fifteen well-curated population Flavobacteriales and Rhodobacterales genomes were defined from the n class="Species">metagenome conpan>tig binning anpan>d curationpan> process, while bins from the other relatively abundanpan>t taxonpan>, Pelagibacterales, could not be eapan> class="Chemical">sily resolved (data not shown). This was reflected in 16S v4 diversity; of OTUs over 0.1% relative abundance, three were in Pelagibacterales, while 18 were Flavobacteriales and 10 were Rhodobacterales. The robust dataset of genes, proteins, and curated genomes made Flavobacteriales and Rhodobacterales lineages good candidates to test whether co-dominance across the estuary was corollary to divergent strategies for growth and resource acquisition, which is a hypothesis that has been explored for co-dominant microbial populations in other ecosystems (e.g., Violle et al., 2011).
Comparison of (A) pangenome and (B) panproteome content of Rhodobacterales (Rb) and Flavobacteriales (Fb). Plots show cumulative relative abundance of lineage-specific (A) CDS and (B) proteins annotated to n class="Chemical">COG numbers anpan>d grouped at pan> class="Chemical">COG category level (represented by letters as points on each plot). Input and results of correlation analysis from genomic and proteomic data are shown above each plot. The inset in (B) is an expansion of the origin region (0–0.02 on x-axis and 0–0.015 on the y-axis). COG categories are colored according to their classification: blue, information storage and processing; green, metabolism; red, cellular processes and signaling; black, unclassified. Dotted lines indicate a 1:1 relationship. COG categories shown but not labeled are: D, cell cycle control, cell division, chromosome partitioning; M, cell wall/membrane/envelope biogenesis; N, cell motility; O, post-translational modification, protein turnover, and chaperones; T, signal transduction mechanisms; U, intracellular trafficking, secretion, and vesicular transport; V, defense mechanisms; B, chromatin structure and dynamics; L, replication, recombination and repair; C, energy production and conversion; F, nucleotide transport and metabolism; I, lipid transport and metabolism; R, general function prediction only; S, function unknown.
While the Flavobacteriales and Rhodobacterales pangenomes, as determined by the relative abundance of all n class="Species">metagenome CDS anpan>notated to each group, were correlated at the conpan>servative functionpan>al claspan> class="Chemical">sification level of COG category (Pearson’s ρ = 0.81), their panproteomes, as determined by the relative abundance of all proteins annotated to each group, were weakly correlated at this broad resolution (ρ = -0.07), indicating that proteins of different functions were being expressed by each group during sampling (Figure ).
The panproteome of Rhodobacterales populations was relatively enriched for metabolism functions, with over 50% of proteins annotated to amino acid (E), n class="Chemical">carbohydrate (G), nucleotide (F), anpan>d seconpan>dary metabolite (Q) tranpan>sport anpan>d metabolism pan> class="Chemical">COG categories (Figure ). In contrast, Flavobacteriales populations devoted just 6.8% of proteins to these four COG categories and were instead enriched in translation (J, 27.4%) and transcription (K, 16.7%) functions relative to Rhodobacterales (7.4 and 2.1%, respectively). Flavobacteriales inorganic ion transport and metabolism proteins was also highly disproportionate (P, 27.0%) relative to Rhodobacterales (3.1%), though the majority of these proteins (>93%, predominately SusC and SusD) are involved in the Bacteroidetes starch utilization system (Reintjes et al., 2017).
Evolutionary Pressure on Life-Strategies
The 10 Flavobacteriales and five Rhodobacterales population genomes recovered by n class="Species">metagenome conpan>tig binning had anpan> average completenpan>ess of 93.84% anpan>d anpan> average conpan>taminationpan> of 2.68% (Supplemenpan>tary Table S7). Inpan> total, 73 anpan>d 138 pan> class="Chemical">single-copy, orthologous gene groups (OGs) were shared in the Flavobacteriales and Rhodobacterales populations, respectively. A concatenated amino acid alignment of these OGs was used to estimate the phylogenetic relationship between populations in each lineage (Supplementary Figure S8), and positive selection on OGs was determined based on the classic dN/dS substitution ratio (Yang et al., 2000). Positive selection was observed in seven (9.6%) Flavobacteriales and 10 (7.2%) Rhodobacterales OGs (Table ). Model statistics are reported in Supplementary Tables S8, S9. Similar rates have been reported in other non-pathogenic, or host-dependent, microbial populations (e.g., 4.5% in Nandi et al., 2010; 9% in Bulgarelli et al., 2015).
Genes in the Rhodobacterales and Flavobacteriales lineages that are under putative pon class="Chemical">sitive selection.
In Rhodobacterales, over half of genes under selection were related to resource transport and metabolism (Table ). The pon class="Chemical">sitively selected genpan>es glnG, pan> class="Gene">mgtE, phoU, and aapM code for proteins that regulate or directly facilitate the import and utilization of metabolic substrates. In addition, gene sequences of the metabolic enzymes chlN and argF, as well as a predicted sulfurtransferase, thioesterase, and amidase, each were under positive selection.
In contrast, pon class="Chemical">sitively selected genpan>es in the Flavobacteriales were typically aspan> class="Chemical">signed to cell growth, repair, and replication functions (Table ). A histidyl-tRNA synthetase gene (hisS), a DNA replication and repair gene (recR), and two outer-membrane biogenesis and protein folding chaperones (hlpA and ftsI) were each under positive selection. The positively selected gene purB encodes the enzyme adenylosuccinate synthetase, which is involved in purine nucleotide biosynthesis during cellular replication and division.
Discussion
The taxonomic structure of n class="Chemical">estuarine bacterioplanpan>ktonpan> communities is knpan>own to vary across salinity gradienpan>ts (e.g., Crump et al., 1999; Smith et al., 2013) anpan>d betweenpan> particle-attached anpan>d free-living fractionpan>s (e.g., D’Ambropan> class="Chemical">sio et al., 2014; Zhang et al., 2016), but little is known about the physiological capabilities that results from this variability. Our study expands on this previous work by quantifying the functional capabilities and activities of taxa present under four distinct spatially separated and size-fractioned habitat conditions in the Yaquina Bay estuary.
Un class="Chemical">sing ampliconpan> sequenpan>cing of the v4 regionpan> of the 16S genpan>e, we determined that microbial community-level taxonpan>omic structure in Yaquina Bay was distinct betweenpan> the upper anpan>d lower estuary margins anpan>d the free-living anpan>d particle-attached lifestyles in each locationpan>. Ordinationpan>s of these patterns in bacterial diverpan> class="Chemical">sity based on 16S amplicon sequencing changed little using OTU-level or Order-level annotations and were each similar to taxonomic structure as determined by metagenome CDS sequence annotation (Supplementary Figure S9). Overall, we found patterns in the distributions of microbial populations that were similar to those in other temperate estuaries (e.g., Bouvier and del Giorgio, 2002; Ortmann and Santos, 2016).
The Relative Abundance of Key Taxonomic Groups Distinguished the Four Sampled Habitats
Two highly abundant lineages, the Flavobacteriales and Rhodobacterales, contained 11 and 19 OTU-level populations that exhibited distinct habitat preferences. Both lineages had multiple OTUs enriched in each of the four habitats, resulting inno clear consensus of Order-level habitat preference. However, their two most abundant populations, a Cellulophaga (Flavobacteriales) and a Thalassobius (Rhodobacterales), were each relatively enriched in the upper estuary, leading to the n class="Chemical">significanpan>t influenpan>ce of Flavobacteriales anpan>d Rhodobacterales in driving upper estuary communities apart from those inhabiting the lower estuary.
Pelagibacterales were also highly abundant in the estuary, but unlike Flavobacteriales and Rhodobacterales, the three Pelagibacterales OTUs exhibited strong habitat preferences, each being highly enriched in the planktonic fraction of the marine-influenced lower estuary margin and being most depleted in the particle-associated lifestyle of the upper estuary. These populations, which were all clasn class="Chemical">sified as SAR11 clade members, were most closely related to coastal (Ortmanpan>n anpan>d Sanpan>tos, 2016), rather thanpan> brackish ecotypes (Smith et al., 2013), suggesting that they may have enpan>tered the estuary by tidal intrupan> class="Chemical">sion. This distribution of Pelagibacterales is consistent with the well-characterized planktonic lifestyle of SAR11 Subclade I (Giovannoni et al., 1990) and with its observed decrease in its relative abundance across another marine-estuary-river continuum (Ortmann and Santos, 2016).
Nitrosopumilales enrichment in the marine-influenced lower estuary (Hewson et al., 2014; Hugerth et al., 2015) and Burkholderiales in the riverine-influenced upper estuary (Bouvier and del Giorgio, 2002; n class="Chemical">Silveira et al., 2011) also corroborate previous findings of microbial populationpan> habitat ranpan>ge. Similarly, the high relative abundance of anaerobic or microaerophilic populations on suspended particulates is a common observation in estuarine environments (Crump et al., 1999; Waidner and Kirchman, 2007; Zhang et al., 2016), and may be the result of oxygen-depleted niches in particle biofilms (Dang et al., 2011), or suspension of colonized grains from the anoxic sediment by physical mixing (Crump and Baross, 1996; Baker et al., 2015). The specialist populations in particle-associated communities contributed to a significantly higher phylogenetic and metabolic diversity in the community associated with these samples, which is consistent with studies of suspended particulate material in other aquatic ecosystems (Smith et al., 2013; Dang and Lovell, 2015; Yung et al., 2016).
Overall, the results from community taxonomic structure analyses substantiate those from previous studies showing that both lateral gradients (Crump et al., 2004; Henriques et al., 2004) and filtration fraction of the n class="Chemical">water-column (Crump et al., 1999; Zhanpan>g et al., 2016) are pan> class="Chemical">significant predictors of microbial community structure in estuaries.
Distinct Taxonomic Structure of Communities in Each Habitat Was Correlated With Distinct Functional Structure
We hypothen class="Chemical">sized that the distinct community taxonpan>omic structure of each habitat would corresponpan>d to distinct functionpan>al structure, leading to differenpan>ces in the capacity of each community to perform ecological processes such as pan> class="Chemical">carbon and nitrogen turnover. If supported, this habitat-specific ecological function may indicate how biogeochemical cycling in the estuarinewater-column is spatially organized and provide a more complete understanding of how resources move through this ecosystem. If no significant functional differences were evident between habitats, it would suggest that communities encode functional redundancy despite their distinct taxonomic structure, leading to a uniform spatial distribution of biogeochemical cycling capacity and a dissociation between microbial population diversity and functional diversity (Allison and Martiny, 2008; Delgado-Baquerizo et al., 2016).
To determine community functional structure, we sequenced n class="Species">metagenomes of all samples anpan>d measured relative abundanpan>ces of protein-coding genpan>e families. Community functionpan>al structure was pan> class="Chemical">significantly correlated with taxonomic structure, leading to similar ordination shapes of OTU and KEGG relative abundance data. Lower estuary communities were enriched in central carbon metabolism functions (i.e., glycolysis and TCA cycle), while upper estuary communities were enriched in cell growth, transcription, and translation genes (Supplementary Figure S6). This result tracked with the distribution of microbial taxa with different trophic strategies: genomes of copiotrophic populations, such as those in the Rhodobacterales and Flavobacteriales lineages that were enriched in the upper estuary, typically encode relatively more transcriptional and ribosomal genes in their larger genomes, whereas oligotrophs, such as Pelagibacterales that were enriched in the lower estuary, often encode streamlined genomes with higher proportions of genes involved in resource transport and metabolism (Cottrell and Kirchman, 2016). Similar observations have been made across physico-chemical gradients in the brackish Baltic Sea, where key pathways and core metabolic processes were organized spatially by salinity gradients (Dupont et al., 2014).
n class="Chemical">Similarly, the relative enpan>richmenpan>t of photosynthetic genpan>es in particle-associated communities was related to the increased relative abundanpan>ce of Cyanpan>obacteria anpan>d chloroplast-like 16S v4 rRNA sequenpan>ces in this pan> class="Chemical">size-fraction. In contrast, ABC transporters were overrepresented in free-living communities, which may be related to the enrichment of planktonic prokaryotes (e.g., SAR11, whose genome has a high density of transporter functions) in this water-column filtration fraction.
Enrichment for n class="Disease">flagellar motility genpan>es in the upper estuary differenpan>tiated the functionpan>al structure of communities in this margin from those in the lower estuary, which again may reflect the biased distributionpan> of copiotrophic anpan>d oligotrophic lineages. The pan> class="Chemical">single flagellar motility gene that was overrepresented in the lower rather than upper estuary was an archaeal flagella subunit (K07332). This was expected considering that all 7 archaeal OTUs with significant habitat preferences were enriched in the marine-influenced lower estuary margin (OTUs 6, 30, 35, 42, 78, 161, 256).
Communities Differed in Capacities for Carbon and Nitrogen Transformation
Based on these n class="Chemical">significanpan>t biases in the distributionpan>s of microbial community functionpan> across habitats, we hypothepan> class="Chemical">sized that microbe-mediated biogeochemical flux would be habitat-specific, leading to the spatial distribution of carbon- (C) and nitrogen- (N) cycling processes in the estuary. To test this hypothesis, we used metagenome and metaproteome data to estimate the relative abundance of marker genes and proteins involved in central C and N utilization pathways within each habitat community. Relative abundance ratios of these 31 functional markers were used to infer their capacity for catalyzing 10 major C- and N-cycling processes.
Out of the 31 n class="Species">metagenome marker genpan>es, we detected 21 marker proteins in our samples, reflecting the difficulty in capturing total protein diverpan> class="Chemical">sity from complex communities and the low recovery of protein fragments such as membrane-spanning domains (Williams and Cavicchioli, 2014). Regardless, all 10 C- and N-cycling processes we examined were represented by at least one marker protein, and the relative abundances of marker proteins generally tracked those of the marker genes that encoded them.
The two most abundant metabolic pathways, aerobic C respiration and N asn class="Chemical">similationpan>, had equal capacity across all habitat pan> class="Species">metagenomes and metaproteomes, suggesting that differences in taxonomic composition did not affect the capacity of the community to perform these functions. On the other hand, the relative abundance of markers for processes requiring specific environmental conditions was not conserved across habitats. For example, denitrification and fermentation genes and proteins were relatively more abundant in the particle-associated communities, especially in the lower estuary margin. The marker genes for these processes were predominately annotated to facultative or strictly anaerobic OTUs, which correspondingly were enriched in the particle-attached communities of both the upper and lower estuary, indicating that these metabolic specialists could be biogeochemically important members of the particle-associated communities (Jørgensen, 1977; Etcheber et al., 2007; Garnier et al., 2010; Zhang et al., 2016).
The relative abundance of encoded marker genes and expressed marker proteins for CO oxidation suggested that this process has the potential to contribute n class="Chemical">significanpan>tly to the C cycle of Yaquina Bay. CO oxidationpan> capacity was highest in the upper estuary communities, anpan>d the majority (68%) of marker genpan>es for this process were anpan>notated to Roseobacter populationpan>s. Manpan>y OTUs in this clade were pan> class="Chemical">significantly enriched in the upper estuary (e.g., OTU 15, Litoreibacter; OTU 4, Thalassobius), and indeed Roseobacters have been proposed to be the major drivers of biotic CO oxidation in coastal ecosystems (Moran et al., 2004; Tolli et al., 2006). Estuarine CO is primarily produced through the photolysis or oxidation of particulate matter (Miller et al., 2002; Song et al., 2015), and the Yaquina Bay estuary and Yaquina river are known to be supersaturated in CO throughout the year (Butler et al., 1987). Thus, we hypothesize that a steady supply of CO from decaying particulate organic matter entering Yaquina Bay by riverine transport is consumed by Roseobacters in the upper estuary. Given the relative magnitude of this biogeochemical process based on marker gene and protein abundance, CO oxidation may be a key characteristic of upper estuary ecology, and changes in Roseobacter abundance or CO production may significantly change the flux of C from Yaquina Bay.
Unexpectedly, we did not detect KO marker genes for nitrification (K10944, K10945, or K10946), despite finding that the Nitrosopumilales lineage represented 0.6% of all 16S amplicons. When conn class="Chemical">sidering sample pan> class="Species">metagenomes, however, the relative abundance of genes annotated to these nitrifiers was an order of magnitude lower than 16S relative abundance (0.06%) and most recovered Nitrosopumilales genes were of housekeeping or unknown function (data not shown). This result indicates that the absence of nitrification markers in our samples may have been due to under-sampling of the genetic diversity in the system, rather true biological absence.
Importantly, our results highlight potential keystone organisms in the microbial food webs of the estuary surface n class="Chemical">water. For example, populationpan>s of Eppan> class="Chemical">silon-proteobacteria (predominately of Sulfurospirillum) represented an ammonification marker gene abundance that was highly disproportional to their relative community abundance. Significant contribution to biogeochemical cycling by such populations in low-abundance lineages has been supported by studies examining rare biosphere activity (Campbell et al., 2011).
Microbial Populations Appear to Use Varying Strategies to Succeed in the Estuarine Environment
To understand what characteristics may be allowing microorganisms to coexist across these four distinct habitats, we examined the pangenome and panproteome of the Flavobacteriales and Rhodobacterales lineages. The co-dominance of these lineages is a common observation in n class="Chemical">estuarine ecosystems (Donpan>g et al., 2014; Colatrianpan>o et al., 2015), anpan>d they have beenpan> shown in experimenpan>tal anpan>d observationpan>al studies to grow anpan>d acquire resources upan> class="Chemical">sing different strategies (Alonso-Sáez and Gasol, 2007; Poretsky et al., 2010; Bryson et al., 2016, 2017). Thus, we hypothesized that their ability to coexist across the estuary may be due to the use of different strategies for growth and reproduction (Hibbing et al., 2010). This hypothesis is based on the phylogenetic limiting theory, which states that when populations minimize their overlap in life-strategy, deleterious interactions can be reduced between them, curtailing forces such as competitive exclusion that drive species with similar life-strategies out of a single habitat (Violle et al., 2011).
Our results supported the divergence in life-strategies of Flavobacteriales and Rhodobacterales, which we inferred based on the n class="Chemical">significanpan>tly differenpan>t types of proteins expressed by each lineage at the time of sampling. Rhodobacterales populationpan>s dedicated a relatively larger part of their proteome to metabolic versatility anpan>d resource acquipan> class="Chemical">sition, while Flavobacteriales were enriched in transcription/translation and specialized starch transport functions. Similar patterns of protein-level expression and de-novo protein production have been recorded previously in these lineages using stable-isotope-probing experiments in Monterey Bay, in which Rhodobacterales accounted for the highest total amount of substrate assimilation (glucose, amino acids, peptides), while Flavobacteriales bloomed under all substrates but were significantly isotope-enriched only in a starch treatment (Bryson et al., 2017).
Interestingly, the types of functions under pon class="Chemical">sitive evolutionpan>ary selectionpan> in each lineage, which we determined upan> class="Chemical">sing high-quality population genomes acquired from metagenome binning, appeared to be those that were related their respective life-strategies. This suggests that evolutionary selection may be a mechanism that maintains lineage life-strategies over time (Violle et al., 2011), allowing for the stable coexistence of Flavobacteriales and Rhodobacterales across the habitats we sampled. Indeed, the relationship between life-strategy and evolutionary selection has been proposed to contribute significantly to present-day observations of microbial function and community assembly (Luo and Moran, 2015). However, because our sampling was designed to focus on spatial rather than temporal investigation, it will be necessary to verify our interpretations over time (e.g., seasonal) or in other estuarine or coastal ecosystems.
Taken together, results from the investigation of Order-level pangenomes and panproteomes showed that the highly abundant and co-occurring Flavobacteriales and Rhodobacterales lineages were differentiated from one another not by broad functional groups that were encoded at the genome level, but by functions they expressed at the protein level. Results from the examination of evolutionary selection on genes from highly resolved, population-level genomes in each lineage showed that pon class="Chemical">sitive selectionpan> onpan> OGs enpan>coded in Flavobacteriales anpan>d Rhodobacterales appeared to be acting onpan> genpan>e sequenpan>ces with the types of functionpan>s that conpan>fer the life-strategy of both groups (i.e., genpan>es in Table were biased toward growth anpan>d replicationpan> functionpan>s for Flavobacteriales anpan>d metabolism-related functionpan>s in Rhodobacterales). This is the first such example of a conpan>nectionpan> betweenpan> life-strategy anpan>d evolutionpan>ary selectionpan> in pan> class="Chemical">estuarine microorganisms, though a link between functions under positive selection and functions that are important in defining the life-strategy of microbial populations has been observed previously, including studies of deep sea bacteria (Campanaro et al., 2008) and extremophilic archaea (Gunbin et al., 2009).
Data Availability
All 16S-amplicon and metagenome sequences are available from NCBI (BioSample SAMN04917373, BioProject PRJNA320136, short read archive SRS1422236 with accession numbers SRX1738750-90 for 16S-amplicon reads and SRX1738728-32 for metagenome reads). The metaproteome protein data are available from the PRIDE archive under ID PXD008093.
Author Contributions
BK, SB, XM, RH, CP, and RM den class="Chemical">signed the experimenpan>ts. BK, SB, XM, anpan>d RM coordinated anpan>d carried out the sampling procedures. BK, ZL, RH, CP, anpan>d RM performed the anpan>alytical procedures anpan>d acquired data. All authors conpan>tributed to discuspan> class="Chemical">sion of results. ZL, RH, CP, and RM contributed reagents, materials, and technology. BK and RM designed and prepared figures and tables and drafted the original manuscript. All authors contributed critical revisions during the editing process, and all authors approved the final manuscript draft for submission.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Table 1
Genes in the Rhodobacterales and Flavobacteriales lineages that are under putative positive selection.
Authors: Davide Bulgarelli; Ruben Garrido-Oter; Philipp C Münch; Aaron Weiman; Johannes Dröge; Yao Pan; Alice C McHardy; Paul Schulze-Lefert Journal: Cell Host Microbe Date: 2015-02-26 Impact factor: 21.023
Authors: Juan Antonio Vizcaíno; Attila Csordas; Noemi del-Toro; José A Dianes; Johannes Griss; Ilias Lavidas; Gerhard Mayer; Yasset Perez-Riverol; Florian Reisinger; Tobias Ternent; Qing-Wei Xu; Rui Wang; Henning Hermjakob Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971
Authors: Kristina D Baker; Colleen T E Kellogg; James W McClelland; Kenneth H Dunton; Byron C Crump Journal: Front Microbiol Date: 2021-02-12 Impact factor: 5.640
Authors: Brandon Kieft; Zhou Li; Samuel Bryson; Robert L Hettich; Chongle Pan; Xavier Mayali; Ryan S Mueller Journal: Proc Natl Acad Sci U S A Date: 2021-10-12 Impact factor: 11.205
Authors: Megan M Morris; Jeffrey A Kimbrel; Haifeng Geng; Mary Bao Tran-Gyamfi; Eizadora T Yu; Kenneth L Sale; Todd W Lane; Xavier Mayali Journal: mSphere Date: 2022-06-22 Impact factor: 5.029
Authors: J E Ruprecht; W C Glamore; K A Dafforn; F Wemheuer; S L Crane; J van Dorst; E L Johnston; S M Mitrovic; I L Turner; B C Ferrari; S C Birrer Journal: Sci Data Date: 2020-05-27 Impact factor: 6.444