Literature DB >> 26784354

Tuning fresh: radiation through rewiring of central metabolism in streamlined bacteria.

Alexander Eiler¹, Rhiannon Mondav¹, Lucas Sinclair¹, Leyden Fernandez-Vidal¹, Douglas G Scofield², Patrick Schwientek³, Manuel Martinez-Garcia⁴, David Torrents^5,6, Katherine D McMahon^7,8, Siv Ge Andersson⁹, Ramunas Stepanauskas⁴, Tanja Woyke³, Stefan Bertilsson¹.

Abstract

Most free-living planktonic cells are streamlined and in spite of their limitations in functional flexibility, their vast populations have radiated into a wide range of aquatic habitats. Here we compared the metabolic potential of subgroups in the Alphaproteobacteria lineage SAR11 adapted to marine and freshwater habitats. Our results suggest that the successful leap from marine to freshwaters in SAR11 was accompanied by a loss of several carbon degradation pathways and a rewiring of the central metabolism. Examples for these are C1 and methylated compounds degradation pathways, the Entner-Doudouroff pathway, the glyoxylate shunt and anapleuretic carbon fixation being absent from the freshwater genomes. Evolutionary reconstructions further suggest that the metabolic modules making up these important freshwater metabolic traits were already present in the gene pool of ancestral marine SAR11 populations. The loss of the glyoxylate shunt had already occurred in the common ancestor of the freshwater subgroup and its closest marine relatives, suggesting that the adaptation to freshwater was a gradual process. Furthermore, our results indicate rapid evolution of TRAP transporters in the freshwater clade involved in the uptake of low molecular weight carboxylic acids. We propose that such gradual tuning of metabolic pathways and transporters toward locally available organic substrates is linked to the formation of subgroups within the SAR11 clade and that this process was critical for the freshwater clade to find and fix an adaptive phenotype.

Entities: CellLine Chemical Disease Species

Mesh：

Substances：
Carbon

Year: 2016 PMID： 26784354 PMCID： PMC5029164 DOI： 10.1038/ismej.2015.260

Source DB: PubMed Journal: ISME J ISSN： 1751-7362 Impact factor: 10.302

Introduction

Recent genome sequencing suggest that bacteria and archaea with streamlined genomes are often numerically dominant in nature, with marine and freshwater bacteria as prominent examples (Hahn ; Garcia ; Swan ; Eiler ; Giovannoni , 2005b). Streamlined genomes are characterized by small size (<1.5 Mb), low guanine cytosine (GC) content (<40%) and short intergenic regions framing a restricted set of genes (Giovannoni ). The few and typically short intergenic regions, combined with a reduction in complexity of regulatory circuits, has posed the question about the capability of such streamlined cells and their populations to respond and adapt to environmental cues. Such genome minimalism has also been linked to unusual nutritional requirements, including auxotrophy for some vitamins and essential amino acids (Giovannoni ; Tripp , 2009; Schwalbach ; Carini , 2014). Other apparent features are scarcity or absence of gene duplications, extrachromosomal elements and remnants of genes and phages (Giovannoni ). The proposed selective drive for this minimalism is efficient use of limiting resources (for example, C, N, P) in nutrient-poor systems (Giovannoni ), which can cause genome reduction when combined with large effective population sizes (Lynch, 2007). These features, particularly the reduced genetic repertoire of individual cells, raises the question of how such organisms can maintain sufficient metabolic flexibility to radiate or possibly cross biome boundaries. A potential explanation is provided by recent genomic analyses of Prochlorococcus where bacterial populations with streamlined genomes were shown to contain considerable sequence variation in shared genes and to harbor a high level of heterogeneity in gene content (Kashtan ). A combination of point mutations and homologous recombination were suggested to be responsible for most of the allelic diversity, whereas homologous and non-homologous recombination were implicated in the loss and gain of genes. The resulting combined diversity of streamlined genomes in these large populations thus could provide the means to cross biome boundaries, such as moving from oceans to freshwaters, with dramatic differences in solutes, salinity and resources, thereby forming a strong ecological and evolutionary barrier. Similar to larger organisms, there is little overlap in the taxonomic composition of marine and freshwater microbial assemblages (Suzuki and DeLong, 2002; Logares ; Newton ). Nevertheless, phylogenetic analyses have uncovered some surprisingly close relatives among marine and freshwater microbiota (Logares ; Newton ). Both marine and freshwater bacterioplankton communities are frequently dominated by alphaproteobacteria of the SAR11 lineage (Morris ; Rusch ; Eiler ; Newton ; Salcher ; Heinrich ). Members of the marine SAR11 clades are characterized as slow growing oligotrophs (Giovannoni ), and they are believed to have a central role in the global cycling of both carbon, sulfur and associated elements (Malmstrom ). Among their adaptations to oligotrophic marine environments are proteorhodopsin-mediated phototrophic potential (Giovannoni ), reliance on externally supplied reduced sulfur (Tripp ), auxotrophy for essential amino acids (Tripp ) and lineage-defined metabolic restrictions for a wide range of labile low molecular weight organic metabolites, including C1 compounds (Schwalbach ; Sun ; Carini ). Freshwater SAR11 are phylogenetically distinct from their marine relatives and group into the uncultivated clade named ‘LD12' (Zwart ). LD12 is recognized as a globally distributed freshwater group (Newton ) that can make up 20–40% of the total bacterioplankton in lakes (Salcher ; Heinrich ), and isotope tracer experiments have verified that they can grow heterotrophically (Salcher ). Here we compared the metabolic potential of 10 sequenced single amplified LD12 genomes obtained from freshwater lakes (Zaremba-Niedzwiedzka ) with 18 marine SAR11 genomes, 4 single amplified genomes and 14 from isolates and originating from different coastal and oceanic habitats (Giovannoni ; Tripp ; Grote ; Viklund ). Previous comparisons of these genomes have revealed lower recombination frequencies in the freshwater LD12 genomes (Zaremba-Niedzwiedzka ) but not detailed the metabolic changes associated with the transition from marine to freshwater environments. Our analyses reveal a reduction of metabolic diversity and lower pathway redundancy in freshwater LD12 compared with marine SAR11, combined with a rewiring of the central metabolism. We propose that the process leading up to the marine–freshwater transition in SAR11 was more likely facilitated by recombination in the large and diverse populations of the marine ancestors than by gene family expansion or horizontal acquisition of metabolic modules from non-SAR11 lineages. Thus our results unveil an unknown chapter in the evolutionary history of these ubiquitous and successful bacterial groups in aquatic environments.

Methods

Genome data sets

The draft assemblies of the single amplified genomes (SAGs) and genome-sequenced SAR11 isolates were obtained from intergrated microbial genome (IMG)/ER (Markowitz ). Protein sequences for the recently described pathway of dimethylsulfonipropionate (DMSP) demethylation were retrieved for Ruegeria pomeroyi and HTCC1062 as identified by Reisch . These were used as queries in BLASTP searches against the 18 marine and 10 freshwater SAR11 genomes.

Genome completeness

Genome size and completeness were estimated using a conserved single-copy gene (CSCG) set of SAR11 isolates (Markowitz ). The set consists of 357 CSCGs that were found to occur only once in the genomes by analysis of an abundance matrix based on hits to the protein family (Pfam) database (Punta ). Hidden Markov models of the identified Pfams were used to search all single amplified genome assemblies by means of the HMMER3 software (Eddy, 2011). Resulting best hits above the trusted cutoff (TC field as provided in the HMM files from Pfam) were counted, and the completeness was estimated as the ratio of found CSCG to total CSCGs in the set. The estimated complete genome size was then calculated by dividing the estimated genome coverage by the total assembly size.

Phylogenetic reconstructions

Orthologous genes were identified by running a BLASTP similarity search using the set of all marine and freshwater genes as the query and the database. Subsequently, we applied the Markov cluster algorithm (Enright ) to the pairwise matrix of bit scores. Out of the 3598, we selected the clusters that had at least one gene pertaining to each of the seven families (‘Ia.1', ‘II', ‘Ia.2', ‘V', ‘IIIa', ‘Ic', ‘IIIb/LD12') and no more than 30 genes in them, leaving us with 518 clusters. Genes were then aligned with muscle (Edgar, 2004), masked with gblocks (Castresana, 2000) and trees were built with RaxML (Stamatakis, 2006) using the ‘PROTGAMMAJTT' model with an automatically determined number of quick bootstraps, thus resulting in one tree per cluster. At this point, we removed one cluster as it contained only gaps. For the master ribosomal protein tree, we extracted 50 ribosomal proteins and subsequently selected only those that had at least one representative in each SAR11 subgroup (N=40). After alignment of randomly selected representatives from each SAR11 subgroup, we concatenated the alignments to form the so-called ‘master ribosomal alignment'. A RAxML tree was constructed using the ‘PROTGAMMAJTT' model with an automatically determined number of quick bootstraps. The code produced for this analysis is available under an MIT license at http://github.com/limno/ld12.

Closest hits and gene family expansions

The first hit for every freshwater gene is obviously itself (though some genes have no other hits). Searching for the first hit that is not from any of the freshwater genomes, we extracted its taxonomy. If this top-hit-non-fresh is a marine SAR11 genome, we resolve it to one of the six marine clades; if outside, we resolve it to broad taxonomic groups. Duplication events (gene family expansions) were defined by two hits associated with the same freshwater genome and ranked prior to the first non-freshwater SAR11 hit.

Results

General features and SAR11 phylogeny

The freshwater and marine SAR11 genomes were extracted from public databases (Table 1). The genomes are in varying stages of completion, with some corresponding to a single circular chromosome whereas others are incomplete and consist of 50–150 contigs. The completeness of the genomes amplified from single cells was estimated to 24–86% using a conserved single-copy gene set of SAR11 genomes (Markowitz ), with estimated genome size of the freshwater genomes ranging from 1.1 to 1.4 Mb compared with marine genomes ranging from 1.1 to 1.6 Mb. In addition to these small differences, the sampling sizes for subgroups II and IIIa are too small to determine with certainty whether differences in genome size are associated with subgroups and habitats. Unifying features are low GC (29–30%), a shared amino-acid usage and high percentage of predicted coding nucleotides (Table 1).

Table 1

Description of the genome data

Name	IMG accession no.	Group	Biome	Assembled nucleotides	Com (fraction)	EGS	Gene count	Estimated gene count	Scaffold count	GC (fraction)	Coding bases %	RNA count
HTCC1002	638341056	1a.1	Marine	1327604	0.99	1346462	1428	1448	5	0.3	95.9	35
HTCC1013	2511231042	1a.1	Marine	1302704	1.00	1302704	1411	1411	22	0.3	96.6	36
HTCC1062	637000058	1a.1	Marine	1308759	1.00	1308759	1394	1394	1	0.3	96.1	40
HTCC9565	2503283022	1a.1	Marine	1279674	0.99	1294174	1386	1402	3	0.29	96.2	34
HIMB083	2510461079	1a.2	Marine	1395997	1.00	1395997	1505	1505	1	0.29	96.8	37
HIMB140	2503754000	1a.2	Marine	1437930	1.00	1437930	1535	1535	1	0.29	97.0	31
HIMB5	2503982040	1a.2	Marine	1343202	1.00	1343202	1467	1467	1	0.29	96.8	36
HTCC7211	2503283017	1a.2	Marine	1456888	1.00	1456888	1481	1481	1	0.29	96.7	34
HTCC8051	2511231043	1a.2	Marine	1395115	0.99	1410923	1498	1515	2	0.29	96.8	36
HTCC9022	2511231041	1a.2	Marine	1362422	0.99	1381774	1464	1485	24	0.3	96.5	37
SCGCAAA288-G21	2236347014	1c	Marine	909786	0.64	1421541	1103	1723	139	0.3	94.4	29
SCGCAAA288-N07	2236347020	1c	Marine	954664	0.66	1446461	1110	1682	81	0.29	94.8	27
SCGCAAA240-E13	2236661011	1c	Marine	1401625	0.86	1629797	1621	1885	151	0.29	93.6	40
SCGCAAA288-E13	2236661012	1c	Marine	812863	0.55	1477933	948	1724	106	0.29	93.6	25
HIMB058	2265129005	2	Marine	1115049	0.99	1130887	1269	1287	59	0.3	96.9	39
HIMB114	2503283019	3a	Marine	1237371	1.00	1237371	1357	1357	1	0.3	97.0	36
IMCC9063	650716017	3a	Marine	1284727	1.00	1284727	1482	1482	1	0.32	94.9	35
SCGCAAA023-L09	2236661000	3b/LD12	Freshw	774923	0.68	1147914	921	1364	76	0.29	96.0	29
SCGCAAA024-N17	2236876027	3b/LD12	Freshw	328144	0.24	1394612	397	1687	45	0.3	95.5	14
SCGCAAA027-C06	2264265094	3b/LD12	Freshw	775384	0.72	1072915	936	1295	90	0.3	95.9	20
SCGCAAA027-J10	2236876030	3b/LD12	Freshw	792980	0.73	1080510	952	1297	82	0.3	96.0	31
SCGCAAA027-L15	2236876031	3b/LD12	Freshw	719587	0.63	1141744	840	1333	56	0.29	96.5	19
SCGCAAA028-C07	2236661008	3b/LD12	Freshw	846566	0.76	1111117	974	1278	32	0.3	96.3	19
SCGCAAA028-D10	2236347069	3b/LD12	Freshw	925141	0.78	1192329	1091	1406	57	0.3	95.6	34
SCGCAAA280-B11	2236876032	3b/LD12	Freshw	674250	0.54	1260247	815	1523	47	0.3	95.8	18
SCGCAAA280-P20	2236876029	3b/LD12	Freshw	720523	0.68	1062920	838	1236	65	0.3	95.6	24
SCGCAAA487-M09	2236347068	3b/LD12	Freshw	627365	0.61	1027382	800	1310	97	0.29	96.1	24
HIMB59	2503982041	5	Marine	1410127	1.00	1410127	1532	1532	1	0.32	96.8	39

Abbreviations: Com, completeness; EGS, estimated genome size; Freshw, freshwater. Genomes including 'SCGCAAA' in the names are single amplified genomes while others are derived from isolates.

Phylogenetic analysis of the selected genomes using concatenated alignment of 40 ribosomal proteins revealed 7 clades (Figure 1), consistent with previous analyses (Grote ; Viklund ; Zaremba-Niedzwiedzka ; Thrash ). In this study, we have not included any other alphaproteobacterial species, which prevents a discussion about the position of subgroup V, represented by genome HIMB59, in the alphaproteobacterial tree. The disagreement about its phylogenetic placement was discussed in quite some detail in a previous publication (Viklund ), but this does not have any implications for our conclusions. Suffice it to conclude that HIMB59 is used as an outgroup to the other SAR11 genomes examined here. By using HIMB59 as an outgroup, we inferred phylogenetic trees from 518 orthologous protein clusters that had at least one representative in each of the 7 subclades. Out of these, only a minor proportion (N=88) of the protein trees are strictly coherent with regards to collapsing the 7 main clades (that is, all genes from a given branch are monophyletic). For a third of these (N=33), the branching patterns agree with the ribosomal protein tree, and the branching pattern between the freshwater LD12 (SAR11 clade IIIb) and clade IIIa is observed in 388 out of the 518 trees, albeit with different statistical significance (representing 75% of all trees as indicated in Figure 1).

Figure 1

Evolutionary relationships between marine and freshwater SAR11 genomes. Unrooted maximum likelihood phylogenetic tree of concatenated ribosomal protein sequences from single-cell genomes and isolate genomes of the SAR11 clade. In addition to bootstrap values as inferred by maximum likelihood, the second (large and bold) number at the branching points show proportion of protein trees with corresponding branching pattern. These proportions were generated by using the Robinson–Foulds metric summarizing 518 orthologous protein trees that had at least one representative in each of the seven subclades (for details, see Supplementary Table S1). Freshwater SAR11 (subgroup IIIb/LD12) are indicated in red, whereas marine SAR11 subgroups in other colors.

Comparing this with the number of cases where the bootstrap support was >80% at this particular branch, freshwater LD12 branched closest with IIIb in 109 trees, whereas in 9 trees LD12 was closest with another SAR11 subgroup. These nine trees were inferred from individual protein alignments involved in amino-acid biosynthesis and conversion (Figures 2a–c), chemoorganotrophic energy acquisition (Figures 2d–f), carboxylic acid membrane transport (Figures 2g and h) and DNA methylation (Figure 2i). This proportion of non-matching tree topology, either 9% or 25% depending on bootstrap cutoffs (Figure 1; see a report detailing the mismatches and the Robinson–Foulds metric in Supplementary Table S1), implies alternative evolutionary histories of various gene families in SAR11.

Figure 2

Nine protein trees with evolutionary inconsistency when compared with the ribosomal tree. (a–i) Unrooted phylogenetic trees of orthologous protein clusters with alternative tree topologies (that is, sister clade to IIIb/LD12 is not IIIa) when compared with the ribosomal tree (Figure 1).

Two trees are largely consistent with the ribosomal protein tree with the exception of the branching orders of subgroups II and III (Figures 2b and d). Another three trees have more serious inconsistencies, such that subgroups Ia and Ic are not sister clades and that the two members of subgroup IIIa do not cluster (Figures 2a, c, e, g and i). The most dramatic inconsistency is observed for the tree inferred from the tripartite ATP-independent periplasmic transporters (TRAP transporters) (Figures 2h and i). In these trees, all members of subgroup LD12 and two genomes of subgroup Ia are separated from all other genomes by long branches. This could be indicative of paralogous gene families or increased evolutionary rates for this gene in LD12. This is potentially interesting as TRAP transporters may be involved in the metabolite flux into the tricarboxylic acid (TCA) cycle through the import of malate (Adnan ). Apart from the incongruence of protein trees, patterns in gene synteny (Supplementary Figure S1) and functional annotations (Supplementary Figure S2) suggest that the analyzed SAR11s group into six (plus outgroup V) distinct clades, basically confirming shared gene phylogenies (Grote ; Viklund ; Zaremba-Niedzwiedzka ; Thrash ). The distinction in gene sets between marine and freshwater genomes was shown by permutational multivariate analysis of variance (P<0.001) for four annotation subsystems (pfam, tigrfam, COG, KO) and also by dendrograms based on bit scores retrieved from pairwise blastn searches (Supplementary Figure S3). In addition, each of these subgroups is preferentially retrieved in a distinct habitat, indicating that they correspond to ecological coherent units (Rusch ; Andersson ; Carlsson ; Newton ; Heinrich ; Thrash ).

Energy and carbon metabolism

We identified genes encoding a complete TCA cycle (Figure 3) and a proteorhodopsin when combining the 10 partial freshwater SAR11 genomes. This led us to hypothesize that, analogously to their marine siblings, freshwater SAR11 are heterotrophs capable of supplementing heterotrophic growth on organic carbon with light-mediated ATP production (Martinez-Garcia ) when carbon starvation limits respiration (Steindler ). Still, there are conspicuous and consistent variations in gene content related to aerobic chemoorganoheterotrophy as already highlighted by previous metabolic reconstruction restricted to marine SAR11 (Tripp , 2009; Schwalbach ; Sun ; Grote ; Carini , 2014). Here we expand on these previous observations by showing that freshwater genomes encode all genes of the Embden–Meyerhof–Parnas (EMP) pathway, whereas most marine SAR11 representatives encode all genes of the Entner–Doudoroff (ED) pathway (Figure 3).

Figure 3

Central carbon metabolism and other relevant metabolic pathways identified in SAR11 genomes. Glycogenesis, TCA cycle and glyoxylate shunt are shown in the center of the plot with adjunction pathways to either side such as the pentose phosphate and ED pathway. The color of the arrows indicates genes encoded in at least one genome of each SAR11 subgroup (see legend for details). The presence and absence of genes was determined within the IMG system based on automated and manual annotations. Detailed results are in Supplementary Tables S6–S9.

Another striking difference is the absence of the glyoxylate shunt (isocitrate lyase (aceA), malate synthase (malG)) and anapleurotic enzymes (phosphoenol pyruvate carboxylase (ppc); see Figure 3) in subgroups IIIa and IIIb, pointing to major differences in carbon and energy-acquisition strategies for these SAR11 clades. The glyoxylate shunt bypasses two steps in the TCA cycle where carbon dioxide is released concomitant with a production of ATP and NADH but instead enable economic use of carbon atoms at the expense of energy acquisition. It essentially allows microorganisms to use simple carbon compounds as a carbon source when complex nutrients such as glucose are not available. In the absence of available carbohydrates, the glyoxylate cycle would, for example, enable synthesis of carbohydrates needed for cell-wall assembly from lipids via acetate supplemented with inorganic carbon fixed via anapleurotic reactions (Moran and Miller, 2007). There is evidence that marine SAR11 can oxidize several small organic carbon compounds that are potentially very abundant in marine systems, in order to satisfy cellular energy demands (Tripp ; Schwalbach ; Sun ). Several genes and pathways involved in the oxidation of C1 and methylated compounds to CO2, previously identified in marine SAR11 genomes (Sun ), appear to be missing in the freshwater single amplified genomes (see Supplementary Material for details). At the same time, the ability to use other common low molecular weight carbon substrates are maintained throughout the entire SAR11 lineage, one example being glycolate oxidase (glcDEF) which enable metabolic use of the abundant exudate glycolate, which is released in large amounts by nutrient-limited phytoplankton as an energy overflow mechanism (Bertilsson and Jones, 2003). While inspecting the completeness of vitamin biosynthesis pathways among the freshwater genomes, we found no major differences when comparing our results with marine genomes (for details on the presence of vitamin biosynthesis homologs, see Supplementary Material). Confirming previous findings (Tripp et al., 2008), we could not identify any sulfite reductase in marine SAR11 genomes nor could this gene be detected in any of the freshwater representatives, implying that freshwater SAR11 also rely on external supply of reduced sulfur similar to their marine siblings. This auxotrophy for reduced sulfur seems to be a common feature in abundant freshwater bacteria as revealed by genome reconstructions (Garcia ). To access reduced sulfur, marine SAR11 are capable of demethylating the abundant marine molecule DMSP. However, homologs of the DMSP pathway genes could not be identified in the 10 freshwater (LD12) or clade IIIa genomes. In addition, we could not identify any homologs to adenylylsulfate reductase, sulfate permease and sulfate exporter in the freshwater single amplified genomes, making sulfur-containing amino acids the most likely reduced sulfur source. Osmolytes, such as DMSP, glycine betaine, proline, mannitol, taurine and spermidine, are produced in high amounts by phytoplankton and other marine microplankton, which use these compounds to maintain their intracellular osmotic balance. These organic compounds can build up intracellular concentrations as high as 100–530 mM in some phytoplankton species (Keller ; Sunda ). Analyses of marine SAR11 genomes (Giovannoni ) have revealed numerous ABC transporters for organic osmolytes that are common in the marine environment (see Supplementary Material for detail on transporters). An ability to use glycine betaine and sulfur-containing osmolytes may also release organisms from an inability to synthesize certain amino acids and reduced (organic) sulfur compounds de novo (Giovannoni ). Although glycine and serine biosynthesis is enabled by 3-phosphoglycerate dehydrogenase (serA) in freshwater SAR11 (for more detail on amino-acid biosynthesis, see Supplementary Material), organosulfur acquisition seems to be restricted to amino acids, such as methionine and cysteine. This is not surprising considering that osmolyte diversity is restricted to non-sulfur compounds such as sucrose and trehalose in freshwater phytoplankton (Batterton and van Baalen, 1971; Joset ; Page-Sharp ). Still, no known hydrolases to use sucrose and trehalose could be identified in any of the inspected freshwater SAR11 genomes. As indicated by the presence of TRAP system, a feature shared with subgroup IIIa, freshwater SAR11 most likely use low-molecular-weight carboxylic acids. Interestingly, two TRAP system-related orthologous gene clusters showed phylogenetic patterns highly divergent from the ribosomal protein trees (Figures 2g and h), suggesting that this transporter system has been widely exchanged within the SAR11 clade.

The marine and freshwater SAR11 pangenome

In addition to phylogenies (see Figures 1 and 2), we explored gene content variations among the SAR11 genomes. Orthologous protein clusters were formed by Markov cluster algorithm (Enright ) from pairwise BLASTP searches using 23 545 predicted SAR11 proteins. Of the 3589 total orthologous clusters identified in the 28 SAR11 genomes, 1467 were present in at least one freshwater SAG, and 368 of these were exclusive to freshwater genomes. Most of the 8332 predicted genes from the freshwater SAGs were most closely related to marine subgroup IIIa (Figure 4a). Out of the 368 orthologous clusters (including 720 genes) exclusive to freshwater SAR11, 59 clusters (including 86 genes) had no hit against bacterial and archaeal subject sequences in refseq. Altogether, 94 of the freshwater-exclusive genes (26%) were most closely matching alphaproteobacteria, whereas only 7 (2%) were most closely related to genes from marine SAR11 (Figure 4b). Although around a sixth of the predicted genes from the freshwater SAGs were most closely related to non-marine SAR11, it cannot be concluded that these genes were acquired horizontally from outside the SAR11 clade. The reason is that the origin and flux of these genes are uncertain when considering that protein diversity was highly undersampled in both the marine and freshwater branch of SAR11, highlighted by the rarefaction analysis (Figure 4c). The large diversity in protein families is further imposed by 6273 protein families in the marine and 2320 in the freshwater SAR11 pangenome as estimated by non-parametric diversity approximations (Chao, 1987).

Figure 4

Closest blast hits against refseq and rarefaction curves of orthologous SAR11 protein clusters. (a) Bar charts revealing top hits from BLASTP against all available genomes in refseq. Hits are classified based on their taxonomic affiliation with higher taxonomic levels removed from lower taxonomic levels (that is, SAR11 hits are removed from the alphaproteobacterial bin, with alphaproteobacterial hits removed from the proteobacterial bin and so on). Black bars indicate hits of genes specific to freshwaters as inferred by orthologous clustering. (b) Detailed representation of freshwater-specific orthologous protein clusters and their closest hits classified on their taxonomic affiliation with higher taxonomic levels removed from lower taxonomic levels. (c) Results from a rarefaction analysis of orthologous protein clusters after normalization to account for the partial single-cell genomes.

BLASTP searches were also used to infer gene family expansions unique to the freshwater SAR11 population by applying the following criteria: two hits associated with the same genome were ranked prior to the first non-freshwater SAR11 protein. This identified at most four orthologous clusters per freshwater genome, thus likely originating from recent duplication events as they were also adjacent to each other in the genome. These orthologous clusters were exclusively annotated as hypotheticals or cell communication systems (for details, see Supplementary Table S5) and are likely located in hypervariable regions. Such hypervariable regions are apparent in a composite freshwater SAR11 genome and seem to host genes for the biosynthesis of lipopolysaccharides (Zaremba-Niedzwiedzka ). The comparative features of these 28 genomes point to the existence of large gene frequency variations in SAR11 populations.

Discussion

As shown previously (Zaremba-Niedzwiedzka ), phylogenetic reconstructions strongly suggest that the freshwater SAR11 clade originates from a population of marine ancestors with streamlined genomes considering the branching pattern leading to LD12. However, we cannot preclude with certainty that IIIa resembles a marine ancestor of LD12 or that IIIa originated from a freshwater ancestor. Nevertheless, we favor the former scenario as a freshwater origin would require two independent transitions, whereas one is sufficient for a marine SAR11 ancestor. In addition, the larger estimated marine gene set when compared with the freshwater equivalents would have required an acquisition of genes independent from the freshwater SAR11. The likelihood for such a vast gene acquisitions is challenged by large population sizes that provide low probabilities of new genes arising and being fixed in streamlined genomes. Likewise, a common streamlined ancestor is implied by freshwater and marine SAR11 clades sharing many typical features of streamlined cells, such as the small cell size (Giovannoni ; Salcher ), auxotrophy, small genome size and low GC content. With no known exceptions, this argues against non-streamlined ancestoral populations at the main branching points. Besides the shared reductionism, both marine and freshwater SAR11 are predicted to host a typical electron transport chain, a complete TCA cycle and a proteorhodopsin system for phototrophic energy acquisition (Giovannoni ; Martinez-Garcia ). Still, there are conspicuous variations in gene content related to aerobic chemoorganoheterotrophy between marine and freshwater SAR11, emphasizing functional adaptations beyond osmoregulation as a necessity to overcome the marine–freshwater boundaries. This is in agreement with recent findings from shotgun metagenomic analyses, where quantitative differences related to respiratory and glycolytic pathways were observed in the gene pools of marine and freshwater communities (Oh ; Dupont ; Eiler ). In particular, the changes in glycolysis combined with the apparent loss of the glyxoylate shunt and the capacity to oxidize a broad array of C1 compounds point to differentiation in heterotrophic substrate use between marine and freshwater SAR11, with the marine representatives more extensively relying on phytoplankton-derived osmolytes, such as proline, betaine glycine and DMSP. Instead, freshwater SAR11 feature a complete EMP, enabling efficient energy acquisition from C6 and C5 sugars combined with the TRAP system to use low-molecular-weight carboxylic acids known to be produced in abundance by photochemical degradation of humic substances in freshwaters (Bertilsson and Tranvik, 2000). Such tuning toward locally available organic substrates is likely linked to observed differences in abundance patterns in relation to environmental conditions in the respective biomes (Eiler ; Salcher ) and the formation of ecotypes within the SAR11 clade (Carlsson ; Vergin ). The reliance on either EMP or ED can be predicted to change the energy status and the redox balance of the cell. The ED pathway (Entner and Doudoroff, 1952; Conway, 1992) has a net yield of 1 ATP, 1 NADH and 1 NADPH for each glucose molecule, whereas the EMP has a net yield of 2 ATP and 2 NADH for each processed glucose molecule. Besides the distinct yields of ATP and NAD(P)H, the EMP and ED pathways result in a range of different metabolic intermediates for each molecule of sugar consumed (Neidhardt ). It is remarkable that, while some bacterial genomes possess complete and coexisting EMP and ED pathways (suggested for HIMB59), others have one or the other, and there are also SAR11 genomes that only seem to host incomplete pathways (Grote ). A flux of genes involved in central metabolism is further implied by the incongruence in tree topology of the glyceraldehyde 3-phosphate dehydrogenase (Figure 2d), the enzyme that catalyzes the sixth step of glycolysis, a phosphorylation reaction coupled to oxidation. In addition to the metabolic distinction between marine and freshwater SAR11, our analyses revealed that the gene pool of marine and freshwater SAR11 are highly diverse and undersampled. The notion that marine SAR11 populations appear to have rather diverse genomes was previously observed in marine metagenomes (Wilhelm ) and was also shown for Prochlorococcus as another abundant and widespread marine bacterium with small genomes (Kashtan ). Overall, we show that SAR11 subgroups share a core set of genes comprising approximately half of the orthologous genes in each genome, independent of whether orthologous genes are defined based on annotations or clustering using sequence similarity. As expected, the core genes encode functions defining the SAR11 clade as an organo-heterotrophic group with varying potential for photo-autotrophy, whereas the flexible genome provides functions that confer selective advantages under very different conditions. This modular structure of genes that confer the ability to take up and metabolize specific organic molecules imply metabolic flexibility and an adaptive tuning to resources. This functional tuning within SAR11 is facilitated by high rates of recombination (Vergin ) among even distantly related marine groups fostering a global SAR11 population where multiple ecotypes with high genetic linkages coexist. A dynamic gene pool in SAR11 is implied by gene content variations, alternative tree topologies of the orthologous clusters shared among all SAR11 subgroups and, as shown previously, by high ratios of homologous recombination to mutation rate in marine SAR11 (Vergin ). Even if extremely high recombination has been suggested for marine SAR11 (Vergin ; Zaremba-Niedzwiedzka ), we argue that the exchange of genetic material seems to be mainly restricted to members within the SAR11 clade, as suggested by a common SAR11 ancestor for the orthologous genes that were present in all marine and freshwater SAR11 genomes. As the marine gene pool is undersampled, we argue that one cannot make any inferences on horizontally acquired genes from outside the SAR11 clade to facilitate the marine–freshwater transition. A likely scenario is that, as SAR11 gradually became adapted to life at low salinities, recombination events with divergent SAR11 in the former marine environment became less likely and eventually mutations overwhelmed recombination, as indicated previously (Zaremba-Niedzwiedzka ). Resulting gene content differences that affect biosynthesis and transport functions are not restricted to specific regions but are instead embedded in different genomic contexts, as implied by the large-scale synteny variations in the SAR11 genomes. Besides these ecotype-defining and broad-scale genomic variations, hypervariable regions containing genes for the biosynthesis of lipopolysaccharides were previously identified in both marine and freshwater SAR11 clades (Grote ; Zaremba-Niedzwiedzka ). This fits the previously proposed concept of low frequency genes being implicated in evolutionary responses to local and variable biotic interactions, such as competition and phage predation (Cordero and Polz, 2014). Adaptive selection that can lead to niche partitioning and the formation of novel ecotypes in bacterial populations is assumed to depend on a wide array of traits, including large population size (Levin and Bergstrom, 2000), a dynamic gene pool facilitated by genetic exchange (Cohan, 2001; Popa ) and gene family expansion by duplications (Alm ; Serres ), the small size of recombined fragments (Zawadzki and Cohan, 1995) and simplicity and modularity of metabolic features (Doyle ; Lawrence, 1999). In the long run, these mechanisms can allow lineages to take major leaps into novel environments or even new biomes, even when populations are made up of individuals with streamlined genomes. For streamlined bacteria in particular, we argue that reshuffling of metabolic modules related to energy and carbon metabolism seem to facilitate such adaptive radiation, as shown in the case of metabolic tuning during the marine–freshwater transition. Following such expansion, we imply from our and previous genome reconstructions (Zaremba-Niedzwiedzka ) that gene exchange barriers arise, which fix only a small fraction of the original pangenome in the novel biome. As such, freshwater SAR11 seem to be less prone to adaptive radiation, at least if we assume that the genetic structure of global populations controls selection efficacy and ability to adapt in free-living streamlined bacteria.

60 in total

1. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors: J Castresana
Journal: Mol Biol Evol Date: 2000-04 Impact factor: 16.240

2. An H-NS-like stealth protein aids horizontal DNA transmission in bacteria.

Authors: Marie Doyle; Maria Fookes; Al Ivens; Michael W Mangan; John Wain; Charles J Dorman
Journal: Science Date: 2007-01-12 Impact factor: 47.728

3. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors: Alexandros Stamatakis
Journal: Bioinformatics Date: 2006-08-23 Impact factor: 6.937

Review 4. Resourceful heterotrophs make the most of light in the coastal ocean.

Authors: Mary Ann Moran; William L Miller
Journal: Nat Rev Microbiol Date: 2007-10 Impact factor: 60.633

5. Auxotrophy and intrapopulation complementary in the 'interactome' of a cultivated freshwater model community.

Authors: Sarahi L Garcia; Moritz Buck; Katherine D McMahon; Hans-Peter Grossart; Alexander Eiler; Falk Warnecke
Journal: Mol Ecol Date: 2015-08-08 Impact factor: 6.185

6. Estimating the population size for capture-recapture data with unequal catchability.

Authors: A Chao
Journal: Biometrics Date: 1987-12 Impact factor: 2.571

7. SAR11 marine bacteria require exogenous reduced sulphur for growth.

Authors: H James Tripp; Joshua B Kitner; Michael S Schwalbach; John W H Dacey; Larry J Wilhelm; Stephen J Giovannoni
Journal: Nature Date: 2008-03-12 Impact factor: 49.962

8. Streamlining and core genome conservation among highly divergent members of the SAR11 clade.

Authors: Jana Grote; J Cameron Thrash; Megan J Huggett; Zachary C Landry; Paul Carini; Stephen J Giovannoni; Michael S Rappé
Journal: MBio Date: 2012-09-18 Impact factor: 7.867

9. One carbon metabolism in SAR11 pelagic marine bacteria.

Authors: Jing Sun; Laura Steindler; J Cameron Thrash; Kimberly H Halsey; Daniel P Smith; Amy E Carter; Zachary C Landry; Stephen J Giovannoni
Journal: PLoS One Date: 2011-08-23 Impact factor: 3.240

10. The Pfam protein families database.

Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971

18 in total

1. Metabolic evolution and the self-organization of ecosystems.

Authors: Rogier Braakman; Michael J Follows; Sallie W Chisholm
Journal: Proc Natl Acad Sci U S A Date: 2017-03-27 Impact factor: 11.205

2. Marine Proteobacteria metabolize glycolate via the β-hydroxyaspartate cycle.

Authors: Lennart Schada von Borzyskowski; Francesca Severi; Karen Krüger; Lucas Hermann; Alexandre Gilardet; Felix Sippel; Bianca Pommerenke; Peter Claus; Niña Socorro Cortina; Timo Glatter; Stefan Zauner; Jan Zarzycki; Bernhard M Fuchs; Erhard Bremer; Uwe G Maier; Rudolf I Amann; Tobias J Erb
Journal: Nature Date: 2019-11-13 Impact factor: 49.962

3. Contrasting patterns of genome-level diversity across distinct co-occurring bacterial populations.

Authors: Sarahi L Garcia; Sarah L R Stevens; Benjamin Crary; Manuel Martinez-Garcia; Ramunas Stepanauskas; Tanja Woyke; Susannah G Tringe; Siv G E Andersson; Stefan Bertilsson; Rex R Malmstrom; Katherine D McMahon
Journal: ISME J Date: 2017-12-08 Impact factor: 10.302

4. Model Communities Hint at Promiscuous Metabolic Linkages between Ubiquitous Free-Living Freshwater Bacteria.

Authors: Sarahi L Garcia; Moritz Buck; Joshua J Hamilton; Christian Wurzbacher; Hans-Peter Grossart; Katherine D McMahon; Alexander Eiler
Journal: mSphere Date: 2018-05-30 Impact factor: 4.389

5. Cultivation and genomics of the first freshwater SAR11 (LD12) isolate.

Authors: Michael W Henson; V Celeste Lanclos; Brant C Faircloth; J Cameron Thrash
Journal: ISME J Date: 2018-03-29 Impact factor: 10.302

6. Reevaluating the Salty Divide: Phylogenetic Specificity of Transitions between Marine and Freshwater Systems.

Authors: Sara F Paver; Daniel Muratore; Ryan J Newton; Maureen L Coleman
Journal: mSystems Date: 2018-11-13 Impact factor: 6.496

7. Genomes from uncultivated prokaryotes: a comparison of metagenome-assembled and single-amplified genomes.

Authors: Johannes Alneberg; Christofer M G Karlsson; Anna-Maria Divne; Claudia Bergin; Felix Homa; Markus V Lindh; Luisa W Hugerth; Thijs J G Ettema; Stefan Bertilsson; Anders F Andersson; Jarone Pinhassi
Journal: Microbiome Date: 2018-09-28 Impact factor: 14.650