Literature DB >> 26863193

Plankton networks driving carbon export in the oligotrophic ocean.

Lionel Guidi^1,2, Samuel Chaffron^3,4,5, Lucie Bittner^6,7,8, Damien Eveillard⁹, Abdelhalim Larhlimi⁹, Simon Roux¹⁰, Youssef Darzi^3,4, Stephane Audic⁸, Léo Berline¹, Jennifer Brum¹⁰, Luis Pedro Coelho¹¹, Julio Cesar Ignacio Espinoza¹⁰, Shruti Malviya⁷, Shinichi Sunagawa¹¹, Céline Dimier⁸, Stefanie Kandels-Lewis^11,12, Marc Picheral¹, Julie Poulain¹³, Sarah Searson^1,2, Lars Stemmann¹, Fabrice Not⁸, Pascal Hingamp¹⁴, Sabrina Speich¹⁵, Mick Follows¹⁶, Lee Karp-Boss¹⁷, Emmanuel Boss¹⁷, Hiroyuki Ogata¹⁸, Stephane Pesant^19,20, Jean Weissenbach^13,21,22, Patrick Wincker^13,21,22, Silvia G Acinas²³, Peer Bork^11,24, Colomban de Vargas⁸, Daniele Iudicone²⁵, Matthew B Sullivan¹⁰, Jeroen Raes^3,4,5, Eric Karsenti^7,12, Chris Bowler⁷, Gabriel Gorsky¹.

Abstract

The biological carbon pump is the process by which CO2 is transformed to organic carbon via photosynthesis, exported through sinking particles, and finally sequestered in the deep ocean. While the intensity of the pump correlates with plankton community composition, the underlying ecosystem structure driving the process remains largely uncharacterized. Here we use environmental and metagenomic data gathered during the Tara Oceans expedition to improve our understanding of carbon export in the oligotrophic ocean. We show that specific plankton communities, from the surface and deep chlorophyll maximum, correlate with carbon export at 150 m and highlight unexpected taxa such as Radiolaria and alveolate parasites, as well as Synechococcus and their phages, as lineages most strongly associated with carbon export in the subtropical, nutrient-depleted, oligotrophic ocean. Additionally, we show that the relative abundance of a few bacterial and viral genes can predict a significant fraction of the variability in carbon export in these regions.

Entities: Chemical

Mesh：

Substances：
Chlorophyll
Carbon

Year: 2016 PMID： 26863193 PMCID： PMC4851848 DOI： 10.1038/nature16942

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Marine planktonic photosynthetic organisms are responsible for approximately fifty percent of Earth’s primary production and fuel the global ocean biological carbon pump[1]. The intensity of the pump is correlated to plankton community composition[2,3], and controlled by the relative rates of primary production and carbon remineralisation[4]. About 10% of this newly produced organic carbon in the surface ocean is exported through gravitational sinking of particles. Finally, after multiple transformations, a fraction of the exported material reaches the deep ocean where it is sequestered over thousand-year timescales[5]. Like most biological systems, marine ecosystems in the sunlit upper layer of the ocean (denoted the euphotic zone) are complex[6,7], characterised by a wide range of biotic and abiotic interactions[8-10] and in constant balance between carbon production, transfer to higher trophic levels, remineralisation, and export to the deep layers[11]. The marine ecosystem structure and its taxonomic and functional composition likely evolved to comply with this loss of energy by modifying organism turnover times and by the establishment of complex feedbacks between them[6] and the substrates they can exploit for metabolism[12]. Decades of groundbreaking research have focused on identifying independently the key players involved in the biological carbon pump. Among autotrophs, diatoms are commonly attributed to being important in carbon flux because of their large size and fast sinking rates[13-15] while small autotrophic picoplankton may contribute directly through subduction of surface water[16] or indirectly by aggregating with larger settling particles or consumption by organisms at higher trophic levels[17]. Among heterotrophs, zooplankton such as crustaceans impact carbon flux via production of fast-sinking fecal pellets while migrating hundreds of meters in the water-column[18,19]. These observations, focusing on just a few components of the marine ecosystem, highlight that carbon export results from multiple biotic interactions and that a better understanding of the mechanisms involved in its regulation will require an analysis of the entire planktonic ecosystem. Advanced sequencing technologies offer the opportunity to simultaneously survey whole planktonic communities and associated molecular functions in unprecedented detail. Such a holistic approach may allow the identification of community- or gene-based biomarkers that could be used to monitor and predict ecosystem functions, e.g., related to the biogeochemistry of the ocean[20-22]. Here, we leverage global-scale ocean genomics datasets from the euphotic zone[10,23-25] and associated environmental data to assess the coupling between ecosystem structure, functional repertoire, and carbon export at 150 m.

Carbon export and plankton community composition

The Tara Oceans global circumnavigation crossed diverse ocean ecosystems and sampled plankton at an unprecedented scale[20,26] (see Methods). Hydrographic data were measured in situ or in seawater samples at all stations, as well as nutrients, oxygen and photosynthetic pigments (see Methods). Net Primary Production (NPP) was derived from satellite measurements (see Methods). In addition, particle size distributions (100 μm to a few mm) and concentrations were measured using an Underwater Vision Profiler (UVP) from which carbon export, corresponding to the carbon flux (Fig. 1a) at 150 m, was calculated to range from 0.014 to 18.3 mg.m−2.d−1 using methods previously described (see Methods). One should keep in mind that fluxes are calculated from images of particles. These estimates are derived from an approximation of Stokes’ law relating the equivalent spherical diameter of particles to carbon flux (see Methods). This exponential approximation is reasonable assuming similar particle composition across all sizes, as highlighted by the standard deviations of parameters in Eq. 5 (see Methods). Furthermore, because of instrument and method limitations, particles <250 μm were not used, which may underestimate total carbon fluxes. Finally, these fluxes are instantaneous because they do not integrate space and time as sediment traps would. However, the approach allowed us to assemble the largest homogeneous carbon export dataset during a single expedition, corresponding to more than 600 profiles over 150 stations. This dataset is of similar magnitude to the body of historical data available in the literature that includes the 134 deep sediment trap-based carbon flux time-series[27] from the JGOFS program and the 419 thorium-derived particulate organic carbon (POC) export measurements[28].

Figure 1

Global view of carbon fluxes along the Tara Oceans circumnavigation route and associated eukaryotic lineages

a, Carbon flux in mg.m−2.d−1 and carbon export at 150 m estimated from particles size distribution and abundance measured with the Underwater Vision Profiler 5 (UVP5). Stations at which environmental data are available (Supplementary Table 9) are depicted by white dots. Stations at which eukaryotic samples are available are colored in red (Supplementary Tables 10 and 12). b, Eukaryotic lineages associated to carbon export as revealed by standard methods for regression-based modeling (sPLS analysis). Correlations between lineages and environmental parameters are depicted as a clustered heatmap and lineages with a correlation to carbon export higher than 0.2 are highlighted (detailed results in Supplementary Table 1).

From 68 globally distributed sites, a total of 7.2 Tb of metagenomics data, representing ~40 million non-redundant genes, around 35,000 Operational Taxonomic Units (OTUs) of prokaryotes (Bacteria and Archaea) and numerous mainly uncharacterized viruses and picoeukaryotes, have been described recently[23,25]. In addition, a set of 2.3 million eukaryotic 18S rDNA ribotypes was generated from a subset of 47 sampling sites corresponding to approximately 130,000 OTUs[24] (Fig. 1a). Finally, 5,476 viral “populations” were identified at 43 sites from viral metagenomic contigs, only 39 (<0.1%) of which had been previously observed[25] (see Methods). These genomics data combined across all domains of life and viruses together with carbon export estimates and other environmental parameters were used to explore the relationships between marine biogeochemistry and euphotic plankton communities (see Methods) in the top 150 m of the oligotrophic open ocean. Our study did not include high latitude areas due to the current lack of available molecular data and results should not be extrapolated to deeper depths. Using a method for regression-based modeling of high multidimensional data in biology (specifically a sparse Partial Least Square analysis - sPLS[29], Extended data Fig. 1), we detected several plankton lineages for which relative sequence abundance correlated with carbon export and other environmental parameters, most notably with NPP, as expected (Fig. 1b and see Supplementary Table 1). These included diatoms, dinoflagellates and metazoa (zooplankton), lineages classically identified as key contributors to carbon export.

Overview of analytical methods used in the manuscript. a, Depiction of a standard pairwise analysis that considers a sequence relative abundance matrix for s samples (s × OTUs (Operational Taxonomic Units)) and its corresponding environmental matrix (s × p (parameters)). sPLS results emphasize OTU(s) that are the most correlated to environmental parameters. b, Depiction of a graph-based approach. Using only a relative abundance matrix (s × OTUs), WGCNA builds a graph where nodes are OTUs and edges represent significant co-occurrence. Co-occurrence scores between nodes are weights allocated to corresponding edges. These weights are magnified by a power-law function until the graph becomes scale-free. The graph is then decomposed within subnetworks (groups of OTUs) that are analyzed separately. One subnetwork (group of OTUs) is considered of interest when its topology is related to the trait of interest; in the current case carbon export. For each subnetwork (for instance the subnetwork related to carbon export), each OTU is spread within a feature space that plots each OTU based on its membership to the subnetwork (x-axis) and its correlation to the environmental trait of interest (i.e., carbon export). A good regression of all OTUs emphasizes the putative relation of the subnetwork topology and the carbon export trait (i.e. the more a given OTU defines the subnetwork topology, the more it is correlated to carbon export). c, Depiction of the machine learning (PLS) approach that was applied following subnetwork identification and selection. Greater VIP scores (i.e. larger circles) emphasized most important OTUs. VIP refers to Variable Importance in Projection and reflects the relative predictive power of a given OTU. OTUs with VIP score greater than 1 are considered as important in the predictive model and their selection do not alter the overall predictive power.

Plankton community networks associated with carbon export

While the analysis presented in Fig. 1b supports previous findings about key organisms involved in carbon export from the euphotic zone[14,15,17-19], it is not able to capture how the intrinsic structure of the planktonic community relates to this biogeochemical process. Conversely, although other recent holistic approaches[10,30,31] used species co-occurrence networks to reveal potential biotic interactions, they do not provide a robust description of sub-communities driven by abiotic interactions. To overcome these issues, we applied a systems biology approach known as Weighted Gene Correlation Network Analysis (WGCNA[32,33]) to detect significant associations between the Tara Oceans genomics data and carbon export. This method delineates communities in the euphotic zone that are the most associated with carbon export rather than predicting organisms associated with sinking particles. In brief, the WGCNA approach builds a network in which nodes are features (in this case plankton lineages or gene functions) and links are evaluated by the robustness of co-occurrence scores. WGCNA then clusters the network into modules (hereafter denoted subnetworks) that can be examined to find significant subnetwork-trait relationships. We then filtered each subnetwork using a Partial Least Square (PLS) analysis that emphasizes key nodes (based on the Variable Importance in Projection (VIP) scores; see Methods and Extended data Fig. 1). These particular nodes are mandatory to summarize a subnetwork (or community) related to carbon export. In particular, they are of interest for evaluating (i) subnetwork robustness and (ii) predictive power for a given trait (see Methods and Extended data Fig. 1). We applied WGCNA to the relative abundance tables of eukaryotic, prokaryotic and viral lineages[23-25] and identified unique subnetworks significantly associated with carbon export within each dataset (see Methods and Supplementary Tables 2, 3, 4). The eukaryotic subnetwork (subnetwork-trait relationship to carbon export, Pearson cor. = 0.81, p = 5e−15) contained 49 lineages (Extended data Fig. 2a and Supplementary Table 2) among which 20% represented photosynthetic organisms (Fig. 2a and Supplementary Table 2). Surprisingly, this small subnetwork’s structure correlates very strongly to carbon export (Pearson cor. = 0.87, p = 5e−16, Extended data Fig. 2d) and it predicts as much as 69% (Leave-One-Out Cross-Validated (LOOCV), R2 = 0.69) of the variability in carbon export (Extended data Fig. 2g). Only ~6% of the subnetwork nodes correspond to diatoms and they show lower VIP scores than dinoflagellates (Supplementary Table 2). This is likely because our samples are not from silicate replete conditions where diatoms were blooming. Furthermore, our analysis did not incorporate data from high latitudes, where diatoms are known to be particularly important for carbon export, so this result suggests that dinoflagellates have a heretofore unrecognized role in carbon export processes in subtropical oligotrophic ‘type’ ecosystems. More precisely four of the five highest VIP scoring eukaryotic lineages that correlated with carbon export at 150 m were heterotrophs such as Metazoa (copepods), non-photosynthetic Dinophyceae, and Rhizaria (Fig. 2a and Supplementary Table 2). These results corroborate recent metagenomics analysis of microbial communities from sediment traps in the oligotrophic North Pacific subtropical gyre[34]. Consistently, in situ imaging surveys have revealed Rhizarian lineages, made up of large fragile organisms such as the Collodaria, to represent an until now under-appreciated component of global plankton biomass[35], which here also appear to be of relevance for carbon export. Another 14% of lineages from the subnetwork correspond to parasitic organisms, a largely under-explored component of planktonic ecosystems when studying carbon export.

Lineage ecological subnetworks associated to environmental parameters and their structures correlating to carbon export. a,b,c, Global ecological networks were built using the WGCNA methodology (see methods) and correlated to classical oceanographic parameters as well as carbon export (estimated at 150 m from particles size distribution and abundance). Each domain-specific global network is decomposed into smaller coherent subnetworks (depicted by distinct colours on the y-axis) and their eigen vector is correlated to all environmental parameters. Similar to a correlation at the network scale, this approach directly links subnetworks to environmental parameters (i.e. the more the taxa contribute to the subnetwork structure, the more their abundance are correlated to the parameter). a, A single eukaryotic subnetwork (n=58, N=1′870) is strongly associated to carbon export (Pearson cor. 0.81, p = 5e−15). b, A single prokaryotic subnetwork (n=109, N=1′527) is moderately associated to carbon export (Pearson cor. 0.32, p = 9e−03). c, A single viral subnetwork (n=277, N=5′476) is strongly associated to carbon export (Pearson cor. 0.93, p = 2e−15). d,e,f, The WGCNA approach directly links subnetworks to environmental parameters, i.e. the more the features contribute to the subnetwork structure (topology), the more their abundance are correlated to the parameter. This measure allows to identify subnetworks for which the overall structure, summarized as the eigen vector of the subnetwork, is related to the carbon export. d, The eukaryotic subnetwork structure correlates to carbon export (Pearson cor. = 0.87, p = 5e−16). e, The prokaryotic subnetwork structure correlates to carbon export (Pearson cor. = 0.47, p = 5e−06). f, The viral population subnetwork structure correlates to carbon export (Pearson cor. = 0.88, p = 6e−93). g,h,i, Lineage subnetworks predict carbon export. PLS regression was used to predict carbon export using lineage abundances in selected subnetworks. LOOCV was performed and VIP scores computed for each lineage. g, The eukaryotic subnetwork predicts carbon export with a R2 of 0.69. h, The prokaryotic subnetwork predicts carbon export with a R2 of 0.60. i, The viral population subnetwork predicts carbon export with a R2 of 0.89. j, k, l, Synechococcus (rather than Prochlorococcus) absolute cell counts correlate well to carbon export. j, Prochlorococcus cell counts estimated by flow cytometry do not correlate to carbon export (mean carbon flux at 150m, Pearson cor. = −0.13, p = 0.27). k, Synechococcus cell counts estimated by flow cytometry correlate significantly to carbon export (Pearson cor. = 0.64, p = 4.0e−10). l, Synechococcus / Prochlorococcus cell counts ratio correlates significantly to carbon export (Pearson cor. = 0.54, p = 4.0e−07).

Figure 2

Ecological networks reveal key lineages associated with carbon export at 150 m at global scale

The relative abundances of taxa in selected subnetworks were used to estimate carbon export and to identify key lineages associated with the process. a, The selected eukaryotic subnetwork (n=49, see Supplementary Table 2) can predict carbon export with high accuracy (PLS regression, LOOCV, R2=0.69, see Extended data Fig. 2g). Lineages with the highest VIP score (dots size is proportional to the VIP score in the scatter plot) in the PLS are depicted as red dots corresponding to three Rhizaria (Collodaria, Collozoum inerme and Sticholonche sp.), one copepod (Oithona sp.), one siphonophore (Lilyopsis), three Dinophyceae and one ciliate (Spirotontonia turbinata). b, The selected prokaryotic subnetwork (n=109, see Supplementary Table 3) can predict carbon export with good accuracy (PLS regression, LOOCV, R2=0.60, see Extended data Fig. 2h). c, The selected viral population subnetwork (n=277, see Supplementary Table 4) can predict carbon export with high accuracy (PLS regression, LOOCV, R2=0.89, see Extended data Fig. 2i). Two viral populations with a high VIP score (red dots) are predicted as Synechococcus phages (see Supplementary Table 4).

The prokaryotic subnetwork that associated most significantly with carbon export at 150 m (subnetwork-trait relationship to carbon export, Pearson cor. = 0.32, p = 9e−03) contained 109 OTUs (Extended data Fig. 2b and Supplementary Table 3), its structure correlated well to carbon export (Pearson cor. = 0.47, p = 5e−06, Extended data Fig. 2e) and it could predict as much as 60% of the carbon export (LOOCV, R2 = 0.60) (Extended data Fig. 2h). By far the highest VIP score within this community was assigned to Synechococcus, followed by Cobetia, Pseudoalteromonas and Idiomarina, as well as Vibrio and Arcobacter (Fig. 2b and Supplementary Table 3). Noteworthy, Prochlorococcus genera and SAR11 clade fall out of this community, while the significance of Synechococcus for carbon export could be validated using absolute cell counts estimated by flow cytometry (Pearson cor. = 0.64, p = 4e−10, Extended data Fig. 2k). Moreover, Prochlorococcus cell counts did not correlate with carbon export (Pearson cor. = −0.13, p = 0.27, Extended data Fig. 2j) whereas the Synechococcus to Prochlorococcus cell count ratio correlated positively and significantly (Pearson cor. = 0.54, p = 4e−07, Extended data Fig. 2l), suggesting the relevance of Synechococcus, rather than Prochlorococcus, to carbon export. Interestingly, Pseudoalteromonas, Idiomarina, Vibrio and Arcobacter (of which several species are known to be associated with eukaryotes[36]) have also been observed in live and poisoned sediment traps[34] and display very high VIP scores in the subnetwork associated with carbon export. Additional genera reported as being enriched in poisoned traps (also known as being associated with eukaryotes) include Enterovibrio and Campylobacter, and are present as well in the carbon export associated subnetwork. Interestingly, the viral subnetwork (involving 277 populations) most related to carbon export at 150 m (Pearson cor. = 0.93, p = 2e−15, Extended data Fig. 2c) contained particularly high VIP scores for two Synechococcus phages (Fig. 2c and Supplementary Table 4), which represented a 16-fold enrichment (Fisher’s exact test p = 6.4e−09). Its structure also correlated with carbon export (Pearson cor. = 0.88, p = 6e−93, Extended data Fig. 2f) and could predict up to 89% of the variability of carbon export (LOOCV, R2 = 0.89) (Extended data Fig. 2i). The significance of these convergent results is reinforced by the fact that sequences from these datasets are derived from organisms collected on independent size filters (see Methods), and further implicates the importance of top-down processes in carbon export. With the aim of integrating eukaryotic, prokaryotic, and viral communities in the euphotic zone with carbon export at 150 m, we synthesized their respective subnetworks using a single global co-occurrence network established previously[10]. The resulting network focused on key lineages and their predicted co-occurrences (Fig. 3). Lineages with high VIP values (such as Synechococcus) are revealed as hubs of the co-occurrence network[10], illustrating the potentially strategic key roles within the integrated network of lineages under-appreciated by conventional methods to study carbon export. Associations between the hub lineages are mostly mutually exclusive which may explain the relatively weak correlation of some of these lineages with carbon export when using standard correlation analyses as shown in Fig. 1b.

Figure 3

Integrated plankton community network built from eukaryotic, prokaryotic and viral subnetworks related to carbon export at 150 m

Major lineages were selected within the three subnetworks (VIP > 1) (Supplementary Tables 2, 3 and 4). Co-occurrences between all lineages of interest were extracted, if present, from a previously established global co-occurrence network (see methods). Only lineages discussed within the study are pinpointed. The resulting graph is composed of 329 nodes, 467 edges, with a diameter of 7, and average weighted degree of 4.6.

Gene functions associated with carbon export

Given the potential importance of prokaryotic processes influencing the biological carbon pump[22], we used the same analytical approaches to examine the prokaryotic genomic functions associated with carbon export at 150 m in the annotated Ocean Microbial Reference Gene Catalogue from Tara Oceans[23]. We built a global co-occurrence network for functions (i.e., Orthologous Groups of genes or OGs) from the euphotic zone and identified two subnetworks of functions that are significantly associated with carbon export (light and dark green subnetworks; FNET1 and FNET2, respectively, see Extended data Fig. 3a, 3b and 3c).

Prokaryotic function subnetworks associated to environmental parameters and their structure correlate to carbon export. a,b,c Global ecological networks were built for the prokaryotic functions using the WGCNA methodology (see methods) and correlated to classical oceanographic parameters as well as carbon export. a, Two bacterial functional subnetworks (n=441 and n=220 N=37′832) are associated to carbon export (Pearson cor. 0.54, p = 1e−07 and 0.42, p = 1e−04). b, The WGCNA approach directly links subnetworks to environmental parameters, i.e. the more the features contribute to the subnetwork structure (topology), the more their abundance are correlated to the parameter. This measure allows to identify subnetworks for which the overall structure, summarized as the eigen vector of the subnetwork, is related to the carbon export. The bacterial function subnetwork structures correlate to carbon export (FNET1 Pearson cor. = 0.68, p = 3e−61, and FNET2 Pearson cor. = 0.47, p = 6e−13). c, Two functional subnetworks (light and dark green, FNET1 (n=220) and FNET2 (n=441), respectively) are significantly associated with carbon export (FNET1: Pearson cor. 0.42, p = 4e−09 and FNET2: 0.54, p = 7e−06). The highest VIP score functions from top to bottom correspond to red dots from right to left. d, PLS regression was used to predict carbon export using abundances of functions (OGs) in selected subnetworks. LOOCV was performed and VIP scores computed for each function. Light green subnetwork (FNET1) functions predict carbon export with a R2 of 0.41. Dark green subnetwork (FNET2) functions predict carbon export with a R2 of 0.48. e, Cumulative abundance of genus-level taxonomic annotations of genes encoding functions from FNET1 and FNET2 subnetworks and Bacterial function subnetworks predict carbon export. Genes contributing to the relative abundance of FNET1 and FNET2 subnetwork functions were taxonomically annotated by homology searches against a non-redundant gene reference database using a last common ancestor (LCA) approach (see methods).

The majority of functions in FNET1 and FNET2 correlate well with carbon export (FNET1: mean Pearson cor. = 0.45, s.d. 0.09 and FNET2: mean Pearson cor. = 0.34, s.d. 0.10). Interestingly, FNET2 functions (n=220) encode mostly (83%) core functions (i.e., functions observed in all euphotic samples, see Methods) while the majority of FNET1 functions (n=441) are non-core (85%) (see Supplementary Tables 5, 6), highlighting both essential and adaptive ecological functions associated with carbon export. Top VIP scoring functions in the FNET1 subnetwork are membrane proteins such as ABC-type sugar transporters (Extended data Fig. 3c). This subnetwork also contains many functions specific to the Synechococcus accessory photosynthetic apparatus (e.g., relating to phycobilisomes, phycocyanin and phycoerythrin; see Supplementary Table 5), which is consistent with the major role of this genus for carbon export inferred from the prokaryotic subnetwork (Fig. 2b). In addition, functions related to carbohydrates, inorganic ion transport and metabolism, as well as transcription, are also well represented (Fig. 4), suggesting overall a subnetwork of functions dedicated to photosynthesis and growth.

Figure 4

Key bacterial functional categories associated with carbon export at 150 m at global scale

A bacterial functional network was built based on Orthologous Group/Gene (OG) relative abundances using the WGCNA methodology (see Methods) and correlated to classical oceanographic parameters. Two functional subnetworks (FNET1 (n=220) and FNET2 (n=441), respectively, Extended data Fig. 3a) are significantly associated with carbon export (FNET1: Pearson cor. 0.42, p = 4e−09 and FNET2: 0.54, p = 7e−06, see Extended data Fig. 3b). Higher functional categories are depicted for functions with a VIP score >1 (PLS regression, LOOCV, FNET1 R2=0.41 and FNET2 R2=0.48, see Extended data Fig. 3d) in both subnetworks.

The FNET2 subnetwork contains several functions encoded by genes taxonomically assigned to Candidatus pelagibacter and Prochlorococcus, known as occupying similar oceanic regions as Synechococcus, but overall most of its relative abundance (74%) is taxonomically unclassified (Extended data Fig. 3e). Top VIP scoring functions in FNET2 are also membrane proteins and ABC-type sugar transporters, as well as functions involved in carbohydrate breakdown such as a chitinase (Extended data Fig. 3c). These features highlight the potential roles of bacteria in the formation and degradation of marine aggregates[37]. Strikingly, 77% and 58%, of OGs with a VIP score > 1 in FNET1 and FNET2, respectively, are functionally uncharacterized[38,39] (Fig. 4), pointing to the strong need for future molecular work to explore these functions (see Supplementary Tables 5, 6). The relevance of the identified bacterial functions to predict carbon export was also confirmed by PLS regression (Extended data Fig. 3d). As proposed for plankton communities, the functional subnetworks predict 41% and 48% of carbon export variability (LOOCV, R2 = 0.41 and 0.48 for FNET1 and FNET2, respectively) with a minimal number of functions (Fig. 4, 123 and 54 functions with a VIP score > 1 for FNET1 and FNET2, respectively). Finally, higher predictive power was obtained using subnetworks of viral protein clusters (Extended data Fig. 4a, 4b and 4c), predicting 55% and 89% of carbon export variability (LOOCV R2 = 0.55 and 0.89 for VNET1 and VNET2, respectively; Extended data Fig. 4d, Supplementary Tables 7, 8), suggesting the key role, of not only bacteria, but also their phages in processes sustaining carbon export at a global level.

Viral protein cluster networks reveal potential marker genes for carbon export prediction at global scale. a, A viral protein cluster (PC) network was built using abundances of PCs predicted from viral population contigs associated to carbon export (Fig. 2c) using the WGCNA methodology (see methods) and correlated to classical oceanographic parameters. Two viral PC subnetworks (n=1′879 and n=2′147, N=4′678, light and dark orange, VNET1 and VNET2, left and right panel respectively) are strongly associated to carbon export (VNET1: Pearson cor. 0.75, p = 3e−07 and VNET2: 0.91, p = 3e−14). b, The viral PC subnetwork structures correlate to carbon export (VNET1 Pearson cor. = 0.91, p < 1e−200, and VNET2 Pearson cor. = 0.96, p < 1e−200). c, Size of dots is proportional to the VIP score computed for the PLS regression. d, Viral PC subnetworks predict carbon export. PLS regression was used to predict carbon export using abundances of viral protein clusters (PCs) in selected subnetworks. LOOCV was performed and VIP scores computed for each PC. Light orange subnetwork (VNET1, left panel) PCs predict carbon export with a R2 of 0.55. Dark orange subnetwork (VNET2, right panel) PCs predict carbon export with a R2 of 0.89.

Discussion

In this report we reveal the potential contribution of unexpected components of plankton communities, and confirm the importance of prokaryotes and viruses in the correlating with carbon export in the nutrient-depleted oligotrophic ocean. Carbon export at 150 m has been estimated from particle size distribution in a global dataset, but should be taken with caution, as the estimates do not account for particle composition. In addition, these export estimates evaluate how much carbon leaves the euphotic zone, but they are not related and should not be extrapolated to sequestration, which occurs after remineralisation, deeper in the water column, and over longer timescales. Nonetheless, the use of the UVP was the only realistic method to evaluate carbon flux over the 3 years expedition because deployment of sediment traps at all stations would have been impossible. While our findings are consistent with the numerous previous studies that have highlighted the central role of copepods and diatoms in carbon export[14,15,17-19], they place them in an ecosystem context and reveal hypothetical processes correlating with the intensity of export, such as parasitism and predation. For example, while viruses are commonly assumed to lyse cells and maintain fixed organic carbon in surface waters, thereby reducing the intensity of the biological carbon pump[40], there are hints that viral lysis may increase carbon export through the production of colloidal particles and aggregate formation[41]. Our current study suggests that these latter roles may be more ubiquitous than currently appreciated. The importance of aggregation and cell stickiness as inferred from gene network analysis should be further explored mechanistically to investigate the biological significance of these findings. The future evolution of the oceanic carbon sink remains uncertain because of poorly constrained processes, particularly those associated with the biological pump. With current trends in climate change, the size and biodiversity of phytoplankton are predicted to decrease globally[42,43]. Furthermore, in spite of the potential importance of viruses revealed in this study, they have largely been ignored because of limitations in sampling technologies. Consequently, as oligotrophic gyres expand and global mean NPP decreases[44], the field is currently unable to predict the consequences for carbon export from the ocean’s euphotic zone. By pinpointing key lineages and key microbial functions that correlate with carbon export at 150 m in these areas, this study provides a framework to address this critical bottleneck. However, the associations presented do not necessarily suggest a causal effect on carbon export, which will require further investigation. One of the grand challenges in the life sciences is to link genes to ecosystems[45], based on the posit that genes can have predictable ecological footprints at community and ecosystem levels[46-48]. The Tara Oceans data sets have allowed us to predict as much as 89% of the variability in carbon export from the oligotrophic surface ocean with just a small number of genes, largely with unknown functions, encoded by prokaryotes and viruses. These findings can be used as a basis to include biological complexity and guide experimental work designed to inform climate modeling of the global carbon cycle. Such statistical analyses, scaling from gene-to-ecosystems, may open the way to the development of a new conceptual and methodological framework to better understand the mechanisms underpinning key ecological processes.

Methods

Environmental data collection

From 2009-2013, environmental data (Supplementary Table 9) were collected across all major oceanic provinces in the context of the Tara Oceans expeditions[20]. Sampling stations were selected to represent distinct marine ecosystems at a global scale[49]. Note that Southern Ocean stations were not examined herein because they were ranked as outliers due to their exceptional environmental characteristics and biota[23,24]. Environmental data were obtained from vertical profiles of a sampling package[50,51]. It consisted of conductivity and temperature sensors, chlorophyll and CDOM fluorometers, light transmissometer (Wetlabs C-star 25cm), a backscatter sensor (WetLabs ECO BB), a nitrate sensor (SATLANTIC ISUS) and a Hydroptic Underwater Vision Profiler (UVP; Hydroptics[52]. Nitrate and fluorescence to chlorophyll concentrations as well as salinity were calibrated from water samples collected with Niskin bottle[50]. Net Primary Production (NPP) data were extracted from 8 day composites of the Vertically Generalized Production Model (VGPM[53]) at the week of sampling[54]. Carbon fluxes and carbon export, corresponding to the carbon flux at 150 m, were estimated based on particle concentration and size distributions obtained from the UVP[51] and details are presented below.

From particle size distribution to carbon export estimation

Previous research has shown that the distribution of particle size follows a power law over the μm to the mm size range[3,55,56]. This Junge-type distribution translates into the following mathematical equation, whose parameters can be retrieved from UVP images: where d is the particle diameter, and exponent k is defined as the slope of the number spectrum when equation (2) is log transformed. This slope is commonly used as a descriptor of the shape of the aggregate size distribution. The carbon-based particle size approach relies on the assumption that the total carbon flux of particles (F) corresponds to the flux spectrum integrated over all particle sizes: where n(d) is the particle size spectrum, i.e., equation (1), and m(d) is the mass (here carbon content) of a spherical particle described as: where α = πρ/6, ρ is the average density of the particle, and w(d) is the settling rate calculated using Stokes Law: where β =g(ρ – ρ0)(18νρ0)−1, g is the gravitational acceleration, ρ0 the fluid density, and ν the kinematic viscosity. In addition, mass and settling rates of particles, m(d) and w(d), respectively, are often described as power law functions of their diameter obtained by fitting observed data, m(d). w(d) = Ad. The particles carbon flux can then be estimated using an approximation of Eq. 2 over a finite number (x) of small logarithmic intervals for diameter d spanning from 250 μm to 1.5 mm (particles <250 μm and >1.5 mm are not considered, consistent with the method presented by Guidi et al., [2008][57]) such as where A=12.5±3.40 and B=3.81 ± 0.70 have been estimated using a global dataset that compared particle fluxes in sediment traps and particle size distributions from the UVP images.

Genomic data collection

For the sake of consistency between all available datasets from the Tara Oceans expeditions, we considered subsets of the data recently published in Science[23-25]. In brief, one sample corresponds to data collected at one depth (surface (SRF) or Deep Cholorophyll Maximum (DCM) determined from the profile of chlorophyll fluorometer) and at one station. To study the eukaryotic community in our current manuscript, we selected stations at which we had environmental data and carbon export estimated at 150 m with the UVP and all size fractions. Consequently a subset of 33 stations (corresponding to 56 samples) has been created compared to the 47 stations analyzed in de Vargas et al. [2015]. A similar procedure has been applied to the prokaryotic and viral datasets, reducing the Sunagawa et al. [2015] prokaryotic dataset to a subset of 104 samples from 62 stations and the Brum et al. [2015] viral dataset into a subset of 37 samples from 22 stations (See Supplementary Table 10). In addition a detailed table is provided summarizing which samples (depth and station) are available for each domain (Supplementary Table 11).

Eukaryotic taxa profiling

Photic-zone eukaryotic plankton diversity has been investigated through millions of environmental Illumina reads. Sequences of the 18S ribosomal RNA gene V9 region were obtained by PCR amplification and a stringent quality-check pipeline has been applied to remove potential chimera or rare sequences (details on data cleaning in de Vargas et al. [2015][24]). For 47 stations, and if possible at two depths (SRF and DCM), eukaryotic communities were sampled in the piconano- (0.8-5 μm), micro- (20-180 μm) and meso-plankton (180-2000 μm) fractions (a detailed list of these samples is given in Supplementary Table 12). In the framework of the carbon export study, sequences from all size fractions were pooled in order to get the most accurate and statistically reliable dataset of the eukaryotic community. The 2.3 million eukaryotic ribotypes were assigned to known eukaryotic taxonomic entities by global alignment to a curated database[24]. To get the most accurate vision of the eukaryotic community, sequences showing less than 97% identity with reference sequences were excluded. The final eukaryotic relative abundance matrix used in our analyses included 1,750 lineages (taxonomic assignation has been performed using a last common ancestor methodology, and had thus been performed down to species level when possible) in 56 samples from 33 stations. Pooled abundance (number of V9 sequences) of each lineage has been normalized by the total sum of sequences in each sample.

Prokaryotic taxa profiling

To investigate the prokaryotic lineages, communities were sampled in the pico-plankton. Both filter sizes have been used along the Tara Oceans transect: up to station #52, prokaryotic fractions correspond to a 0.22-1.6 μm size fraction, and from station #56, prokaryotic fractions correspond to a 0.22-3 μm size fraction. Prokaryotic taxonomic profiling was performed using 16S rRNA gene tags directly identified in Illumina-sequenced metagenomes (mitags) as described in Logares et al., [2014][58]. 16S mitags were mapped to cluster centroids of taxonomically annotated 16S reference sequences from the SILVA database[59] (release 115: SSU Ref NR 99) that had been clustered at 97% sequence identity using USEARCH v6.0.307[60]. 16S mitag counts were normalized by the total reads count in each sample (further details in Sunagawa et al. [2015][23]). The photic-zone prokaryotic relative abundance matrix used in our analyses included 3,253,962 mitags corresponding to 1,328 genera in 104 samples from 62 stations.

Prokaryotic functional profiling

For each prokaryotic sample, gene relative abundance profiles were generated by mapping reads to the OM-RGC using the MOCAT pipeline[61]. The relative abundance of each reference gene was calculated as gene length-normalized base counts. And functional abundances were calculated as the sum of the relative abundances of these reference genes, annotated to OG functional groups. In our analyses, we used the subset of the OM-RGC that was annotated to Bacteria or Archaea (24.4 M genes). Using a rarefied (to 33 M inserts) gene count table, an OG was considered to be part of the ocean microbial core if at least one insert from each sample was mapped to a gene annotated to that OG. For further details on the prokaryotic profiling please refer to Sunagawa et al. [2015][23]. The final prokaryotic functional relative abundance matrix used in our analyses included 37,832 OGs or functions in 104 samples from 62 stations. Genes from functions of FNET1 and FNET2 subnetworks were taxonomically annotated using a modified dual BLAST-based last common ancestor (2bLCA) approach[62]. We used RAPsearch2[63] rather than BLAST to efficiently process the large data volume and a database of non-redundant protein sequences from UniProt (version: UniRef_2013_07) and eukaryotic transcriptome data not represented in UniRef (see Supplementary Table 5, 6, for full annotations).

Enumeration of prokaryotes by flow cytometry

For prokaryote enumeration by flow cytometry, three aliquots of 1 ml of seawater (pre-filtered by 200-μm mesh) were collected from both SRF and DCM. The samples were fixed immediately using cold 25% glutaraldehyde (final concentration 0.125%), left in the dark for 10 min at room temperature, flash-frozen and kept in liquid nitrogen on board and then stored at −80°C on land. Two subsamples were taken to separate counts of heterotrophic prokaryotes (not shown herein) and phototrophic picoplankton. For heterotrophic prokaryote determination, 400 μl of sample was added to a diluted SYTO-13 (Molecular Probes Inc., Eugene, OR, USA) stock (10:1) at 2.5 μmol l−1 final concentration, left for about 10 min in the dark to complete the staining and run in the flow cytometer. We used a FacsCalibur (Becton & Dickinson) flow cytometer equipped with a 15 mW Argon-ion laser (488 nm emission). At least 30,000 events were acquired for each subsample (usually 100,000 events). Fluorescent beads (1 μm, Fluoresbrite carboxylate microspheres, Polysciences Inc., Warrington, PA) were added at a known density as internal standards. The bead standard concentration was determined by epifluorescence microscopy. For phototrophic picoplankton, we used the same procedure as for heterotrophic prokaryote, but without addition of SYTO-13. Data analysis was performed with FlowJo software (Tree Star, Inc.).

Profiling of viral populations

In order to associate viruses to carbon export we used viral populations as defined in Brum et al. [2015][25] using a set of 43 Tara Oceans viromes. Briefly, viral populations were defined as large contigs (>10 predicted genes and >10 kb) identified as most likely originating from bacterial or archaeal viruses. These 6,322 contigs remained and were then clustered into populations if they shared more than 80% of their genes at >95% nucleotide identity. This resulted in 5,477 ‘populations’ from the 6,322 contigs, where as many as 12 contigs were included per population. For each population, the longest contig was chosen as the ‘seed’ representative sequence. The relative abundance of each population was computed by mapping all quality-controlled reads to the set of 5,477 non-redundant populations (considering only mapping quality scores greater than 1) with Bowtie2[64] and if more than 75% of the reference sequence was covered by virome reads. The relative abundance of a population in a sample was computed as the number of base pairs recruited to the contig normalized to the total number of base pairs available in the virome and the contig length if more than 75% of the reference sequence was covered by virome reads, and set to 0 otherwise (see Brum et al. [2015][25] for further details). The final viral population abundance matrix used in our analyses included 5,291 viral population contigs in 37 samples from 22 stations.

Viral host predictions

The longest contig in a population was defined as the seed sequence and considered the best estimate of that population’s origin. These seed sequences were used to assess taxonomic affiliation of each viral population. Cases where >50% of the genes were affiliated to a specific reference genome from RefSeq Virus (based on a BLASTp comparison with thresholds of 50 for bit score and 10−5 for e-value) with an identity percentage of at least 75% (at the protein sequence level) were considered as confident affiliations to the corresponding reference virus. The viral population host group was then estimated based on these confident affiliations (see Supplementary Table 13 for host affiliation of viral population contigs associated to carbon export).

Viral protein clusters

Viral protein clusters (PCs) correspond to ORFs initially mapped to existing clusters (POV, GOS and phage genomes). The remaining, unmapped ORFs were self-clustered, using cd-hit as described in Brum et al. [2015][25]. Only PCs with more than two ORFs were considered bona fide and were used for subsequent analyses. To compute PC relative abundance for statistical analyses, reads were mapped back to predicted ORFs in the contigs dataset using Mosaik as described in Brum et al. [2015][25]. Read counts to PCs were normalized by sequencing depth of each virome. Importantly, we restricted our analyses to 4,294 PCs associated to the 277 viral population contigs significantly associated to carbon export in 37 samples from 22 stations.

Sparse Partial Least Squares analysis

In order to directly associate eukaryotic lineages to carbon export and other environmental traits (Fig. 1b), we used sparse Partial Least Square (sPLS[65] as implemented in the R package mixOmics[29]. We applied the sPLS in regression mode, which will model a causal relationship between the lineages and the environmental traits, i.e. PLS will predict environmental traits (e.g. carbon export) from lineage abundances. This approach enabled us to identify high correlations (see Supplementary Table 1) between certain lineages and carbon export but without taking into account the global structure of the planktonic community.

Co-occurrence network model analysis

Weighted correlation network analysis (WGCNA) was performed to delineate feature (lineages, viral populations, PCs or functions) subnetworks based on their relative abundance[66,67]. A signed adjacency measure for each pair of features was calculated by raising the absolute value of their Pearson correlation coefficient to the power of a parameter p. The default value p=6 was used for each global network, except for the Prokaryotic functional network where p had to be lowered to 4 in order to optimize the scale-free topology network fit. Indeed, this power allows the weighted correlation network to show a scale free topology where key nodes are highly connected with others. The obtained adjacency matrix was then used to calculate the topological overlap measure (TOM), which for each pair of features, taking into account their weighted pairwise correlation (direct relationships) and their weighted correlations with other features in the network (indirect relationships). For identifying subnetworks a hierarchical clustering was performed using a distance based on the TOM measure. This resulted in the definition of several subnetworks, each represented by its first principal component. These characteristic components play a key role in weighted correlation network analysis. On the one hand, the closeness of each feature to its cluster, referred to as the subnetwork membership, is measured by correlating its relative abundance with the first principal component of the subnetwork. On the other hand, association between the subnetworks and a given trait is measured by the pairwise Pearson correlation coefficients between the considered environmental trait and their respective principal components. A similar protocol has been performed on the eukaryotic relative abundance matrix, the prokaryotic relative abundance matrix, the prokaryotic functions relative abundance matrix and the viral population and PC relative abundance matrices. All procedures were applied on Hellinger-transformed log-scaled abundances. Noteworthy, the protocol is not sensitive to copy number variation as observed across different eukaryotic species, because the association between two species relies on a correlation score between relative abundance measurements. Computations were carried out using the R package WGCNA[33]. Given the nature of the eukaryotic dataset (three distinct size fractions), the sampling process may lead to the loss of size fractions. In particular, samples #1, #3, #17, #37, #39, #43, #48, #53, #54, #55, #66 are eventually biases by such a loss (Supplementary Table 12). A complementary WGCNA analysis was performed with addition of these samples to evaluate the robustness of our protocol to missing size fractions. The composition of the eukaryotic subnetwork built with an extended dataset (i.e., 67 samples from 37 stations for which size fractions were missing in 11 samples) was compared to the subnetwork as presented above (i.e., 56 samples from 33 stations). Both subnetworks shown an overlap of 75% of lineage, whereas four of the top five VIP lineages with the extended dataset (see Extended data Fig. 5 for details) can be found in the top six VIP lineages of the above subnetwork (Supplementary Table 2), emphasizing highly similar results and a small sensitivity to size fraction loss.

WGCNA and PLS regression analyses for the full Eukaryotic dataset. a, A single eukaryotic subnetwork (n=58, is strongly associated to carbon export (Pearson cor. 0.79, p = 3e−14). b, The eukaryotic subnetwork structure correlates to carbon export (Pearson cor. = 0.94, p = 4e−27). c, The eukaryotic subnetwork predicts carbon export with a R2 of 0.76. d, Lineages with the highest VIP score (dots size is proportional to the VIP score in the scatter plot) in the PLS are depicted as red dots corresponding to two rhizaria (Collodaria), one copepod (Euchaeta), and three dinophyceae (Noctiluca scintillans, Gonyaulax polygramma and Gonyaulax sp. (clade 4)).

Extraction of subnetworks related to carbon export

For each subnetwork (called modules within WGCNA) extracted from each global network, pairwise Pearson correlation coefficients between the subnetwork principal components and the carbon export estimation was computed, as well as corresponding p-values corrected for multiple testing using the Benjamini & Hochberg FDR procedure. The subnetworks showing the highest correlation scores are of interest and were investigated. One subnetwork (49 nodes) was significant within the eukaryotic network; one subnetwork (109 nodes) was significant for the prokaryotic network; one subnetwork (277 nodes) was significant within the virus network; two subnetworks (441 and 220 nodes) were significant within the prokaryotic functional network, and two subnetworks (1,879 and 2,147 nodes) were significant within the viral PCs network.

Partial Least Squares regression

In addition to the network analyses, we asked whether the identified subnetworks can be used as predictors for the carbon export estimations. To answer this question, we used Partial least squares (PLS) regression, which is a dimensionality-reduction method that aims at determining predictor combinations with maximum covariance with the response variable. The identified combinations, called latent variables, are used to predict the response variable. The predictive power of the model is assessed by correlating the predicted vector with the measured values. The significance of the prediction power was evaluated by permuting the data 10,000 times. For each permutation, a PLS model was built to predict the randomized response variable and a Pearson correlation was calculated between the permuted response variable and in Leave-One-Out Cross-Validation (LOOCV) predicted values. The 10,000 random correlations are compared to the performance of the PLS model that were used to predict the true response variable. In addition, the predictors were ranked according to their value importance in projection (VIP)[68]. The VIP measure of a predictor estimates its contribution in the PLS regression. The predictors having high VIP values are assumed important for the PLS prediction of the response variable. The VIP values of the prokaryotic functional subnetworks are provided in Supplementary Tables 5, 6. For the sake of illustration, only lineages or functions with VIP > 1[68] are discussed and pictured in Figure 2 and 4. Our computations were carried out using the R package pls[69]. All programs are available under GPL Licence.

Subnetwork representations

Nodes of the subnetworks represent either lineages (eukaryotic, prokaryotic or viral) or functions (prokaryotic or viral). Subnetworks related to the carbon export have been represented in two distinct formats. Scatter plots represent each nodes based on their Pearson correlation to the carbon export and their respective node centrality within the subnetwork. The latter has been recomputed using significant Spearman correlations above 0.3 (>0.9 for viral PCs) as edges, this is done for visualization purposes since WGCNA subnetworks (based on the Topology Overlap Measure (TOM) between nodes) are hyper-connected. Size representation of nodes are proportional to the VIP score after PLS. The hiveplots depict the same subnetworks by focusing on two main features: x-axis and y-axis depict nodes of subnetworks ranked by their VIP scores and Pearson correlation to the carbon export, respectively. Overview of analytical methods used in the manuscript. a, Depiction of a standard pairwise analysis that considers a sequence relative abundance matrix for s samples (s × OTUs (Operational Taxonomic Units)) and its corresponding environmental matrix (s × p (parameters)). sPLS results emphasize OTU(s) that are the most correlated to environmental parameters. b, Depiction of a graph-based approach. Using only a relative abundance matrix (s × OTUs), WGCNA builds a graph where nodes are OTUs and edges represent significant co-occurrence. Co-occurrence scores between nodes are weights allocated to corresponding edges. These weights are magnified by a power-law function until the graph becomes scale-free. The graph is then decomposed within subnetworks (groups of OTUs) that are analyzed separately. One subnetwork (group of OTUs) is considered of interest when its topology is related to the trait of interest; in the current case carbon export. For each subnetwork (for instance the subnetwork related to carbon export), each OTU is spread within a feature space that plots each OTU based on its membership to the subnetwork (x-axis) and its correlation to the environmental trait of interest (i.e., carbon export). A good regression of all OTUs emphasizes the putative relation of the subnetwork topology and the carbon export trait (i.e. the more a given OTU defines the subnetwork topology, the more it is correlated to carbon export). c, Depiction of the machine learning (PLS) approach that was applied following subnetwork identification and selection. Greater VIP scores (i.e. larger circles) emphasized most important OTUs. VIP refers to Variable Importance in Projection and reflects the relative predictive power of a given OTU. OTUs with VIP score greater than 1 are considered as important in the predictive model and their selection do not alter the overall predictive power. Lineage ecological subnetworks associated to environmental parameters and their structures correlating to carbon export. a,b,c, Global ecological networks were built using the WGCNA methodology (see methods) and correlated to classical oceanographic parameters as well as carbon export (estimated at 150 m from particles size distribution and abundance). Each domain-specific global network is decomposed into smaller coherent subnetworks (depicted by distinct colours on the y-axis) and their eigen vector is correlated to all environmental parameters. Similar to a correlation at the network scale, this approach directly links subnetworks to environmental parameters (i.e. the more the taxa contribute to the subnetwork structure, the more their abundance are correlated to the parameter). a, A single eukaryotic subnetwork (n=58, N=1′870) is strongly associated to carbon export (Pearson cor. 0.81, p = 5e−15). b, A single prokaryotic subnetwork (n=109, N=1′527) is moderately associated to carbon export (Pearson cor. 0.32, p = 9e−03). c, A single viral subnetwork (n=277, N=5′476) is strongly associated to carbon export (Pearson cor. 0.93, p = 2e−15). d,e,f, The WGCNA approach directly links subnetworks to environmental parameters, i.e. the more the features contribute to the subnetwork structure (topology), the more their abundance are correlated to the parameter. This measure allows to identify subnetworks for which the overall structure, summarized as the eigen vector of the subnetwork, is related to the carbon export. d, The eukaryotic subnetwork structure correlates to carbon export (Pearson cor. = 0.87, p = 5e−16). e, The prokaryotic subnetwork structure correlates to carbon export (Pearson cor. = 0.47, p = 5e−06). f, The viral population subnetwork structure correlates to carbon export (Pearson cor. = 0.88, p = 6e−93). g,h,i, Lineage subnetworks predict carbon export. PLS regression was used to predict carbon export using lineage abundances in selected subnetworks. LOOCV was performed and VIP scores computed for each lineage. g, The eukaryotic subnetwork predicts carbon export with a R2 of 0.69. h, The prokaryotic subnetwork predicts carbon export with a R2 of 0.60. i, The viral population subnetwork predicts carbon export with a R2 of 0.89. j, k, l, Synechococcus (rather than Prochlorococcus) absolute cell counts correlate well to carbon export. j, Prochlorococcus cell counts estimated by flow cytometry do not correlate to carbon export (mean carbon flux at 150m, Pearson cor. = −0.13, p = 0.27). k, Synechococcus cell counts estimated by flow cytometry correlate significantly to carbon export (Pearson cor. = 0.64, p = 4.0e−10). l, Synechococcus / Prochlorococcus cell counts ratio correlates significantly to carbon export (Pearson cor. = 0.54, p = 4.0e−07). Prokaryotic function subnetworks associated to environmental parameters and their structure correlate to carbon export. a,b,c Global ecological networks were built for the prokaryotic functions using the WGCNA methodology (see methods) and correlated to classical oceanographic parameters as well as carbon export. a, Two bacterial functional subnetworks (n=441 and n=220 N=37′832) are associated to carbon export (Pearson cor. 0.54, p = 1e−07 and 0.42, p = 1e−04). b, The WGCNA approach directly links subnetworks to environmental parameters, i.e. the more the features contribute to the subnetwork structure (topology), the more their abundance are correlated to the parameter. This measure allows to identify subnetworks for which the overall structure, summarized as the eigen vector of the subnetwork, is related to the carbon export. The bacterial function subnetwork structures correlate to carbon export (FNET1 Pearson cor. = 0.68, p = 3e−61, and FNET2 Pearson cor. = 0.47, p = 6e−13). c, Two functional subnetworks (light and dark green, FNET1 (n=220) and FNET2 (n=441), respectively) are significantly associated with carbon export (FNET1: Pearson cor. 0.42, p = 4e−09 and FNET2: 0.54, p = 7e−06). The highest VIP score functions from top to bottom correspond to red dots from right to left. d, PLS regression was used to predict carbon export using abundances of functions (OGs) in selected subnetworks. LOOCV was performed and VIP scores computed for each function. Light green subnetwork (FNET1) functions predict carbon export with a R2 of 0.41. Dark green subnetwork (FNET2) functions predict carbon export with a R2 of 0.48. e, Cumulative abundance of genus-level taxonomic annotations of genes encoding functions from FNET1 and FNET2 subnetworks and Bacterial function subnetworks predict carbon export. Genes contributing to the relative abundance of FNET1 and FNET2 subnetwork functions were taxonomically annotated by homology searches against a non-redundant gene reference database using a last common ancestor (LCA) approach (see methods). Viral protein cluster networks reveal potential marker genes for carbon export prediction at global scale. a, A viral protein cluster (PC) network was built using abundances of PCs predicted from viral population contigs associated to carbon export (Fig. 2c) using the WGCNA methodology (see methods) and correlated to classical oceanographic parameters. Two viral PC subnetworks (n=1′879 and n=2′147, N=4′678, light and dark orange, VNET1 and VNET2, left and right panel respectively) are strongly associated to carbon export (VNET1: Pearson cor. 0.75, p = 3e−07 and VNET2: 0.91, p = 3e−14). b, The viral PC subnetwork structures correlate to carbon export (VNET1 Pearson cor. = 0.91, p < 1e−200, and VNET2 Pearson cor. = 0.96, p < 1e−200). c, Size of dots is proportional to the VIP score computed for the PLS regression. d, Viral PC subnetworks predict carbon export. PLS regression was used to predict carbon export using abundances of viral protein clusters (PCs) in selected subnetworks. LOOCV was performed and VIP scores computed for each PC. Light orange subnetwork (VNET1, left panel) PCs predict carbon export with a R2 of 0.55. Dark orange subnetwork (VNET2, right panel) PCs predict carbon export with a R2 of 0.89. WGCNA and PLS regression analyses for the full Eukaryotic dataset. a, A single eukaryotic subnetwork (n=58, is strongly associated to carbon export (Pearson cor. 0.79, p = 3e−14). b, The eukaryotic subnetwork structure correlates to carbon export (Pearson cor. = 0.94, p = 4e−27). c, The eukaryotic subnetwork predicts carbon export with a R2 of 0.76. d, Lineages with the highest VIP score (dots size is proportional to the VIP score in the scatter plot) in the PLS are depicted as red dots corresponding to two rhizaria (Collodaria), one copepod (Euchaeta), and three dinophyceae (Noctiluca scintillans, Gonyaulax polygramma and Gonyaulax sp. (clade 4)).

44 in total

1. A global network of coexisting microbes from environmental and whole-genome sequence data.

Authors: Samuel Chaffron; Hubert Rehrauer; Jakob Pernthaler; Christian von Mering
Journal: Genome Res Date: 2010-05-10 Impact factor: 9.043

2. Community genomics among stratified microbial assemblages in the ocean's interior.

Authors: Edward F DeLong; Christina M Preston; Tracy Mincer; Virginia Rich; Steven J Hallam; Niels-Ulrik Frigaard; Asuncion Martinez; Matthew B Sullivan; Robert Edwards; Beltran Rodriguez Brito; Sallie W Chisholm; David M Karl
Journal: Science Date: 2006-01-27 Impact factor: 47.728

Review 3. Marine viruses--major players in the global ecosystem.

Authors: Curtis A Suttle
Journal: Nat Rev Microbiol Date: 2007-10 Impact factor: 60.633

4. Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column.

Authors: Yanmei Shi; Gene W Tyson; Edward F DeLong
Journal: Nature Date: 2009-05-14 Impact factor: 49.962

Review 5. Microbial community structure and its functional implications.

Authors: Jed A Fuhrman
Journal: Nature Date: 2009-05-14 Impact factor: 49.962

Review 6. Microbial ecology of ocean biogeochemistry: a community perspective.

Authors: Suzanne L Strom
Journal: Science Date: 2008-05-23 Impact factor: 47.728

7. Microbial community transcriptional networks are conserved in three domains at ocean basin scales.

Authors: Frank O Aylward; John M Eppley; Jason M Smith; Francisco P Chavez; Christopher A Scholin; Edward F DeLong
Journal: Proc Natl Acad Sci U S A Date: 2015-03-09 Impact factor: 11.205

8. Reconciliation of the carbon budget in the ocean's twilight zone.

Authors: Sarah L C Giering; Richard Sanders; Richard S Lampitt; Thomas R Anderson; Christian Tamburini; Mehdi Boutrif; Mikhail V Zubkov; Chris M Marsay; Stephanie A Henson; Kevin Saw; Kathryn Cook; Daniel J Mayor
Journal: Nature Date: 2014-03-19 Impact factor: 49.962

9. Analysis of the Pseudoalteromonas tunicata genome reveals properties of a surface-associated life style in the marine environment.

Authors: Torsten Thomas; Flavia F Evans; David Schleheck; Anne Mai-Prochnow; Catherine Burke; Anahit Penesyan; Doralyn S Dalisay; Sacha Stelzer-Braid; Neil Saunders; Justin Johnson; Steve Ferriera; Staffan Kjelleberg; Suhelen Egan
Journal: PLoS One Date: 2008-09-24 Impact factor: 3.240

10. Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes.

Authors: Pascal Hingamp; Nigel Grimsley; Silvia G Acinas; Camille Clerissi; Lucie Subirana; Julie Poulain; Isabel Ferrera; Hugo Sarmento; Emilie Villar; Gipsi Lima-Mendez; Karoline Faust; Shinichi Sunagawa; Jean-Michel Claverie; Hervé Moreau; Yves Desdevises; Peer Bork; Jeroen Raes; Colomban de Vargas; Eric Karsenti; Stefanie Kandels-Lewis; Olivier Jaillon; Fabrice Not; Stéphane Pesant; Patrick Wincker; Hiroyuki Ogata
Journal: ISME J Date: 2013-04-11 Impact factor: 10.302

137 in total

1. Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples.

Authors: Lucas Czech; Alexandros Stamatakis
Journal: PLoS One Date: 2019-05-28 Impact factor: 3.240

2. Seasonal and diel patterns of abundance and activity of viruses in the Red Sea.

Authors: Gur Hevroni; José Flores-Uribe; Oded Béjà; Alon Philosof
Journal: Proc Natl Acad Sci U S A Date: 2020-11-10 Impact factor: 11.205

Review 3. The evolution of diatoms and their biogeochemical functions.

Authors: Anne-Sophie Benoiston; Federico M Ibarbalz; Lucie Bittner; Lionel Guidi; Oliver Jahn; Stephanie Dutkiewicz; Chris Bowler
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2017-09-05 Impact factor: 6.237

4. Biogeography and diversity of Collodaria (Radiolaria) in the global ocean.

Authors: Tristan Biard; Estelle Bigeard; Stéphane Audic; Julie Poulain; Andres Gutierrez-Rodriguez; Stéphane Pesant; Lars Stemmann; Fabrice Not
Journal: ISME J Date: 2017-03-24 Impact factor: 10.302

5. Linking patterns of net community production and marine microbial community structure in the western North Atlantic.

Authors: Seaver Wang; Yajuan Lin; Scott Gifford; Rachel Eveleth; Nicolas Cassar
Journal: ISME J Date: 2018-05-01 Impact factor: 10.302

6. Ocean science: The rise of Rhizaria.

Authors: David A Caron
Journal: Nature Date: 2016-04-20 Impact factor: 49.962

7. Environmental microbiology: Pumping carbon to the deep ocean.

Authors: Cláudio Nunes-Alves
Journal: Nat Rev Microbiol Date: 2016-02-22 Impact factor: 60.633

8. The oceans' twilight zone must be studied now, before it is too late.

Authors: Adrian Martin; Philip Boyd; Ken Buesseler; Ivona Cetinic; Hervé Claustre; Sari Giering; Stephanie Henson; Xabier Irigoien; Iris Kriest; Laurent Memery; Carol Robinson; Grace Saba; Richard Sanders; David Siegel; María Villa-Alfageme; Lionel Guidi
Journal: Nature Date: 2020-04 Impact factor: 49.962

9. Depth-Dependent Variables Shape Community Structure and Functionality in the Prince Edward Islands.

Authors: Boitumelo Sandra Phoma; Thulani Peter Makhalanyane
Journal: Microb Ecol Date: 2020-09-15 Impact factor: 4.552

10. Molecular bases of an alternative dual-enzyme system for light color acclimation of marine Synechococcus cyanobacteria.

Authors: Théophile Grébert; Adam A Nguyen; Suman Pokhrel; Kes Lynn Joseph; Morgane Ratin; Louison Dufour; Bo Chen; Allissa M Haney; Jonathan A Karty; Jonathan C Trinidad; Laurence Garczarek; Wendy M Schluchter; David M Kehoe; Frédéric Partensky
Journal: Proc Natl Acad Sci U S A Date: 2021-03-02 Impact factor: 11.205