| Literature DB >> 29358710 |
Yoann Seeleuthner1,2,3, Samuel Mondy1,2,3, Vincent Lombard4,5,6, Quentin Carradec1,2,3, Eric Pelletier1,2,3, Marc Wessner1,2,3, Jade Leconte1,2,3, Jean-François Mangot7, Julie Poulain1, Karine Labadie1, Ramiro Logares7, Shinichi Sunagawa8,9, Véronique de Berardinis1,2,3, Marcel Salanoubat1,2,3, Céline Dimier10,11,12, Stefanie Kandels-Lewis8,13, Marc Picheral14, Sarah Searson15, Stephane Pesant16,17, Nicole Poulton18, Ramunas Stepanauskas18, Peer Bork8, Chris Bowler12, Pascal Hingamp19, Matthew B Sullivan20, Daniele Iudicone21, Ramon Massana7, Jean-Marc Aury1, Bernard Henrissat4,5,6,22, Eric Karsenti12,15,16, Olivier Jaillon1,2,3, Mike Sieracki23, Colomban de Vargas24,25, Patrick Wincker26,27,28.
Abstract
Single-celled eukaryotes (protists) are critical players in global biogeochemical cycling of nutrients and energy in the oceans. While their roles as primary producers and grazers are well appreciated, other aspects of their life histories remain obscure due to challenges in culturing and sequencing their natural diversity. Here, we exploit single-cell genomics and metagenomics data from the circumglobal Tara Oceans expedition to analyze the genome content and apparent oceanic distribution of seven prevalent lineages of uncultured heterotrophic stramenopiles. Based on the available data, each sequenced genome or genotype appears to have a specific oceanic distribution, principally correlated with water temperature and depth. The genome content provides hypotheses for specialization in terms of cell motility, food spectra, and trophic stages, including the potential impact on their lifestyles of horizontal gene transfer from prokaryotes. Our results support the idea that prominent heterotrophic marine protists perform diverse functions in ocean ecology.Entities:
Year: 2018 PMID: 29358710 PMCID: PMC5778133 DOI: 10.1038/s41467-017-02235-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
SAGs assembly and annotation summary
| Name | Number of cells | Raw assembly size (Mbp) | Cross SAG sequences (Mbp) | Outlier sequences (Mbp) | Final assembly size (Mbp) | N50 | BUSCO v2 complete genes (%) | Number of predicted genes |
|---|---|---|---|---|---|---|---|---|
| Chrysophyte H1 | 8 | 16.7 | 0.1 | 0.6 | 15.9 | 25,581 | 57 | 3050 |
| Chrysophyte H2 | 3 | 14.3 | 1.1 | 0.3 | 10.6 | 10,194 | 27 | 1637 |
| MAST-3A | 4 | 20.0 | 0 | 1.0 | 18.9 | 6223 | 53 | 3289 |
| MAST-3F | 2 | 21.5 | 0 | 0.3 | 21.1 | 7132 | 37 | 2694 |
| MAST-4A1 | 6 | 33.4 | 0 | 1.0 | 31.8 | 10,950 | 59 | 8018 |
| MAST-4A2 | 4 | 37.1 | 3.0 | 1.1 | 32.8 | 11,577 | 64 | 8537 |
| MAST-4C | 4 | 31.2 | 0 | 0.9 | 30.0 | 8097 | 54 | 5478 |
| MAST-4E | 9 | 30.3 | 0.2 | 1.4 | 28.4 | 9788 | 61 | 4652 |
SAG single amplified genome, N50 length of the shortest scaffold from the minimal set of scaffolds representing 50% of the assembly size, BUSCO v2 number of complete genes found using the BUSCO program (Benchmarking Universal Single-Copy Orthologs)
Fig. 1Marine heterotrophic SAG lineages form a functional group distinct from autotrophs and other heterotrophs. a Non-metric multidimensional scaling (NMDS) projection of a Bray–Curtis distance matrix that shows Pfam motif occurrences in various stramenopile genomes. Because the genome sequences are incomplete, a rarefying procedure was applied to obtain 1400 Pfam motifs per genome. Ten independent rarefied samples were obtained and used for NMDS. Ellipses (at 95% confidence limit) were drawn by using the ‘ordiellipse’ function of the vegan package in R, with the group defined by life history mode (indicated by number in top right). Letters indicate the positions of the mean coordinates of the 10 rarefied Pfam counts per organism. The analysis was conducted on 19 stramenopile genomes, which included MAST-4D[8]. Marine heterotrophic stramenopiles from this study form a large but coherent group (Group 3), which is distinct from autotrophic species and heterotrophic species from other environments. b Phylogenetic tree from the analysis of a total of 160 conserved eukaryotic proteins using maximum likelihood. Protein sequences of Incisomonas marina from ref.[44] are included. Indices indicate life history mode as in panel a. Bootstrap values are represented on internal nodes. The branch length represents the mean number of substitutions per site
Fig. 2SAG lineage glycoside hydrolases (GHs). GH families are numbered (right) according to the CAZyme database. Potential substrates are indicated on the left side. Internal numbers represent the number of genes in each genome predicted to belong to the GH category. Colors indicate the number of predicted GH genes per family, from low (red) to high (green)
Fig. 3Biogeographic distribution of the SAG lineages based on metagenome read recruitment with separation between deep chlorophyll maximum and subsurface. Global maps showing the presence of the SAG lineages based on metagenomics read mapping at each Tara Oceans station either as a black dot (no signal detected) or as a circle whose diameter indicates the species relative abundance. Abundance in samples from deep chlorophyll maximum (DCM, left panel) often differs from surface samples (SRF, right panel): only MAST-4A shows the same pattern in DCM and SRF samples (a). The color inside each circle provides the median percentage similarity of the reads to the reference. The station from where the SAG originates is indicated by its number. a MAST-4A; b MAST-4C; c MAST-4D; d MAST-4E; e Chrysophyte H1; f Chrysophyte H2; g MAST-3A; and h MAST-3F
Fig. 4Water temperature and distribution of the heterotrophic protists. a x-axis represents the lineage composite genomes, and y-axis represents surface temperatures in degrees Celsius at each sampling location. Relative abundances are represented by circle size (one per station/depth where the genome was detected). The scale for each column is indicated below the name of the lineage. b MAST-3A abundance distribution relative to temperature. A difference in the distributions is observed with a p-value <2×10−2 (Wilcoxon test). c MAST-4E abundance distribution relative to temperature. Means are statistically different with a p-value <3×10−4 (Wilcoxon test). In b and c, line type indicates median sequence similarity to the reference genome assembly