| Literature DB >> 28128359 |
Jean-François Mangot1, Ramiro Logares1, Pablo Sánchez1, Fran Latorre1, Yoann Seeleuthner2,3,4, Samuel Mondy2,3,4, Michael E Sieracki5,6, Olivier Jaillon2,3,4, Patrick Wincker2,3,4, Colomban de Vargas7,8, Ramon Massana1.
Abstract
Pico-sized eukaryotes play key roles in the functioning of marine ecosystems, but we still have a limited knowledge on their ecology and evolution. The MAST-4 lineage is of particular interest, since it is widespread in surface oceans, presents ecotypic differentiation and has defied culturing efforts so far. Single cell genomics (SCG) are promising tools to retrieve genomic information from these uncultured organisms. However, SCG are based on whole genome amplification, which normally introduces amplification biases that limit the amount of genomic data retrieved from a single cell. Here, we increase the recovery of genomic information from two MAST-4 lineages by co-assembling short reads from multiple Single Amplified Genomes (SAGs) belonging to evolutionary closely related cells. We found that complementary genomic information is retrieved from different SAGs, generating co-assembly that features >74% of genome recovery, against about 20% when assembled individually. Even though this approach is not aimed at generating high-quality draft genomes, it allows accessing to the genomic information of microbes that would otherwise remain unreachable. Since most of the picoeukaryotes still remain uncultured, our work serves as a proof-of-concept that can be applied to other taxa in order to extract genomic data and address new ecological and evolutionary questions.Entities:
Mesh:
Year: 2017 PMID: 28128359 PMCID: PMC5269757 DOI: 10.1038/srep41498
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1General characteristics of the draft genomes obtained by individual SAGs.
Box plots capture the variation in assembly size (a), number of contigs (b), N50 (c) and GC content (d) among MAST-4A (n = 14) and MAST-4E (n = 9) SAGs.
Figure 2Genome recovery estimated by CEGMA of SAGs in relation to the sequencing effort.
(a) Genome recovery of the 23 SAGs in relation to their sequencing depth. (b) Genome recovery at different sequencing depths in two selected SAGs (those with the largest genome in each clade). Each point represents the mean recovery after 5 separate subsamplings (at 17%, 33%, 50%, 67%, and 83%) of the total number of reads.
Figure 3Comparison of tetranucleotide frequencies of SAGs in an ESOM map.
Each contig (2.5–5 kbp in size) is represented by a point placed in the map by relatedness and colored according to their provenance from SAGs of MAST-4A (bluish) or MAST-4E (reddish). Note that the map is continuous from top to bottom and side to side. Large differences in tetranucleotide frequencies (black borders) represent natural divisions between taxonomic groups. Two clusters (a and b) were identified and taxonomically assigned (see text).
MAST-4A and MAST-4E assembly properties in comparison to complete published genomes of other small phototrophic and heterotrophic protists.
| Raw assembly size (Mbp) | CEGMA completeness (%) | Number of genes | Mean gene size (bp) | Mean intron density (introns per gene) | Mean intron length (bp) | Number of KOs or GOs | Number of KOG | Reference | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Stramenopiles | Mar Stram | 48.1 | 74.2 | 19,909 | 1,657 | 0.56 | 260 | 2,733 | 2,115 | This study | |
| 32.3 | 68.2 | 11,850 | 1,723 | 0.36 | 332 | 2,210 | 1,878 | This study | |||
| Bacil | 34.5 | 92.7 | 11,242 | 992 | 1.4 | 5,473 | 8,113 | ||||
| Oomyc | 95 | 96.0 | 19,027 | — | — | — | 8,714 | 3,891 | |||
| 65 | 95.2 | 15,743 | — | — | — | 7,633 | 3,830 | ||||
| Opist | Choan | 42 | 92.7 | 9,196 | 3,004 | 6.6 | 174 | 1,843 | 3,389 | ||
| Chlorophyta | Mamiell | 21.9 | 83.5 | 10,575 | 1,557 | 0.9 | 187 | 4,787 | 7,086 | ||
| 20.9 | 87.1 | 10,056 | 1,587 | 0.57 | 163 | 4,911 | 6,554 | ||||
| 15 | 87.5 | 7,847 | — | — | — | 3,597 | — | ||||
| 12.5 | 80.6 | 8,116 | 1,257 | 0.39 | 187 | 3,603 | 5,320 | ||||
| Treb | 46.2 | 77.8 | 9,791 | 2,928 | — | 209 | 5,372 | 7,938 | |||
| Chlor | 121 | 77.8 | 15,143 | 4,312 | 0.92 | 373 | 6,733 | 9,435 |
Mar Stram, Marine Stramenopiles. Bacil, Bacillariophyceae. Oomyc, Oomycetes. Opist, Opistokhonta. Choan, Choanoflagellates. Mamiell, Mamiellophyceae. Treb, Trebouxiophyceae. Chlor, Chlorophyceae.
Assembly features of MAST-4A and MAST-4E have been calculated on contigs longer than 1 kb.
Assembly features of published genomes were retrieved from their respective publications or, when missed, from the JGI genome portal (http://genome.jgi.doe.gov). Additionally, their CEGMA completeness (contigs > 1 kb) were also calculated here.
Missing data are shown by the symbol (—).
*KOs, KEGG Orthology. GOs, Gene Ontology.
†KOGs, Eukaryotic Orthologous Groups.
Figure 4Fractions of the co-assembled genomes of MAST-4A and MAST-4E shared among their respective SAGs (from 1 to 14 cells).
The contribution of each SAG was determined through a fragment recruitment analysis of their reads towards the final co-assembly.
Figure 5Cumulative genome size (a) and genome recovery (b) calculated when increasing the number of SAGs used for co-assembly.
Figure 6Identification of the 34 CEGs coding for proteins involved in translation, ribosomal structure and biogenesis processes within SAGs and co-assemblies of both lineages.
The presence of CEGs among SAGs and co-assembly (light grey) or solely among SAGs (dark grey) or co-assembly (black) are listed here.
Summary of the 248 CEGMA eukaryotic core genes (CEGs) determined in SAGs and co-assemblies of both MAST lineages.
| Lineage | Number of CEGs detected | ||||||
|---|---|---|---|---|---|---|---|
| In SAGs and Co-assembly | |||||||
| ≧ | < | Total | Solely in SAGs | Solely in Co-assembly | |||
| MAST-4A | 166 | 33 | 18 | ||||
| MAST-4E | 163 | 1 | 6 | ||||
*Mean amino acid sequence identity of CEGs found in several SAGs.
☥NA: Not applicable, since these CEGs are found in only one SAG.
Figure 7Alignment of the ITS1 (a) and ITS2 (b) regions of individual SAGs and the co-assembly in MAST-4A.
Conserved nucleotides in the helices II and III of the two regions were highlighted according to ITS secondary structure models in MAST-4. Differences against a consensus sequence (not shown) are colored as red (A positions), green (T), blue (C), and yellow (G).