| Literature DB >> 28763055 |
Adriana Alberti1, Julie Poulain1, Stefan Engelen1, Karine Labadie1, Sarah Romac2,3, Isabel Ferrera4, Guillaume Albini1, Jean-Marc Aury1, Caroline Belser1, Alexis Bertrand1, Corinne Cruaud1, Corinne Da Silva1, Carole Dossat1, Frédérick Gavory1, Shahinaz Gas1, Julie Guy1, Maud Haquelle1, E'krame Jacoby1, Olivier Jaillon1,5,6, Arnaud Lemainque1, Eric Pelletier1,5,6, Gaëlle Samson1, Mark Wessner1, Silvia G Acinas4, Marta Royo-Llonch4, Francisco M Cornejo-Castillo4, Ramiro Logares4, Beatriz Fernández-Gómez4,7,8, Chris Bowler9, Guy Cochrane10, Clara Amid10, Petra Ten Hoopen10, Colomban De Vargas2,3, Nigel Grimsley11,12, Elodie Desgranges11,12, Stefanie Kandels-Lewis13,14, Hiroyuki Ogata15, Nicole Poulton16, Michael E Sieracki16,17, Ramunas Stepanauskas16, Matthew B Sullivan18,19, Jennifer R Brum19, Melissa B Duhaime20, Bonnie T Poulos21, Bonnie L Hurwitz22, Stéphane Pesant23,24, Eric Karsenti9,13,25, Patrick Wincker1,5,6.
Abstract
A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28763055 PMCID: PMC5538240 DOI: 10.1038/sdata.2017.93
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Overview of -omics analysis strategy applied on Tara Oceans samples.
Summary of libraries generated from Tara Oceans DNA and RNA samples and sequencing experiments performed on each type of library.
| Number of libraries with available readsets in public databases at the date of publication of the paper. | |||||||
|---|---|---|---|---|---|---|---|
| <0.2 μM | Viruses | Metagenomics | M. Sullivan lab (University of Arizona, AZ, US) | CEA, Genoscope, France | Virus_DNA_ext (2.4) | MetaG_virus (4.2) | |
| 0.2–1.6, 0.1–0.2, 0.45–0.8, 0.2–0.45 | Giruses | Metagenomics | N. Grimsley lab (CNRS, Banyuls-sur -Mer, France) | CEA, Genoscope, France | Girus_DNA_ext (2.5) | MetaG (4.1) | |
| 0.2–1.6, 0.2–3 | Viruses, Giruses, Prokaryotes, small Eucaryotes | 16S metabarcoding | S.G. Acinas lab (ICM-CSIC, Barcelona, Spain) | CEA, Genoscope, France | Acinas_Prok_DNA_ext (2.2) | 16S_PCR (3.2) | MetaBar_16S (4.6) |
| Metagenomics | Acinas_Prok_DNA_ext (2.2) | MetaG (4.1) | |||||
| Metatranscriptomics by random priming | Acinas_Prok_RNA_ext Genoscope_Prok_RNA_ext (2.2) | RiboZero_SMART_strand (4.4) | |||||
| 0.8-inf, 3-inf, 0.8–5 (0.8–3), 5–20 (3–20), 20–180, 180–2,000 | Protists and metazoa | 18S metabarcoding | C. De Vargas lab (CNRS/UPMC, Roscoff, France) | CEA, Genoscope, France | 18S_PCR (3.1) | MetaBar_18S (4.5) | |
| 16S metabarcoding | 16S_PCR (3.2) | MetaBar_16S (4.6) | |||||
| Metagenomics | P. Wincker lab (CEA, Genoscope, France) | Euk_ DNA_RNA_ext (2.1) | MetaG (4.1) | ||||
| Metatranscriptomics on poly(A)+ RNA | Euk_ DNA_RNA_ext (2.1) | TS_RNA (4.4) TS_strand (4.4) SMART_dT (4.4) | |||||
| Samples for SAGs | Protists | N. Poulton lab (Bigelow lab, ME, US) | CEA, Genoscope, France | SAGs_amplif (2.6) | MetaG_SAGs (4.3) |
Summary of libraries generated from Tara Oceans DNA and RNA samples and sequencing experiments performed on each type of library.
| Metagenomics from size fractionated filters (Section 4.1) | 180 | HS2000 | 101 | 855 | 160 |
| Metagenomics from viral samples (Section 4.2) | 150–900 | HS2000 | 101 | 90 | 50 |
| SAGs (Section 4.3) | 150–900 | HS2000 | 101 | 49 | 20 |
| Metatranscriptomic libraries (Section 4.4) | 100–600 | HS2000 | 101 | 467 | 160 |
| 18S metabarcode libraries (Section 4.5) | 160 | GAIIx | 151 | 884 | 1.5 |
| 16S metabarcode libraries (Section 4.6) | 400 | HiSeq2500 | 251 | In progress | ND |
*Number of libraries with available readsets in public databases at the date of publication of the paper.
Figure 2Data processing flowchart.
Figure 3Overview of experimental pipeline from nucleic acids to sequences.
Red crosses highlight QC steps where experiments can be stopped.
Figure 4Agilent Bioanalyzer profiles of amplified libraries.
(a) Shows an example of electropherogram obtained following the metagenomic library preparation protocol described in paragraph 4.1. The size of this kind of library is very tight due to the size selection step for generation of overlapping paired end reads. (b) Shows an example of metatranscriptomic library generated following the TS_RNA protocol.
Figure 5Representative examples of tabulated data reports generated by the LIMS for multiple datasets.
(a) Shows an example of sequencing report for metagenomics libraries. Metrics particularly useful for evaluating the quality of this type of data can be visualized, as the % of merged reads, the median size length and the estimated insert size. (b) Shows an example of report for metatranscriptomic libraries from poly(A)+ RNA. Quality control of these libraries focuses on duplication rate and potential contamination by bacteria and fungi, whose % are easily visualized on the report.
Figure 6Representative examples of key data reports generated by the LIMS for individual datasets.
(a) Quality score box plot of 100-bp Illumina reads. This plot summarizes the average quality per position over all reads; it shows the box-plot per position in the read and the average smoothed line in black. (b) Nucleotide distribution chart per read position: at left, before adapters and low quality reads trimming; at right, after the trimming process. On the left plot, a non-random distribution in the first 12 bases is typical of metatranscriptomic libraires generated with SMART-dT protocol, which leaves SMARTer adapter sequencing at the beginning of the cDNA insert. (c) Graphical representation of known overrepresented sequences (primers and adapters used for library preparation) before (left panel) and after (right panel) adapter sequences trimming. Again, the overrepresentation of SMARTer adapter is easily visualised on the left panel (red bar) and it disappears after the trimming process (right panel). (d) Report of taxonomic assignation by organism (left), by division (middle) and by keyword (right). Bacteria and fungi %<5% are highlighted in green to facilitate manual validation of the dataset. (e) Report of rRNA sequences detection and trimming with detail of % of different rRNA species. (f) Krona chart of the same taxonomic assignment reported in (d). (g) Distribution of the length of the reads obtained after merging of paired reads generated by sequencing of a metagenomic library.