| Literature DB >> 26582919 |
Alex Mitchell1, Francois Bucchini1, Guy Cochrane1, Hubert Denise1, Petra ten Hoopen1, Matthew Fraser1, Sebastien Pesseat1, Simon Potter1, Maxim Scheremetjew1, Peter Sterk1, Robert D Finn2.
Abstract
EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans.Entities:
Mesh:
Year: 2015 PMID: 26582919 PMCID: PMC4702853 DOI: 10.1093/nar/gkv1195
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of the analysis pipeline. Processes/components are indicated as circles and inputs/outputs are represented by rectangles. The structure of pipeline v2.0 is similar to that of v1.0. Following input file preparation and a QC stage (to remove short/low quality reads), the pipeline branches into two parts: one performing taxonomic classification (based on 16S rRNA) and the other providing functional annotation (based on pCDS matches to a subset of the InterPro databases). A full description of the steps, tools and reference libraries used is provided on the EMG website at https://www.ebi.ac.uk/metagenomics/pipelines/2.0.
Updated tools and algorithms used in analysis pipeline version 2.0
| Component | Previous version | New version | Function |
|---|---|---|---|
| QIIME/GreenGenes | 1.50/12.10 | 1.90/13.8 | 16S taxonomic classification |
| rRNASelector | 1.0.0 | 1.0.1 | Identification of rRNA fragments |
| InterPro/InterProScan | 31.0/5-beta | 50.0/5.9 | Functional annotation |
Figure 2.EMG analysis pipeline throughput for the Tara Oceans project, based on analyses completed each month. With relatively static compute resources available, the upward trend is a result of pipeline improvements. The highest value gives an indication of our current expected peak processing capacity.
Figure 3.Biome icons and search-by-biome functionality. (A) Biomes for projects are indicated by icons on the EMG website. The numbers under the icons represent the number of projects belonging to each biome. (B) The biome filter allows users to select a biome of interest (for example marine) and returns matching projects.
Figure 4.Examples of different data visualizations available via the online comparison tool. The high level GO terms assigned to runs within a project can be compared to each other via bar charts, stacked columns, PCA plots and heatmaps.
Figure 5.Analysis summary files at the project level are available to download in TSV format. In this example, pCDS matching InterPro entries (rows) for all runs in a project (columns) are provided.