| Literature DB >> 25569221 |
Andreas Wilke1, Jared Bischof1, Travis Harrison1, Tom Brettin2, Mark D'Souza1, Wolfgang Gerlach1, Hunter Matthews1, Tobias Paczian1, Jared Wilkening1, Elizabeth M Glass1, Narayan Desai1, Folker Meyer1.
Abstract
Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.Entities:
Mesh:
Year: 2015 PMID: 25569221 PMCID: PMC4287624 DOI: 10.1371/journal.pcbi.1004008
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Different stages of the MG-RAST automated pipeline.
In the annotation mapping stage, functions and taxonomic units from the M5nr are mapped to the MD5 identifiers found in the similarity search.
Current annotation sources available in MG-RAST via the M5nr mechanism.
| Database | Source | Type | #IDs |
| GenBank | NCBI | protein | 20,977,345 |
| IMG | JGI | protein | 11,306,919 |
| InterPro | EBI | protein | 22,666 |
| KEGG | KEGG | protein | 6,071,792 |
| PATRIC | VBI | protein | 13,612,238 |
| Phantome | Phantome | protein | 67,876 |
| RefSeq | NCBI | protein | 14,875,735 |
| SEED | SEED | protein | 15,822,645 |
| SwissProt | UniProt | protein | 535,248 |
| TrEMBL | UniProt | protein | 20,639,311 |
| COG | eggNOG | functional hierarchy | 7,321 |
| GO | GO | functional hierarchy | 19,849 |
| KO | KEGG | functional hierarchy | 13,584 |
| NOG | eggNOG | functional hierarchy | 37,941 |
| Subsystems | SEED | functional hierarchy | 13,912 |
Top-level resources available through the MG-RAST-API.
| Resource/Object | Description |
| annotation | Taxonomic and functional annotations made by comparison with the M5nr database. |
| compute | Resource to compute PCoA, heatmap, and normalization for a set of input metagenomes. |
| download | Download results of the MG-RAST pipeline. |
| inbox | Upload and listing of data in the staging area prior to pipeline execution. |
| library | Library information for uploaded metagenome provided by the user. |
| matrix | Abundance profiles in BIOM (5) format for a list of metagenomes. |
| M5nr | Access M5 nonredundant protein database used for sequence annotation. |
| metadata | Creation, export, and validation of metadata templates and spreadsheets. |
| metagenome | Container for sample, library, project, and precomputed data for an uploaded metagenomic sequence file. |
| project | Project summary for metagenome provided by user |
| sample | Sample information provided by user |
| search | Search MG-RAST by MG-ID, metadata, function, or taxonomy; or implement a more complex search. |