Literature DB >> 28830114

Automated assembly of species metabolomes through data submission into a public repository.

Reza M Salek1, Pablo Conesa1, Keeva Cochrane1, Kenneth Haug1, Mark Williams1, Namrata Kale1, Pablo Moreno1, Kalai Vanii Jayaseelan1, Jose Ramon Macias1, Venkata Chandrasekhar Nainala1, Robert D Hall2, Laura K Reed3, Mark R Viant4, Claire O'Donovan1, Christoph Steinbeck1,5.   

Abstract

Following similar global efforts to exchange genomic and other biomedical data, global databases in metabolomics have now been established. MetaboLights, the first general purpose, publically available, cross-species, cross-application database in metabolomics, has become the fastest growing data repository at the European Bioinformatics Institute in terms of data volume. Here we present the automated assembly of species metabolomes in MetaboLights, a crucial reference for chemical biology, which is growing through user submissions.
© The Authors 2017. Published by Oxford University Press.

Entities:  

Keywords:  curation; databases; metabolomics; species metabolomes

Mesh:

Year:  2017        PMID: 28830114      PMCID: PMC5737527          DOI: 10.1093/gigascience/gix062

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Background

Following data standardization efforts in the 1990s and the success of global efforts to exchange genomic [1, 2], proteomic [3], gene expression [4], and other biomedical data, we have now witnessed the emergence of global databases in metabolomics. In 2012, the European Bioinformatics Institute launched MetaboLights (RRID:SCR_014663) [5, 6], the first general purpose, cross-species, cross-application database in metabolomics, aiming at a similar growth in this remaining large pillar of ‘omics sciences [7]. Within the first 2 years of its inception, MetaboLights became the fastest growing data repository at the European Bioinformatics Institute (EMBL-EBI) in terms of data volume. Here we present the automated assembly and growth of species metabolomes in the MetaboLights reference layer, which is largely driven by user submissions. Journals already demand or recommend the deposition of metabolomics studies in MetaboLights. These include Nature, EMBO, PLoS, BioMed Central, Frontiers, Metabolomics, and MDPI Metabolites. To the best of our knowledge, MetaboLights is the only global, general purpose repository that systematically requires the submission of a metabolites assignment, a requirement fundamental for the process described here.

Findings

A fundamental, unsolved problem in Metabolomics is the availability of exhaustive model organism metabolomes. The newly formed Model Organism Metabolomes task group of the International Metabolomics Society has issued a call to arms to identify and map all metabolites onto metabolic pathways and to relate these pathways across multiple species within the context of evolutionary metabolomics (or phylometabolomics) [8]. The scale of this endeavour means that the group has prioritized the deep investigation of established model organism metabolomes in microbial, plant, and animal biology, promising an avalanche of new metabolic data. Exponential growth is observed in biological databases, and MetaboLights is no exception (Fig. 1).
Figure 1:

Growth in data repositories at the EMBL-EBI. The graph shows the data volume in each of the repositories over time on a logarithmic scale. Shown are repositories for controlled access human data, raw sequencing data, microarray, proteomics, and metabolomics data. Archives were started at different moments in history. Metabolomics shows the steepest growth of all repositories at the EMBL-EBI.

Growth in data repositories at the EMBL-EBI. The graph shows the data volume in each of the repositories over time on a logarithmic scale. Shown are repositories for controlled access human data, raw sequencing data, microarray, proteomics, and metabolomics data. Archives were started at different moments in history. Metabolomics shows the steepest growth of all repositories at the EMBL-EBI. Metabolomics datasets submitted to MetaboLights contain lists of metabolites that have been identified in those respective studies for a given species in a given biological context. This steady stream of assigned metabolites, together with species and organism part information, is leading to an evidence-based assembly of metabolomes for species, with more complete annotations for the model organisms under investigation worldwide. We believe that this submission-driven assembly, backed by automated and manual quality control, is the only sustainable model for large-scale species metabolome assembly and that it will lead to an indispensable knowledge base for chemical biology research. This common framework is also essential to provide the crystallization point to initiate cross-species to cross-division metabolic analysis of commonality and uniqueness. Studies in MetaboLights are created by researchers in ISA-Tab format, by either automatically creating datasets from inhouse laboratory information management systems (rare) or manually creating ISA-Tab archives with the help of the ISA tools suite (common). Naturally, the species coverage of studies follows the preferences for model species around the globe (Fig. 2).
Figure 2:

Bar chart distribution of the number of studies in MetaboLights by species. The distribution reflects the most used model species in biological and biomedical research.

Bar chart distribution of the number of studies in MetaboLights by species. The distribution reflects the most used model species in biological and biomedical research. The key to this process is the application of online ontologies from BioPortal, combined with local controlled vocabularies to ensure correct terms are used to describe biological samples and experimental factors. Assignment of identified metabolites is done in Metabolite Identification Files, a bespoke extension to the ISA suite. When the submitter has completed the annotations and the study satisfies all mandatory validations, the study is flagged as ready for curation [9]. At this stage, the curation team makes any required changes, and the study is ready for review. Journal reviewers are then given a unique URL to access the complete study. When the journal review process is complete, the study can be made publicly accessible. The traditional model of curated chemical databases scales linearly, both with time and the number of curators involved in the database assembly. In contrast, the MetaboLights reference layer grows via a 2-tiered approach for assembling metabolome information. We collect historical information about metabolites found in species from the primary literature and link it with the experimental annotations submitted to MetaboLights, thus generating many of the data points for common and rarer species. In a similar manner as before, this manually curated data process grows linearly with the number of curators working on it. The second source for metabolome information is submissions from the community, triggered by their commitment to provide open access data, and/or the requirement from funders and publishers to deposit data in an open and accessible manner. The sustainability and efficiency of this second tier is the key argument of this article. Figure 3 shows the current distribution of metabolites per species in the MetaboLights reference layer. Sorted by frequency, this shows a typical long-tail distribution; a few model species are well covered, while only a few metabolites are available for most species. These data include both metabolites reported in studies and those manually curated by MetaboLights and Chemical Entities of Biological Interest (ChEBI) from the literature.
Figure 3:

Long-tail distribution of metabolites per species in the MetaboLights reference layer. A few model species are covered very well, while for the majority of more than 1600 species, only a few metabolites were reported. These data cover both metabolites reported in studies and those manually added from the literature by MetaboLights and ChEBI curators. a) Truncated version with the 30 most annotated species. b) Full graph.

Long-tail distribution of metabolites per species in the MetaboLights reference layer. A few model species are covered very well, while for the majority of more than 1600 species, only a few metabolites were reported. These data cover both metabolites reported in studies and those manually added from the literature by MetaboLights and ChEBI curators. a) Truncated version with the 30 most annotated species. b) Full graph.

Conclusion

We have established a model in which information about metabolites in species metabolomes grows dynamically through submissions to public archives such as the MetaboLights database. For the first time, this will automatically provide both the information about which metabolites are found in which species and the supporting evidence—the primary spectroscopic data and supporting meta-data—in a community-driven way. In turn, this will provide up-to-date knowledge bases for fields such as chemical biology, metabolomics, and biomedicine.

Availability of data and materials

Data underlying the analysis presented here are available without restrictions in the MetaboLights database (RRID:SCR_014663) [5].

Abbreviations

ChEBI: Chemical Entities of Biological Interest; EMBL-EBI: European Molecular Biology Laboratory-European Bioinformatics Institute.

Competing interests

None of the authors has any competing interests.

Funding

The development of MetaboLights was funded by the Biotechnology and Biological Sciences Research Council (BBSRC; http://dx.doi.org/10.13039/501100000268, grant numbers BB/I000933/1 and BB/L024152/1).

Author contributions

R.M.S., P.C., K.C., K.H., M.W., N.K., P.M., K.J., J.R.M., and V.C.N. developed and curated the MetaboLights database. R.D.H., L.K., M.R.V., C.O.D., and C.S. conceived this study and performed the analysis. All authors have read and approved the manuscript. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  8 in total

1.  Content discovery and retrieval services at the European Nucleotide Archive.

Authors:  Nicole Silvester; Blaise Alako; Clara Amid; Ana Cerdeño-Tárraga; Iain Cleland; Richard Gibson; Neil Goodgame; Petra Ten Hoopen; Simon Kay; Rasko Leinonen; Weizhong Li; Xin Liu; Rodrigo Lopez; Nima Pakseresht; Swapna Pallreddy; Sheila Plaister; Rajesh Radhakrishnan; Marc Rossello; Alexander Senf; Dmitriy Smirnov; Ana Luisa Toribio; Daniel Vaughan; Vadim Zalunin; Guy Cochrane
Journal:  Nucleic Acids Res       Date:  2014-11-17       Impact factor: 16.971

2.  ArrayExpress update--simplifying data submissions.

Authors:  Nikolay Kolesnikov; Emma Hastings; Maria Keays; Olga Melnichuk; Y Amy Tang; Eleanor Williams; Miroslaw Dylag; Natalja Kurbatova; Marco Brandizi; Tony Burdett; Karyn Megy; Ekaterina Pilicheva; Gabriella Rustici; Andrew Tikhonov; Helen Parkinson; Robert Petryszak; Ugis Sarkans; Alvis Brazma
Journal:  Nucleic Acids Res       Date:  2014-10-31       Impact factor: 16.971

3.  The Time Is Right to Focus on Model Organism Metabolomes.

Authors:  Arthur S Edison; Robert D Hall; Christophe Junot; Peter D Karp; Irwin J Kurland; Robert Mistrik; Laura K Reed; Kazuki Saito; Reza M Salek; Christoph Steinbeck; Lloyd W Sumner; Mark R Viant
Journal:  Metabolites       Date:  2016-02-15

4.  MetaboLights--an open-access general-purpose repository for metabolomics studies and associated meta-data.

Authors:  Kenneth Haug; Reza M Salek; Pablo Conesa; Janna Hastings; Paula de Matos; Mark Rijnbeek; Tejasvi Mahendraker; Mark Williams; Steffen Neumann; Philippe Rocca-Serra; Eamonn Maguire; Alejandra González-Beltrán; Susanna-Assunta Sansone; Julian L Griffin; Christoph Steinbeck
Journal:  Nucleic Acids Res       Date:  2012-10-29       Impact factor: 16.971

5.  The MetaboLights repository: curation challenges in metabolomics.

Authors:  Reza M Salek; Kenneth Haug; Pablo Conesa; Janna Hastings; Mark Williams; Tejasvi Mahendraker; Eamonn Maguire; Alejandra N González-Beltrán; Philippe Rocca-Serra; Susanna-Assunta Sansone; Christoph Steinbeck
Journal:  Database (Oxford)       Date:  2013-04-29       Impact factor: 3.451

6.  MetaboLights: towards a new COSMOS of metabolomics data management.

Authors:  Christoph Steinbeck; Pablo Conesa; Kenneth Haug; Tejasvi Mahendraker; Mark Williams; Eamonn Maguire; Philippe Rocca-Serra; Susanna-Assunta Sansone; Reza M Salek; Julian L Griffin
Journal:  Metabolomics       Date:  2012-09-25       Impact factor: 4.290

7.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

8.  ProteomeXchange provides globally coordinated proteomics data submission and dissemination.

Authors:  Juan A Vizcaíno; Eric W Deutsch; Rui Wang; Attila Csordas; Florian Reisinger; Daniel Ríos; José A Dianes; Zhi Sun; Terry Farrah; Nuno Bandeira; Pierre-Alain Binz; Ioannis Xenarios; Martin Eisenacher; Gerhard Mayer; Laurent Gatto; Alex Campos; Robert J Chalkley; Hans-Joachim Kraus; Juan Pablo Albar; Salvador Martinez-Bartolomé; Rolf Apweiler; Gilbert S Omenn; Lennart Martens; Andrew R Jones; Henning Hermjakob
Journal:  Nat Biotechnol       Date:  2014-03       Impact factor: 54.908

  8 in total
  4 in total

Review 1.  Are microbiome studies ready for hypothesis-driven research?

Authors:  Anupriya Tripathi; Clarisse Marotz; Antonio Gonzalez; Yoshiki Vázquez-Baeza; Se Jin Song; Amina Bouslimani; Daniel McDonald; Qiyun Zhu; Jon G Sanders; Larry Smarr; Pieter C Dorrestein; Rob Knight
Journal:  Curr Opin Microbiol       Date:  2018-07-27       Impact factor: 7.934

2.  ESI-LC-MS based-metabolomics data of mangosteen (Garcinia mangostana Linn.) fruit pericarp, aril and seed at different ripening stages.

Authors:  Siti Farah Mamat; Kamalrul Azlan Azizan; Syarul Nataqain Baharum; Normah Mohd Noor; Wan Mohd Aizat
Journal:  Data Brief       Date:  2018-02-15

Review 3.  Caveat Usor: Assessing Differences between Major Chemistry Databases.

Authors:  Christopher Southan
Journal:  ChemMedChem       Date:  2018-02-23       Impact factor: 3.466

4.  Modeling Meets Metabolomics-The WormJam Consensus Model as Basis for Metabolic Studies in the Model Organism Caenorhabditis elegans.

Authors:  Michael Witting; Janna Hastings; Nicolas Rodriguez; Chintan J Joshi; Jake P N Hattwell; Paul R Ebert; Michel van Weeghel; Arwen W Gao; Michael J O Wakelam; Riekelt H Houtkooper; Abraham Mains; Nicolas Le Novère; Sean Sadykoff; Frank Schroeder; Nathan E Lewis; Horst-Joachim Schirra; Christoph Kaleta; Olivia Casanueva
Journal:  Front Mol Biosci       Date:  2018-11-14
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.