Literature DB >> 25527831

Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

Franck Giacomoni1, Gildas Le Corguillé1, Misharl Monsoor1, Marion Landi1, Pierre Pericard1, Mélanie Pétéra1, Christophe Duperier1, Marie Tremblay-Franco1, Jean-François Martin1, Daniel Jacob1, Sophie Goulitquer1, Etienne A Thévenot1, Christophe Caron1.   

Abstract

SUMMARY: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation.
AVAILABILITY AND IMPLEMENTATION: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). CONTACT: contact@workflow4metabolomics.org.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 25527831      PMCID: PMC4410648          DOI: 10.1093/bioinformatics/btu813

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Metabolomics, the high throughput analysis of small molecules in biological samples, heavily depends on data pre-processing, statistical analysis and chemical and biological annotation, which are complex, transdisciplinary processes involving both computation and interpretation (Holmes ). Since analytical technologies and protocols evolve rapidly, computational metabolomics is a field of intensive methodological research, resulting in a large volume of proposed algorithms written in various languages, making their evaluation by the bioinformatics community (including reviewers) and their chaining within ad hoc workflows by experimenters difficult (Smith ). A few user-oriented online platforms for metabolomics data pre-processing and analysis have been described recently, including MeltDB (Neuweger ), MetaboAnalyst (Xia ) and XCMS Online (Tautenhahn ). There is, however, an unmet need for an open source and open development infrastructure which would enable developers to readily integrate new modules, compare their performances on reference datasets or download and modify the existing ones for their own research. We have thus developed a collaborative online research resource, Workflow4Metabolomics (W4M), for comprehensive metabolomics data pre-processing, statistical analysis and interpretation. W4M is a fully open-source virtual research environment (VRE; Carusi and Reimer, 2010) built upon the Galaxy environment (Goecks ) for bioinformatics developers and metabolomics users, with minimal wrapping burden of algorithms into modules, in addition to user-friendly functionalities for workflow management. Moreover, W4M includes unique computational modules for data normalization (signal drift and batch-effect correction), multivariate analysis (orthogonal partial least-squares) and annotation (via multiple databases query).

2 Features

2.1 Framework

The VRE integrates several digital resources over many layers (hardware, software, user interfaces, documentation, tools and workflows), and is based on a High Performance Computing environment (600 cores, 100 TB). The light-weight runner technology has been added to enhance interoperability and integration of components from heterogeneous environments (Linux, Windows, etc.). Multiple workflows can be run in parallel and users can rapidly analyse large datasets: for example, the full pre-processing, statistical analysis and annotation of a cohort dataset (184 raw files, 11 Go) from liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS) was performed in 5 h, using 1% of the computational resources. Modules include wrappers from existing open-source code, and innovative tools developed with standard languages (e.g. R, Perl, Python or Java). Each tool has a web page including working examples. In addition, a help desk is provided for both users and developers. An ‘app-store’ based on the Galaxy native toolshed, ToolShed4Metabolomics, fosters the local deployment of modules, facilitates the management of developer contributions, provides wrapper templates and promotes best practice guidelines. Finally, a full W4M portable virtual machine is distributed for local installation (e.g. for development prototyping).

2.2 Computational tools, workflows and services

W4M currently contains 19 modules covering all steps of LC-HRMS data analysis: Format conversion: raw data can be converted from commercial formats (e.g. Thermo Fisher.RAW) to open formats (including mzXML, Pedrioli , and mzML, Deutsch, 2008), via a recently developed toolshed wrapper implementation of the ProteoWizard software (Kessner ). Pre-processing: all wrappers of the reference XCMS (Smith ) and CAMERA (Kuhl ) functions are available to perform peak extraction, retention time alignment and annotation of isotopes and adducts. Normalization: signal drift and batch-effects, which are two major source of bias in MS data (van der Kloet ), can be corrected by fitting linear or local polynomial regression models to quality control samples. Statistical analysis: in addition to parametric and non-parametric univariate tests, W4M offers unique functionalities for multivariate modelling, including orthogonal partial least-squares (Trygg and Wold, 2002), with all numerical and graphical results and diagnostics (optimal number of components estimated by cross-validation, variable importance in projection, model significance by permutation testing, outlier detection). Annotation: a formula generator based on the HiRes (High Resolution) algorithm (Kind and Fiehn, 2007) is provided, in addition to several modules for public database query which allow the user to define specific annotation strategies (e.g. by searching from general to more specialized resources). Metabolomics scientists can access W4M with a simple web browser, upload their data, select analysis parameters or choose the default settings, and run their workflows in batch mode. In addition, W4M provides functionalities for creating interactive web-based documents showing the results of the analyses, and sharing them with collaborators directly on W4M. To get started easily, pre-configured workflows and corresponding histories are publicly shared for pre-processing, statistical analysis and annotation, respectively. A real LC-HRMS dataset (Roux ) is provided as a reference for new module and workflow evaluation.

3 Conclusion

The W4M infrastructure enables both experimental users with no specific programming skills and advanced developers to perform cutting-edge and reproducible computational analyses from raw data to metabolite annotation. W4M can be further extended to integrate external workflows running on desktop platforms (e.g. Taverna, KNIME), or acquisition instruments. The statistical modules from W4M can be used to analyse other ‘omics’ data, or can be combined with existing Galaxy workflows (e.g. in transcriptomics), thus enabling multi-omics analyses in a global systems-biology approach. In the coming months, modules for NMR data pre-processing will be integrated into W4M and the infrastructure will be connected to MetExplore (Cottret ) for genome-scale network analysis. W4M is therefore an innovative open-source computational VRE bridging the data-intensive bioinformatics and metabolomics communities.

Funding

This work was supported by Biogenouest®, Lifegrid (Auvergne), and by the IDEALG project [ANR-10-BTBR-04], IFB [ANR-11-INBS-0013] and MetaboHUB [ANR-11-INBS-0010] grants. Conflict of Interest: none declared.
  15 in total

1.  mzML: a single, unifying data format for mass spectrometer output.

Authors:  Eric Deutsch
Journal:  Proteomics       Date:  2008-07       Impact factor: 3.984

2.  MeltDB: a software platform for the analysis and integration of metabolomics experiment data.

Authors:  Heiko Neuweger; Stefan P Albaum; Michael Dondrup; Marcus Persicke; Tony Watt; Karsten Niehaus; Jens Stoye; Alexander Goesmann
Journal:  Bioinformatics       Date:  2008-09-02       Impact factor: 6.937

3.  Analytical error reduction using single point calibration for accurate and precise metabolomic phenotyping.

Authors:  Frans M van der Kloet; Ivana Bobeldijk; Elwin R Verheij; Renger H Jellema
Journal:  J Proteome Res       Date:  2009-11       Impact factor: 4.466

4.  Novel algorithms and the benefits of comparative validation.

Authors:  Robert Smith; Dan Ventura; John T Prince
Journal:  Bioinformatics       Date:  2013-04-14       Impact factor: 6.937

5.  ProteoWizard: open source software for rapid proteomics tools development.

Authors:  Darren Kessner; Matt Chambers; Robert Burke; David Agus; Parag Mallick
Journal:  Bioinformatics       Date:  2008-07-07       Impact factor: 6.937

6.  MetExplore: a web server to link metabolomic experiments and genome-scale metabolic networks.

Authors:  Ludovic Cottret; David Wildridge; Florence Vinson; Michael P Barrett; Hubert Charles; Marie-France Sagot; Fabien Jourdan
Journal:  Nucleic Acids Res       Date:  2010-05-05       Impact factor: 16.971

7.  Human metabolic phenotype diversity and its association with diet and blood pressure.

Authors:  Elaine Holmes; Ruey Leng Loo; Jeremiah Stamler; Magda Bictash; Ivan K S Yap; Queenie Chan; Tim Ebbels; Maria De Iorio; Ian J Brown; Kirill A Veselkov; Martha L Daviglus; Hugo Kesteloot; Hirotsugu Ueshima; Liancheng Zhao; Jeremy K Nicholson; Paul Elliott
Journal:  Nature       Date:  2008-04-20       Impact factor: 49.962

8.  Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry.

Authors:  Tobias Kind; Oliver Fiehn
Journal:  BMC Bioinformatics       Date:  2007-03-27       Impact factor: 3.169

9.  A common open representation of mass spectrometry data and its application to proteomics research.

Authors:  Patrick G A Pedrioli; Jimmy K Eng; Robert Hubley; Mathijs Vogelzang; Eric W Deutsch; Brian Raught; Brian Pratt; Erik Nilsson; Ruth H Angeletti; Rolf Apweiler; Kei Cheung; Catherine E Costello; Henning Hermjakob; Sequin Huang; Randall K Julian; Eugene Kapp; Mark E McComb; Stephen G Oliver; Gilbert Omenn; Norman W Paton; Richard Simpson; Richard Smith; Chris F Taylor; Weimin Zhu; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2004-11       Impact factor: 54.908

10.  MetaboAnalyst: a web server for metabolomic data analysis and interpretation.

Authors:  Jianguo Xia; Nick Psychogios; Nelson Young; David S Wishart
Journal:  Nucleic Acids Res       Date:  2009-05-08       Impact factor: 16.971

View more
  109 in total

1.  Osmolality-based normalization enhances statistical discrimination of untargeted metabolomic urine analysis: results from a comparative study.

Authors:  Loïc Mervant; Marie Tremblay-Franco; Emilien L Jamin; Emmanuelle Kesse-Guyot; Pilar Galan; Jean-François Martin; Françoise Guéraud; Laurent Debrauwer
Journal:  Metabolomics       Date:  2021-01-02       Impact factor: 4.290

Review 2.  Nutritional Metabolomics in Cancer Epidemiology: Current Trends, Challenges, and Future Directions.

Authors:  Emma E McGee; Rama Kiblawi; Mary C Playdon; A Heather Eliassen
Journal:  Curr Nutr Rep       Date:  2019-09

3.  Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online.

Authors:  Erica M Forsberg; Tao Huan; Duane Rinehart; H Paul Benton; Benedikt Warth; Brian Hilmers; Gary Siuzdak
Journal:  Nat Protoc       Date:  2018-03-01       Impact factor: 13.491

4.  Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data.

Authors:  Marion Brandolini-Bunlon; Mélanie Pétéra; Pierrette Gaudreau; Blandine Comte; Stéphanie Bougeard; Estelle Pujos-Guillot
Journal:  Metabolomics       Date:  2019-10-03       Impact factor: 4.290

5.  Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data.

Authors:  Chanisa Thonusin; Heidi B IglayReger; Tanu Soni; Amy E Rothberg; Charles F Burant; Charles R Evans
Journal:  J Chromatogr A       Date:  2017-09-09       Impact factor: 4.759

Review 6.  Metabolomics: a systems biology approach for enhancing heat stress tolerance in plants.

Authors:  Ali Raza
Journal:  Plant Cell Rep       Date:  2020-11-29       Impact factor: 4.570

Review 7.  Recommended strategies for spectral processing and post-processing of 1D 1H-NMR data of biofluids with a particular focus on urine.

Authors:  Abdul-Hamid Emwas; Edoardo Saccenti; Xin Gao; Ryan T McKay; Vitor A P Martins Dos Santos; Raja Roy; David S Wishart
Journal:  Metabolomics       Date:  2018-02-12       Impact factor: 4.290

8.  Statistical analysis in metabolic phenotyping.

Authors:  Benjamin J Blaise; Gonçalo D S Correia; Gordon A Haggart; Izabella Surowiec; Caroline Sands; Matthew R Lewis; Jake T M Pearce; Johan Trygg; Jeremy K Nicholson; Elaine Holmes; Timothy M D Ebbels
Journal:  Nat Protoc       Date:  2021-07-28       Impact factor: 13.491

9.  Dietary switch to Western diet induces hypothalamic adaptation associated with gut microbiota dysbiosis in rats.

Authors:  Véronique Douard; Gaëlle Boudry; Mélanie Fouesnard; Johanna Zoppi; Mélanie Petera; Léa Le Gleau; Carole Migné; Fabienne Devime; Stéphanie Durand; Alexandre Benani; Samuel Chaffron
Journal:  Int J Obes (Lond)       Date:  2021-03-13       Impact factor: 5.095

10.  Exploring the Glucose Fluxotype of the E. coli y-ome Using High-Resolution Fluxomics.

Authors:  Cécilia Bergès; Edern Cahoreau; Pierre Millard; Brice Enjalbert; Mickael Dinclaux; Maud Heuillet; Hanna Kulyk; Lara Gales; Noémie Butin; Maxime Chazalviel; Tony Palama; Matthieu Guionnet; Sergueï Sokol; Lindsay Peyriga; Floriant Bellvert; Stéphanie Heux; Jean-Charles Portais
Journal:  Metabolites       Date:  2021-04-26
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.