Literature DB >> 23418185

eMZed: an open source framework in Python for rapid and interactive development of LC/MS data analysis workflows.

Patrick Kiefer1, Uwe Schmitt, Julia A Vorholt.   

Abstract

SUMMARY: The Python-based, open-source eMZed framework was developed for mass spectrometry (MS) users to create tailored workflows for liquid chromatography (LC)/MS data analysis. The goal was to establish a unique framework with comprehensive basic functionalities that are easy to apply and allow for the extension and modification of the framework in a straightforward manner. eMZed supports the iterative development and prototyping of individual evaluation strategies by providing a computing environment and tools for inspecting and modifying underlying LC/MS data. The framework specifically addresses non-expert programmers, as it requires only basic knowledge of Python and relies largely on existing successful open-source software, e.g. OpenMS. AVAILABILITY: The framework eMZed and its documentation are freely available at http://emzed.biol.ethz.ch/. eMZed is published under the GPL 3.0 license, and an online discussion group is available at https://groups.google.com/group/emzed-users. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2013        PMID: 23418185      PMCID: PMC3605603          DOI: 10.1093/bioinformatics/btt080

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Liquid chromatography/mass spectrometry (LC/MS) data analysis generally requires flexible software tools. Although a number of solutions for specific or multiple applications currently exist, many of these belong to one of two extremes. The first group includes frameworks that are highly flexible but have been developed in languages (e.g. C++) that require advanced programming skills, e.g. OpenMS (Sturm ). Although such frameworks have a rapid application run time, the testing of new workflows and concepts is cumbersome because programming requirements are high, and edit-compile cycles are slow. The second group includes closed black-box solutions with graphical user interfaces that are easy to use but inherently non-transparent and inflexible, e.g. Maven (Melamud ) and mzMine2 (Pluskal ). Note that libraries such as the R-based XCMS (Smith ) or the Matlab-based Bioinformatics Toolbox (Mathworks, Natick, MA, USA) lie between these extremes. The motivation to develop eMZed was to provide an open-source framework to establish transparent and flexible workflows for high-end data treatment that requires only basic programming skills of the user. To this end, we combined the powerful and easy-to-learn programming language Python, a comprehensive library of elementary building blocks, and an integrated development environment.

2 RESULTS

2.1 Technical aspects

The eMZed framework is implemented in the Python programming language, which is well established in scientific computing (Oliphant, 2007) and bioinformatics in particular (Cock ). Compared with R and Matlab, Python’s standard library is more extensive and enables rapid application development by various means; e.g. Python supports easy access to online services such as PubChem or Metlin, which are of great interest for metabolomics data analyses. We used the Python libraries PyQt4, spyderlib and guiqwt to build the workbench and graphical explorers and used numpy and scipy for the numerical data structures and algorithms. One central concept in the development of eMZed was the integration of previously established algorithms into a single platform that minimizes error-prone import and export steps. Therefore, we integrated functionalities from the libraries XCMS and OpenMS. To call functionalities from XCMS, we built a bridge to R that enables eMZed to use the centWave feature detector (Tautenhahn ) and ‘matched filter’ method (Smith ). Enabling access to a subset of OpenMS functionalities for fast I/O and providing clustering-based retention time alignment (Lange ) represented a major obstacle that was overcome by developing a code generator. This generator is hosted at https://github.com/uweschmitt/pyOpenMS and uses Cython for invoking C/C++ functions. The current version of eMZed was developed and tested using 32-bit Windows 7 and was further tested using 64-bit Ubuntu 12.04 Linux. A 64-bit version for Windows is currently being developed.

2.2 Functionalities

eMZed provides simple and readily usable building blocks for rapid workflow development. In addition to data inspection, peak detection, alignment and integration, the current version possesses several dedicated helper modules that support the building of graphical dialogues, statistical analyses and chemical data examinations, such as mass and isotope abundance analyses and the manipulation of molecular formulas, for example. For peak identification, access to the chemical compound database PubChem (Bolton ) and the Metlin online service (Smith ) is provided. LC/MS data are handled using PeakMap and Spectrum data structures, and interactive explorer tools are linked to these data structures for visual data inspection. Table is a comprehensive data structure supporting SQL-like operations. Table plays a key role in eMZed workflows because it provides easy handling of peaks or chemical data and supports the identification and integration of MS level 1 and level 2 peaks. Note that chromatographic peaks and spectra can also be directly visualized within Table structure (Fig. 1). In addition, Table can be edited, thereby allowing for the modification of peak and integration limits or the deletion and duplication of rows. PeakMap and Table are available in the workspace variable explorer, and interactive inspection can be integrated into workflows to validate intermediate or final results. A complete overview of all features can be found at the eMZed homepage.
Fig. 1.

Screenshot of the eMZed workbench showing the editor, variable explorer, IPython console and interactive table explorer. The table explorer shows the results of a coenzyme A ester identification workflow (see Supplementary Material). Peaks of the parent ion and integrated peaks of two fragment ions are depicted in the left plot. The right plot shows a corresponding MS level 2 spectrum that includes information on selected m/z peaks (red dots)

Screenshot of the eMZed workbench showing the editor, variable explorer, IPython console and interactive table explorer. The table explorer shows the results of a coenzyme A ester identification workflow (see Supplementary Material). Peaks of the parent ion and integrated peaks of two fragment ions are depicted in the left plot. The right plot shows a corresponding MS level 2 spectrum that includes information on selected m/z peaks (red dots)

2.3 Example application

To demonstrate the comprehensive functionalities of eMZed, we implemented a tailored workflow for the database-independent identification of coenzyme A thioesters of MS level 1 and level 2 spectra. The workflow can be subdivided into four steps: Creation of a coenzyme A ester solution space from a restricted recombination of chemical elements C, H, N, O, P and S. Detection of high-resolution MS level 1 peaks using the centWave feature detector and the identification of candidates using the Table join operation. Evaluation of candidates by comparing m/z values of measured MS level 2 peaks with values of specific fragment ions calculated from assigned molecular formulas. Visualization of a result table for inspection. The given example demonstrates that even complex operations can be encoded easily owing to the multitude of functionalities that are available. A more detailed description of the workflow, the Python code and example data are provided in the Supplementary Material.

3 DISCUSSION

Metabolomics and related fields are rapidly progressing and require the development and modification of workflows and analytical strategies. In this context, the speed of data analysis routines is an important factor, although efforts to implement and test new solutions are equally important. To this end, eMZed provides a workspace and capability to inspect and visualize interim results at each step of data processing. In addition, eMZed provides a common base for developing individual applications and supports interchangeable individual solutions. This approach may help to simplify the current landscape of existing LC/MS software, which is fragmented and often laboratory specific.

4 OUTLOOK

Future work will be directed towards the implementation of new features, which, e.g. will allow for enhanced MS level 2 data handling, port eMZed to 64 bit Windows 7 operating system, better support of R and faster analysis by multi core support. These enhancements will be available in forthcoming versions of eMZed. Funding: This project was support by ETH Zurich, Department of Biology, within the frame of an IT-strategy initiative. Complementary funding was obtained via the Swiss Initiative in Systems Biology SystemsX.ch, BattleX. Conflict of interest: none declared.
  8 in total

1.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification.

Authors:  Colin A Smith; Elizabeth J Want; Grace O'Maille; Ruben Abagyan; Gary Siuzdak
Journal:  Anal Chem       Date:  2006-02-01       Impact factor: 6.986

2.  A geometric approach for the alignment of liquid chromatography-mass spectrometry data.

Authors:  Eva Lange; Clemens Gröpl; Ole Schulz-Trieglaff; Andreas Leinenbach; Christian Huber; Knut Reinert
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

3.  Metabolomic analysis and visualization engine for LC-MS data.

Authors:  Eugene Melamud; Livia Vastag; Joshua D Rabinowitz
Journal:  Anal Chem       Date:  2010-11-04       Impact factor: 6.986

4.  METLIN: a metabolite mass spectral database.

Authors:  Colin A Smith; Grace O'Maille; Elizabeth J Want; Chuan Qin; Sunia A Trauger; Theodore R Brandon; Darlene E Custodio; Ruben Abagyan; Gary Siuzdak
Journal:  Ther Drug Monit       Date:  2005-12       Impact factor: 3.681

5.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

6.  MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data.

Authors:  Tomás Pluskal; Sandra Castillo; Alejandro Villar-Briones; Matej Oresic
Journal:  BMC Bioinformatics       Date:  2010-07-23       Impact factor: 3.169

7.  Highly sensitive feature detection for high resolution LC/MS.

Authors:  Ralf Tautenhahn; Christoph Böttcher; Steffen Neumann
Journal:  BMC Bioinformatics       Date:  2008-11-28       Impact factor: 3.169

8.  OpenMS - an open-source software framework for mass spectrometry.

Authors:  Marc Sturm; Andreas Bertsch; Clemens Gröpl; Andreas Hildebrandt; Rene Hussong; Eva Lange; Nico Pfeifer; Ole Schulz-Trieglaff; Alexandra Zerck; Knut Reinert; Oliver Kohlbacher
Journal:  BMC Bioinformatics       Date:  2008-03-26       Impact factor: 3.169

  8 in total
  20 in total

1.  An environmental bacterial taxon with a large and distinct metabolic repertoire.

Authors:  Micheal C Wilson; Tetsushi Mori; Christian Rückert; Agustinus R Uria; Maximilian J Helf; Kentaro Takada; Christine Gernert; Ursula A E Steffens; Nina Heycke; Susanne Schmitt; Christian Rinke; Eric J N Helfrich; Alexander O Brachmann; Cristian Gurgui; Toshiyuki Wakimoto; Matthias Kracht; Max Crüsemann; Ute Hentschel; Ikuro Abe; Shigeki Matsunaga; Jörn Kalinowski; Haruko Takeyama; Jörn Piel
Journal:  Nature       Date:  2014-01-29       Impact factor: 49.962

2.  Systems-level analysis of isotopic labeling in untargeted metabolomic data by X13CMS.

Authors:  Elizabeth M Llufrio; Kevin Cho; Gary J Patti
Journal:  Nat Protoc       Date:  2019-06-05       Impact factor: 13.491

3.  The One-carbon Carrier Methylofuran from Methylobacterium extorquens AM1 Contains a Large Number of α- and γ-Linked Glutamic Acid Residues.

Authors:  Jethro L Hemmann; Olivier Saurel; Andrea M Ochsner; Barbara K Stodden; Patrick Kiefer; Alain Milon; Julia A Vorholt
Journal:  J Biol Chem       Date:  2016-02-19       Impact factor: 5.157

4.  Metabolic adaptation to vitamin auxotrophy by leaf-associated bacteria.

Authors:  Birgitta Ryback; Miriam Bortfeld-Miller; Julia A Vorholt
Journal:  ISME J       Date:  2022-08-20       Impact factor: 11.217

5.  Metabolic footprint of epiphytic bacteria on Arabidopsis thaliana leaves.

Authors:  Florian Ryffel; Eric J N Helfrich; Patrick Kiefer; Lindsay Peyriga; Jean-Charles Portais; Jörn Piel; Julia A Vorholt
Journal:  ISME J       Date:  2015-08-25       Impact factor: 10.302

6.  Import of Aspartate and Malate by DcuABC Drives H2/Fumarate Respiration to Promote Initial Salmonella Gut-Lumen Colonization in Mice.

Authors:  Bidong D Nguyen; Miguelangel Cuenca V; Johannes Hartl; Ersin Gül; Rebekka Bauer; Susanne Meile; Joel Rüthi; Céline Margot; Laura Heeb; Franziska Besser; Pau Pérez Escriva; Céline Fetz; Markus Furter; Leanid Laganenka; Philipp Keller; Lea Fuchs; Matthias Christen; Steffen Porwollik; Michael McClelland; Julia A Vorholt; Uwe Sauer; Shinichi Sunagawa; Beat Christen; Wolf-Dietrich Hardt
Journal:  Cell Host Microbe       Date:  2020-05-15       Impact factor: 21.023

7.  Marine Proteobacteria metabolize glycolate via the β-hydroxyaspartate cycle.

Authors:  Lennart Schada von Borzyskowski; Francesca Severi; Karen Krüger; Lucas Hermann; Alexandre Gilardet; Felix Sippel; Bianca Pommerenke; Peter Claus; Niña Socorro Cortina; Timo Glatter; Stefan Zauner; Jan Zarzycki; Bernhard M Fuchs; Erhard Bremer; Uwe G Maier; Rudolf I Amann; Tobias J Erb
Journal:  Nature       Date:  2019-11-13       Impact factor: 49.962

8.  A synthetic pathway for the fixation of carbon dioxide in vitro.

Authors:  Thomas Schwander; Lennart Schada von Borzyskowski; Simon Burgener; Niña Socorro Cortina; Tobias J Erb
Journal:  Science       Date:  2016-11-18       Impact factor: 47.728

9.  Longevity of major coenzymes allows minimal de novo synthesis in microorganisms.

Authors:  Johannes Hartl; Patrick Kiefer; Fabian Meyer; Julia A Vorholt
Journal:  Nat Microbiol       Date:  2017-05-15       Impact factor: 17.745

10.  Biosynthesis of glycerol phosphate is associated with long-term potentiation in hippocampal neurons.

Authors:  Giuseppe Martano; Luca Murru; Edoardo Moretto; Laura Gerosa; Giulia Garrone; Vittorio Krogh; Maria Passafaro
Journal:  Metabolomics       Date:  2016-07-23       Impact factor: 4.290

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.