Literature DB >> 31097673

MARMoSET - Extracting Publication-ready Mass Spectrometry Metadata from RAW Files.

Marina Kiweler1, Mario Looso2, Johannes Graumann3.   

Abstract

In the context of publishing data sets acquired by mass spectrometry or works based on such molecular screens, metadata documenting the instrument settings are of central importance to the evaluation and reproduction of results. A single experiment may be linked to hundreds of data acquisitions, which are frequently stored in proprietary file formats. Together with community-, repository-, as well as publisher-specific reporting standards, this state of affairs frequently leads to manual -and thus error prone-metadata extraction and formatting. Data extracted from a single file also often stand in for an entire file set, implying a risk for unreported parameter divergence. To support quality control and data reporting, the C# application MARMoSET extracts and reduces publication relevant metadata from Thermo Fischer Scientific RAW files. It is integrated with an R package for easy reporting. The tool is expected to be particularly useful to high throughput environments such as service facilities with large project numbers and/or sizes.
© 2019 Kiweler et al.

Entities:  

Keywords:  Bioinformatics; Bioinformatics software; Data standards; Mass Spectrometry; Quality control and metrics

Mesh:

Year:  2019        PMID: 31097673      PMCID: PMC6683000          DOI: 10.1074/mcp.TIR119.001505

Source DB:  PubMed          Journal:  Mol Cell Proteomics        ISSN: 1535-9476            Impact factor:   5.911


Aiming for evaluability and reproducibility of mass spectrometry based research and in parallel to the maturation of the field toward the acquisition of ever larger data sets, initiatives to standardize reporting of instrument settings and other relevant metadata have arisen in the community itself (1, 2), from the deposition requirements of public data repositories (3, 4), as well as have been launched by publishers and editors involved in the dissemination of mass spectrometric experiments (5, 6). The extraction and reporting of the metadata required remains a tedious process, especially given the facts that OMICS experiments frequently involve hundreds of data files and that data often resides in binary and/or proprietary file formats offering an excellent information to storage space ratio yet limiting ease of access. One such example is the RAW file format produced by Thermo Fischer Scientific's (Bremen, Germany) mass spectrometers. Beyond the acquired spectral data, RAW files also contain instrument settings as metadata, which are required to evaluate and reproduce the results. An obvious and common approach to extract this data is to manually open individual RAW files using the vendor-specific Xcalibur software and copy the required information. The tediousness and error prone nature of manual interaction with the individual file, however, frequently leads to the extraction of metadata describing an entire data set from a single file. When the data set encompasses hundreds of files acquired over a potentially long period, this implies a potential for undetected parameter drift with implications for laboratory-internal quality control, reporting and publication. In this context core facility laboratories carry a particularly large burden, as the sheer number of projects they handle further compounds the data access problem. In combination with publication requiring metadata reporting often years removed from data delivery to customers, the difficulty to extract parameters frequently implies “data archeology” from deep archive. To the best of our knowledge no software exists to date to address the need for both simple reporting from large numbers of RAW files and metadata reduction to a consensus set of parameters. Using the vendor-provided application programming interface (API) RawFileReader (7) we create such a tool along with R (8) based infrastructure for the generation of tabular representations suitable for intra-laboratory quality control, data reporting, as well as supplemental material in publications.

MATERIALS AND METHODS

System Requirements

The C# (9) command line tool MARMoSET is running as 64-bit code on Microsoft Windows only. It was compiled in Visual Studio Community (Version 15.8.7, .net 4.7.03056) (10) for the .NET Framework 4.6.1. The accompanying R package is agnostic with respect to operating system and only requires a functional R installation, as well as the package dependences assertive (11), jsonlite (12), pathological (13), Rlist (14), stringi (15), and magrittr (16).

Implementation

The C# (9) application MARMoSET (https://github.molgen.mpg.de/loosolab/MARMoSET_C) extracts publication relevant metadata from Thermo Fischer Scientific RAW files as a JSON (17) file. The RAW file format as accessed through the RawFileReader API provides multiple levels of metadata dependent on the system it was acquired on. A fixed header contains information like date, original filename, and sample information. The header is followed by a list containing instrument modules used and their respective methods as strings. The API additionally provides separate entry points for detector-associated data (such as ultraviolet spectrophotometry or mass spectrometry). MARMoSET currently only implements access to MS data, which beyond the acquired spectral data includes further instrument parameters and logs. Structure, string formatting, as well as location in the data structure of relevant metadata are specific for instrument classes. Using the “IRaw DataPlus” interface from the RawFileReader API, MARMoSET collects all relevant metadata from the divergent data structure. In the context of liquid chromatography/mass spectrometry (LC/MS) using EASY-nLC ultra high-pressure liquid chromatography instruments (Thermo Fisher Scientific), LC parameters are available in the method strings and extracted and parsed by MARMoSET. Where chromatography instrument parameters beyond the EASY-nLC series are retrievable from the RAW file, future inclusion into the reporting infrastructure is expected to be straight forward and will be added as encountered. Depending on whether it is provided with the path to a single RAW file or a directory, MARMoSET either acts on a single file or iterates over a collection of RAW files in a directory making use of parallelization dependent on the hardware resources available. In a first step, the metadata is gathered for each RAW file individually. In order to reduce data from many files into a minimal set of parameters describing the entire collection, the resulting data structures are hash code evaluated and sorted in a dictionary. This information is then used to sort RAW files into groups that share all relevant parameters. Finally, a JSON file is written, comprising a minimal set of parameter groups linked to the corresponding RAW file names (see Fig. 1).
Fig. 1.

Schematic representation of data processing by MARMoSET including excerpts of the data representations involved.

Schematic representation of data processing by MARMoSET including excerpts of the data representations involved. To provide easy and intuitive handling of the structured data in the JSON file, we provide an R (8) package also named MARMoSET (https://github.molgen.mpg.de/loosolab/MARMoSET). It is designed to create compact and clear tables following predefined journal requirements such as MIAPE or JPR. Moreover, it supports individual selections of parameter sets in a few easy steps. On Microsoft Windows, the included function “generate_json()” directly runs the C# command line tool from within R and captures its content into memory. Alternatively, externally generated JSON files may be read as well. In-memory data is reformatted by list flattening into a “data.frame” (function “flatten_json()”).Based on a term-matching Table included in the package, the function “match_terms()” extracts and arranges a subset of parameters from the total metadata set in the flattened JSON file for all given parameter groups and generates a table reduced to journal specific requirements. These tables may then be exported ready to upload in the form of tab delimited txt files and MS excel tables by using the “save_all_groups()” function. The included term-matching table can easily be adapted by the user to include further metadata entities or to design individual reporting styles. It is noteworthy that the same toolkit employed here has also been choosen by Trachsel et al (2018) (17) to facilitate analysis of spectrum-level metadata.

Exemplary Workflow (Windows)

In a first step and from within a functional R environment (see https://cloud.r-project.org/ for installation instructions), the MARMoSET R package is installed from the github repository using the package remotes (18), which may be achieved by calling “install.packages(“remotes”); remotes::install_github(“loosolab/MARMoSET,” host = “github.molgen.mpg.de/api/v3”).” After loading MARMoSET with “library(MARMoSET),” a JSON object containing the metadata of grouped RAW files may be created by executing “json <- generate_json(“”)” and prepared for downstream processing using “flat_json <- flatten_json(json),” A reporting guideline- (“MIAPE” in this example) and instrument-specific filter is generated by calling “term_matching_table <- create_term_matching_table(instrument_list = c(“Thermo EASY-nLC”, “Q Exactive - Orbitrap_MS”), origin_key = “miape”)” and applied to the JSON object using “vector_of_group_tables <- match_terms(flat_json, term_matching_table).” Finally, a tab delimited text file representation may be saved using “save_all_groups(vector_of_group_tables, output-path).” Further use cases and more detailed instructions may be found on the top-level README page of the github repository (https://github.molgen.mpg.de/loosolab/MARMoSET), as well as through R's help system (e.g. by calling '?generate_json') (19).

RESULTS AND DISCUSSION

A combination of an ever-increasing number of raw data files per mass spectrometry based experiment and a strong push for the standardized reporting of metadata for evaluation and reproducibility has rendered metadata extraction and its reduction into the smallest common parameter set a common need within the community. To the best of our knowledge, however, no tool simplifying metadata extraction currently exists. With the MARMoSET suite of tools presented here we fill that gap for mass spectrometric data acquired by Thermo Fisher Scientific's instruments, combining outputs geared toward machine readability (JSON), as well as human consumption (tab delimited text and MS excel). The resulting information is suited for documentation and reporting, publication, as well as operations oversight and expected to be particularly helpful in the context of environments implying high throughput acquisition of mass spectrometric data, such as core facility laboratories. In conclusion, MARMoSET offers tools for the simple extraction of metadata from RAW files. With the intend to particularly serve high throughput data acquisition environments, the tool enables the straightforward generation of small and clear tables containing just the metadata or parameter information needed. MARMoSET is designed for flexible adaption on individual laboratory's needs.
  5 in total

1.  Reporting protein identification data: the next generation of guidelines.

Authors:  Ralph A Bradshaw; Alma L Burlingame; Steven Carr; Ruedi Aebersold
Journal:  Mol Cell Proteomics       Date:  2006-05       Impact factor: 5.911

Review 2.  The minimum information about a proteomics experiment (MIAPE).

Authors:  Chris F Taylor; Norman W Paton; Kathryn S Lilley; Pierre-Alain Binz; Randall K Julian; Andrew R Jones; Weimin Zhu; Rolf Apweiler; Ruedi Aebersold; Eric W Deutsch; Michael J Dunn; Albert J R Heck; Alexander Leitner; Marcus Macht; Matthias Mann; Lennart Martens; Thomas A Neubert; Scott D Patterson; Peipei Ping; Sean L Seymour; Puneet Souda; Akira Tsugita; Joel Vandekerckhove; Thomas M Vondriska; Julian P Whitelegge; Marc R Wilkins; Ioannnis Xenarios; John R Yates; Henning Hermjakob
Journal:  Nat Biotechnol       Date:  2007-08       Impact factor: 54.908

3.  Guidelines for reporting the use of mass spectrometry in proteomics.

Authors:  Chris F Taylor; Pierre-Alain Binz; Ruedi Aebersold; Michel Affolter; Robert Barkovich; Eric W Deutsch; David M Horn; Andreas Hühmer; Martin Kussmann; Kathryn Lilley; Marcus Macht; Matthias Mann; Dieter Müller; Thomas A Neubert; Janice Nickson; Scott D Patterson; Roberto Raso; Kathryn Resing; Sean L Seymour; Akira Tsugita; Ioannis Xenarios; Rong Zeng; Randall K Julian
Journal:  Nat Biotechnol       Date:  2008-08       Impact factor: 54.908

4.  rawDiag: An R Package Supporting Rational LC-MS Method Optimization for Bottom-up Proteomics.

Authors:  Christian Trachsel; Christian Panse; Tobias Kockmann; Witold E Wolski; Jonas Grossmann; Ralph Schlapbach
Journal:  J Proteome Res       Date:  2018-07-24       Impact factor: 4.466

5.  ProteomeXchange provides globally coordinated proteomics data submission and dissemination.

Authors:  Juan A Vizcaíno; Eric W Deutsch; Rui Wang; Attila Csordas; Florian Reisinger; Daniel Ríos; José A Dianes; Zhi Sun; Terry Farrah; Nuno Bandeira; Pierre-Alain Binz; Ioannis Xenarios; Martin Eisenacher; Gerhard Mayer; Laurent Gatto; Alex Campos; Robert J Chalkley; Hans-Joachim Kraus; Juan Pablo Albar; Salvador Martinez-Bartolomé; Rolf Apweiler; Gilbert S Omenn; Lennart Martens; Andrew R Jones; Henning Hermjakob
Journal:  Nat Biotechnol       Date:  2014-03       Impact factor: 54.908

  5 in total
  9 in total

Review 1.  Software tools, databases and resources in metabolomics: updates from 2018 to 2019.

Authors:  Keiron O'Shea; Biswapriya B Misra
Journal:  Metabolomics       Date:  2020-03-07       Impact factor: 4.290

2.  Protein kinase N2 mediates flow-induced endothelial NOS activation and vascular tone regulation.

Authors:  Young-June Jin; Ramesh Chennupati; Rui Li; Guozheng Liang; ShengPeng Wang; András Iring; Johannes Graumann; Nina Wettschureck; Stefan Offermanns
Journal:  J Clin Invest       Date:  2021-11-01       Impact factor: 14.808

3.  Control of CRK-RAC1 activity by the miR-1/206/133 miRNA family is essential for neuromuscular junction function.

Authors:  Ina Klockner; Christian Schutt; Theresa Gerhardt; Thomas Boettger; Thomas Braun
Journal:  Nat Commun       Date:  2022-06-08       Impact factor: 17.694

4.  Tumor-associated macrophages promote ovarian cancer cell migration by secreting transforming growth factor beta induced (TGFBI) and tenascin C.

Authors:  Anna Mary Steitz; Alina Steffes; Florian Finkernagel; Annika Unger; Leah Sommerfeld; Julia M Jansen; Uwe Wagner; Johannes Graumann; Rolf Müller; Silke Reinartz
Journal:  Cell Death Dis       Date:  2020-04-20       Impact factor: 8.469

5.  Epigenetic therapy of novel tumour suppressor ZAR1 and its cancer biomarker function.

Authors:  Verena Deutschmeyer; Janina Breuer; Sara K Walesch; Anna M Sokol; Johannes Graumann; Marek Bartkuhn; Thomas Boettger; Oliver Rossbach; Antje M Richter
Journal:  Clin Epigenetics       Date:  2019-12-04       Impact factor: 6.551

6.  Stimulation of glycolysis promotes cardiomyocyte proliferation after injury in adult zebrafish.

Authors:  Rubén Marín-Juez; Hadil El-Sammak; Ryuichi Fukuda; Arica Beisaw; Radhan Ramadass; Carsten Kuenne; Stefan Guenther; Anne Konzer; Aditya M Bhagwat; Johannes Graumann; Didier Yr Stainier
Journal:  EMBO Rep       Date:  2020-07-09       Impact factor: 8.807

7.  Phosphoproteomics identify arachidonic-acid-regulated signal transduction pathways modulating macrophage functions with implications for ovarian cancer.

Authors:  Raimund Dietze; Mohamad K Hammoud; María Gómez-Serrano; Annika Unger; Tim Bieringer; Florian Finkernagel; Anna M Sokol; Andrea Nist; Thorsten Stiewe; Silke Reinartz; Viviane Ponath; Christian Preußer; Elke Pogge von Strandmann; Sabine Müller-Brüsselbach; Johannes Graumann; Rolf Müller
Journal:  Theranostics       Date:  2021-01-01       Impact factor: 11.556

8.  Arachidonic acid, a clinically adverse mediator in the ovarian cancer microenvironment, impairs JAK-STAT signaling in macrophages by perturbing lipid raft structures.

Authors:  Mohamad K Hammoud; Raimund Dietze; Jelena Pesek; Florian Finkernagel; Annika Unger; Tim Bieringer; Andrea Nist; Thorsten Stiewe; Aditya M Bhagwat; Wolfgang Andreas Nockher; Silke Reinartz; Sabine Müller-Brüsselbach; Johannes Graumann; Rolf Müller
Journal:  Mol Oncol       Date:  2022-05-04       Impact factor: 7.449

9.  CDKL5 kinase controls transcription-coupled responses to DNA damage.

Authors:  Taran Khanam; Ivan Muñoz; Florian Weiland; Thomas Carroll; Michael Morgan; Barbara N Borsos; Vasiliki Pantazi; Meghan Slean; Miroslav Novak; Rachel Toth; Paul Appleton; Tibor Pankotai; Houjiang Zhou; John Rouse
Journal:  EMBO J       Date:  2021-10-04       Impact factor: 11.598

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.