| Literature DB >> 28402395 |
Martin Larralde1, Thomas N Lawson2, Ralf J M Weber2,3, Pablo Moreno4, Kenneth Haug4, Philippe Rocca-Serra5, Mark R Viant2,3, Christoph Steinbeck4,6, Reza M Salek4.
Abstract
SUMMARY: Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets.Entities:
Mesh:
Year: 2017 PMID: 28402395 PMCID: PMC5870861 DOI: 10.1093/bioinformatics/btx169
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Schematic diagram and workflow of mzML2ISA & nmrML2ISA software suite. 1) Experimental vendor raw files are converted into an open source XML equivalent. 2) A user provides experimental metadata (minimally a study identifier). 3) The additional metadata and open source XML raw files are submitted to the mzML2ISA & nmrML2ISA software through either a CLI, API, GUI or Galaxy interface. The time to complete this step is dependent on the extent of metadata provided. 4) Metadata is extracted from XML files. 5) ISA-Tab structure is generated with a large number of fields automatically populated. Steps 4 and 5 take approximately 45 seconds for 50 XML files. 6) The remaining fields are then populated manually using the standard ISAcreator software. 7) The completed ISA-Tab structure can then be submitted to MetaboLights. 8) Additionally, the parsing components of mzML2ISA & nmrML2ISA can be used as standalone Python packages to extract metadata as either a python dictionary or JSON for integration in other analysis pipelines