| Literature DB >> 35888710 |
Nils Hoffmann1, Gerhard Mayer2, Canan Has3,4,5, Dominik Kopczynski6, Fadi Al Machot7, Dominik Schwudke8,9,10, Robert Ahrends6, Katrin Marcus11, Martin Eisenacher11,12, Michael Turewicz13,14.
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography-mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.Entities:
Keywords: FAIR; bioinformatics; data format; database; lipidomics; mass spectrometry; standardization
Year: 2022 PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Data flow for different lipidomics software tools. From top to bottom, the supported ‘data format paths’ of selected lipidomics software tools that support at least one PSI standard data format are highlighted with a solid white outline. Those with planned or upcoming support are highlighted with a dashed white outline. The various software tools are represented by blue rectangles with the tool name in white. Data formats are represented by gray rectangles. On the right, the theoretically possible data flow between the PSI standard data formats, mzML and mzTab-M, is depicted (orange background) via supporting software and repositories/databases (white rectangle outline). In this figure, ‘Vendor Format’ stands for various proprietary, vendor-specific raw data formats (i.e., Thermo Fisher .raw, Agilent .d, ABSciex .wiff and Waters .raw), which can be converted to mzML (among other formats) using msConvert. (1) ‘csv/xlsx’ represents non-standardized output formats such as Microsoft XLSX, comma/tab separated text file formats and HTML. (2) ‘mgf/msp/ms2‘ are text-based formats that encode mass spectral data but generally do not have a strongly defined metadata schema. (3) ‘mzXML/mzData’ represent the legacy raw data and peak list formats mzXML and mzData. (4) ISA-Tab (used by MetaboLights) and mwTab (used by Metabolomics Workbench) are text-based, tabular data formats based on a defined metadata model, which simplifies validation and tooling. (5) Only some tools support quantification. See Table 1 for details.
Overview of software for lipid identification from mass spectrometry. Abbreviations: U: Untargeted, T: Targeted, C: Chromatography, CE: Capillary Electrophoresis, IM: Ion Mobility, DI: Direct Infusion (Shotgun), I: Imaging. $: targeted includes Selected Reaction and Multiple Reaction Monitoring (MRM), untargeted includes DDA and DIA approaches. *: Only the most important ones relevant to this review. All tools use some form of configuration file format, e.g., text-based (TXT) or other formats for libraries or fragmentation rules. Workflow assignment designates the primary workflow a tool was designed for and this was stated by the authors; others may be available. We use direct infusion as a more generic synonym for what is usually referred to as ‘shotgun lipidomics’. Comma-separated values (CSV) is a tabular, spreadsheet-like format. If tab characters are used as separators, the format is TSV. Hypertext markup language (HTML) is a format viewable with an internet browser. XLSX: MS office XML-based spreadsheet format. MSP: NIST mass spectral library format. MGF: Mascot Generic Format. BLIB: Binary mass spectral library format. PDF: Portable Document Format. #: rule-based validation often includes spectral scores, ratios and thresholds, scores denote spectral similarity functions, such as the commonly used dot product/cosine variants. Remarks: (1) The software is no longer available. (2) Lipid class separation chromatography, e.g., HILIC or supercritical fluid chromatography. (3) XCMS input recommended, LIPID MAPS class assignment of suspect ions. (3) Software is provided as a web application without further information. (4) Supports phospholipids only. (5) XCMS input recommended, LIPID MAPS class assignment of suspect ions. (6) After release 3.0, LipidMatch is available as LipidMatch Flow (latest version 3.5, but without source code). (7) Supports oxidized phospholipids only. (8) Identification and quantification use other tools’ methods. (9) The source code is provided for download, but no code license is defined.
| Workflow $ | Name | Handling | MS * | Identification # | Quant | Input | Output | Last Release | Open-Source | License | Programming Language |
|---|---|---|---|---|---|---|---|---|---|---|---|
| T | LIMSA | C, DI | MS1, MS2 | Compound/Fragment library | yes | XLSX, CSV, HTML | NA | 2006 | NA (1) | GPL v3 | C++, VBA, Excel |
| T | LipidomeDB | DI, C | MS1, MS2 | yes | XLSX | XLSX, HTML | 2019 | no | NA | Java | |
| T | LipidQuant | C (2) | MS1 | yes | TXT | XLSX | 2021 | yes | CC-BY 4 | VBA, Excel | |
| U | ALEX and ALEX 123 | DI | MS1, MS2, MS3 | Manual | no | manual input of parameters | HTML | 2017 | no | NA | NA (3) |
| U | Greazy (4) | C, DI | MS1, MS2 | Fragment/Spectral Library + score | no | vendor, mzML | mzTab (via LipidLama) | 2022 | yes | Apache v2 | C# |
| U | LDA2 | C | MS1, MS2 | Rule-based | yes | mzML, TXT | XLSX, mzTab-M | 2021 | yes | GPL v3 | Java |
| U | LipidBlast | C | MS1, MS2 | Spectral Library + score | no | MSP, MGF, XLSX | MGF, XLSX | 2014 | yes | CC-BY | EXCEL |
| U | LipiDex | C | MS1, MS2 | Spectral Library + rule-based | yes | MGF, mzXML, CSV | CSV | 2018 | yes | MIT | Java |
| U | LipidFinder | C | MS1 | Rule-based, LMSD | no | CSV, JSON (5) | 2021 | yes | MIT | Python | |
| U | LipidHunter (4) | C, DI | MS1, MS2 | Rule-based | yes | mzML, XLSX, TXT | XLSX, HTML, TXT | 2020 | yes | GPL v2, Proprietary | Python |
| U | LipidIMMS | C, IM | MS1 + CCS, MS2 | CCS Library + Spectral Library + score | no | MSP, MGF | CSV, HTML | 2020 | no | NA | NA (3) |
| U | LipidMatch (6) | C, I, DI | MS1, MS2, MSE/DIA | Compound/Fragment library + rule-based | yes | CSV, MS2 (ProteoWizard) | CSV | 2020 | yes | CC BY 4.0 | R |
| U | LipidMiner | C | MS1, MS2 | Compound/Fragment library + rule-based | yes | raw | XLSX, CSV | 2014 | no | NA | C#, Python |
| U | LipidMS | C | MS1, MS2, MSE/DIA | Compound/Fragment library + rule-based | yes | mzXML, CSV | CSV | 2022 | yes | GPL v3 | R |
| U | Lipid-Pro | C | MSE/DIA | Compound/Fragment library | yes | CSV | XLSX, TXT | 2015 | no | Proprietary | C# |
| U | LipidXplorer | DI | MS1, MS2, MS3 | Rule based | no | mzML | CSV, HTML | 2019 | yes | GPL v2 | Python |
| U | LiPydomics | C, IM | MS1 | CCS Library + | yes | CSV | XLSX | 2021 | yes | MIT | Python |
| U | LIQUID | C | MS1, MS2 | Spectral Library + rule-based | yes | RAW, mzML | TSV, mzTab, MSP | 2021 | yes | Apache v2 | C# |
| U | LOBSTAHS | C | MS1 | Spectral Library + rule-based | yes | mzML, mzXML, mzData, CSV | XLSX, CSV | 2021 | yes | GPL v3 | R |
| U | LPPTiger (7) | C | MS1, MS2 | Spectral Library + score | yes | mzML, XLSX, TXT | XLSX, HTML | 2021 | yes | GPL v2, Proprietary | Python |
| U | MassPix | I | MS1 | no | imzML | CSV | 2017 | yes | NA | R | |
| U | MS-DIAL 4 | C, CE, IM | MS1, MS2, MSE/DIA | Spectral Library + rule-based | yes | vendor, mzML | CSV, mzTab-M, XLSX | 2022 | yes | GPL v3 | C# |
| U | MZmine 2 | C | MS1, MS2 | Spectral Library + rule-based | yes | vendor, mzML, mzXML, mzData, CSV, mzTab, XML | CSV, mzTab, XML | 2019 | yes | GPL v2 | Java |
| U | XCMS | C | MS1, MS2 | Spectral Library + score | yes | mzML, mzXML, netCDF | CSV | 2021 | yes | GPL v2 | R, C |
| T + U | LipidCreator and Skyline | C | MS1, MS2, MSE/DIA | Fragment/Spectral Library + score (8) | yes (8) | vendor, mzML (MS1 + MS2) | XLSX, CSV, BLIB | 2021 | yes | MIT | C# |
| T + U | LipidPioneer | C | MS1, MS2 | Compound/ | yes (8) | XLSX | XLSX | 2017 | yes (9) | NA | VBA, Excel |
| T + U | LipidQA | DI | MS1, MS2 | Spectral Library + score | yes | vendor (Thermo, Waters) | CSV | 2007 | NA (1) | NA | Visual C++ |
| T + U | LipoStar | C, IM | MS1, MS2, MSE/DIA | Compound/Fragment library + rule-based validation | yes | vendor | CSV | 2022 | no | Proprietary | C# |
| T + U | LipoStarMSI | DI, I | MS1, MS2 | Spectral Library + rule based | yes | vendor (Bruker, Waters), imzML | CSV | 2020 | no | Proprietary | C# |
| T + U | SmartPeak | C | MS1, MS2 | Transitions + rule-based | yes | mzML, CSV | mzTab, XML, CSV | 2022 | yes | MIT | C++, Python |
| T + U | Smfinder | C | MS1, MS2 | Spectral Library + score | yes | mzML, mzXML | XLSX, TXT | 2020 | yes (9) | NA | Python, R, C++ |
Libraries and web applications for Pathway analysis, ontology mapping/classification, enrichment analysis, post-processing, visualization and statistical analysis. Remarks: (1) Library Rodin is used by the web application. (2) From molecular formulas. (3) Figshare id. (4) Based on lipid structural features. (5) Part of Bioconductor release 3.14. (6) Only the R package is open-source. (7) R package MetaboAnalystR 3.2 (2021). (8) Part of MZmine 2. (9) MZmine 3 release is planned for 2022.
| Category | Name | Type | Open Source | License | Programming Language | Last | Version |
|---|---|---|---|---|---|---|---|
| Ontology, Enrichment | Lipid Mini-On | Web application, Library (1) | yes | BSD 2-Clause | R | 2019 | 0.1.43 |
| Ontology, Enrichment | LION/web | Web application | yes | GPL v3 | R | 2020 | NA |
| Ontology, Enrichment | LipiDisease | Web application | no | NA | R | 2021 | NA |
| Ontology, Classification (2) | SMIRFE | Library | yes | NA | Python | 2020 | 187eb261983b6d0aca1c (3) |
| Ontology, Classification (4) | Lipid Classifier | Library | yes | A-GPL v3 | Ruby | 2014 | 0.0.0.1 |
| Ontology, Enrichment, Pathway Analysis | BioPAN | Web application | no | GPL v3 | PHP, R, HTML, JavaScript | 2020 | NA |
| Post-Processing | Goslin | Web application, Library | yes | MIT, Apache v2 | C++, C#, Java, Python, R | 2022 | 2.0 |
| Post-Processing | LipidLynxX | Web application, Library | yes | GPL v3 | Python | 2020 | 0.9.24 |
| Post-Processing | RefMet | Web application | no | NA | PHP, R | 2021 | NA |
| Post-Processing | LICAR | Web application | yes | MIT | R | 2021 | 1.0 |
| Statistical Analysis, Visualization | lipidr | Library | yes | MIT | R | 2021 | 2.8.1 (5) |
| Statistical Analysis, Visualization | LipidSuite | Web application | no | NA | R | 2021 | 1 |
| Statistical Analysis, Visualization | liputils | Library | yes | GPL v3 | Python | 2021 | 0.16.2 |
| Statistical Analysis, Visualization | MetaboAnalyst | Web application, Library | no (6) | GPL v2 | Java, R (7) | 2021 | 5.0 |
| Visualization | Kendrick mass-defect plots | Library (8) | yes | GPL v2 | Java | 2019 (9) | 2.53 |
| Statistical Analysis, Visualization | LUX Score | Web application, application | yes | Apache v2 | Perl, R, Python | 2018 | 1.0.1 |
Overview of databases and resources for lipidomics grouped by classification, specific support for lipids, general availability of lipid structures, support for different levels of structural resolution (shorthand notation), main type of lipid ontology supported, availability of mass spectral data, availability and cross-linking of biochemical reaction data and curation model. Remarks: (1) kingdom, superclass, class, subclass. (2) internal and through MassIVE. (3) via integration with multiple tools. (4) via MassIVE and other public repositories. (5) via search. (6) local and linked via SPLASH [119] to MONA, MassBank. (7) via search and shorthand abbreviation. (8) Original and Liebisch 2020. (9) via Metabolomics Workbench. (10) GP and GL only. (11) based on the submission format (Mass Bank format). (12) via reference to spectral data. (13) based on submission format ISA-Table. (14) based on submission format mwTab. (15) others are available, e.g., LIPID MAPS, SwissLipids. (16) metabolic pathways of lipid mediators. (17) not necessarily machine readable.
| Category | Name | Main Purpose | Lipid Specific | Lipid Structures | Structural Levels | Ontology | Spectral Data | Biochemical | Curation |
|---|---|---|---|---|---|---|---|---|---|
| Database | CCS-Compendium | Compendium of experimentally acquired Collisional Cross Section (Ion Mobility) data from molecular standards acquired on drift tube instruments | yes | yes | yes (1) | ClassyFire/ChemOnt | no | no | manual |
| Database | Panomics CCS | Collisional Cross Section (Ion Mobility) Database for Metabolites and Xenobiotics acquired on drift tube instruments | no | yes | no | no | no | yes | manual |
| Database | GNPS | Knowledge base for raw, processed or annotated fragmentation mass spectrometry data | no | yes | no | - | yes (2) | yes (3) | no (4) |
| Database | HMDB | Curated database of small molecule metabolites found in the human body | no | yes | yes (5) | ClassyFire/ChemOnt | yes (6) | yes | manual |
| Database | LIPID MAPS | Curated portal for LIPID MAPS lipid classification, experimentally determined structures, in-silico combinatorial structures and other lipid resources | yes | yes | yes (7) | LIPID MAPS (8) | yes (9) | yes | manual |
| Database | LipidHome | In-silico generated theoretical lipid structures | yes | yes (10) | no | Liebisch 2013 | no | no | manual |
| Database | SwissLipids | Curated database of lipid structures with experimental evidence and integration with biological knowledge and models | yes | yes | yes | Liebisch 2013 | no | yes | manual |
| Repository | MassBank | Curated database of mass spectrometry reference spectra | no | no | no | - | yes | no | manual (11) |
| Repository | MetaboLights | Repository for metabolomics data (MS and Nuclear Magnetic Resonance (NMR)) and metadata | no | yes | no | ChEBI | yes (12) | no | manual (13) |
| Repository | Metabolomics Workbench | Repository for metabolomics data (MS and NMR) and metadata | no | yes | yes | RefMet | yes | no | manual (14) |
| Repository | Metabolonote | Wiki-based repository for metabolomics metadata | no | no | no | - | yes (12) | no | manual |
| Repository | MetabolomeXchange | Aggregator of metabolomics metadata from MetaboLights, Metabolomics Workbench, Metabolonote and Metabolomic Repository Bordeaux | no | no | no | - | no | no | no |
| Repository | METASPACE | Repository for imaging mass spectrometry for metabolomics | no | yes | no | HMDB/ClassyFire/ChemOnt (15) | yes | no | manual |
| Resource | LimeMap | Curated CellDesigner XML and Vanted GML graph of lipid mediator pathways | yes | no | no (15) | - | no | yes (16) | manual |
| Resource | LipidWeb | Literature review and biochemistry of lipids | yes | yes (17) | no | - | yes (17) | yes (17) | manual |