| Literature DB >> 35804309 |
Konstantinos Tzanakis1, Tim W Nattkemper2, Karsten Niehaus3, Stefan P Albaum4.
Abstract
BACKGROUND: Modern mass spectrometry has revolutionized the detection and analysis of metabolites but likewise, let the data skyrocket with repositories for metabolomics data filling up with thousands of datasets. While there are many software tools for the analysis of individual experiments with a few to dozens of chromatograms, we see a demand for a contemporary software solution capable of processing and analyzing hundreds or even thousands of experiments in an integrative manner with standardized workflows.Entities:
Keywords: Distributed analysis; Distributed storage; Large-scale metabolomics; Mass spectrometry data; Parallel processing
Mesh:
Year: 2022 PMID: 35804309 PMCID: PMC9270834 DOI: 10.1186/s12859-022-04793-w
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1This figure shows the data-flow during the processing and analysis steps of the combination of Apache Spark, Apache Cassandra and KNIME
Fig. 2This figure shows the steps of two of the workflows currently implemented in MetHoS. a Identification by exact mass, b Identification by spectral matching
Fig. 3This figure shows the conceptual data model of the Cassandra database
Fig. 4This figure shows the scalability of MetHoS with 1, 2, 4, 8 and 16 Spark workers compare to the time it takes to process 200 experiments. The processing was performed 3 times on the same 200 experiments for every number of workers
Fig. 5a Web interface of a project. b K-means clustering on all 4827 experiments (re-scaled). c PCA analysis of 144 experiments originating from whole blood, blood plasma and erythrocyte samples and 57 experiments originating from urine samples. d Pearson Correlation of 90 experiments on 112 compounds