| Literature DB >> 31548506 |
Jan Stanstrup1, Corey D Broeckling2, Rick Helmus3, Nils Hoffmann4, Ewy Mathé5, Thomas Naake6, Luca Nicolotti7, Kristian Peters8, Johannes Rainer9, Reza M Salek10, Tobias Schulze11, Emma L Schymanski12, Michael A Stravs13, Etienne A Thévenot14, Hendrik Treutler15, Ralf J M Weber16, Egon Willighagen17, Michael Witting18,19, Steffen Neumann20,21.
Abstract
Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.Entities:
Keywords: CRAN; NMR spectroscopy; R; bioconductor; compound identification; data integration; feature selection; lipidomics; mass Spectrometry; metabolite networks; metabolomics; signal processing; statistical data analysis
Year: 2019 PMID: 31548506 PMCID: PMC6835268 DOI: 10.3390/metabo9100200
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Overview of typical tasks in metabolomics workflows, ranging from metabolite profiling (left, green) via metabolite annotation (center, purple) to data analysis using statistics and metabolite networks (right, red).
Figure 2Dependency network of R packages. Shown in blue are packages mentioned in the review. Edges connect to packages that depend on another package, as long as they are in CRAN or BioC. Green nodes correspond to packages in CRAN or BioC not covered in the review. The inset shows the neighbourhood of the ptw package. Not shown are (1) infrastructure packages, e.g., rJava, Rcpp; (2) packages from the review without reverse dependencies; and (3) data packages. Some packages from the review are not in current versions of CRAN or BioC. An interactive version of this figure is also available online (rformassspectrometry.github.io/metaRbolomics-book, Appendix 2) and as Supplemental File S2.
R packages for mass spectrometry data handling and (pre-)processing.
| Functionalities | Package | Reference | Repo |
|---|---|---|---|
| Parser for common file formats: mzXML, mzData, mzML and netCDF. Usually not used directly by the end user, but provides functions to read raw data for other packages. | [ | BioC | |
| Infrastructure to manipulate, process and visualise MS and proteomics data, ranging from raw to quantitative and annotated data. | [ | BioC | |
| Export and import of processed metabolomics MS results to and from the mzTab-M for metabolomics data format. | [ | GitHub | |
| Converts MRM-MS (.mzML) files to LC-MS style .mzML. | GitHub | ||
| Infrastructure for import, handling, representation and analysis of chromatographic MS data. | GitHub | ||
| Infrastructure for import, handling, representation and analysis of MS spectra. | GitHub | ||
| Pre-processing and visualisation for (LC/GC-) MS data. Includes visualisation and simple statistics. | [ | BioC | |
| Automatic optimisation of | [ | BioC | |
| Parameter tuning algorithm for | BioC | ||
| Pre-processing and visualisation for (LC/GC-) MS data. Includes visualisation and simple statistics. | [ | BioC | |
| Peak picking with | [ | SF | |
| Pre-processing and alignment of LC-MS data without assuming a parametric peak shape model allowing maximum flexibility. It utilises the knowledge of known metabolites, as well as robust machine learning. | [ | SF | |
| Peak detection using chromatogram subregion detection, consensus integration bound determination and Accurate missing value integration. Outputs in | [ | GitHub | |
| Peak picking for (LC/GC-) MS data, improving the detection of low abundance signals via a master map of m/z/RT space before peak detection. Results are | BioC | ||
| [ | SF | ||
| (GC/LC)-MS data analysis for environmental science, including raw data processing, analysis of molecular isotope ratios, matrix effects, and short-chain chlorinated paraffins. | [ | CRAN | |
| Sequential partitioning, clustering and peak detection of centroided LC-MS mass spectrometry data (.mzXML), with Interactive result and raw data plot. | CRAN | ||
| Peak picking with | GitHub | ||
| KPIC2 extracts pure ion chromatograms (PIC) via K-means clustering of ions in region of interest, performs grouping and alignment, grouping of isotopic and adduct features, peak filling and Random Forest classification. | GitHub | ||
| Analysis of untargeted LC/MS data from stable isotope-labeling experiments. Also uses | [ | GitHub | |
| Correction of MS and MS/MS data from stable isotope labeling (any tracer isotope) experiments for natural isotope abundance and tracer impurity. Separate GUI available in IsoCorrectoRGUI. | [ | BioC | |
| Extension of | [ | ||
| Analysis of isotopic patterns in isotopically labeled MS data. Estimates the isotopic abundance of the stable isotope (either 2H or 13C) within specified compounds. | [ | GitHub | |
| Finding the dual (or multiple) isotope labeled analytes using dual labeling of metabolites for metabolome analysis (DLEMMA) approach, described in Liron [ | [ | CRAN | |
| Peak picking using peak apex intensities for selected masses. Reference library matching, RT/RI conversion plus metabolite identification using multiple correlated masses. Includes GUI. | [ | BioC | |
| Pre-processing for targeted (SIM) GC-MS data. Guided selection of appropriate fragments for the targets of interest by using an optimisation algorithm based on user provided library. | [ | BioC | |
| Deconvolution of SWATH-MS experiments to MRM transitions. | [ | ||
| Automatic analysis of large-scale MRM experiments. | [ | ||
| Tailors peak detection for targeted metabolites through iterative user interface. It automatically integrates peak areas for all isotopologues and outputs extracted ion chromatograms (EICs). | [ | GitHub | |
| Targeted peak picking and annotation. Includes | GitHub | ||
| Toolkit for working with Selective Reaction Monitoring (SRM) MS data and other variants of targeted LC-MS data. | GitHub | ||
| Deconvolution of SWATH-MS data. | [ | GitHub | |
| Targeted peak picking and annotation. All functions through | GitHub | ||
| Unsupervised data mining on GC-MS. Clustering of mass spectra to detect compound spectra. The output can be searched in NIST and ARISTO [ | [ | CRAN | |
| Pre-processing for GC/MS, MassBank search, NIST format export. | [ | CRAN | |
| Pre-processing using AMDIS [ | [ | BioC | |
| Deconvolution of GC-MS and GC×GC-MS unit resolution data using orthogonal signal deconvolution (OSD), independent component regression (ICR) and multivariate curve resolution (MCR-ALS). | [ | CRAN | |
| Corrects overloaded signals directly in raw data (from GC-APCI-MS) automatically by using a Gaussian or isotopic-ratio approach. | [ | CRAN | |
| Alignment of GC data. Also GC-FID or any single channel data since it works directly on peak lists. | [ | CRAN | |
| GC-MS data processing and compound annotation pipeline. Includes the building, validating, and query of in-house databases. | [ | BioC | |
| Peak picking for GC×GC-MS using bayes factor and mixture probability models. | [ | SF | |
| Peak alignment for GC×GC-MS data with homogeneous peaks based on mixture similarity measures. | [ | SF | |
| Peak alignment for GC×GC-MS data with homogeneous and/or heterogenous peaks based on mixture similarity measures. | [ | SF | |
| Chemometrics analysis GC×GC-MS: baseline correction, smoothing, COW peak alignment, multiway PCA is incorporated. | CRAN | ||
| Retention time and mass spectra similarity threshold-free alignments, seamlessly integrates retention time standards for universally reproducible alignments, performs common ion filtering, and provides compatibility with multiple peak quantification methods. | [ | GitHub | |
| Pre-processing of data from Flow Injection Analysis (FIA) coupled to High-Resolution Mass Spectrometry (HRMS). | [ | BioC | |
| Flow In-jection Electrospray Mass Spectrometry Processing: data processing, classification modelling and variable selection in metabolite fingerprinting | [ | GitHub | |
| Processing Mass Spectrometry spectrum by using wavelet-based algorithm. Can be used for direct infusion experiments. | [ | BioC | |
| Filtering of features originating from artifactual interference. Based on the analysis of an extract of E. coli grown in 13C-enriched media. | [ | GitHub | |
| Wrappers for | [ | BioC | |
| Processing of peaktables from AMDIS, | [ | BioC | |
| Parametric Time Warping (RT correction) for both DAD and LC-MS. | [ | CRAN | |
| R wrapper for X!Tandem software for protein identification. | [ | BioC | |
| Building, validation, and statistical analysis of extended assay libraries for SWATH proteomics data. | [ | BioC | |
| Split a data set into a set of likely true metabolites and likely measurement artifacts by comparing missing rates of pooled plasma samples and biological samples. | [ | CRAN | |
| Quality of LC-MS and direct infusion MS data. Generates a report that contains a comprehensive set of quality control metrics and charts. | GitHub | ||
R packages for ion species grouping, annotation, molecular formula generation and accurate mass lookup.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Uses GenForm for molecular formula generation on mass accuracy, isotope and/or MS/MS fragments, as well as performing MS/MS subformula annotation. | [ | GitHub | |
| Calculation of isotope fine patterns. Also adduct calculations and molecular formula parsing. Web version available at | [ | CRAN | |
| Molecular formula assignment, mass recalibration, signal-to-noise evaluation, and unambiguous formula selections are provided. | GitHub | ||
| Checking element isotopes, calculating (isotope labelled) exact monoisotopic mass, | CRAN | ||
| Simulation of and decomposition of Isotopic Patterns. | [ | BioC | |
| Grouping of correlated features into pseudo compound spectra using correlation across samples and similarity of peak shape. Annotation of isotopes and adducts. Works as an add-on to | [ | BioC | |
| Grouping of features based on similarity between coelution profiles. | [ | CRAN | |
| Cluster-based feature grouping for non-targeted GC or LC-MS data. | [ | CRAN | |
| Deconvolution of MS/MS spectra obtained with wide isolation windows. | [ | ||
| Uses dynamic block summarisation to group features belong to the same compound. Correction for peak misalignments and isotopic pattern validation. | [ | SF | |
| Isotope & adduct peak grouping, homologous series detection. | [ | CRAN | |
| Bayesian approach for grouping peaks originating from the same compound. | [ | ||
| Combination of data from positive and negative ionisation mode finding common molecular entities. | [ | CRAN | |
| Grouping of correlated features into pseudo compound spectra using correlation across sample. Annotation of isotopes and adducts. Can work directly with the | [ | ||
| Navigation of high-resolution MS/MS data in a GUI based on mass spectral similarity. | [ | BioC | |
| Bayesian probabilistic annotation. | [ | GitHub | |
| Isotope & adduct peak grouping, unsupervised homologous series detection. | [ | CRAN | |
| Automatic interpretation of fragments and adducts in MS spectra. Molecular formula prediction based on fragmentation. | [ | CRAN | |
| Automated annotation using MS/MS data or databases and retention time. Calculation of spectral and chemical networks. | [ | GitHub | |
| Screening, annotation, and putative identification of mass spectral features in lipidomics. Default databases contain ~25,000 compounds. | [ | BioC | |
| Automated annotation of fragments from MS and MS/MS and putative identification against simulated library fragments of ~500,000 lipid species across ~60 lipid types. | [ | GitHub | |
| Annotation of lipid type and acyl groups on independent acquisition-mass spectrometry lipidomics based on fragmentation and intensity rules. | [ | CRAN | |
| Accurate mass and/or retention time and/or collisional cross section matching. | [ | GitHub | |
| Downloads KEGG compounds orthology data and wraps the | [ | CRAN | |
| Paired mass distance analysis to find independent peaks in | [ | CRAN | |
| Pre-processing ( | [ | GitHub | |
| Putative annotation of unknowns in MS1 data. | [ | SF | |
R packages for MS/MS data.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Tools for processing raw data to database ready cleaned spectra with metadata. | [ | BioC | |
| From RT- | [ | GitHub | |
| Creating MS libraries from LC-MS data using | [ | GitHub | |
| Assess precursor contribution to fragment spectrum acquired or anticipated isolation windows using “precursor purity” for both LC-MS(/MS) and DI-MS(/MS) data. Spectral matching against a SQLite database of library spectra. | [ | BioC | |
| Automated quantification of metabolites by targeting mass spectral/retention time libraries into full scan-acquired GC-MS chromatograms. | [ | CRAN | |
| MS/MS spectra similarity and unsupervised statistical methods. Workflow from raw data to visualisations and is interfaceable with | [ | BioC | |
| Import of spectra from different file formats such as NIST msp, mgf (mascot generic format), and library (Bruker) to | GitHub | ||
| Multi-purpose mass spectrometry package. Contains many different functions e.g., isotope pattern calculation, spectrum similarity, chromatogram plotting, reading of msp files and peptide related functions. | CRAN | ||
| Annotation of LC-MS data based on a database of fragments. | [ | CRAN | |
| [ | GitHub | ||
| SOLUTIONS for High ReSOLUTION Mass Spectrometry including several functions to interact with | [ | GitHub | |
| Uses | [ | GitHub | |
| Retention time prediction based on compound structure descriptors. Five different machine learning algorithms are available to build models. Plotting available to explore chemical space and model quality assessment. | GitHub | ||
R packages for NMR data handling, (pre-)processing and analysis.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Interactive environment based on | [ | Bitbucket | |
| Basic processing and statistical analysis steps including several spectral quality assessment as well as pre-processing to multivariate analysis statistics functions. | GitHub | ||
| A tool for processing of 1H NMR data including apodisation, baseline correction, bucketing, Fourier transformation, warping and phase correction. Bruker FID can be directly imported. | [ | GitHub | |
| Spectra alignment, peak picking-based processing, quantitative analysis and visualisations for 1D NMR. | [ | CRAN | |
| I dentification and quantification of metabolites in complex 1D 1H NMR spectra. | [ | BioC | |
| Bayesian automated metabolite analyser for 1D NMR spectra. Deconvolution of NMR spectra and automatic metabolite quantification. Also identification based on chemical shift lists. | [ | RF | |
| Pre-processing and identification in an R-based GUI for 1D NMR. | [ | GitHub | |
| Analysis of 1D and 2D NMR spectra using a ROI-based approach. Export to MMCD or uploaded to BMRB for identification. | [ | ||
| CRAN | |||
| Handles hyperspectral data, i.e., spectra plus further information such as spatial information, time, concentrations, etc. Such data are frequently encountered in Raman, IR, NIR, UV/VIS, NMR, MS, etc. | CRAN | ||
| [ | BioC | ||
| An Integrated Suite for Genetic Mapping of Quantitative Variations of 1H NMR-Based Metabolic Profiles. mQTL.NMR provides a complete metabotype quantitative trait locus (mQTL) mapping analysis pipeline for metabolomic data. | [ | BioC | |
R packages for UV data handling and (pre-)processing.
| DAD | |||
|---|---|---|---|
| Functionalities | Package | Reference | Repos |
| Multivariate Curve Resolution (Alternating Least Squares) for DAD data. | [ | GitHub | |
| Collection of baseline correction algorithms, along with a GUI for optimising baseline algorithm parameters. | CRAN | ||
| Handles hyperspectral data, i.e., spectra plus further information such as spatial information, time, concentrations, etc. Such data are frequently encountered in Raman, IR, NIR, UV/VIS, NMR, MS, etc. | CRAN | ||
| Projection-based methods for pre-processing, exploring and analysis of multivariate data. | CRAN | ||
| Parametric Time Warping (RT correction) for both DAD and LC-MS. | [ | CRAN | |
R packages for statistical analysis of metabolomics data.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Estimate sample sizes for metabolomics experiments (NMR and targeted approaches supported). | [ | CRAN | |
| Within and between batch correction of LC-MS metabolomics data using either QC samples or all samples. | [ | GitLab | |
| Drift correction using QC samples or all study samples. | [ | GitHub | |
| Cross-contribution robust multiple standard normalisation. Normalisation using internal standards. | [ | CRAN | |
| Normalisation using a singular value decomposition. | [ | SF | |
| Functions for drift removal and data normalisation based on component correction, median fold change, ComBat or common PCA. | [ | ||
| Normalisation for low concentration metabolites. Mixed model with simultaneous estimation of a correlation matrix. | [ | SF | |
| Multiple fitting models to correct intra- and inter-batch effects. | [ | CRAN | |
| Normalisation based on removing unwanted variation [ | [ | CRAN | |
| Collection of functions designed to implement, assess, and choose a suitable normalisation method for a given metabolomics study. | [ | CRAN | |
| Support Vector Regression based normalisation and integration for large-scale metabolomics data. | [ | GitHub | |
| A collection of data distribution normalisation methods. | [ | ||
| Signal and Batch Correction for Mass Spectrometry. | GitHub | ||
| Chemometric analysis of NMR, IR or Raman spectroscopy data. It includes functions for spectral visualisation, peak alignment, HCA, PCA and model-based clustering. | BioC | ||
| Joint analysis of MS and MS/MS data, where hierarchical cluster analysis is applied to MS/MS data to annotate metabolite families and principal component analysis is applied to MS data to discover regulated metabolite families. | [ | GitHub | |
| A large number of methods available for PCA. | [ | BioC | |
| Estimate tail area-based false discovery rates (FDR) as well as local false discovery rates (fdr) for a variety of null models (p-values, z-scores, correlation coefficients, t-scores). | [ | CRAN | |
| GUI for statistical analysis using linear mixed models to normalise data and ANOVA to test for treatment effects. | [ | RF | |
| Many methods for corrections for multiple testing. | [ | BioC | |
| Derives stable estimates of the metabolome-wide significance level within a univariate approach based on a permutation procedure, which effectively controls the maximum overall type I error rate at the α level. | [ | GitHub | |
| Find Biomarkers in two class discrimination problems with variable selection methods provided for several classification methods (LASSO, Elastic Net, PC-LDA, PLS-DA, and | [ | CRAN | |
| Recursive feature elimination approach that selects features, which significantly contribute to the performance of PLS-DA, Random Forest or SVM classifiers. | [ | BioC | |
| General framework for building regression and classification models. | [ | CRAN | |
| Linear and non-linear Discriminant Analysis methods (e.g., LDA), stepwise selection and classification methods useful for feature selection. | [ | CRAN | |
| Unsupervised feature extraction specifically designed for analysing noisy and high-dimensional datasets. | [ | CRAN | |
| Various additions to PCA like PPCA, PPCCA, MPPCA. | [ | CRAN | |
| ANOVA-simultaneous component analysis (ASCA), figure of merit, PCA, Goeman’s global test for metabolomic pathways (Q-stat), Penalised Jacobian method (for calculating network connections), time-lagged correlation method and zero slopes method. It also includes centering and scaling functions. | CRAN | ||
| Performs variable selection in a multivariate linear model by estimating the covariance matrix of the residuals then use it to remove the dependence that may exist among the responses and eventually performs variable selection by using the Lasso criterion. | [ | CRAN | |
| Fits multi-way component models via alternating least squares algorithms with optional constraints: orthogonal, non-negative, unimodal, monotonic, periodic, smooth, or structure. Fit models include InDScal, PARAFAC, PARAFAC2, SCA, Tucker. | CRAN | ||
| Predictive multivariate modelling using PLS and Random Forest Data. Repeated double cross unbiased validation and variable selection. | [ | GitLab | |
| Probabilistic PLS-DA, Random Forest, SVM, GBM, GLMNET, PAM models for spectral data. | [ | BioC | |
| Performs the O2PLS data integration method for two datasets yielding joint and data-specific parts for each dataset. | [ | CRAN | |
| Package for performing Partial Least Squares regression (PLS). | [ | CRAN | |
| Variable selection methods for PLS, including significance multivariate correlation, selectivity ratio, variable importance in projections (VIP), loading weights, and regression coefficients. It contains also some other modelling methods. | [ | CRAN | |
| Decompose a tensor of any order, as a generalisation of SVD also supporting non-identity metrics and penalisations. 2-way SVD is also available. Also includes PCAn (Tucker-n) and PARAFAC/CANDECOMP. | [ | CRAN | |
| Non-parametric method for identifying differentially expressed features based on the estimated percentage of false predictions. | [ | BioC | |
| RF for the construction, optimisation and validation of classification models with the aim of identifying biomarkers. Also includes functionality for normalisation, scaling, PCA, MDS. | CRAN | ||
| Various multivariate methods to analyse metabolomics datasets. Main methods include PCA, Partial Least Squares regression, and extensions like PLS-DA and the orthogonal variants OPLS(-DA). | [ | BioC | |
| Fits multi-way component models via alternating least squares algorithms with optional constraints. Fit models include Individual Differences Scaling, Multiway Covariates Regression, PARAFAC (1 and 2), SCA, and Tucker Factor Analysis. | [ | CRAN | |
| Contains ordination methods such as ReDundancy Analysis (RDA), (Canonical or Detrended) Correspondence Analysis (CCA, DCA for binary explanatory variables), (Non-metric) MDS and other univariate and multivariate methods. Originally developed for vegetation ecologists, many functions are also applicable to metabolomics. | CRAN | ||
| Biomarker validation for predicting survival. Cross validation methods to validate and select biomarkers when the outcome of interest is survival. | CRAN | ||
| Pre-treatment, classification, feature selection and correlation analyses of metabolomics data. | GitHub | ||
| Components search, optimal model components number search, optimal model validity test by permutation tests, observed values evaluation of optimal model parameters and predicted categories, bootstrap values evaluation of optimal model parameters and predicted cross-validated categories. | CRAN | ||
| Robust identification of time intervals are significantly different between groups. | BioC | ||
| Identifies analyte-analyte (e.g., gene-metabolite) pairs whose relationship differs by phenotype (e.g., positive correlation in one phenotype, negative or no correlation in another). The software is also accessible as a user-friendly interface at | [ | GitHub | |
| Statistical framework supporting many different types of multivariate analyses, e.g., PCA, CCA, (sparse)PLS(-DA). | [ | CRAN | |
| Multi-omics base classes integrable with commonly used R Bioconductor objects for omics data; container that holds omics results. | [ | BioC | |
| Multiple co-inertia analysis of omics datasets (MCIA) is a multivariate approach for visualisation and integration of multi-omics datasets. The MCIA method is not dependent on feature annotation therefore it can extract important features even when they are not present across all datasets. | [ | BioC | |
| [ | BioC | ||
| STatistics in R Using Class Templates—Classes for building statistical workflows using methods, models and validation objects. | GitHub | ||
| Integration of omics data using multivariate methods such as PLS. Performs community detection and network analysis to allow visualisation of positive or negative associations between different datasets generated using samples from the same individuals. Also available as a | [ | GitHub | |
| Joint metabolic model-based analysis of metabolomics measurements and taxonomic composition from microbial communities. | [ | GitHub | |
| Mixture-model for accounting for data missingness’. | [ | BioC | |
| Kernel-Based Metabolite Differential Analysis provides a kernel-based score test to cluster metabolites between treatment groups, in order to handle missing values. | [ | CRAN | |
| Visualisation and imputation of missing values. VIM provides methods for the evaluation and visualisation of the type and patterns of missing data. The included imputation approaches are kNN, Hot-Deck, iterative robust model-based imputation, fast matching/imputation based on categorical variables and regression imputation. | [ | CRAN | |
| GUI for VIM. | CRAN | ||
| kNN-based imputation for microarray data. | [ | BioC | |
| Bootstrap-based algorithm and diagnostics for fast and robust multiple imputation for cross sectional, time series or combined cross sectional and time series data. | [ | CRAN | |
| Algorithms and diagnostics for the univariate imputation of time series data. | [ | CRAN | |
| Methods for the Imputation of incomplete continuous or categorical datasets. | [ | CRAN | |
| Random forest-based missing data imputation for mixed-type, nonparametric data. An out-of-bag (OOB) error estimate is used for model optimisation. | [ | CRAN | |
| Multivariate imputation by chained equations using fully conditional specifications for categorical, continuous and binary datasets. It includes various diagnostic plots for the evaluation of the imputation quality. | [ | CRAN | |
| Missing data imputation using an approximate Bayesian framework. Diagnostic algorithms are included to analyse the models, the assumptions of the imputation algorithm and the multiply imputed datasets. | [ | CRAN | |
| Iterative Gibbs sampler-based left-censored missing value imputation. | [ | GitHub | |
| Missing value imputation, filtering, normalisation and averaging of technical replications. | [ | SF | |
| HCA, Fold change analysis, heat maps, linear models (ordinary and empirical Bayes), PCA and volcano plots. Also includes functionality for log transformation, missing value replacement and methods for normalisation. Cross-contribution compensating multiple internal standard normalisation and remove unwanted variation. | [ | CRAN | |
| Data processing, normalisation, statistical analysis, metabolite set enrichment analysis, metabolic pathway analysis, and biomarker analysis. | [ | GitHub | |
| Pipeline for metabolomics data pre-processing, with particular focus on data representation using univariate and multivariate statistics. Built on already published functions. | [ | GitHub | |
| Framework for multi-omics experiments. Identifies sources of variability in the experiment and performs additional analysis (identification of subgroups, data imputation, outlier detection). | [ | BioC | |
| Performs entry-level differential analysis on metabolomics data. | [ | GitHub | |
| STRUCT wrappers (see above) for filtering, normalisation, missing value imputation, glog transform, HCA, PCA, PLS-DA, PLSR, | GitHub | ||
| Data transformation, filtering of feature and/or samples and data normalisation. Quality control processing, statistical analysis and visualisation of MS data. | GitHub | ||
| Quality control, signal drift and batch correction, transformation, univariate hypothesis testing. | GitHub | ||
| Missing value filtering and imputation, zero value filtering, data normalisation, data integration, data quality assessment, univariate statistical analysis, multivariate statistical analysis such as PCA and PLS-DA and potential marker selection. | GitHub | ||
| Univariate analysis (linear model), PCA, clustered heatmap, and partial correlation network analysis. Based on classes from the | GitHub | ||
| Outlier detection, PCA, drift correction, visualisation, missing value imputation, classification. | [ | CRAN | |
| Pre-processing, differential compound identification and grouping, pharmacokinetic parameter calculation, multivariate statistical analysis, correlations, cluster analyses and visualisation. | [ | CRAN | |
1 See also http://www.strimmerlab.org/notes/fdr.html for an exhaustive list of available FDR methods.
R packages for molecule structures and chemical structure databases.
| Structure Representation and Manipulation | |||
|---|---|---|---|
| Functionalities | Package | Reference | Repos |
| Subset of functions from the Chemistry Development Kit. Provide a computer readable representation of molecular structures and provide functions to import structures from different molecule structure description formats, manipulate structures, visualise structures and calculate properties and molecular fingerprints. | [ | CRAN | |
| Similar to | [ | BioC | |
| Provides conversion of structure representation through OpenBabel. | BioC | ||
| Exposes functionalities of the RDKit library, including reading and writing of SF files and calculating a few physicochemical properties. | GitHub | ||
| Read and write InChI and InChIKey from and to | GitHub | ||
| Maximum Common Substructure Searching using | [ | BioC | |
| Basic cheminformatics functions tailored for mass spectrometry applications, enhancing functionality available in other packages like | GitHub | ||
| Provides fingerprinting methods for | CRAN | ||
| Calculation of molecular properties. | [ | GitHub | |
| Querying information from PubChem. | CRAN | ||
| Querying information from various web services (CACTUS, CTS, PubChem, ChemSpider) as part of compound list generation. | [ | BioC | |
| Querying information from a large number of databases. | [ | CRAN | |
| R Interface to the ClassyFire REST API. | CRAN | ||
| Allows mapping of identifiers from one database to another, for metabolites, genes, proteins, and interactions. | BioC | ||
| Define utilities for exploration of human metabolome database, including functions to retrieve specific metabolite entries and data snapshots with pairwise associations. | BioC | ||
| Parsers for many compound databases including HMDB, MetaCyc, ChEBI, FooDB, Wikidata, WikiPathways, RIKEN respect, MaConDa, T3DB, KEGG, Drugbank, LipidMaps, MetaboLights, Phenol-Explorer, MassBank. | GitHub | ||
| Functionality to create and use compound databases generated from (mostly publicly) available resources such as HMDB, ChEBI and PubChem. | GitHub | ||
| Standardised and extensible framework to query chemical and biological databases. | GitHub | ||
R packages for network analysis and biochemical pathways.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Infrastructure for representation of networks, analysis and visualisation. | [ | CRAN | |
| Infrastructure for representation of networks, analysis and visualisation. | CRAN | ||
| Infrastructure for representation of networks, analysis and visualisation. | CRAN | ||
| Interactive visualisation and manipulation of networks. | [ | BioC | |
| Comparison of correlation networks from two experiments. | [ | CRAN | |
| Correlation-based networks from metabolomics data and analysis tools. | BioC | ||
| Putative annotation of unknowns in MS1 data. | [ | BioC | |
| Putative annotation of unknowns in MS1 data. | [ | SF | |
| Putative annotation of unknowns using MS1 and MS/MS data. | [ | GitHub | |
| Visualisation of spectral similarity networks, putative annotation of unknowns using MS/MS data. | [ | BioC | |
| Putative annotation of unknowns using MS/MS data, clustering of MS/MS data. | [ | BioC | |
| Putative annotation of unknowns using MS/MS data. | [ | GitHub | |
| Biochemical reaction networks, spectral and structural similarity networks. | [ | GitHub | |
| Correlation-based networks, structural similarity networks. | [ | GitHub | |
| Targeted metabolome-wide association studies. | [ | SF | |
| Generation of scale-free correlation-based networks. | [ | CRAN | |
| Analysis of -omics data, pathway, transcription factor and target gene identification. | [ | BioC | |
| MSEA a metabolite set enrichment analysis with factor loading in principal component analysis. | [ | CRAN | |
| Enrichment analysis of a list of affected metabolites. | CRAN | ||
| Network-based enrichment analysis of a list of affected metabolites. | [ | BioC | |
| Pathway-based enrichment analysis of a list of affected metabolites. | [ | CRAN | |
| Differential analysis, modules/sub-pathway identification using networks. | [ | GitHub | |
| Integrates metabolic networks and RNA-seq data to construct condition-specific series of metabolic sub-networks and applies to gene set enrichment analysis | [ | CRAN | |
| Differential analysis. | BioC | ||
| Biomarker identification. | [ | CRAN | |
| Biomarker identification. | [ | BioC | |
| Biomarker identification. | [ | GitHub | |
| Pathway activity profiling. | [ | BioC | |
| Pathway activity profiling. | [ | BioC | |
| Flux balance analysis. | [ | BioC | |
| Flux balance analysis. | CRAN | ||
| Flux balance analysis. | CRAN | ||
| Flux balance analysis. | CRAN | ||
| Identification of affected pathway from phenotype data (interface with | [ | BioC | |
| Identification of affected pathway from phenotype data (interface with | BioC | ||
| Interface to PathVisio and WikiPathways and pathway analysis and enrichment. | [ | GitHub | |
| Enrichment analysis of a list of genes and metabolites. | [ | GitHub | |
| Simulation of longitudinal metabolomics data based on an underlying biological network | CRAN | ||
| BioPax parser and representation in R. | [ | BioC | |
| Interface to KEGG, Biocarta, Reactome, NCI/Nature Pathway Interaction Database, HumanCyc, Panther, SMPDB and PharmGKB. | [ | BioC | |
| Interface to NCI Pathways Database. | BioC | ||
| Interface to KEGG. | [ | BioC | |
| Interface to KEGG. | [ | BioC | |
| Interface to systems biology markup language (SBML). | BioC | ||
| Interface to systems biology markup language (SBML). | BioC | ||
| Interface to Gaggle-enabled software (Cytoscape, Firegoose, Gaggle Genome browser). | BioC | ||
| Interface to molecular interaction databases. | BioC | ||
| Interface to KEGG REST server. | BioC | ||
| Interface to BioPAX OWL files and the Pathway Commons (PW) molecular interaction database. | [ | BioC | |
| Interface to WikiPathways. | [ | BioC | |
| Database that integrates metabolite and gene biological pathways from HMDB, KEGG, Reactome, and WikiPathways. Includes user-friendly R | [ | GitHub | |
R packages with multifunctional workflows.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Convenience wrapper for pre-processing tools ( | [ | BioC | |
| Pre-processing ( | [ | GitHub | |
| GitHub | |||
| Performs simultaneous raw data to mzXML conversion ( | [ | GitHub | |
| Pre-processing of large LC-MS datasets. Performs automatic PCA with iterative automatic outlier removal and, clustering analysis and biomarker discovery. | [ | GitHub | |
| Workflow for the systematic analysis of 1H NMR metabolomics dataset in quantitative genetics. Performs pre-processing, mQTL mapping, metabolites structural assignment and offers data visualisation tools. | [ | BioC | |
| Workflow for pre-processing, quality control, annotation and statistical data analysis of LC-MS and GC-MS-based metabolomics data to be submitted to public repositories. | [ | GitHub | |
| Framework mainly built on several already published packages. It supports data processing form different analytical platforms (LC-MS, GC-MS, NMR, IR, UV-Vis). | [ | GitHub | |
| Common interface for several different MS-based data processing software. It covers various aspects, such as data preparation and data extraction, formula calculation, compound identification and reporting. | GitHub | ||
| Processing of high resolution of LC-MS data for environmental trend analysis. | Zenodo | ||
| Workflow for pre-processing of LC-HRMS data, suspect screening, screening for transformation products using combinatorial prediction, and interactive filtering based on ratios between sample groups. | [ | GitHub | |
| Workflow to perform pre-processing, statistical analysis and metabolite identifications based on database search of detected spectra. | [ | GitHub | |
| GitHub | |||
| [ | GitHub | ||
| Pre-processing and visualising LC-MS data, as well as statistical analyses, mainly based on univariate linear models. | GitHub |
Categorisation of creating and sharing R code and data analysis functionality. Symbols indicate strengths (+, ++) or weaknesses (-, --) or neutral (o) assessment.
| Framework | Implementation Simplicity Low to High | User- Friendliness Low to High | Interactivity | Example URLs |
|---|---|---|---|---|
| R script | ++ | -- | - |
|
| R Markdown vignette | o | o | -- |
|
| Jupyter Notebook | o | + | + |
|
| - | ++ | + |
| |
| -- | ++ | ++ |
Packages to interface R with other languages and workflow environments.
| Functionalities | Package | Reference | Repos |
|---|---|---|---|
| Given an R function and its manual page, make the documented function available in Galaxy. | BioC | ||
| Integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents. | [ | CRAN | |
| Low-Level R to Java Interface. | CRAN | ||
| Interface to ‘Python’ modules, classes, and functions and translation between R and Python objects. | CRAN |
Metabolomics data sets packaged as R packages.
| Content | Package | Reference | Repos |
|---|---|---|---|
| 12 HPLC-MS NetCDF files (Agilent 1100 LC-MSD SL). | [ | BioC | |
| 16 UPLC-MS mzData files (Bruker microTOFq). | [ | BioC | |
| 12 UPLC-MS mzML files (AB Sciex TripleTOF 5600, SWATH mode). | [ | GitHub | |
| Different raw MS files (LTQ, TripleQ, FTICR, Orbitrap, QTOF) some in different formats (mzML, mzXML, mzData, mzData.gz, NetCDF, mz5). Also mzid format from proteomics. | BioC | ||
| Metadata and DDA MS/MS spectra of 15 narcotics standards (LTQ Orbitrap XL). | [ | BioC | |
| 183 × 109 peak table. | [ | BioC | |
| 69 × 5501 peak table. | [ | BioC | |
| 40 × 1632 peak table. | [ | CRAN | |
| Raw MS files from a set of blanks and standards that contain common environmental contaminants (acquired with Bruker maXis 4G). | GitHub | ||
| Proteomics, metabolomics GC-MS and Lipidomics data from Calu-3 cell culture; 3 mockulum treated and 9 MERS-CoV treated; Time point, 18 h from MassIVE dataset ids MSV000079152, MSV000079153, MSV000079154. | GitHub | ||
| 6 mzML files (human plasma spiked with 40 compounds acquired in positive mode on an orbitrap fusion). | BioC | ||
| mzML files (Thermo Exactive) from comparison of leaf tissue from 4 | GitHub | ||
| 52 × 154 peak table. | [ | BioC | |
| 18 × 189 peak table. | CRAN | ||
| 33 × 164 peak table. | CRAN | ||
| ASICSdata: 1D NMR spectra for ASICS. | [ | BioC | |