| Literature DB >> 35208247 |
Johannes Rainer1, Andrea Vicini1, Liesa Salzer2, Jan Stanstrup3, Josep M Badia4,5, Steffen Neumann6,7, Michael A Stravs8,9, Vinicius Verri Hernandes1,10, Laurent Gatto11, Sebastian Gibb12, Michael Witting13,14.
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the MetaboCoreUtils, MetaboAnnotation and CompoundDb packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS1 properties such as m/z and retention times, followed by an MS2-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the CompoundDb package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor.Entities:
Keywords: R programming; annotation; metabolomics; reproducible research; small-compound databases; untargeted analysis
Year: 2022 PMID: 35208247 PMCID: PMC8878271 DOI: 10.3390/metabo12020173
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1R-package ecosystem for MS1- and MS2-based annotations. Functionality from the various R packages is combined for specific annotation tasks. The MetaboAnnotation package represents the main interface to the end user, while other packages such as MsCoreUtils, MetaboCoreUtils or Spectra provide the base functionality, which can also be easily integrated into other R packages or R-based workflows. A variety of input and output formats are supported, also enabling integration with other analysis software.
Listing of core utility functions for metabolite annotation. The first panel contains functions to work with chemical formulas followed by a panel with various utility functions. The last panel contains functions to calculate established spectra similarity scores.
| Function | Description | Package |
|---|---|---|
|
| Counts elements in chemical formulas. |
|
|
| Converts element counts to chemical formulas. |
|
|
| Removes elements from chemical formulas. |
|
|
| Adds elements to chemical formulas. |
|
|
| Standardizes formulas according to the Hill notation [ |
|
|
| Calculates exact masses from chemical formulas. |
|
| Converts between masses and |
| |
|
| Groups potential isotopologue peaks in MS1 data. |
|
|
| Matches numeric values accepting differences. |
|
|
| Normalized dot product [ |
|
|
| Normalized Euclidian distance [ |
|
|
| Normalized absolute values distance [ |
|
|
| Normalized spectra angle [ |
|
High-level functions to perform MS1 annotation. The algorithm used by matchMz can be selected and configured with a parameter object. Supported input objects are at present numeric, data.frame, SummarizedExperiment, CompDb and IonDb.
| Function | Parameter Object | Description |
|---|---|---|
|
|
| Performs |
|
|
| Matches |
|
|
| Performs |
|
|
| Matches |
Functions for MS2-based annotation. The first panel contains function to map peaks between compared spectra. The second panel high-level functions to perform spectra similarity calculations.
| Function | Parameter Object | Description |
|---|---|---|
|
| - | Maps peaks between two spectra accepting differences between the peaks’ |
|
| - | Hybrid search approach [ |
|
| - | Calculates pairwise similarity scores between two spectra objects. |
|
|
| Identifies spectra with a similarity score above a user-defined threshold. |
|
|
| Identifies spectra with a similarity score above a user-defined threshold and calculates in addition the reverse score. |
Figure 2Peak mapping strategies of the joinPeaks() function. Peaks returned by the joining strategy are highlighted in yellow, those not considered in red. Mapping strategies are named according to the join terminology in SQL. Top left: the outer join option reports all peaks from both spectra. Top right: the inner join reports only matching peaks from both spectra. Bottom left: the left join option includes all peaks from the query spectrum and matching peaks from the target spectrum. Bottom right: the right join includes all peaks from the target spectrum and only matching peaks from the query spectrum.
Figure 3Mirror plot created with plotSpectraMirror for visual inspection of MS2 annotation results. The upper panel shows an experimental MS2 spectrum and the lower a reference spectrum for Caffeine from HMDB. Matching peaks are highlighted in blue.