Literature DB >> 25389922

Bioinformatics: the next frontier of metabolomics.

Caroline H Johnson¹, Julijana Ivanisevic, H Paul Benton, Gary Siuzdak.

Abstract

Entities: Chemical Disease Gene Species

Year: 2014 PMID： 25389922 PMCID： PMC4287838 DOI： 10.1021/ac5040693

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

× No keyword cloud information.

Bioinformatic tools are required to carry out essential functions such as statistical analyses and database functionalities. Now, they are also needed for one of the most difficult tasks, helping researchers decide which metabolites are the most biologically meaningful. This can be achieved through aiding the identification process, reducing feature redundancy, putting forward better candidates for tandem mass spectrometry (MS/MS), speeding up or automating the workflow, deconvolving the feature list through meta-analysis or multigroup analysis, or using stable isotopes and pathway mapping. This review thus focuses on the most recent and innovative bioinformatic advancements for identifying metabolites. A primary objective of metabolomics beyond biomarker discovery is to identify the most meaningful metabolites that correlate with disease pathogenesis or other perturbations of metabolism. Metabolites play important roles in biological pathways; their flux or differential regulation (dysregulation) can reveal novel insights into disease and environmental influences. Therefore, one of the most important goals of metabolomic analysis has been to assign metabolite identity so they can be used for further statistical and informed pathway analysis.[1,2] Over the past few years, technologies for analyzing metabolites by untargeted or targeted metabolomics have undergone extensive improvements. Strides to establish the most efficient protocols for experimental design, sample extraction techniques, and data acquisition have paid off providing robust complex data sets.[3−9] As more is being required of these data sets such as assigning identity and biological meaning to the features, bioinformatics is the area of metabolomics which is currently undergoing the most needed growth. It is often the case that metabolomic analysis results in a list of metabolites with low specificity for the disease or stimulus being studied (Figure 1). Some of these metabolites seem to be dysregulated in a variety of diseases such as acylcarnitines[10−13] and fatty acids.[14−17] They may be more indicative of a perturbed systemic cause (appetite, physical activity, diurnal rhythm changes, etc..), sample contamination, or instrumental/bioinformatic noise, rather than a specific biomarker of disease. An example of this can be seen in the analysis of urinary biomarkers of ionizing radiation, where dicarboxylic acids were downregulated in the rat after radiation exposure. It was proven that this observation was actually caused by a decreased appetite after radiation exposure perturbing the β-oxidation pathway and not from radiation-induced cellular changes.[18,19] Furthermore, dicarboxylic acids can leach out from plastics during the extraction process, further adding to the ambiguity of their role in ionizing radiation.[20]

Figure 1

Biomarkers that have high vs low disease specificity.

Biomarkers that have high vs low disease specificity. As well as identifying the correct source of the biomarkers, it is also important to identify their physiological role and how to utilize them as therapeutic targets. This first has to start with the identification of the metabolite and is determined by filtering thresholds set by the user which is intrinsically biased. These thresholds include those for fold change and p-value, which are highly dependent on the experiment; in vitro experiments would exhibit lower variation between biological replicates than in vivo. The ease of identifying the metabolite is also determined by its concentration in the sample and previous annotation in metabolite databases. Filtering thresholds for metabolite intensity that are set too high may omit important biologically meaningful metabolites rather than noise. Furthermore, a metabolite that is novel or not curated in a database may not be taken into consideration based on the chemical knowledge of the researcher and what they deem as meaningful. In order to transform the complex list of identified metabolites into markers of disease, or assign what role they play, bioinformatic tools can aid in identifying the potential pathways that the metabolite may belong to. It is then that the researcher can use this knowledge surrounding the biology of the metabolite to probe the mechanism of the disease. Untargeted metabolomics has already been used in such a manner to find the source of neuropathic pain.[21]N,N-Dimethylsphingosine was dysregulated in a rat model of neuropathic pain, furthermore when dosed to control rats it induced mechanical hypersensitivity. This metabolite implicated the sphingomyelin-ceramide pathway as a potential therapeutic target. Antimetabolite inhibitors of enzymes in this pathway were tested and were able to ameliorate neuropathic pain (unpublished data). This study holds promise for other metabolomic studies to maximize the potential information contained within the data for finding therapeutics of disease rather than only providing lists of dysregulated metabolites.

Streamlining Data Acquisition: The First Step

Data Alignment

One of the most important developments for untargeted liquid chromatography/mass spectrometry (LC/MS)-based metabolomics was nonlinear retention time (RT) alignment with XCMS using endogenous metabolites.[22] This realignment for untargeted analysis is important to match peaks representing the same analytes from different samples for comparative analysis; these peaks naturally drift between sample runs due to sample build up on the columns and physical changes to the column from the mobile phase, both which change the nature of the sample-stationary phase interaction. An often used technique involved spiking internal standards into samples prior to acquisition, but this was based on the assumption that RT deviations were linear.[23] Thus, XCMS was particularly poignant as there were no options for carrying out nonlinear realignment and untargeted LC/MS-based untargeted metabolomics (Figure 2). Since XCMS there have been a number of notable alignment algorithms developed including MZmine2.[24,25] New developments in columns and LC systems have improved RT drift considerably producing more reliable data.

Figure 2

Nonlinear retention time alignment by XCMS allows for untargeted metabolomic analysis.

Automated Metabolomics

The feature identification process is the most time-consuming and complex part of the metabolomic workflow. After annotation of peaks and statistical analyses, MS/MS data need to be acquired for the feature of interest. This data is then compared to metabolite databases and commercial standards for definitive identification and validation, a process that can take weeks to carry out but can be potentially shortened through integration of metabolite profiling and identification into a single autonomous mass spectrometry method. During the typical metabolomic workflow, MS1 data is acquired for each of the samples and a feature list table is produced displaying the results of the statistical analysis. The investigator will search through the table and pick out the features of interest that need to be identified, then manually search metabolite databases for putative identification. To confirm the identification, further mass spectrometry setup and run time is needed for the subsequent MS/MS analysis. XCMS Online already speeds up this process through its integration to METLIN,[26] the world’s largest metabolite database, which enables each feature, when possible, to have putative metabolite identification based on its mass-to-charge ratio (m/z) match. In addition METLIN has automated MS/MS matching to help confirm identity. RAMClust also has a semiautomated workflow where indiscriminant MS/MS (idMS/MS) data is used for automatic database searching.[27] Both the XCMS Online and RAMClust programs aid in the identification process but still require informed manual interpretation. A simple but effective way to shorten this acquisition-to-identification process from weeks to hours is to simultaneously acquire MS1 and MS/MS data over the duration of a LC/MS run, using an autonomous untargeted metabolomic workflow[26,28] (Figure 3). During this workflow MS1 data are preprocessed by XCMS; features are extracted, realigned for RT correction, and undergo statistical analysis. The MS/MS data are acquired automatically using data dependent acquisition (DDA); the MS1 data are scanned for precursor ions selected by predefined parameters. Automatic spectral matching of the MS/MS data to databases containing MS/MS spectra, such as METLIN, Human Metabolome Database (HMDB),[29] and MassBank[30] aid in putative identification. This autonomous approach has been optimized to achieve the correct balance of acquired MS1 and MS/MS data. A high scan speed can allow for greater MS/MS spectral coverage, but a scan speed that is too high can affect the quality of the spectra due to the lack of data points for each extracted ion chromatogram (EIC). In both cases adequate time is required to obtain high-quality spectra, which is mitigated by using quadrupole time-of-flights (QTOFs) with fast scan rates and high sensitivity. The autonomous workflow has many benefits for untargeted metabolomics; it saves weeks of mass spectrometry time as well as inspection time for manually picking out significant peaks. It also saves sample as repeat injections are not needed. However, the downfall is its reliance on metabolite databases which are not yet fully comprehensive. METLIN has the largest number of MS/MS spectra (>12 000 metabolites with high-resolution MS/MS data) and is growing, which will improve the likelihood of successful matches with this workflow.

Figure 3

Overview of autonomous metabolomics. MS1 and MS/MS spectra acquisition, relative comparative analysis, and metabolite identification are carried out simultaneously.

Overview of autonomous metabolomics. MS1 and MS/MS spectra acquisition, relative comparative analysis, and metabolite identification are carried out simultaneously. XCMS has been utilized in another semiautomated acquisition and processing workflow along with RT prediction to aid in metabolite identification.[31] This workflow performs feature detection, RT correction, gap filling, feature annotation, in silico fragmentation, and spectral matching to databases.[31] A nearline (nearly online) DDA and MS/MS processing step using MetShot (an R package) is also incorporated; MS/MS experiments are automatically generated from a ranked list of interesting precursor features within the same analysis, it uses defined filters which results in the acquisition of only relevant spectra.[32] The filters include sorting and prioritizing features by p-value or fold change, selecting features related to quasi-molecular ions, and the removal of ions that would be too low for MS/MS analysis. It also separates features from coeluting and cofragmenting compounds and places them into separate MS/MS files. To further aid in metabolite identification, the spectra can be compared to mass spectral databases. In silico prediction tools for fragmentation such as MetFrag can further aid putative identification to help overcome the incompleteness of the databases.[33] More recently MetFusion has been developed which combines MassBank, METLIN, or the Golm Metabolome Database[34] libraries with MetFrag to improve the rank of the correct candidate.[35] For each candidate metabolite returned by MetFrag, RT prediction further limits the number of potential candidates based on physiochemical properties (lipophilicity) that may aid in metabolite identification.[36−39] Automated workflows that incorporate data upload, processing, identification, and pathway analyses will thus markedly improve the efficiency of the metabolomic workflow.

Data Streaming For Cloud-Based Metabolomics

To overcome the challenges involved in uploading metabolomic data files to servers, a streaming approach was developed.[40] Data upload can sometimes take more than 1 day due to limitations set by the speed of data transfer over the Internet. The software developed allows the acquired data to be uploaded from the instrument computer workstation directly to XCMS Online where they are converted and processed. This concept of data streaming reduces the mean wait time for complete data processing from 20 h to fewer than 3 h (Figure 4). The upload speed is dependent on data size and the Internet connection, but there is a marked improvement in the efficiency of the untargeted metabolomic workflow, as the data upload occurs parallel to the data acquisition. In addition, simultaneous MS/MS data are only acquired for the features of interest based on statistical thresholds and matches to metabolite databases from the processed files already uploaded and analyzed by XCMS Online.

Figure 4

XCMS-based data streaming workflow (top left) allows data upload and processing after each LC/MS run is performed, dramatically reducing the processing time after the data are acquired for the final sample (top right). A thousand XCMS Online data sets were examined for their average processing time without streaming. For low-resolution data (∼1.4 GB) and high-resolution data (∼14.0 GB) over 10 and 20 h was required after the final LC/MS analysis was performed, respectively. Streaming allowed a 7-fold decrease in average processing time after data acquisition. Reprinted from ref (40). Copyright 2014 Nature America, Inc.

Feature Analysis

Mass Spectral Annotations

In order to accurately identify metabolites from LC/MS data, it is first essential to elucidate which features are isotopologue ions, adduct ions ([M + Na]+, [M + Cl]−), multiply charged ions or fragment ions ([M + H – H2O]+). These all exist from the ionization process.[41,42] Through correct annotation, the complexity of the data sets can be reduced and features of biological interest identified. LC/MS data generated from untargeted metabolomics experiments are typically processed using freely available bioinformatics software such as XCMS,[22,43] OpenMS,[44] apLCMS,[45] xMSanalyzer,[46] mzMatch,[47] or MZmine.[24] These platforms create, in different manners, grouped feature lists across multiple samples and classes of samples in which only a percentage of the ions are unique; a feature is defined as a two-dimensional bounded signal, a chromatographic peak (RT), and a mass spectral peak (m/z).[48] Annotation software recently introduced, use clustering principles to deconvolve the data, where isotopic or adduct ions will coelute and improve confidence in assigning identity. The software uses within sample, sample-to-sample, or Bayesian “priors” correlation to find clusters. A Bioconductor package in R, called Collection of Algorithms for Metabolite pRofile Annotation (CAMERA) was recently designed to be used in conjunction with XCMS.[48] It facilitates the annotation of isotopic peaks, adducts, and fragment ions in peak lists using RT-based grouping, peak shape integration/analysis, and intensity correlation. CAMERA can reduce the number of redundant features in the processed LC/MS data by approximately 50%, decreasing the number of false identifications and unnecessary MS/MS experiments that would potentially be carried out.[49,50] One of the disadvantages of CAMERA is that it is biased toward the most abundant features,[27] but even so, very low abundant peaks would anyway have a low likelihood of being identified due to the concentration constraints of MS/MS analysis. Another spectral matching-based annotation software, RAMclust,[27] works on the basis that two features resulting from the same metabolite will have similar RTs and abundances across different samples within a sample set. A similarity function allows the generation of spectra through grouping features from a single metabolite into a single cluster. Another aspect of RAMclust is the feature finding parameter, which carries idMS/MS, enabling manual interpretation and automatic database searching of feature clusters. Daly et al.[51] use a Bayesian “priors” approach called MetAssign. They point out that the level of confidence in annotations varies across metabolites and data sets. To improve confidence in peak annotation, probabilistic estimates of the presence/absence of metabolites are provided based on integration of information from multiple peaks. MetAssign is also based on the principle that several peaks of an isotopic series at the same RT should provide a higher confidence in a putative identification than a single noisy peak. When validated on an experimental data set, this software performed better than CAMERA and mzMatch[47] (another annotation software) for peak annotation. It also provided a measure of confidence for its putative annotations. Fernandez-Albert et al.[52] showed that peak aggregation (clustering) improved the statistical power of LC/MS data when using analysis of variance (ANOVA) to select features and multivariate methods such as partial least-squares discriminant analysis (PLS-DA)[53] and support vector machines (SVM).[54] They used four peak aggregation methods to take advantage of and solve high variable collinearity. Two of the clustering methods, “Non-Negative Matrix Factorization (NMF) Reduction” and “Principal Component Analysis (PCA) Decomposition” significantly improved the detection of significant features. The recent surge of papers for improving peak annotation shows the importance of reducing redundant features and integrating a confidence level for the annotations may aid in more efficient identification.

Credentialing

Even though these tools can aid in annotating multiple features for a single metabolite and thus reduce the redundancy of many of the metabolite features, a new tool introduced by Mahieu et al.[55] aids in identifying features, which are of biological origin only. The idea is based on the principle that untargeted metabolomic analysis results in thousands of biological and artifactual features. Omitting these artifacts that can arise from contaminants (from sample extraction or carryover from previous experiments), chemical/background noise detected by the MS, and bioinformatic noise (misannotation during processing) will allow those more important biological features to be assessed. This process of distinguishing features of biological origin from artifactual ones is called “credentialing”. To assess the efficacy of the credentialing algorithm, Escherichia coli (E. coli) was grown in media containing natural-abundance 12C glucose or media containing U–13C glucose. The cultures were mixed together in defined ratios and analyzed by untargeted LC/MS metabolomics. The credentialing algorithm identified and credentialed features based on isotope-intensity ratios; intensities from coeluting isotopologue pairs compared to the values expected in the culture volume ratios. Signals of biological origin could thus be distinguished from artifactual ones. This tool allowed the authors to optimize the bioinformatic analysis, reducing overall noise features by 15% and increasing biological feature detection by 20%. Another advantage to this tool, like the others aforementioned that annotate unwanted feature peaks, is that the list of features for MS/MS is dramatically reduced; when credentialing was applied to the E. coli data set it reduced the number of candidates from 23 567 to 2 912. Even if all these metabolites cannot be correctly identified, knowing that the ones targeted for analysis are of biological origin effectively improves the metabolomic workflow, and moves toward finding those that are meaningful. Similarly, others have used stable isotopes for peak annotation but do not provide enough specificity to remove all spurious peaks.[56−59] Unlike these methods, the 13C and 12C samples are run together to reduce RT variation, and the absolute mass differences of U–13C and U–12C metabolites are filtered rather than using predicted molecular formulas. Therefore, the credentialing approach limits the amount of noise and enhances the annotation of biologically relevant peaks, meanwhile the other workflows are better for improving formula annotation which would be useful for identification and have a lower false discovery rate.

Calculating Mass Measurement Errors

Metabolite identification can also be problematic in high throughput or large-scale LC/MS runs. During these long run times the mass accuracy suffers and the number of incorrectly assigned or redundant peaks dramatically increases. The mass accuracy is crucial for matching experimental accurate masses to those found in databases, an increase of 10 ppm (ppm) in the mass accuracy window results in a 10-fold increase in database hits.[60] The major factor in maintaining a high accuracy window of less than 5 ppm is the intensity of the ion signal.[61−64] This can be demonstrated when measuring the mass error of the lock mass signal; its two isotopic peaks which are at lower concentrations often have a larger ppm than the parent ion. Conversely when a sample is too concentrated and peaks are saturated, the mass accuracy can also suffer. When the mass accuracy shifts the mass error window needs to be widened to perhaps 10–22 ppm, which greatly increases the false positive rate. Methods to correct the mass accuracy while data is being acquired include using a reference mix of known ions which the mass spectrometer uses to calibrate during the run. Ions that occur naturally in most of the experimental runs can also be used. There are also prediction models that have been designed to estimate the mass measurement errors. These models do not change the acquired data but aid in reducing the false positive rate. One such model by Shahaf et al.[60] uses XCMS and CAMERA to process the data and annotate peaks. Mass measurement errors are estimated using an annotated library of reference metabolites which are obtained from multiple runs on the same MS instrument, containing information on peaks of related features such as isotopes and adducts. The data are grouped into bins that cover the available mass and intensity range and a prediction model applied to the data which takes into account the one-sided confidence levels of all the mass measurement errors. The model predicts the error for an ion peak’s mass-intensity pair within a range and is specific for the instrument on which it was carried out, making it a reusable model for the next experiment. A reduction in the false positive rate of 21% on a small data set was seen and could be very effective for larger high-throughput studies given that the prediction models are optimized specifically for an instrument.

Statistical Analysis: Visualization and Deconvolution

Multivariate and univariate statistical analyses can further aid in finding meaningful metabolites from the hundreds of filtered features postannotation. However, choosing the correct test is challenging for those without a background in bioinformatics. Recently, journals have become more stringent in making sure that articles submitted for publication use the correct statistical tests. Indeed Nature requires a statistical checklist to ensure articles have statistical adequacy. Gowda et al.[65] highlight the appropriate use of different statistical tests, such as when to use parametric vs nonparametric tests and paired vs unpaired tests for metabolomic analysis. Paired tests can be especially useful in human studies where there is high interindividual variability, comparing differences between two measurements (e.g., metabolic response before and after drug treatment) for each subject across the sample set. Developments in visualizing results and filtering features have facilitated characterization and structural identification in untargeted metabolomics. MetaboAnalyst[66] and XCMS Online both provide comprehensive statistical analysis tools which include univariate, multivariate, high-dimensional feature selection, clustering, and supervised classification analysis. Most recently the interactive cloud plot, interactive PCA, and interactive heat map were introduced to XCMS Online.[65,67] These interactive statistical read-outs let the user customize the display and choose the most valuable features. The cloud plot allows for instant visualization of each feature processed from the untargeted data.[67] The interactive version of this plot can be manipulated to remove isotopes, change thresholds and ion-intensity ranges, and zooms in on areas of feature overlap.[65] By clicking on a feature it is possible to instantly obtain information about the feature; p-value, q-value, m/z, RT, intensity, and peak group. One of its strengths is that coeluting features and the hydrophilicity/hydrophobicity can be easily observed. An example of an interactive cloud plot can be seen in Figure 5. PCA and heatmaps have been used widely in metabolomic research; however, the interactive versions of these tools allow for instant modification by user definitions of criteria and visualization of underlying information (Figure 6).

Figure 5

Figure 6

Interactive heatmap with metabolomic data visualization. Each row represents a metabolite feature, and each column represents a sample. The Z-scale of each feature is plotted on the red-green color scale. When a feature annotation tile (m/z, RT, or p-value) is selected, its Box–Whisker plot, EIC (extracted ion chromatogram), MS spectrum, and putative METLIN matches appear.

Interactive multigroup cloud plot. Metabolite features whose level varies significantly (p < 0.01) across wild-type and mutant bacteria are projected on the cloud plot depending on their RT (x-axis) and m/z (y-axis). Each metabolite feature is represented by a bubble. Statistical significance (p-value) is represented by the bubble’s color intensity. The size of the bubble denotes feature intensity. When the user scrolls the mouse over a bubble, feature assignments are displayed in a pop-up window (m/z, RT, p-value, fold change). When a bubble is selected by a “mouse click”, the EIC, Box–Whisker plot, Posthoc, and METLIN hits appear on the main panel. Each bubble is linked to the METLIN database to provide putative identifications based on accurate m/z. The variation pattern of glutamic acid (m/z 146.0468, MS/MS METLIN match) across different mutants is shown by a box–whisker plot. Reprinted from ref (65). Copyright 2014 American Chemical Society. Interactive heatmap with metabolomic data visualization. Each row represents a metabolite feature, and each column represents a sample. The Z-scale of each feature is plotted on the red-green color scale. When a feature annotation tile (m/z, RT, or p-value) is selected, its Box–Whisker plot, EIC (extracted ion chromatogram), MS spectrum, and putative METLIN matches appear. One of the most useful tools for finding meaningful metabolites is meta-analyses and multigroup comparisons. With these analyses it is possible to observe shared metabolic patterns across multiple experiments and metabolite variation patterns across multiple data groups. Meta-analysis can prioritize interesting features by integrating data from multiple studies and help identify shared homologous patterns of metabolic variation across the results of multiple different experiments. Multigroup analysis on the other hand is an extension of the two-group/pairwise analysis that allows for the comparison of multiple classes and identifies features whose variation pattern is statistically significant across them. This type of analysis is particular useful for a time-course experiment. An example of multigroup analysis can be seen in Figure 5 where the metabolic response to stress was analyzed across different types of bacteria.

Targeted Validation

While not an aspect of the bioinformatic solution per se, it is worth noting that all untargeted metabolomic analyses require further validation to remove false positives and provide an additional level of confidence in the follow up biological experiments. To accomplish this, typically quantitative targeted analysis (triple quadrupole mass spectrometry (QqQ-MS)) are performed using multiple reaction monitoring (MRM). False positives can occur during untargeted analysis therefore carrying out targeted QqQ-MS with standards for each metabolite and can provide assurance that an accurate fold change and p-value is being reported.

Pathway Analysis: Putting Metabolomic Data into a Biological Context

On the quest to find meaningful metabolites in metabolomic data, the ability to relate the identified metabolites to the biological question at hand is imperative. Pathway/network tools can aid in elucidating the roles the metabolites play in multiple pathways. However, it is often the case that metabolomic analysis can result in a number of metabolites that are not related to each other, i.e., they are not in the same pathways or a thin coverage is providing for one particular pathway; this can be due to some metabolites being in flux, while others may be at a steady concentration and would not be differentially regulated. Furthermore, some metabolites cannot be mapped to any pathways. Therefore, it is far from trivial to visualize metabolites with respect to their presence and interactions in biochemical networks. There have been a number of recently developed programs which are designed to help with biological pathway and network analyses. As part of the streaming approach aforementioned, simultaneous MS/MS data are acquired for features of interest based on statistical thresholds and matches to metabolite databases from files already uploaded and analyzed by XCMS Online. When two or more putatively assigned metabolites are observed in the same pathway, the MS1 data are mined for other metabolites in that pathway and targeted for MS/MS analysis, essentially aiding pathway and network analysis. Thus, streaming allows for biology-dependent data acquisition (BDDA) to take place. BDDA differs from DDA as MS/MS is not triggered on the basis of ion intensity.[68] The BDDA concept was validated in the analysis of tumor samples, where four metabolites were found belonging to the same pathway (Figure 7).

Figure 7

Biology-dependent data acquisition relies on statistics generated after each sample run for mass spectrometry data acquisition decision making. The representative example shows a decreasing P value for a feature of interest over the time-course of data streaming. When the P value for the features reaches 0.001, MS/MS is performed. A two-tailed Wilcoxon signed-rank test was used to calculate the statistical significance for n = 28. Box and whisker plots display the full range of variation (whiskers, median with minimum–maximum; boxes, interquartile range). Reprinted from ref (40). Copyright 2014 Nature America, Inc. Another notable program is mummichog, which can aid in finding biological pathway activity.[69] Instead of identifying the metabolites before carrying out pathway/network analysis, mummichog predicts biological activity using the m/z values of both the statistically dysregulated and unchanged features. Genome-scale human metabolic networks are then mined for enriched pathways. These metabolic networks include KEGG,[70] Recon1,[71] and Edinburgh human metabolic network.[72] However, mummichog can incorrectly assign metabolites since one m/z could account for several different metabolites, but the program’s design is to find networks rather than individual metabolites. Indeed one of the major benefits of this program is that one can bypass the initial laborious metabolite identification process and move directly onto investigating whole pathways that are disrupted and can be targeted in future studies. It also eliminates user bias, where features are traditionally picked for identification based on interest and biochemical relevance.[50] The successes of this technology will likely progress as metabolic models improve and as metabolomic data becomes integrated into genome-scale metabolic models. Another approach aids visualization of metabolites in related pathways, through the creation of network “MetaMapp” graphs in Cytoscape.[73,74] These graphs integrate biochemical pathway and chemical relationships using KEGG reactant pair database,[70] Tanimoto chemical and NIST mass spectral similarity scores.[75] Differential expression of metabolite nodes are superimposed onto network graphs to help with the interpretation of their involvement in metabolic networks and facilitate biological interpretation. To overcome the issue of incomplete mapping and also metabolites that lack any putative identification or network mapping in current reaction databases, metabolites are associated together based on their chemical similarity as these compounds should be in theory metabolized to each other. However, this can result in a loss of overall biochemical clarity but still allows for visualization of their involvement in pathways, which would otherwise not be possible. The other advantages to this program are that it can be used on any type of acquired metabolomic data (MS or nuclear magnetic resonance spectroscopy) allowing for integration between multiple analytical platforms. Multiple species can be mapped at the same time as it is not constricted by genomic information, and the maps can be automatically updated as more additional information regarding the identification of the metabolites is gathered. Another notable method for network analysis is Metabolite Set Enrichment Analysis (MSEA), part of the MetaboAnalyst program.[66,76,77] MSEA is based on gene set enrichment analysis (GSEA) and is used to investigate the enrichment of predefined groups of functionally related metabolites rather than individual metabolites. This program was developed through the creation of one encompassing metabolite library using HMDB, PubChem,[78] ChEBI,[79] KEGG, BiGG,[80] METLIN, BioCyc,[81] Reactome,[82] and Wikipedia, and importantly includes reference concentrations for many metabolites. The main limitation of MSEA is that it is biased to mammalian studies therefore metabolomic experiments using other species will require separate metabolite sets. However, MSEA can automatically direct the investigator to biologically important pathways, and remove manual searching of pathway databases. When a list of compound names is entered into MetaboAnalyst, Over Representation Analysis (ORA) is performed and evaluates whether a metabolite set is represented more than expected by chance within the given compound list. Figure 8 shows an example of the output from a MSEA analysis of metabolites that were dysregulated after the treatment of rats with ionizing radiation exposure. A number of metabolic pathways were dysregulated, specifically, two metabolites, taurine and isethionic acid were mapped onto the taurine metabolism pathway.[19] MeltDB, like MetaboAnalyst is a software platform that processes raw data, carries out peak picking, statistical analysis, and visualization. It was first introduced in 2008 but has recently undergone a number of improvements including pathway mapping via ProMeTra.[83,84] ProMeTra allows visualization and mapping of metabolite, transcript, and protein ratios to metabolic pathways maps defined by the user.

Figure 8

An example of metabolite set enrichment analysis (MSEA) using MetaboAnalyst.[66] (A) Enrichment of metabolic pathways after hypergeometric test to evaluate whether urinary metabolites upregulated in rats after radiation exposure are represented more than expected by chance within a compound list. (B) Taurine (Tau) and isethionic acid (IseThio) were found to be involved in the same pathway; taurine metabolism. Other metabolites in the taurine pathway are not changed; taurocyamine (TauCyam), sulfoacetaldehyde (SulfoAcet), hypotaurine (HypTau), and taurocholic acid (TauCho). The new bioinformatic tools developed for pathway analysis use a number of different methodologies to achieve the same end-goal but have different advantages over each other. For example, some do not require the features to be identified first such as mummichog, which finds metabolic pathways based on m/z. Some require feature identification such as MetaMapp but can fill in gaps for metabolites not found in any pathways.[74] Others use the network analysis function as part of their workflow as in the autonomous/BDDA approach and also as part of the MetaboAnalyst software package.[66]

The Past and the Future: Stable Isotopes in Metabolic Analysis

Stable isotopes have been used in targeted MS-based metabolomics as internal standards to validate the identity of a metabolite and also for elucidating and tracing the involvement of metabolites in pathways for the past 60 years.[85−94] The stable isotopes have the same physicochemical properties as natural abundance metabolites but contain at least one atom that is one mass unit greater, thus the isotope can be distinguishable from its natural abundance counterpart. A known metabolite such as glutamine for example can be introduced into an in vitro or in vivo experiment with at least one of its atoms labeled with a stable isotope, such as [1-13C] or [5-13C]. The incorporation of the 13C-labeled atom(s) into other metabolites can show the potential impact of the labeled metabolite in pathways and on the biology of the system being studied. This is exemplified in a study where labeled 13C was transferred from [5-13C]glutamine to acetyl coenzyme A for lipid biosynthesis during hypoxia. The conversion of glutamine was traced by reductive flux, catalyzed by isocitrate dehydrogenase (IDH).[95] This study revealed a novel biological role for glutamine in lipogenesis during hypoxia. A follow-up study using a quantitative flux model with [U–13C] glutamine and glucose showed that fatty acid labeling from glutamine does occur, but simultaneously with oxidative flux, and not by net reductive IDH flux.[96] A logical progression from the stable isotope targeted metabolomic technology is to follow the full conversion of the labeled isotopes in an untargeted unbiased manner, allowing observation of the metabolic fate of the metabolites. For metabolites such as glucose, glutamine, or acetyl coenzyme A which are involved in multiple pathways, they can be used as surrogate markers to infuse into the experiment and assess widespread metabolic flux within a perturbed system (influenced by a disease, environmental factor, genetic change, microbial influence, xenobiotic use). However, infusing these precursors will also result in precursor-related metabolic perturbations. Furthermore, it is somewhat problematic to introduce labeled precursors at biologically relevant concentrations. In rodent models most of these metabolites will be completely metabolized within a few minutes and undetectable in biofluids by LC/MS. Designing an experiment that shows the direct involvement of a metabolite with the outcome of the disease pathogenesis may be more useful. For example a metabolite shown through conventional untargeted metabolomics to be downregulated in a disease, could be administered as a stable isotope, and reverse the disease phenotype. This would allow the metabolic fate to be revealed which may not have been seen by targeted analysis (due to only a small number of metabolites being targeted) or conventional untargeted analysis (due to the complexity of the data set). The simplest experiment would be to have a fully labeled metabolite, where all the 13C atoms would be transferred to its products and cofactors. More complicated experiments arise when the precursor metabolite is partially labeled, since the metabolite may split and those unlabeled atoms would not provide a labeled isotopomer. Glucose, for example, splits into glyceraldehyde 3-phosphate and dihydroxyacetone phosphate during glycolysis; these metabolites are converted into pyruvate or used for lipid biosynthesis, respectively. Partial labeling of glucose would allow the observation of how these atoms are distributed during glycolysis. Another issue is synchronizing metabolism. According to Bar-Joseph et al.,[97] cells need to be arrested so that they start at the same phase, and even then they may lose their synchronization after awhile. Therefore, determining what phase the cells are in will help decide what time point is best for sampling; this however is somewhat impossible for in vivo experiments but useful for in vitro validation studies. Similar to conventional untargeted metabolomics, which ultimately provides a list of dysregulated features, stable isotope untargeted metabolomics would provide the same list of metabolites but containing a metabolically derived stable isotope perturbed to the same extent (fold change and statistical significance) as the natural abundance metabolite. This would allow direct comparison of the paired isotopologues. This concept is one that many metabolomic researchers would seize upon to use but has been technologically difficult to set up due to many constraints; algorithms capable of detecting metabolite isotopologues are needed, and databases with MS/MS fragmentation data are required for identifying the isotopologues and isotopomers. Huang et al.[98] have made great strides into providing the bioinformatics tools to make this possible. Using the XCMS platform they have extended its capabilities to identify isotopologue groups that correspond to isotopically labeled compounds. The aptly named X13CMS program can thus be used to track the metabolism of isotopically labeled substrates in an untargeted manner revealing valuable insights into metabolic mechanisms. Another major advancement to stable isotope untargeted metabolomics has been the recent development of a database containing thousands of metabolite isotopes. This isotope-based database, isoMETLIN will allow users to find all possible isotopologues derived from METLIN and obtain MS/MS fragmentation data on isotopomers[99] (Figure 9).

Figure 9

Stable isotope untargeted and targeted metabolomics.

Conclusions

Recent developments in bioinformatic tools have enhanced the untargeted metabolomic workflow, enabling researchers to identify metabolite features from LC/MS data and assign their biological roles by identifying their involvement in chemical pathways. Experiments carried out on high-resolution mass spectrometers result in thousands of dysregulated features. One of the biggest obstacles has been deconvolution and identification of these features, the latter requiring highly biased manual interpretation by the researcher. Automated multistep workflows have alleviated this process by incorporating functions that remove redundant features and enhance the efficiency and efficacy of metabolite identification. The novel use of stable isotopes for untargeted metabolomics and feature annotation has further enhanced the ability of the investigator to recover biologically relevant metabolites. Another major advancement has been in the development of metabolite databases and in silico fragmentation tools to help identify these metabolites for functionality in biological pathways. Indeed the area of growth for bioinformatics in metabolomic research will be in finding the role of these metabolites, rather than creating lists of biomarkers without mechanistic implications. This is somewhat dependent on the further curation of metabolite MS/MS fragmentation data in metabolite databases as well as the development of network mapping tools.

93 in total

1. Metabolomics and first-trimester prediction of early-onset preeclampsia.

Authors: Ray O Bahado-Singh; Ranjit Akolekar; Rupasri Mandal; Edison Dong; Jianguo Xia; Michael Kruger; David S Wishart; Kypros Nicolaides
Journal: J Matern Fetal Neonatal Med Date: 2012-04-28

Review 2. Systems level studies of mammalian metabolomes: the roles of mass spectrometry and nuclear magnetic resonance spectroscopy.

Authors: Warwick B Dunn; David I Broadhurst; Helen J Atherton; Royston Goodacre; Julian L Griffin
Journal: Chem Soc Rev Date: 2010-08-17 Impact factor: 54.564

3. MetFusion: integration of compound identification strategies.

Authors: Michael Gerlich; Steffen Neumann
Journal: J Mass Spectrom Date: 2013-03 Impact factor: 1.982

4. In vivo 15N-enrichment of metabolites in suspension cultured cells and its application to metabolomics.

Authors: Kazuo Harada; Eiichiro Fukusaki; Takeshi Bamba; Fumihiko Sato; Akio Kobayashi
Journal: Biotechnol Prog Date: 2006 Jul-Aug

5. Metabolic flux profiling of Escherichia coli mutants in central carbon metabolism using GC-MS.

Authors: Eliane Fischer; Uwe Sauer
Journal: Eur J Biochem Date: 2003-03

6. An accelerated workflow for untargeted metabolomics using the METLIN database.

Authors: Ralf Tautenhahn; Kevin Cho; Winnie Uritboonthai; Zhengjiang Zhu; Gary J Patti; Gary Siuzdak
Journal: Nat Biotechnol Date: 2012-09 Impact factor: 54.908

7. Quantification and mass isotopomer profiling of α-keto acids in central carbon metabolism.

Authors: Michael Zimmermann; Uwe Sauer; Nicola Zamboni
Journal: Anal Chem Date: 2014-02-24 Impact factor: 6.986

8. Reactome knowledgebase of human biological pathways and processes.

Authors: Lisa Matthews; Gopal Gopinath; Marc Gillespie; Michael Caudy; David Croft; Bernard de Bono; Phani Garapati; Jill Hemish; Henning Hermjakob; Bijay Jassal; Alex Kanapin; Suzanna Lewis; Shahana Mahajan; Bruce May; Esther Schmidt; Imre Vastrik; Guanming Wu; Ewan Birney; Lincoln Stein; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2008-11-03 Impact factor: 16.971

9. Visualizing post genomics data-sets on customized pathway maps by ProMeTra-aeration-dependent gene expression and metabolism of Corynebacterium glutamicum as an example.

Authors: Heiko Neuweger; Marcus Persicke; Stefan P Albaum; Thomas Bekel; Michael Dondrup; Andrea T Hüser; Jörn Winnebald; Jessica Schneider; Jörn Kalinowski; Alexander Goesmann
Journal: BMC Syst Biol Date: 2009-08-23

10. The Edinburgh human metabolic network reconstruction and its functional analysis.

Authors: Hongwu Ma; Anatoly Sorokin; Alexander Mazein; Alex Selkov; Evgeni Selkov; Oleg Demin; Igor Goryanin
Journal: Mol Syst Biol Date: 2007-09-18 Impact factor: 11.429

34 in total

1. Metabolomics and Mycobacterial Disease: Don't Forget the Bioinformatics.

Authors: Jeffrey M Collins; Russell R Kempker; Thomas R Ziegler; Henry M Blumberg; Dean P Jones
Journal: Ann Am Thorac Soc Date: 2016-01

2. Perspectives on Data Analysis in Metabolomics: Points of Agreement and Disagreement from the 2018 ASMS Fall Workshop.

Authors: Erin S Baker; Gary J Patti
Journal: J Am Soc Mass Spectrom Date: 2019-08-22 Impact factor: 3.109

Review 3. Holistic Analysis Enhances the Description of Metabolic Complexity in Dietary Natural Products.

Authors: Charlotte Simmler; Daniel Kulakowski; David C Lankin; James B McAlpine; Shao-Nong Chen; Guido F Pauli
Journal: Adv Nutr Date: 2016-01 Impact factor: 8.701

4. Deciphering ocean carbon in a changing world.

Authors: Mary Ann Moran; Elizabeth B Kujawinski; Aron Stubbins; Rob Fatland; Lihini I Aluwihare; Alison Buchan; Byron C Crump; Pieter C Dorrestein; Sonya T Dyhrman; Nancy J Hess; Bill Howe; Krista Longnecker; Patricia M Medeiros; Jutta Niggemann; Ingrid Obernosterer; Daniel J Repeta; Jacob R Waldbauer
Journal: Proc Natl Acad Sci U S A Date: 2016-03-07 Impact factor: 11.205