| Literature DB >> 31861212 |
Julijana Ivanisevic1, Elizabeth J Want2.
Abstract
Untargeted metabolomics (including lipidomics) is a holistic approach to biomarker discovery and mechanistic insights into disease onset and progression, and response to intervention. Each step of the analytical and statistical pipeline is crucial for the generation of high-quality, robust data. Metabolite identification remains the bottleneck in these studies; therefore, confidence in the data produced is paramount in order to maximize the biological output. Here, we outline the key steps of the metabolomics workflow and provide details on important parameters and considerations. Studies should be designed carefully to ensure appropriate statistical power and adequate controls. Subsequent sample handling and preparation should avoid the introduction of bias, which can significantly affect downstream data interpretation. It is not possible to cover the entire metabolome with a single platform; therefore, the analytical platform should reflect the biological sample under investigation and the question(s) under consideration. The large, complex datasets produced need to be pre-processed in order to extract meaningful information. Finally, the most time-consuming steps are metabolite identification, as well as metabolic pathway and network analysis. Here we discuss some widely used tools and the pitfalls of each step of the workflow, with the ultimate aim of guiding the reader towards the most efficient pipeline for their metabolomics studies.Entities:
Keywords: data processing; experimental design; liquid chromatography–mass spectrometry (LC-MS); metabolic pathway and network analysis; metabolism; metabolite identification; sample preparation; univariate and multivariate statistics; untargeted metabolomics
Year: 2019 PMID: 31861212 PMCID: PMC6950334 DOI: 10.3390/metabo9120308
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Common experimental designs. (A) Cross-over design involving a large patient cohort. Two drugs are administered sequentially to each patient, with a crucial washout period between each drug to enable the effects of each drug to be elucidated. (B) Factorial design, where both the gender of the subject and effect of the drug are being studied. (C) Common cross-sectional design in metabolomics studies, comparing controls and two drug dose levels in both genders.
Figure 2Setting up the data acquisition worklist to facilitate metabolite quantification and identification. Prior to batch run, the instrument should be conditioned (or “passivated”) using the pooled quality control (QC) of biological samples. During the conditioning, high-quality MS/MS data can be acquired in a data-dependent acquisition (DDA) mode by taking advantage of iterative injections through the application of PC-driven exclusion (of ions for which the MS/MS data have already been acquired). In this way, the amount of acquired high-quality MS/MS data will be maximized. The batch run can start (and end) with the analysis of diluted QC series that will serve to remove the features whose response is not linear; however, this removal should be performed carefully by evaluating low abundance features and those with saturation issues. Finally, samples should be run in a randomized fashion (considering the most important confounding factors, such as disease, sex, age, etc., depending on the experiment) with pooled QCs every 4–10 samples (depending on the size of the batch). Extracted blanks can be analyzed after the sample run and used for the removal of background (chemical and informatic) noise. Abbreviations: MS/MS data—fragmentation pattern, HRMS—high-resolution mass spectrometry, DDA—data-dependent acquisition, DIA—data-independent acquisition, AIF—all ion fragmentation (on Agilent or Thermo systems), MSE—all ion fragmentation on Waters systems-, SWATH—sequential window acquisition of all theoretical mass spectra or DIA strategy on Sciex systems, SONAR—scanning quadrupole DIA or DIA strategy on Waters systems.
MS/MS data acquisition modes with their advantages and disadvantages.
| MS/MS Data Acquisition Mode | Selection of Precursor Ions | Advantage | Pitfall |
|---|---|---|---|
| Selective or targeted MS/MS | Only selected ions specified on an inclusion list will be targeted | Highest quality MS/MS data | A posteriori acquisition, in a separate batch of analyses |
| Data-Dependent Acquisition (DDA) | Ions are selected for MS/MS acquisition in real-time based on threshold intensity: Top «n» ions are «picked» in each scan | High-quality MS/MS data and established link between precursor and product ions | High acquisition rates required. Selection of the most highly abundant ions each time, across multiple scans, resulting in low MS/MS coverage |
| Data-Independent Acquisition (DIA) | All fragment ions for all precursors are acquired simultaneously: All-ion-fragmentation (Q1 transmits the full mass range, 50–1700 Da of precursor ions in the collision cell: AIF, MSE) or with sequential mass windows (Q1 transmits several increments of 20–50 amu across the mass range in the collision cell: SWATH, SONAR, BASIC DIA—see | Improved coverage for low abundant precursor ions | High acquisition rates required. Difficulty of MS/MS data deconvolution to re-establish the link between the precursor and product ions |
Figure 3Overview of lipidomic data analysis (acquired by DDA) using MS-DIAL, the open-access software designed for simultaneous metabolite quantification and identification. Displayed are the MS/MS matched peaks (each lipid class is differently colored) with the example of phosphatidylcholine annotation using MS/MS matching against LipidBlast.
Criteria for feature filtering using QC and blank samples in order to reduce data complexity and remove redundancy.
| Parameter | Criteria | Outcome | Notes |
|---|---|---|---|
| Coefficient of variation (CV) | Choose threshold of variation, e.g., of metabolite peak area in repeated injections of QC sample | Remove metabolite features, e.g., with CV > 30% in QC samples * | CV cut-off values may be dependent on sample type, chromatography, or instrument parameters |
| Presence in study samples | Metabolite feature/peak must be present in a certain proportion of the study samples (and/or QCs) | Remove metabolite features present in only a low proportion of study samples | Certain peaks may only be present in one class of samples—adjust threshold accordingly |
| Presence in blank samples | Metabolite feature/peak must not be present in study samples/at very low levels | Remove metabolite features present in blank samples | Some metabolite features may be present in blank samples due to carryover—ensure multiple blanks have been run to address this |
| Response to dilution | Metabolite feature/peak must respond to dilution series with r2 > 0.8** | Remove metabolite features with r2 < 0.8 ** | Some metabolite features may be saturated at higher concentrations and so do not behave linearly—check raw data |
* Some groups recommend a lower cut-off, e.g., 20% [97]; ** this removal should be performed carefully by evaluating the features whose response may not be linear due to their low abundance.
Figure 4Simplified overview of PCA and OPLS-DA showing (A) good separation on PCA and OPLS-DA scores plots. High R2 and Q2 values indicate good model robustness and predictive capability. Permutation test indicates a valid model. (B) No separation on the PCA scores plot of PC1 vs. PC2, but separation is still achieved using OPLS-DA. In this instance, the model could be overfitted and unreliable. It is advisable to check for separation in other components, e.g, PC2 vs. PC3, as well as to assess R2 and Q2 and perform permutation tests. CV-ANOVA can also be used to assess model validity (not shown).
Major problems and solutions associated with metabolite identification in metabolomic datasets. The references for different tools are cited in the main text.
| Bottleneck | Cause | Solutions |
|---|---|---|
| Isomers or metabolites with identical mass (and molecular formula) but different structures |
Chromatographic resolution (i.e., separation by RT, chiral columns for stereoisomers) Ion mobility MS (IMS and/or cross-collision section—CCS values) MS/MS fragmentation pattern matching against experimentally acquired or in silico generated MS/MS databases (i.e., METLIN, mzCloud, NIST, MassBank, LipidBlast, LipidMaps, GNPS) | |
| Isobars or compounds of similar molecular weight produce interferences |
MS resolution (HRMS using TOF or Orbitrap mass analyzer) Chromatographic resolution (i.e., separation by RT) MS/MS fragmentation pattern matching as specified above Ion mobility MS (IMS and/or cross-collision section—CCS values) | |
| In-source fragments—due to production of ions (by loss of H2O, CO2, H3PO4) that have the same mass and/or structure as the molecular ions of other metabolites |
Chromatographic resolution (i.e., separation by RT) MS source with reduced in-source fragmentation | |
| “ |
In silico fragmentation tools and derived databases (e.g., CSI:FingerID coupled to Sirius, MetFrag, iMet, MS2LDA, MS-FINDER, etc.) and similarity matching (of experimentally acquired and in silico generated MS/MS) and network analysis (e.g., GNPS) RT prediction models (limited to specific columns and LC conditions) CCS prediction models and databases (e.g., MetCCS, LipidCCS) Multiple-stage tandem MS (MSn) | |
| “ |
Metabolite isolation and NMR analysis for structural elucidation LC-MS/MS analysis (RT, accurate mass, MS/MS) combined with above indicated tools for “ Multiple-stage tandem MS (MSn) |
List of selected open access web servers for interactive pathway visualization, metabolite mapping, and visualization in the context of pathways and metabolic networks, and metabolite set enrichment and overrepresentation analysis (MSEA, ORA).
|
|
|
| MeTexplore web server [ |
Metabolite mapping on metabolic pathways and networks Visualizing networks Mining and editing networks based on data and network structure (identify sub-networks connecting identified metabolites) Pathway enrichment analysis Mapping polyomics data Computing fluxes |
| Pathvisio [ |
Metabolite mapping on the pathways Pathway editing, drawing, and analysis Overrepresentation analysis |
| iPath—Interactive Pathways Explorer [ |
Metabolite mapping on the pathways Pathway editing and analysis |
| MetaboAnalyst* web server [ |
Metabolite ID conversion Enrichment analysis (ORA, MSEA) Pathway topology analysis Joint pathway analysis (genes and metabolites) MS peaks to pathways |
| PathBank [ |
Interactive database for visualizing metabolic pathways in different model organisms Metabolite (as well as gene, protein, drug) search and mapping Detailed description and references are provided for each pathway from energy metabolism, associated with metabolic diseases, drug-action pathways, drug metabolism pathways, signaling pathways |
| LION/web [ |
Web platform for lipid ontology enrichment analysis Lipid classification by chemical data (LIPIDMAPS), biophysical data, lipid functions and organelle associations |
| XCMS online* [ |
Activity network analysis i.e., “MS peaks to metabolic network” (integrated Integrated pathway analysis (using genome and proteome data, in addition to metabolome data) |
* Features relevant to pathway and network analysis have been listed here, MetaboAnalyst and XCMS online servers provide plenty of other functionalities related to data processing and analysis.
Figure 5Metabolite mapping on the metabolic networks—an overview of MetExplore network Viz functionalities. The projected network has been created from the list of chemical reactions (in the cart on the right side of the figure)—derived from the list of identified metabolites whose levels varied significantly (as a result of brain cell profiling). The extent of each pathway has been encircled and colored for visualization. Alanine, aspartate and glutamate metabolism, and arginine biosynthesis have been highlighted as enriched (using integrated ORA).
List of open access knowledge databases (used in the above listed web servers). Some databases have been extended into pathway browsers for interactive metabolite mapping. Although some databases are gene-centric, all of them are searchable for metabolites and represent a great source of biochemical knowledge for metabolite data interpretation.
| Database | Functionalities |
|---|---|
| KEGG database and pathway browser [ |
Metabolite mapping on metabolic pathways (with annotation of the direction of changes) |
| Reactome database and pathway browser [ |
Visualization of known biological processes and pathways from intermediary metabolism, signaling, transcriptional regulation, apoptosis, disease Metabolite mapping and pathway and network visualization and analysis Pathway enrichment analysis |
| Cyc databases (EcoCyc, HumanCyc, MetaCyc, BioCyc) [ |
Curated database of experimentally elucidated metabolic pathways from many different model organisms Metabolite, protein, reaction, and pathway search Comparison of specific pathway and metabolic networks of different organisms |
| Recon database [ |
Largest database of human and gut microbiome metabolism Searchable by metabolic reaction, metabolites and genes, by microorganism species, by disease, and by diet Organelle maps |
| WikiPathways database [ |
Pathway database maintained by scientific community Pathway browsing and editing |