| Literature DB >> 32392884 |
Zhiqiang Pang1, Jasmine Chong1, Shuzhao Li2, Jianguo Xia1,3.
Abstract
Liquid chromatography coupled to high-resolution mass spectrometry platforms are increasingly employed to comprehensively measure metabolome changes in systems biology and complex diseases. Over the past decade, several powerful computational pipelines have been developed for spectral processing, annotation, and analysis. However, significant obstacles remain with regard to parameter settings, computational efficiencies, batch effects, and functional interpretations. Here, we introduce MetaboAnalystR 3.0, a significantly improved pipeline with three key new features: (1) efficient parameter optimization for peak picking; (2) automated batch effect correction; and 3) more accurate pathway activity prediction. Our benchmark studies showed that this workflow was 20~100X faster compared to other well-established workflows and produced more biologically meaningful results. In summary, MetaboAnalystR 3.0 offers an efficient pipeline to support high-throughput global metabolomics in the open-source R environment.Entities:
Keywords: batch effects; global metabolomics; pathway activity prediction; peak detection
Year: 2020 PMID: 32392884 PMCID: PMC7281575 DOI: 10.3390/metabo10050186
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1MetaboAnalystR 3.0 provides an optimized workflow for global metabolomics: optimized peak picking, automized batch effect correction, and improved pathway activity prediction.
Figure 2Time consumed by One Variable at A Time (OVAT), Isotopologue Parameter Optimization (IPO), MetaboAnalystR, and AutoTuner for parameter optimization on three different datasets. The evaluations were performed on a desktop computer (Ubuntu 18.04.3 with an Intel® Core™ i7-4790 CPU and 32 GB of memory).
Qualitative peak picking results of the different tools using different settings.
| Methods | Total Peaks | True Peaks | Quantified Consensus | Gaussian Peak Ratio |
|---|---|---|---|---|
| Default | 16,896 | 382 | 350 | 47.8% |
| IPO | 24,346 | 744 | 663 | 52.0% |
| AutoTuner | 25,517 | 664 | 603 | 40.5% |
| MetaboAnalystR 3.0 | 18,044 | 799 | 754 | 64.4% |
True peaks are peaks that match the targeted metabolomics results with m/z ppm <10 and RT difference <0.3 min. Qualified consensus refers to the peaks where the relative error of intensity ratio between the two groups is less than 50% compared with the actual concentration. Gaussian Peak Ratio is the ratio of peaks with shapes following a Gaussian distribution (cor > 0.9 and p < 0.05).
Figure 3Assessment of the performance of different tools utilizing the NIST 1950 serum dilution series. (A) Reliability Index (RI) vs. processing speed for three optimization strategies compared to the default. (B) A bar graph showing the number of peaks with good linearity (p < 0.001).
Figure 4Performance evaluation using Inflammatory Bowel Disease (IBD) data. Principal Component Analysis (PCA) of peaks profiled with (A) default parameters and (B) optimized parameters. (C) Performance of batch effect correction by different strategies. Among them, EigenMS behaved the best (indicated by *). (D) PCA of the optimized and batch corrected data.
The pathway enrichment results (top 20, Crohn’s disease vs. non-IBD) generated by mummichog v1.0.8 and v2.0. Insignificant pathways (p value > 0.05) are shown in grey text.
| Mummichog v1.0.8 | Mummichog v2.0 | ||
|---|---|---|---|
| Pathways | Pathways | ||
| Bile acid biosynthesis | 0.017199 | Bile acid biosynthesis | 0.011283 |
| Vitamin D3 (cholecalciferol) metabolism | 0.017526 | Vitamin E metabolism | 0.011321 |
| Vitamin E metabolism | 0.017966 | Vitamin D3 (cholecalciferol) metabolism | 0.014207 |
| Carnitine shuttle | 0.018084 | Galactose metabolism | 0.016026 |
| Glycosphingolipid metabolism | 0.021048 | Glycerophospholipid metabolism | 0.020464 |
| De novo fatty acid biosynthesis | 0.026554 | Carnitine shuttle | 0.021085 |
| Keratan sulfate degradation | 0.031317 | Chondroitin sulfate degradation | 0.025739 |
| Fatty Acid Metabolism | 0.032132 | Vitamin B2 (riboflavin) metabolism | 0.025739 |
| N-Glycan Degradation | 0.043912 | Vitamin H (biotin) metabolism | 0.025739 |
| Phosphatidylinositol phosphate metabolism | 0.053756 | Fatty acid oxidation | 0.025739 |
| Hexose phosphorylation | 0.069236 | Omega-6 fatty acid metabolism | 0.025739 |
| Fatty acid activation | 0.075044 | Glycosphingolipid metabolism | 0.041115 |
| Limonene and pinene degradation | 0.078492 | Phosphatidylinositol phosphate metabolism | 0.043604 |
| Chondroitin sulfate degradation | 0.082534 | Hyaluronan Metabolism | 0.04815 |
| Glycosphingolipid biosynthesis - globoseries | 0.082534 | Putative anti-Inflammatory metabolites formation from EPA | 0.04815 |
| Saturated fatty acids beta-oxidation | 0.082534 | Electron transport chain | 0.04815 |
| Heparan sulfate degradation | 0.082534 | Heparan sulfate degradation | 0.04815 |
| Glycerophospholipid metabolism | 0.09418 | Sialic acid metabolism | 0.061564 |
| Starch and Sucrose Metabolism | 0.13566 | Vitamin A (retinol) metabolism | 0.061564 |
| Ascorbate (Vitamin C) and Aldarate Metabolism | 0.14503 | Saturated fatty acids beta-oxidation | 0.061564 |
Figure 5The selection process of regions of interest (ROIs) that are enriched for true peak signals. Red dashes in (A) represent the bin boundaries used for sliding windows’ working to contain the most signal points. The whole spectrum is divided evenly into four bins. Four m/z windows (light red area) will slide within each bin respectively in parallel and select the window with the highest scan intensity sum in the retained m/z window. RT window (light red area) in (B) will slide across the entire RT dimension to get retention time regions with the highest scan signal intensity. (C) The intersected MS scan signals from both the m/z and RT dimensions containing four ROIs. (D) The zoomed-in view of the ROIs (note low intensity peaks are still abundant).
Batch effect correction methods available in MetaboAnalystR 3.0.
| Categories | Methods |
|---|---|
| QC Sample Independent | Combat [ |
| QC Sample Dependent | QC-RLSC [ |
| QC Metabolite Dependent | RUV-random [ |
| Internal Standards Dependent | NOMIS [ |