Ahmed Mohamed1, Michelle M Hill1,2. 1. Precision & Systems Biomedicine Laboratory, QIMR Berghofer Medical Research Institute, Herston, QLD 4006, Australia. 2. Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Herston, QLD 4006, Australia.
Abstract
Advances in mass spectrometry enabled high throughput profiling of lipids but differential analysis and biological interpretation of lipidomics datasets remains challenging. To overcome this barrier, we present LipidSuite, an end-to-end differential lipidomics data analysis server. LipidSuite offers a step-by-step workflow for preprocessing, exploration, differential analysis and enrichment analysis of untargeted and targeted lipidomics. Three lipidomics data formats are accepted for upload: mwTab file from Metabolomics Workbench, Skyline CSV Export, and a numerical matrix. Experimental variables to be used in analysis are uploaded in a separate file. Conventional lipid names are automatically parsed to enable lipid class and chain length analyses. Users can interactively explore data, choose subsets based on sample types or lipid classes or characteristics, and conduct univariate, multivariate and unsupervised analyses. For complex experimental designs and clinical cohorts, LipidSuite offers confounding variables adjustment. Finally, data tables and plots can be both interactively viewed or downloaded for publication or reports. Overall, we anticipate this free, user-friendly webserver to facilitate differential lipidomics data analysis and re-analysis, and fully harness biological interpretation from lipidomics datasets. LipidSuite is freely available at http://suite.lipidr.org.
Advances in mass spectrometry enabled high throughput profiling of lipids but differential analysis and biological interpretation of lipidomics datasets remains challenging. To overcome this barrier, we present LipidSuite, an end-to-end differential lipidomics data analysis server. LipidSuite offers a step-by-step workflow for preprocessing, exploration, differential analysis and enrichment analysis of untargeted and targeted lipidomics. Three lipidomics data formats are accepted for upload: mwTab file from Metabolomics Workbench, Skyline CSV Export, and a numerical matrix. Experimental variables to be used in analysis are uploaded in a separate file. Conventional lipid names are automatically parsed to enable lipid class and chain length analyses. Users can interactively explore data, choose subsets based on sample types or lipid classes or characteristics, and conduct univariate, multivariate and unsupervised analyses. For complex experimental designs and clinical cohorts, LipidSuite offers confounding variables adjustment. Finally, data tables and plots can be both interactively viewed or downloaded for publication or reports. Overall, we anticipate this free, user-friendly webserver to facilitate differential lipidomics data analysis and re-analysis, and fully harness biological interpretation from lipidomics datasets. LipidSuite is freely available at http://suite.lipidr.org.
Biological lipids play important roles in cell and organismal function, ranging from cellular membrane structure and dynamics, to intercellular communication in the form of hormones and lipid mediators, and finally, as energy storage molecules. These varied lipid functions are enabled through a diversity of lipid classes and structures. Dysregulation of specific structures have been implicated in pathogenesis of a range of conditions including metabolic diseases, neurodegeneration and cancers. Advances in mass spectrometry instrumentation and methods for lipid structure elucidation have enabled high throughput profiling of hundreds to thousands of lipid species in a single analysis. Mass spectrometry-based lipidomics has become a vital part of biomedical and life science research (1), complementing information from genomics, proteomics and metabolomics.In a generic lipidomics workflow, lipids extracted from biological sources are subjected to mass spectrometry with or without liquid chromatography separation. Internal standards should be included to ensure similar extraction efficiency, mass spectrometry loading, and for absolute quantitation in targeted workflows. The relative levels of observed lipid species are measured by the intensity or area under the peak. In untargeted workflows, lipid annotation is made by matching tandem mass spectra (MS/MS) to spectral libraries. Software tools developed for lipid annotation were recently reviewed by Züllig and Köfeler (2). Most of the software tools require a level of coding proficiency by the user, there are a few web servers that enable wider accessibility by researchers (Table 1). Metabolomics web servers such as MetaboAnalyst (3), Workflow4metabolomics (4) and MetaBox (5) offer comprehensive statistical analyses but do not offer lipid-specific features such as lipid name parsing, enrichment and chain length analyses. On the other hand, lipidomics analysis web servers goslin (6) and LipidLynxX (7) address lipid name parsing, while Lipid Mini-On (8) and LION/web (9) also offer ontology enrichment and lipid chain length analyses (Table 1). However, none offer the ability to integrate lipid class enrichment and chain length analyses with a statistical workflow including normalization, to provide lipid-specific biological interpretation of the differential analysis results.
Table 1.
Comparison of features of web servers for lipidomics data analysis
Comparison of features of web servers for lipidomics data analysisHere, we present an integrated webserver for differential lipidomics which offers tools for the each step of the analysis workflow. Starting from annotated lipidomics data with the experimental groupings, LipidSuite offers options for all steps in lipidomics data analytics, from data quality control (QC), normalization, imputation to differential analysis. Uniquely, LipidSuite allows users to interactively process and visualize subsets of their data; users can choose or exclude certain samples or lipid molecules. The data analytics components of LipidSuite uses lipidr R-package (10), which includes univariate, multivariate statistics, lipid set enrichment and chain trend analysis. LipidSuite is freely accessible at http://suite.lipidr.org.
Overview of LipidSuite web server
LipidSuite is a user-friendly web application that allows advanced lipidomics analysis in a user-interactive fashion. The app main interface displays the analysis steps on an expandable menu on the left-hand side (Figure 1), with a guided help section on the right. The analysis steps include: Input, Data QC, Preprocessing, Data Exploration, Differential Analysis and Heatmap. The complete analysis workflow is demonstrated with two example datasets available on LipidSuite, as well as in a detailed tutorial video available on the landing page. We describe each step with the available options in detail below.
Figure 1.
LipidSuite web interface.
LipidSuite web interface.
Input
Lipidsuite analysis begins with upload of two files, containing the lipidomics measurements and the experimental sample annotation, respectively (Figure 2A). LipidSuite accepts three lipidomics data formats, (i) a Skyline CSV Export, useful for targeted lipidomics, (ii) a numerical matrix where lipids are rows and samples are columns and iii) mwTab file from Metabolomics Workbench (11). Notably, LipidSuite can utilize additional measures exported by Skyline, where users can include retention time, peak height or background in addition to the default peak area. Instructions for each file type is displayed when the file type is selected, with example files available for download. Lipid names should be provided in a LIPIDMAPS convention or in ‘Class XX:YY’ format (12) for LipidSuite to automatically extract class and chain information from lipid molecules.
Figure 2.
Dataset input process in LipidSuite. (A) Two files required as input, a file of lipidomics measurements (in Skyline, numerical matrix or mwTab format), and a file with experimental sample annotations (in CSV format). (B) Overview of uploaded datasets, showing lipid class distribution and lipid name errors. (C) Interactive editing of mismatching lipid names.
Dataset input process in LipidSuite. (A) Two files required as input, a file of lipidomics measurements (in Skyline, numerical matrix or mwTab format), and a file with experimental sample annotations (in CSV format). (B) Overview of uploaded datasets, showing lipid class distribution and lipid name errors. (C) Interactive editing of mismatching lipid names.The experimental sample annotation file should be provided as a CSV-formatted file, where first column is matching the sample names provided in the lipidomics dataset. Additional columns represent sample grouping or annotations that can be subsequently used in plots and statistical analyses.
Data QC
Three functionalities are implemented in LipidSuite to allow thorough data inspection. Overview provides the user an opportunity to correct any parsing errors, while Samples and Lipids allows data inspection through interactive plotting and data subsetting focusing on particular sample(s) or lipid(s).
Overview
Total number of samples, number of lipids and number of lipid classes are summarized, and the distribution of lipid classes are displayed as a pie chart (Figure 2B). Errors in lipid name parsing are highlighted, which can be corrected by navigating to the ‘Lipids’ tab and using the Search function to locate the misnamed lipid molecule (Figure 2C). Interactive editing of sample annotation can be performed similarly from the ‘Samples’ tab.
Samples
Interactive plots allow visual inspection of sample quality, and ability to choose a subset of data, e.g. to inspect QC samples only, via the ‘Data Subset’ tab. Data subsetting (Figure 3) can also be utilized to explore intensity distribution of certain lipid classes across samples. Users can customize the displayed plots by choosing plot type, measure and color annotation. Plots are interactive, and can also be downloaded for record.
Figure 3.
Interactive user-driven analysis workflow. LipidSuite allows users to try out different processing and analysis options, and directly assess the results with interactive visualization. Then, with the novel data subsetting functionality, users can focus on, or exclude subsets of their datasets, and redo the analysis and visualization on sliced datasets. Once done, static plots can be exported as images for publication and reports.
Interactive user-driven analysis workflow. LipidSuite allows users to try out different processing and analysis options, and directly assess the results with interactive visualization. Then, with the novel data subsetting functionality, users can focus on, or exclude subsets of their datasets, and redo the analysis and visualization on sliced datasets. Once done, static plots can be exported as images for publication and reports.
Lipids
Similarly, lipid QC plots allow visual inspection of lipid molecules or classes, to determine any data quality issues. By default, plots show the variability (%CV) for each lipid molecule across all samples. Powered by the data subsetting functionality, users can explore the technical variability in intensity or retention time of lipid molecules by focusing only on QC samples. Low quality lipids or lipid classes can be removed from analysis if required.
Preprocessing
Before differential analysis, LipidSuite requires three pre-processing steps to be performed, namely summarization, imputation and normalization. In cases where a Skyline export file is used as input, the preprocessing steps can be applied on any of the exported measures. While this functionality is kept for versatility of the LipidSuite, it is often logical to apply preprocessing only to the measure representing lipid concentration.
Summarization
This is an essential step only for targeted lipidomics that report multiple transitions per lipid molecule, where it summarizes all transitions to calculate a single intensity value for each molecule. In other cases where summarization is irrelevant, users can choose ‘Don’t Summarize’ option from the drop-down list to skip this step. LipidSuite displays the number of lipid measurements before and after summarization.
Imputation
Since dealing with missing values is an essential step of analyzing lipidomics datasets, LipidSuite implements a wide range of imputation methods addressing different types of missingness (13). If the data is missing because of low abundance (missing not at random), users can use sophisticated QRILC (Quantile Regression Imputation of Left-Censored data) imputation method, impute missing values with minimum deterministic or probabilistic values, or replace them with zeros (14). On the other hand, if the data is missing at random, for example due to ion interference, users can utilize K-Nearest Neighbours, Singular Value Decomposition (13). Following imputation, LipidSuite displays a histogram of imputed intensities versus the true measurements, allowing users to assess the effect of imputation of the data distribution.
Normalization
To prepare lipidomics datasets for comparative analyses, LipidSuite allows normalization using Probabilistic Quotient Normalization (15,16), a widely used method for metabolomics data, which corrects for dilution factors across samples. Alternatively, users can normalize each lipid class against its spiked-in internal standards. To prevent low-intensity samples from skewing the normalization process, LipidSuite automatically detects and excludes ‘Blank’ samples. Similarly, users can exclude additional samples, such as outliers, from the normalization process. In cases where users provide a pre-normalized data, this step can be bypassed by selecting ‘Dataset is already normalized’. User can easily assess the effect of normalization methods on their datasets though boxplots of intensities before and after normalization.
Data exploration
Users can explore their datasets in unsupervised or supervised manner, using Principal Component Analysis (PCA) and Orthogonal Partial Least-Squares Discriminant Analysis (OPLS-DA), respectively. PCA offers a simple visual exploration for separation between sample groups. Users can color samples on the plot using any of the annotations. Data subsetting enables users to replot PCA without outlier samples and focus on certain lipid classes to show how much they contribute to the separation between sample groups. On the other hand, OPLS-DA reveals which groups of lipids are discriminative between two sample groups (17). Top associated lipids can be extracted from the loading plot.
Differential analysis
LipidSuite supports both two- and multi-group differential analysis using moderated Student's t-test and ANOVA, respectively (18). Results from the univariate Analysis is used for lipid set enrichment and chain trend analysis, therefore it must be conducted first.
Univariate analysis
The groups to be compared are selected using tickboxes. For two-group comparison, moderated Student t-test is applied and the outputs include a volcano plot, showing log2-fold changes (LogFC) and adjusted P-values, and a result data table. For multi-group comparison, moderated ANOVA is applied and LipidSuite displays a modified volcano plot, with the average intensity of lipids instead of LogFC, since mean differences are not calculated in ANOVA. Furthermore, LipidSuite allows users to easily adjust for confounding factors such as age, using the ‘Adjust by’ dropdown menu.
Lipid set enrichment
To assess the lipid sets that are preferentially altered in the chosen comparison, individual lipids are ranked by LogFC, enrichment scores are calculated for each lipid set as previously described (10). Enrichment plot is displayed in interactive or static mode, and the raw table can be exported as CSV or Excel.
Chain trend analysis
To investigate the correlation between chain characteristics and differential regulation, trend plots showing the trend in logFC obtained in the univariate analysis and chain lengths or total unsaturations.
Heatmap
The final data visualization offered by LipidSuite is an interactive heatmap of the selected dataset. The hierarchical clustering with customizable sample and lipid annotations allow users to generate a heatmap to show co-clustering of samples and lipids.
CONCLUSION
LipidSuite webserver is designed to facilitate differential lipidomics data analysis and datamining, after lipid annotation has been made using other tools. With a focus on biological interpretation, LipidSuite provides multiple options for data quality control, imputation, normalization, data subsetting and confounder adjustment. In addition to univariate statistical analysis and OPLS multivariate analysis. Finally, a customisable heatmap from unsupervised clustering is generated. Together, LipidSuite offers a convenient interface for life science researchers to conduct sophisticated lipidomics data analytics with biological interpretation. We anticipate this new free webserver to accelerate lipidomics research.
DATA AVAILABILITY
Any required links are present in the manuscript as described.
Authors: Geremy Clair; Sarah Reehl; Kelly G Stratton; Matthew E Monroe; Malak M Tfaily; Charles Ansong; Jennifer E Kyle Journal: Bioinformatics Date: 2019-11-01 Impact factor: 6.937
Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971
Authors: Manish Sud; Eoin Fahy; Dawn Cotter; Alex Brown; Edward A Dennis; Christopher K Glass; Alfred H Merrill; Robert C Murphy; Christian R H Raetz; David W Russell; Shankar Subramaniam Journal: Nucleic Acids Res Date: 2006-11-10 Impact factor: 16.971
Authors: Manish Sud; Eoin Fahy; Dawn Cotter; Kenan Azam; Ilango Vadivelu; Charles Burant; Arthur Edison; Oliver Fiehn; Richard Higashi; K Sreekumaran Nair; Susan Sumner; Shankar Subramaniam Journal: Nucleic Acids Res Date: 2015-10-13 Impact factor: 16.971
Authors: Nils Hoffmann; Gerhard Mayer; Canan Has; Dominik Kopczynski; Fadi Al Machot; Dominik Schwudke; Robert Ahrends; Katrin Marcus; Martin Eisenacher; Michael Turewicz Journal: Metabolites Date: 2022-06-23