| Literature DB >> 32630764 |
Elisa Benedetti1,2, Nathalie Gerstner2,3, Maja Pučić-Baković4, Toma Keser5, Karli R Reiding6,7, L Renee Ruhaak7,8, Tamara Štambuk5, Maurice H J Selman6, Igor Rudan9, Ozren Polašek10,11, Caroline Hayward12, Marian Beekman13, Eline Slagboom13, Manfred Wuhrer7, Malcolm G Dunlop14, Gordan Lauc4,5, Jan Krumsiek1,2.
Abstract
Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, here, we quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms: Liquid Chromatography - ElectroSpray Ionization - Mass Spectrometry (LC-ESI-MS), Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD), and Matrix Assisted Laser Desorption Ionization - Furier Transform Ion Cyclotron Resonance - Mass Spectrometry (MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the 'Probabilistic Quotient' method followed by log-transformation, irrespective of the measurement platform. This recommendation is further supported by an additional analysis, where we ranked normalization methods based on their statistical associations with age, a factor known to associate with glycomics measurements.Entities:
Keywords: data normalization; gaussian graphical models; glycomics
Year: 2020 PMID: 32630764 PMCID: PMC7408386 DOI: 10.3390/metabo10070271
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1Pipeline for the evaluation of different normalization methods for glycomics data. First, data are normalized with various approaches. From each processed dataset, a Gaussian Graphical Model (GGM) is inferred and compared to the known biochemical pathway of glycan synthesis. The result of this comparison is a quantitative overlap value that describes how well the estimated GGM represents known synthesis reactions. This overlap is then used to evaluate the normalization approach, where higher overlap corresponds to a better data normalization.
Summary of datasets.
| LC-ESI-MS | UHPLC-FLD | MALDI-FTICR-MS | ||||
|---|---|---|---|---|---|---|
| Dataset Name | Korčula 2013 | Korčula 2010 | Split | Vis | CRC Controls | LLS |
| Glycans measured | IgG Fc | IgG Fc | IgG Fc | IgG Fc | IgG total | Total plasma |
| Number of peaks | 50 | 50 | 50 | 50 | 24 | 61 |
| Number of samples for analysis | 669 | 504 | 980 | 395 | 535 | 2056 |
| Age range | 18–88 | 18–98 | 18–85 | 18–91 | 21–74 | 30–80 |
LC-ESI-MS: Liquid Chromatography-ElectroSpray Ionization-Mass Spectrometry; UHPLC-FLD: Ultra High Performance Liquid Chromatography with Fluorescence Detection; MALDI-FTICR-MS: Matrix Assisted Laser Desorption Ionization-Fourier Transform Ion Cyclotron Resonance-Mass Spectrometry; CRC: Colorectal cancer; LLS: Leiden Longevity Study; IgG: Immunoglobulin G; Fc: Fragment crystallizable; SD: standard deviation.
Evaluated normalization methods.
| Normalization | Label | Group |
|---|---|---|
| Raw | Raw | Basic Normalizations |
| Quantile per glycan | Quantile | |
| Rank per glycan | Rank | |
| Total Area | TA | |
| Median Centering | Median | |
| Probabilistic Quotient | Quotient | |
| Total Area + Probabilistic Quotient | TAQuotient | |
| log(Raw) | Raw log | Logarithm |
| log(Quantile per glycan) | Quantile log | |
| log(Rank per glycan) | Rank log | |
| log(Total Area) | TA log | |
| log(Probabilistic Quotient) | Quotient log | |
| log(Total Area + Probabilistic Quotient) | TAQuotient log | |
| (Quantile per glycan) per IgG subclass | Quantile subclass | Per Subclass |
| (Rank per glycan) per IgG subclass | Rank subclass | |
| (Total Area) per IgG subclass | TA subclass | |
| (Probabilistic Quotient) per IgG subclass | Quotient subclass | |
| (Total Area + Probabilistic Quotient) per IgG subclass | TAQuotient subclass | |
| (log(Quantile per glycan)) per IgG subclass | Quantile log subclass | |
| (log(Rank per glycan) per IgG subclass | Rank log subclass | |
| (log(Total Area)) per IgG subclass | TA log subclass | |
| (log(Probabilistic Quotient)) per IgG subclass | Quotient log subclass | |
| (log(Total Area + Probabilistic Quotient)) per IgG subclass | TAQuotient log subclass |
IgG: Immunoglobulin G; log: logarithm.
Figure 2Reference pathway for Immunoglobulin G (IgG) Liquid Chromatography-ElectroSpray Ionization-Mass Spectrometry (LC-ESI-MS) data. IgG glycans include monosaccharides such as mannose, N-acetylglucosamine, galactose, fucose, and sialic acid, and are synthesized by the incremental addition of single monosaccharides.
Figure 3LC-ESI-MS normalization analysis results (Korčula 2013 cohort). Results in the panels are colored according to type of normalization (left), log-transformation (center), or normalization per IgG subclass or total IgG (right). Bars represent the median of the Fisher’s exact test p-values over 1000 bootstrap samples, and error bars indicate the corresponding 95% confidence intervals.
Figure 4Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD) normalization analysis results (Colorectal cancer controls cohort). Results in the panels are colored according to type of normalization (left), or log-transformation (right). Bars represent the median of the Fisher’s exact test p-values over 1000 bootstrapping, and error bars indicate the corresponding 95% confidence intervals.
Figure 5Matrix Assisted Laser Desorption Ionization-Furier Transform Ion Cyclotron Resonance (MALDI-FTICR-MS) normalization analysis results (Leiden Longevity Study cohort). Results in the panels are colored according to type of normalization (left), or log-transformation (right). Bars represent the median of the Fisher’s exact test p-values over 1000 bootstrapping, and error bars indicate the corresponding 95% confidence intervals.
Fraction of glycans significantly associated with age (False Discovery Rate 0.01). Normalization approaches are sorted by decreasing average fraction of significant associations.
| Platform | LC-ESI-MS | UHPLC-FLD | MALDI-FTICR | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Dataset | Korčula 2013 | Korčula 2010 | Split | Vis | LC-ESI-MS | CRC Controls | LLS | Weighted Average across Platforms | |
| Normalization | |||||||||
| TAQuotient log | 0.340 | 0.680 | 0.700 | 0.740 | 0.615 | 0.625 | 0.590 | 0.610 | |
| Quotient log | 0.340 | 0.660 | 0.700 | 0.740 | 0.610 | 0.625 | 0.590 | 0.608 | |
| Quotient | 0.320 | 0.660 | 0.740 | 0.680 | 0.600 | 0.583 | 0.574 | 0.586 | |
| TAQuotient | 0.320 | 0.660 | 0.740 | 0.660 | 0.595 | 0.583 | 0.574 | 0.584 | |
| TA log | 0.360 | 0.700 | 0.760 | 0.700 | 0.630 | 0.542 | 0.475 | 0.549 | |
| TA | 0.300 | 0.720 | 0.780 | 0.720 | 0.630 | 0.500 | 0.475 | 0.535 | |
| Quantile | 0.220 | 0.600 | 0.700 | 0.640 | 0.540 | 0.000 | 0.279 | 0.273 | |
| Raw | 0.180 | 0.560 | 0.700 | 0.620 | 0.515 | 0.000 | 0.279 | 0.265 | |
| Rank | 0.220 | 0.520 | 0.700 | 0.580 | 0.505 | 0.000 | 0.262 | 0.256 | |
| Quantile log | 0.220 | 0.560 | 0.700 | 0.580 | 0.515 | 0.000 | 0.246 | 0.254 | |
| Median | 0.220 | 0.520 | 0.560 | 0.640 | 0.485 | 0.000 | 0.246 | 0.244 | |
| Raw log | 0.220 | 0.560 | 0.680 | 0.540 | 0.500 | 0.000 | 0.213 | 0.238 | |
| Rank log | 0.140 | 0.400 | 0.620 | 0.540 | 0.425 | 0.000 | 0.115 | 0.180 | |