| Literature DB >> 29698490 |
Sandra Waaijenborg1, Oksana Korobko1, Ko Willems van Dijk2, Mirjam Lips3, Thomas Hankemeier4, Tom F Wilderjans5,6, Age K Smilde1, Johan A Westerhuis1,7.
Abstract
Combining different metabolomics platforms can contribute significantly to the discovery of complementary processes expressed under different conditions. However, analysing the fused data might be hampered by the difference in their quality. In metabolomics data, one often observes that measurement errors increase with increasing measurement level and that different platforms have different measurement error variance. In this paper we compare three different approaches to correct for the measurement error heterogeneity, by transformation of the raw data, by weighted filtering before modelling and by a modelling approach using a weighted sum of residuals. For an illustration of these different approaches we analyse data from healthy obese and diabetic obese individuals, obtained from two metabolomics platforms. Concluding, the filtering and modelling approaches that both estimate a model of the measurement error did not outperform the data transformation approaches for this application. This is probably due to the limited difference in measurement error and the fact that estimation of measurement error models is unstable due to the small number of repeats available. A transformation of the data improves the classification of the two groups.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29698490 PMCID: PMC5919515 DOI: 10.1371/journal.pone.0195939
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Average number of misclassifications.
| Method | LV = 3 | LV = 5 | LV = 7 |
|---|---|---|---|
Average number of misclassifications using (W)SCDA methods with different methods for measurement error variance stabilization methods.
Fig 1Measurement error variance.
Measurement error variance (y-axis) as a function of Mean Ratio (X-axis) for the Amines and three lipid groups (TG, PE, LPC) for the raw data and after SQRT, LOG and GLOG transformation, and the Rocke-Lorenzato (RL) estimates (in black) of the measurement error variance, and the median errors. The color in each column represents a metabolite and circles of the same color are obtained from different samples of which the error variance is estimated from the replicated analyses of that sample.
Fig 2Amine and lipid levels.
Amine levels and lipid levels for 15 Diabetic Obese individuals and 16 Healthy Obese individuals. Each color represents a different metabolite. The amine with large values for both groups is L-Glutamine.
Fig 3SCA loadings.
SCA loading 1 (X-axis) and SCA loading 2 (Y-axis), for centered data (left column) or autoscaled data (right column) and additionally block-scaled after square root (SQRT) transformation (top row), LOG-transformation (middle row) and GLOG transformation (bottom row). The amino acids are indicated in red, black are the lipids. L-Glutamine (highly levelled amino acid) is indicated in all plots.
Fig 4Effect of measurement error on SCA loading after MALS and weighted SCA model.
On the X-axis is the loading of PC1 and on the Y-axis the loading of PC2. After MALS, centering (top row) and autoscaling (2nd row) is applied. In the bottom row the loadings of the weighted SCA model are presented. In red are the amine loadings and in black the lipid loadings. Both the median error model (left column) and the Rocke Lorenzato error model (right column) are explored. L-Glutamine, the amine with large values is indicated in all plots.
Selected variables.
| Raw data | Transformation | |||||||
|---|---|---|---|---|---|---|---|---|
| Center | autoscaled | Sqrt + center | sqrt | Log + Center | log + | Glog + Center | Glog + Autoscaling | |
| PC.36.2 | L-leucine | Glycine | L-Leucine | 3-Methyl histidine | L-Leucine | L-glutamic acid | L-Leucine | |
| PC.34.2 | Glycine | L-proline | Glycine | L-Glutamic acid | Glycine | L-alpha-aminobutyric acid | Glycine | |
| PC.36.3 | DL3aminoisobutyric acid | L-arginine | N6N6N6trimethyllysine | Citruline | N6N6N6trimethyllysine | 3-Methyl histidine | N6N6N6trimethyllysine | |
| TG.52.2. | L-arginine | L.glutamicacid | DL3aminoisobutyric acid | L-pipecolic acid | L-argenine | Glycine | L-argenine | |
| L-lysine | L2-aminoadipicacid | L-Threonine | L-Arginine | Beta alanine | L2-aminoadipicacid | Citruline | L2-aminoadipicacid | |
| TG.50.1 | Epinephrine | PC.36.2 | L2-aminoadipicacid | Glycine | 3-Methyl histidine | L-argenine | 3-Methyl histidine | |
| L-Arginine | N6N6N6trimethyllysine | 3-Methylhistidine | 3-methylhistidine | L-methionine sulfoxide | DL3aminoisobutyric acid | TG.58.1 | DL3aminoisobutyric | |
| L-Proline | L-glutamic acid | PC.36.3 | epinephrine | L-alpha-aminobutyric acid | epinephrine | DL3aminoisobutyric acid | L.glutamicacid | |
| SM.d18.1.16.0. | 3-Methylhistidine | Citruline | L-glutamic acid | L-Arginine | L.glutamicacid | L2-aminoadipicacid | epinephrine | |
| TG.52.1 | Beta-alanine | L-glutamine | Beta-alanine | LPE2.26 | L-valine | TG.48.4 | L-valine | |
Top 10 of selected variables in the 25 cross validation models, with 5 principal components for the transformation methods
Selected variables for MALS methods.
| MALS RL | MALS MEdian | Modeling | ||||
|---|---|---|---|---|---|---|
| Center | Autoscaled | Center | Autoscaled | Median | RL | |
| TG5.22 | Taurine | L-proline | Taurine | PC.36.2 | PC.36.2 | |
| PC.34.2 | DL3aminoisobutyric acid | TG.52.2 | 3-methylhistidine | PC.34.2 | PC.34.2 | |
| PC.36.2 | L-leucine | TG.52.3 | N6N6N6trimethyllysine | L-proline | L-proline | |
| PC.36.3 | N6N6N6trimethyllysine | PC.36.4 | PCO.36.2 | PC.36.3 | PC.36.3 | |
| TG.52.3. | L-argenine | TG.50.1 | TG.55.1 | TG.52.2 | TG.52.2 | |
| L-proline | Glycine | Glycine | TG.52.1 | PC.36.4 | PC.36.4 | |
| PC.34.1 | L-Valine | PC.50.2 | TG.57.1 | SM.d18.12.31 | SM.d18.12.31 | |
| TG.50.1 | L-alpha-aminobutyric acid | L-argenine | SM.d18.12.31 | TG.50.2 | TG.50.2 | |
| TG.50.2 | L-Threonine | L-serine | CE.18.3 | TG.52.3 | PC.38.4 | |
| SM.d18.1.16.0. | L-kynurenine | L-threonine | L-Isoleucine | PC.38.4 | TG.52.3 | |
Table 3: The top 10 number of selected variables in the 25 cross-validation models, with 5 principal components for the filtering (MALS) and modeling methods.