| Literature DB >> 35630781 |
Miriam Pérez-Cova1,2, Stefan Platikanov1, Dwight R Stoll3, Romà Tauler1, Joaquim Jaumot1.
Abstract
The use of chemometric methods based on the analysis of variances (ANOVA) allows evaluation of the statistical significance of the experimental factors used in a study. However, classical multivariate ANOVA (MANOVA) has a number of requirements that make it impractical for dealing with metabolomics data. For this reason, in recent years, different options have appeared that overcome these limitations. In this work, we evaluate the performance of three of these multivariate ANOVA-based methods (ANOVA simultaneous component analysis-ASCA, regularized MANOVA-rMANOVA, and Group-wise ANOVA-simultaneous component analysis-GASCA) in the framework of metabolomics studies. Our main goals are to compare these various ANOVA-based approaches and evaluate their performance on experimentally designed metabolomic studies to find the significant factors and identify the most relevant variables (potential markers) from the obtained results. Two experimental data sets were generated employing liquid chromatography coupled to mass spectrometry (LC-MS) with different complexity in the design to evaluate the performance of the statistical approaches. Results show that the three considered ANOVA-based methods have a similar performance in detecting statistically significant factors. However, relevant variables pointed by GASCA seem to be more reliable as there is a strong similarity with those variables detected by the widely used partial least squares discriminant analysis (PLS-DA) method.Entities:
Keywords: ANOVA; ASCA; GASCA; biomarkers; feature detection; metabolomics; rMANOVA
Mesh:
Year: 2022 PMID: 35630781 PMCID: PMC9147242 DOI: 10.3390/molecules27103304
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Summary of the statistical assessment study for the considered datasets showing obtained p-values for the different ANOVA-based approaches.
| Dataset | Experimental Factor | ASCA | rMANOVA | GASCA | |
|---|---|---|---|---|---|
| TIC | Yeast–MS negative ionization mode | Type of lipid extraction | 0.0001 (0.0007 *) | 0.0001 | 0.039 * |
| Yeast–MS positive ionization mode | Type of lipid extraction | 0.0001 | 0.0001 | 0.001 * | |
| Features | Yeast–MS negative ionization mode | Type of lipid extraction | 0.0001 | 0.0001 | 0.002 * |
| Yeast–MS positive ionization mode | Type of lipid extraction | 0.0001 | 0.0001 | 0.002 * | |
| Zebrafish embryos–BPA exposure | Exposure concentration | ||||
| Control vs. Low | 0.0001 | 0.0001 | 0.09 | ||
| Control vs. High | 0.0001 | 0.0001 | 0.10 | ||
| Control vs. Low vs. High | 0.0001 | 0.0001 | 0.01 | ||
| Zebrafish embryos–E2 exposure | Exposure concentration | ||||
| Control vs. Low | 0.4472 | 0.0001 | 0.47 | ||
| Control vs. High | 0.0001 | 0.0001 | 0.22 | ||
| Control vs. Low vs. High | 0.0093 | 0.0001 | 0.35 | ||
* Balanced data (a sample was eliminated from the set).
Figure 1Exploration of the scores generated by the ANOVA-based methods: values of the first component. (A) TICs yeast positive. Sample colouring depending on the factor studied: green bars–extraction A: phospholipids, and blue bars—extraction B: sphingolipids; (B) Zebrafish embryos exposed to low–dose BPA. Sample colouring depending on the factor studied: green bars—control samples and blue bars—low-dose BPA treatment.
Figure 2Comparison of the loadings obtained by PLS-DA variable selection methods and ANOVA-based methods. (A) TICs yeast positive PLS-DA profiles: VIP scores and selectivity ratio; (B) TICs yeast positive ANOVA–based approaches: ASCA, rMANOVA, and GASCA loadings; (C) Zebrafish embryos exposed to low–dose BPA PLS-DA profiles: VIP scores and selectivity ratio; (D) Zebrafish embryos exposed to low–dose BPA ANOVA–based approaches: ASCA, rMANOVA, and GASCA profiles. In each plot, profiles were normalized to an equal area for representation in the same scale. Shadowed boxes represent regions with a high number of relevant variables.
Figure 3Venn diagrams summarizing the relationships among the 50 different selected variables detected for each data set. (A) TICs matrix for yeast positive MS ionization mode; (B) Features matrix for yeast positive MS ionization mode; (C) Zebrafish embryos exposed to low–dose BPA; and (D) Zebrafish embryos exposed to high–dose BPA.
Figure 4Comparison of the number of coincident variables detected by PLS–DA and the considered ANOVA–based methods. Maximum possible number of coincidences is 50.
Figure 5Workflow of the data analysis strategy from the MS raw data acquisition to the statistical assessment.
Summary of the main advantages, limitations, and opportunities of the considered ANOVA-based methods.
| ASCA | rMANOVA | GASCA | |
|---|---|---|---|
|
| Widespread use in metabolomics (reference multivariate statistical method) | Best of both worlds (model depending on data I MANOVA and ASCA) | A good option for sparse data (i.e., metabolomic datasets) |
|
| Most dissimilar matches identifying significant variables compared to VIPs from PLS-DA | Dissimilar matches with VIPs from PLS-DA in selection of relevant variables | Very strict for determination of significant factors (only factors with very low |
|
| Good choice when combined with PLS-DA (VIPs) for the determination of the significant variables | Good choice when aiming one method for statistical analysis and selecting relevant variables (but further validation on the variables is desirable) | Good option for assessing the significance of variables and factors when big effects are encountered (very significant factors in the DOE) |