| Literature DB >> 35406997 |
Maria Tufariello1, Sandra Pati2, Lorenzo Palombi3, Francesco Grieco1, Ilario Losito4.
Abstract
This review takes a snapshot of the main multivariate statistical techniques and methods used to process data on the concentrations of wine volatile molecules extracted by means of solid phase micro-extraction and analyzed using GC-MS. Hypothesis test, exploratory analysis, regression models, and unsupervised and supervised pattern recognition methods are illustrated and discussed. Several applications in the wine volatolomic sector are described to highlight different interactions among the various matrix components and volatiles. In addition, the use of Artificial Intelligence-based methods is discussed as an innovative class of methods for validating wine varietal authenticity and geographical traceability.Entities:
Keywords: HS-SPME-GC-MS; artificial intelligence; multivariate statistical analysis; volatile compounds; wine
Year: 2022 PMID: 35406997 PMCID: PMC8997410 DOI: 10.3390/foods11070910
Source DB: PubMed Journal: Foods ISSN: 2304-8158
Figure 1Number of published articles between 1998 and 2022 related to HS-SPME/wine/volatile.
Figure 2Schematic diagram of different objectives and multivariate statistical analysis techniques used for HS-SPME/GC-MS data.
Overview of the statistical techniques, pro, cons and applications to HS-SPME/GC-MS data.
| Name | Scope | Pros | Cons | Applications |
|---|---|---|---|---|
| M-ANOVA | Hypothesis testing | M-ANOVA allows a deeper analysis than ANOVA in determining changes introduced by a given factor. | It requires a larger number of samples than the number of variables. The extension of the analysis to N factors is more complex. Results can be misleading if the working assumptions are not respected. | Determination of significant differences between wine varieties [ |
| PCA | Hypothesis testing; Exploratory analysis; Unsupervised classification | Explain multivariate variance by a limited number of factors. It does not suffer the possible multi-collinearity between variables; on the contrary it exploits it. It allows to visualize both the similarity and dissimilarity between samples and the correlation and influence of variables. | Highly dependent on the pre-treatment of the data, e.g., standardization. Sensitive to outliers. The detection of orthogonal (uncorrelated) factors can lead to a misinterpretation of the true cause-effect relationship. Only Euclidean metrics can be considered. | Assessment of the authenticity of wines [ |
| PLS | Linear regression | It can be used in cases where the number of variables is greater than the number of samples. Handles well any multi-collinearity between variables. | The interpretation of the results is more complex than that of the results of a simple multilinear regression. Results can be poor in the case of non-linear relationships between variables. | Correlation between VOCs and wine ageing [ |
| ANN | Non-linear regression; Supervised classification; Unsupervised classification | Capable to handle strong non-linearity in the underlying model. They are robust to the presence of noise and outliers. They are unaffected by, and indeed exploit, multi-collinearity between variables. | A large number of samples is required. Interpretability of results is more difficult. Validation of results is necessary to exclude overfitting. | Authenticity and traceability assessment [ |
| LDA | Supervised classification | Interpretability of results is straightforward. | It cannot be used if the number of variables exceeds the number of samples. Conditioned by multi-collinearity. Results can be poor in the case groups are non-linearly separable. | Varietal differentiation from volatile profiles [ |
| PLS-DA | Supervised classification | It can be used in cases where the number of variables is greater than the number of samples. Handles well any multi-collinearity between variables | The interpretation of the results is more complex than that of the results of a simple LDA. | Discrimination of selected wines with different geographical origin and type [ |
| HCA | Unsupervised classification | Straightforward interpretation. It allows different levels of clustering to be evaluated. It allows the use of metrics other than Euclidean to assess similarity and dissimilarity between samples. | The results are highly dependent on the pre-treatment of the data, e.g., whether or not standardization is applied. | Classification of high-quality wines according to their brand based on their volatile fingerprint [ |