| Literature DB >> 35479572 |
Guillaume Laurent Erny1, Elsa Brito2, Ana Bárbara Pereira3, Andreia Bento-Silva2,4,5, Maria Carlota Vaz Patto2, Maria Rosario Bronze2,3,4.
Abstract
Latent variables are used in chemometrics to reduce the dimension of the data. It is a crucial step with spectroscopic data where the number of explanatory variables can be very high. Principal component analysis (PCA) and partial least squares (PLS) are the most common. However, the resulting latent variables are mathematical constructs that do not always have a physicochemical interpretation. A new data reduction strategy, named projection to latent correlative structures (PLCS), is introduced in this manuscript. This approach requires a set of model spectra that will be used as references. Each latent variable is the relative similarity of a given spectrum to a pair of reference spectra. The latent structure is obtained using every possible combination of reference pairing. The approach has been validated using more than 500 FTIR-ATR spectra from cool-season culinary grain legumes assembled from germplasm banks and breeders' working collections. PLCS has been combined with soft discriminant analysis to detect outliers that could be particularly suitable for a deeper analysis. This journal is © The Royal Society of Chemistry.Entities:
Year: 2021 PMID: 35479572 PMCID: PMC9040593 DOI: 10.1039/d1ra03359j
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Fig. 1(A) Projection of all spectra on the latent plane obtain by measuring the dissimilarity with the reference spectra of lentil and faba bean; (B) projection to the latent axis lentil <=> faba bean. The black line in (A) is the vector passing through the dissimilarity indexes of the two models spectra that is the latent axis faba bean <=> lentil.
Fig. 2Two-dimensional PLCS of the beans FTIR-ATR spectra with the lentil <=> faba bean axis and the grass pea <=> faba bean axis.
Fig. 3(A) PLCS and superimposed soft discriminant analysis using all the data with the axis lentil <=> grass pea and chickpea <=> grass pea, (B) PLCS and superimposed discriminant analysis after removal of the outliers. The coloured surfaces represent, for each cluster, the zone where d ≤ dcrit. Crosses in (A) indicate outliers.
PLCS-softDA performances with different spectra pre-treatment
|
Derivative of the spectra | ||||
|---|---|---|---|---|
|
None |
1st |
2nd |
3rd | |
| ms | 3 | 3 | 0 | 3 |
| nm | 60 | 34 | 7 | 6 |
|
|
|
|
|
|
| no | 6 | 4 | 6 | 9 |
No derivative, no smoothing.
SG 1st derivative with polynomial order of 2 and frame length of 9.
SG 2nd derivative with polynomial order of 2 and frame length of 9.
SG 3rd derivative with polynomial order of 3 and frame length of 15. Ms, nm, ns and no are the number of samples assigned at d ≤ dcrit (ms: false positives, nm: multinomial classification, ns: true positives and no: unclassified).
Comparison of PCA, PLS and PDLS with softDA
| PCA-softDA | PLS-softDA | PLCS-softDA | |
|---|---|---|---|
| ms | 2 | 1 | 0 |
| nm | 84 | 57 | 7 |
|
|
|
|
|
| no | 4 | 1 | 6 |
Fig. 4(A) PCA, (B) PLS and (C) PLCS two-dimensional representations of the second derivative of the FTIR-ATR spectra.
Validation
| PLCS 2 LV | PLCS 3 LV | PLCS 4 LV | PLS 2 LV | PLS 3 LV | PLS 4 LV | PLS 5 LV | ||
|---|---|---|---|---|---|---|---|---|
| Hard | True | 117 | 116 | 118 | 98 | 116 | 118 | 121 |
| False | 5 | 6 | 4 | 24 | 6 | 4 | 1 | |
| Soft | True | 109 | 111 | 107 | 56 | 101 | 103 | 101 |
| False | 4 | 1 | 1 | 2 | 3 | 2 | 0 | |
Number of true positives in hard and soft classification.
Number of false positives with hard and soft classifications.