| Literature DB >> 35632260 |
José Luis P Calle1, Marta Barea-Sepúlveda1, Ana Ruiz-Rodríguez1, José Ángel Álvarez2, Marta Ferreiro-González1, Miguel Palma1.
Abstract
Fruit juice production is one of the most important sectors in the beverage industry, and its adulteration by adding cheaper juices is very common. This study presents a methodology based on the combination of machine learning models and near-infrared spectroscopy for the detection and quantification of juice-to-juice adulteration. We evaluated 100% squeezed apple, pineapple, and orange juices, which were adulterated with grape juice at different percentages (5%, 10%, 15%, 20%, 30%, 40%, and 50%). The spectroscopic data have been combined with different machine learning tools to develop predictive models for the control of the juice quality. The use of non-supervised techniques, specifically model-based clustering, revealed a grouping trend of the samples depending on the type of juice. The use of supervised techniques such as random forest and linear discriminant analysis models has allowed for the detection of the adulterated samples with an accuracy of 98% in the test set. In addition, a Boruta algorithm was applied which selected 89 variables as significant for adulterant quantification, and support vector regression achieved a regression coefficient of 0.989 and a root mean squared error of 1.683 in the test set. These results show the suitability of the machine learning tools combined with spectroscopic data as a screening method for the quality control of fruit juices. In addition, a prototype application has been developed to share the models with other users and facilitate the detection and quantification of adulteration in juices.Entities:
Keywords: adulteration; classification; fruits juices; machine learning; near-infrared spectroscopy; regression
Mesh:
Year: 2022 PMID: 35632260 PMCID: PMC9145498 DOI: 10.3390/s22103852
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1BIC value obtained as a function of the number of clusters per Gaussian mixture model resulting from the model-based clustering analysis. The spectroscopic data matrix of the unadulterated samples was used for the analysis, i.e., D76×4190.
Figure 2Representation of the samples as a function of the first two components using spectroscopic data matrix D76×4190. The samples have been colored and symbolized according to the group obtained by the model-based clustering (VVI distribution), where the centroid of each group is represented by the respective larger symbol and its distribution is shown as an ellipse.
Accuracy and kappa results for the different classification algorithms applied on the complete spectroscopic data matrix (D184×4190).
| Employed Value | Training Set | Test Set | |||
|---|---|---|---|---|---|
| Model | Hyperparameters | Accuracy (%) | Kappa | Accuracy (%) | Kappa |
|
| - | 100 | 1 | 97.67 | 0.9648 |
|
| 100 | 1 | 88.37 | 0.8139 | |
|
| 100 | 1 | 97.67 | 0.9648 | |
Results obtained for each regression method applied in the quantification of the global adulterant by using the spectroscopic data matrix of all adulterated juice samples (D96×89).
| Model | Hyperparameter | Training Set Performance | Test Set Performance |
|---|---|---|---|
|
| 11 principal components | RMSE = 3.644 | RMSE = 4.388 |
|
| RMSE = 1.446 | RMSE = 1.683 | |
|
| RMSE = 2.571 | RMSE = 7.223 |