Literature DB >> 34066453

Exploration of Spanish Olive Oil Quality with a Miniaturized Low-Cost Fluorescence Sensor and Machine Learning Techniques.

Francesca Venturini^1,2, Michela Sperti³, Umberto Michelucci^2,4, Ivo Herzig¹, Michael Baumgartner¹, Josep Palau Caballero⁵, Arturo Jimenez⁵, Marco Agostino Deriu³.

Abstract

Extra virgin olive oil (EVOO) is the highest quality of olive oil and is characterized by highly beneficial nutritional properties. The large increase in both consumption and fraud, for example through adulteration, creates new challenges and an increasing demand for developing new quality assessment methodologies that are easier and cheaper to perform. As of today, the determination of olive oil quality is performed by producers through chemical analysis and organoleptic evaluation. The chemical analysis requires advanced equipment and chemical knowledge of certified laboratories, and has therefore limited accessibility. In this work a minimalist, portable, and low-cost sensor is presented, which can perform olive oil quality assessment using fluorescence spectroscopy. The potential of the proposed technology is explored by analyzing several olive oils of different quality levels, EVOO, virgin olive oil (VOO), and lampante olive oil (LOO). The spectral data were analyzed using a large number of machine learning methods, including artificial neural networks. The analysis performed in this work demonstrates the possibility of performing the classification of olive oil in the three mentioned classes with an accuracy of 100%. These results confirm that this minimalist low-cost sensor has the potential to substitute expensive and complex chemical analysis.

Entities: Chemical Disease Species

Keywords: artificial neural networks; fluorescence sensor; fluorescence spectroscopy; machine learning; olive oil; quality control

Year: 2021 PMID： 34066453 PMCID： PMC8148140 DOI： 10.3390/foods10051010

Source DB: PubMed Journal: Foods ISSN： 2304-8158

1. Introduction

Olive oil is an important commodity in the world, and its demand has grown substantially in recent years. Interest in highest quality grade, extra virgin olive oil (EVOO), is due to its high nutritional value, its richness in bioactive molecules, and its importance to our health due to its content of anti-inflammatory and antioxidant substances [1]. The increased demand has led, however, to an increase in fraudulent activities like adulteration. As a result, edible olive oil quality assessment has become increasingly important. To develop a trusted means of control, the European Economic Community (EEC) has created regulations that define the categorization of olive oils according to several chemical properties, obtainable by accredited laboratories, and organoleptic evaluation, obtainable by accredited panels, to guarantee its quality [2]. For the highlighted reasons, quality control is complex, costly, and cannot be carried out easily at any desired moment during the product life cycle. An inexpensive tool for an accessible analysis will boost consumers’ trust in the product and decrease dramatically production costs, reducing at the same time the possibilities for fraudulent activities. Fluorescence spectroscopy has attracted a lot of interest in recent years as a fast, cost-efficient, and at the same time a sensitive method to study the properties of vegetables, particularly olive oils [3,4]. This is due to the fact that olive oils contain several natural fluorescence molecules like pigments, such as chlorophyll and beta-carotene, phenolic compounds, such as tocopherol, and their oxidation products. The most frequently used techniques are either the acquisition of excitation emission matrices (EEMs) or the use of synchronous scanning [5]. Both take advantage of the multidimensional characteristic of fluorescence spectroscopy to create a fingerprint to uniquely identify and characterize virgin olive oils [6,7]. Applications of those methods range include discrimination of different quality grades of olive oils [8,9], detection of adulteration [10,11,12], monitoring of the oxidation processes [13,14,15], shelf-life monitoring [16], and geographical origin authentication [17,18,19]. The extraction of information of interest from the spectral data can be a difficult task depending on the type of data acquired, which may range from a single spectrum to the more complex EEMs, and on the specificity of the application. Several multivariate analysis techniques and classification methods have been successfully employed, like for example, principal component analysis (PCA), partial least square regression (PLS) and PLS discriminant analysis (PLS-DA), linear discriminant analysis (LDA), K-nearest neighbors (k-NN), and random forest (RF), to mention only the most widely used. More recently the use of artificial neural networks (ANN) has proven to be a useful tool, particularly because it does not require the pre-processing of data or a dimensionality reduction [20]. Complete overviews of the mentioned statistical and machine learning methods, including ANN, can be found in [4,21,22,23,24]. The acquisition of high-quality data, particularly of EEM, and the necessary data post-processing require special instrumentation and knowledge, thus limiting the accessibility of these methods. To the best of the author’s knowledge, no portable and low-cost sensors for fluorescence spectroscopy for quality assessment of olive oils are available so far. This work presents a minimalist sensor for olive oil quality assessment based on fluorescence spectroscopy and shows how it can be used to perform classification without any sample-preparation and without any pre-processing of the acquired data with several machine learning methods. The main contributions of this paper are three. Firstly, a new miniaturized and low-cost fluorescence sensor is presented. The sensor is demonstrated by using it to produce data (fluorescence spectra) that can be used to successfully classify olive oil samples into three quality classes. Secondly, eight different machine learning methods are applied to the data acquired with the sensor, to demonstrate that the data are extremely effective in allowing machine learning models to learn to predict olive oil’s quality almost perfectly. A detailed comparison of the models used is discussed. Finally, the performance of ANNs is analyzed in detail. The study of ANNs’ performance is an important contribution since ANNs allow the application of explainability techniques to better understand how olive oil quality is linked to its chemical properties. This has the potential of completely superseding the classical chemical analysis.

2. Materials and Methods

2.1. Olive Oil Samples

All samples were obtained from the 2019–2020 harvest and provided by the producer Conde de Benalúa, Granada, Spain. In total, 27 olive oil samples were analyzed in this study, divided into 12 EVOO, 8 VOO, and 7 LOO (Table 1). The quality assessment of all the olive oils was performed by the producer according to the current European regulation for the commercial classification into EVOO, VOO, and LOO categories [2]. The quality is determined by both chemical parameters, such as acidity and peroxide index, and sensory parameters, such as the fruity median and the median defect.

Table 1

The number of olive oils samples in each quality class. EVOO: Extra virgin olive oil, VOO: Virgin olive oil, and LOO: Lampante olive oil.

Quality	Number of Samples
EVOO	12
VOO	8
LOO	7

All oils were stored in the dark and at 20 C during the entire time of the measurements. For data acquisition, the samples were placed into commercial transparent 4-mL glass vials, taking care that no headspace was present to reduce oxidation [25]. For a few selected oils, two samples were prepared from the same olive oil bottle to check the variability of the samples and no difference was observed. All the measurements in this work were done on undiluted samples. It is well known that fluorescence in olive oil is subjected to the inner effect [5], which includes both the attenuation of the excitation light due to the strong absorption from the sample and the re-absorption of the fluorescence light from the sample itself, due to the overlap of the excitation and emission spectra. However, for the technology described in this work, this effect does not pose a problem. In facts, the fluorescence is intense enough that the strong absorption does not influence the signal-to-noise ratio, and possible sample-dependent effects are learned and compensated by machine learning models.

2.2. Miniaturized Low-Cost Fluorescence Sensor

The design of the sensor was conceived to have as few elements as possible, to minimize the complexity and the costs. For the first time, the sensor itself does not contain any optical component or optical filters, as it is typical in fluorescence spectroscopy [26], nor lenses. The schema of the minimalist sensor is shown schematically in Figure 1.

Figure 1

Schematics of the minimalist fluorescence sensor. Blue: Excitation light, red: Fluorescence light.

The excitation light was provided by a UV LED with emission at 395 nm (Kingbright Electronic Co, New Taipei City, Taiwan), driven by a current driver (MIC4801, Micrel Inc., San Jose, CA, USA) which allows regulating the current and, therefore, the illumination intensity. This excitation wavelength is advantageous because it is close to an absorption maximum in the absorption band of the different pigments present in olive oil, mainly chlorophylls and carotenoids [27,28,29]. The fluorescence was collected by a miniature spectrometer (STS-Vis, Ocean Optics, Dunedin, FL, USA) with a 1024-element CCD array which acquires the entire spectrum in one single measurement. The resolution of the spectrometer was 16 nm. The spectrometer was placed at 90 with respect to the LED to avoid the LED light transmitted by the sample to reach the spectrometer. Both the LED driver and the spectrometer are controlled by a Raspberry Pi. The optomechanics of the sensor is designed to minimize the amount of stray light from the excitation LED that is collected by the spectrometer. The current for the LED was chosen so as to have a good signal-to-noise ratio for a single spectrum with short integration time avoiding, however, heating the sample with the LED light. The sensor has a recess where standard 4-mL clear glass vials can be inserted. The sensor has a very small footprint of 12.5 cm × 12.5 cm × 5 cm and is shown in Figure 2.

Figure 2

Photo of the minimalist fluorescence sensor with olive oil samples in the glass vials and a bottle of olive oil.

2.3. Dataset Preparation and Description

All measurements were taken under ambient conditions in a single day to avoid different aging of the olive oils to influence the results. The description of the samples is reported in Section 2.1. For each olive oil sample, 20 measurements were performed. A total of 27 samples × 20 measurements produced 540 spectra. The dataset, therefore, consists of 540 arrays, each having 1024 values (the number of pixels), whose elements are the measured intensities at the different pixel position after background subtraction, normalized to have an average of zero and a standard deviation of one. This normalization is a very common one with neural networks, as it makes the input data small enough to avoid numerical problems during the training phase [20]. The dataset contains a different row for each acquisition repetition of each oil, with the spectrum points as features, and the corresponding label for the quality classification.

2.4. Machine Learning Classifiers

The quality of the data acquired with the sensor and the feasibility of using them for quality control were tested by applying different machine learning methods. The goal was to classify the oils into three categories EVOO, VOO, and LOO. The performance of the sensor as a tool for quality control can be defined as its ability to generate data that allow a classification with an accuracy as close to 100% as possible. The following eight machine learning algorithms were tested: Support vector machines (SVM), naïve Bayes (NB), multinomial logistic regression (MLR), PCA combined with LDA, decision tree (DT), ANN, RF, and k-NN. The implementation parameters and the references describing the methods are listed in Table 2. The methods have been implemented using the Python library scikit-learn [30]. A detailed description of each algorithm goes beyond the scope of this paper, and the interested reader is referred to the listed references. The details of the ANN implementation are described in Section 2.5.

Table 2

List of the machine learning methods used, with implementation parameters and references to the methods description: Support vector machine (SVM), naïve Bayes (NB), multinomial logistic regression (MLR), principal component analysis (PCA) and linear discriminant analysis (LDA), decision tree (DT), random forest (RF), and k-nearest neighbor (k-NN).

Algorithm	Implementation Details	References
SVM	Regularization parameter C=1.0, kernel = radial basis function	[31,32,33]
NB	None	[32,34,35,36]
MLR	regularization penalty = l2, solver algorithm = Newton conjugate gradient	[32,37,38,39]
PCA + LDA	Number of components used with LDA: 2, 3, 4, 5, 10, 15, 20, 25, and 30	[32,40,41]
DT	Split quality criterion used = Gini impurity	[32,42,43,44,45]
RF	Number of trees = 100, split quality criterion used = Gini impurity	[32,42,46,47,48]
k-NN	Numbers of neighbors k=3	[32,49,50]

2.5. Artificial Neural Network-Based Classifiers

For oil classification, a feed-forward neural network architecture was used. To find the best parameters of the neural network’s model (NNM), namely the number of layers, the number of neurons in each layer and the number of epochs, a hyperparameter optimization was performed with a grid-search approach [20]. The number of layers varied from one to three, the number of neurons in each layer from 2 to 32 and the number of epochs tested were 350, 600, and 1000. The activation function of the hidden layer’s neurons is the rectified linear unit (ReLU) [20]: while for the output layer the softmax [20] function was used. The loss function used is the cross-entropy [20]: where the sum over i is performed over all the observations on the mini-batch extracted from the training dataset used for the weight update, assumes the value of one if the observations is of class j, and is the predicted probability of the observation i of being of the j-th class. indicates the three expected classes: EVOO, VOO, and LOO. The NNM was trained using the optimizer Adaptive Moment Estimation (Adam) [51] with a mini-batch size of 32. The implementation was performed using the TensorFlow Python library. As will be discussed in the results section, the NNM that gave the best performance was the one with three layers, 32 neurons in each layer, and was trained for 1000 epochs. To measure the performance of the models, the accuracy calculated as the number of correctly classified oils divided by the total number of oils was used. All the models were trained with backpropagation.

2.6. External Validation of Models

To assess the performance of the machine learning models, these need to be applied to data not used during training and the resulting prediction tested against the expected results. For this purpose, the dataset was split into 80% used as training dataset, and 20% used for validation [20,32]. All the results reported in this work were obtained on the validation portion of the dataset. The accuracy is defined as the percentage of the olive oils of the validation dataset which are correctly classified. Since variation in the accuracy may arise from the specific split which was performed, the split and train process needs to be repeated several times [52]. In this work, the split and train process was repeated 100 times for all algorithms. Then, for all the methods, the average and standard deviation of the accuracy over 100 splits were calculated. These are the results described in Section 3.2.

3. Results and Discussion

In this section, firstly, the results of the measurements are presented. Then, the results of the classification using the different techniques are reported.

3.1. Spectral Response of the Olive Oils

The raw fluorescence spectra of selected EVOOs, VOOs, and LOOs are shown in Figure 3. In all the figures the curves are just one single spectrum with the background subtracted, without averaging or smoothing. The integration time is 1 second.

Figure 3

Fluorescence emission spectra of selected olive oils. Panel (A) five EVOOs, panel (B) five VOOs, and panel (C) five LOOs. Each curve shows a single spectrum without averaging or smoothing.

Figure 3A shows the fluorescence spectra of EVOOs. For clarity, the spectra of only 5 of the 12 oils are plotted. The spectra are characterized by a strong signal in the region between 650 nm and 750 nm, with an intense peak at ca. 678 nm and a weaker broader one at ca. 720 nm, typical of chlorophyll and pheophytins [13,14,15,53]. The stronger peak has not always the same spectra position and intensity, while the broader one weakly varies between the samples. These observations are consistent with those previously reported and are attributed to the inner filter effect [28]. Noticeably, the spectra below 650 nm do not show any significant fluorescence intensity. This spectral region is usually attributed to underlying chemical constituents, such as vitamin E, hydrolysis, and oxidation products [3,15]. The lack of significant fluorescence signal in this region is due to the choice of the excitation wavelength. These compounds absorb in the UV, well below the excitation wavelength peaked at 395 nm used here. Depending on the sensor purpose, the inclusion of an additional UV LED to also acquire the UV fluorescence contributions to the spectrum could increase the performance by providing additional specific information. For the problem studied in this work, the strong fluorescence contribution between 650 nm and 750 nm proved to be enough to achieve 100% classification. For comparison, the fluorescence spectra of VOOs and LOOs are shown in Figure 3B,C. The VOOs show emission spectra which are similar to the EVOOs, with a stronger variability particularly in the intensity of the broader shoulder at 720 nm. The variability between the spectra increases further in the LOOs. The fluorescence from EVOO and VOO samples is generally stronger than LOO ones, which is consistent with previously reported observations for LOOs obtained with synchronous fluorescence spectroscopy [9].

3.2. Classification with Machine Learning Methods

The results of the classification with all the machine learning methods are summarized in Table 3. The results are given as the average of the accuracy over 100 different splits and the standard deviation of the accuracy, as described in Section 2.6.

Table 3

Summary of results of the classification given by the average of the accuracy and its standard deviation ; machine learning methods: Support vector machine (SVM), naïve Bayes (NB), multinomial logistic regression (MLR), principal component analysis (PCA) and linear discriminant analysis (LDA), decision tree (DT), random forest (RF), and k-nearest neighbor (k-NN).

Algorithm	Average Accuracy a¯	Standard Deviation σ
SVM	0.51	0.07
NB	0.64	0.05
MLR	0.88	0.03
PCA + LDA	0.93	0.02
(10 PCA Components)
DT	0.99	0.01
ANN	0.99	0.04
PCA + LDA	0.999	0.006
(30 PCA Components)
RF	1.0	0.0
k-NN	1.0	0.0

As seen from Table 3, several methods allow reaching an average accuracy above 99% without any pre-processing, namely the DT, ANN, and PCA combined with LDA, RF, and k-NN. There results are better than previously reported for the classification between VOO and LOO with Hierarchical Cluster Analysis (HCA) on EEMs and similar to what is obtained with PCA [9]. Unsurprisingly, the results obtained with SVM are poorer than those obtained with the other methods as typically with those algorithms pre-processing is a key part of the analysis. In fact, in previous work, SVM was applied after pre-processing the data, for example with PCA, to obtain a good accuracy [54]. PCA with LDA was studied using an increasing number of PCA components: 2, 3, 4, 5, 10, 15, 20, 25, and 30. By using only 10 PCA components, LDA was able to reach an accuracy over 90%. With 30 the accuracy reached was over 99%. It is important to note that each spectrum (input to the PCA) consists of 1024 values (the pixels of the CCD of the spectrometer), thus using 30 PCA components is equivalent to using only 2.9% of the amount of features in the original spectra. To find the optimal architecture for the ANN, hyperparameter tuning was performed as described in Section 2.5. The evolution of the average of accuracy and its standard deviation with increasing ANN complexity is shown in Figure 4. The vertical bar indicates the standard deviation calculated from the 100 different splits. Only the results obtained with 1000 epochs are shown. The effect of increasing the number of epochs from 350 to 1000 was to improve the accuracy and reduce the standard deviation of the accuracy’s average. At above 1000 epochs, the performance increase is smaller than what is obtained by changing the number of layers. Since for moderately complex networks the accuracy was 100% with 1000 epochs, the training was not performed for a larger number of epochs.

Figure 4

Evolution of the average of the accuracy and its standard deviation with increasing ANN complexity. For each architecture the points indicate the average of the accuracy of 100 split and train runs, and the error lines indicate the standard deviation.

For very simple architectures, with only two neurons, the accuracy is below 60%. The use of eight neurons already improves the accuracy to above 80%. When using 32 neurons the accuracy is always above 90%, and increases to above 99% when using two layers. The increase from two to three layers does not affect the results significantly, as the accuracy is already at approximately 100%. This means that the ANN can always correctly identify the three classes of olive oil quality (EVOO, VOO, and LOO). The goal of this work is to demonstrate that the fluorescence sensor is able to generate data that can be used without any pre-processing or manual feature engineering to make the classification process as easy and automatic as possible. As seen from Table 3 this is the case. These results indicate without any doubt that the data acquired with this very simple and low-cost spectrometer contain sufficient information to allow the correct discrimination between the three quality classes with almost perfect accuracy.

4. Conclusions

The current work presented a new type of compact and low-cost fluorescence sensor which allows high-quality data acquisition that can be reliably used for data-processing or inference for classification purposes. The sensor is simply and conceived to minimize size and costs so as to allow portability. The results demonstrate the use of a minimalist optical sensor based on fluorescence spectroscopy associated with machine-learning methods that can reliably distinguish between different qualities of olive oil: EVOO, VOO, and LOO. This new low-cost sensor has the advantage of being a portable, easy-to-use, and low-cost device, which works with undiluted samples, without any handling of olive oils, like dilution, and without any pre-processing of data, thus simplifying the analysis to the maximum degree possible. Problems like strong absorption and inner filter effect do not affect performance because they are learnt and compensated by the machine learning methods. Among the methods, the use of ANN is particularly important because it does not require pre-processing of data and allows the use of flexible explainability techniques to better optimize and understand the classification process. The problem investigated here is just one example of the many possible applications. The sensor can be used to solve other classification and regression problems. The details of the machine learning models are expected to be specific of the problem to be addressed.

15 in total

1. Origin of French virgin olive oil registered designation of origins predicted by chemometric analysis of synchronous excitation-emission fluorescence spectra.

Authors: Nathalie Dupuy; Yveline Le Dréau; Denis Ollivier; Jacques Artaud; Christian Pinatel; Jacky Kister
Journal: J Agric Food Chem Date: 2005-11-30 Impact factor: 5.279

Review 2. A critical review on the use of artificial neural networks in olive oil production, characterization and authentication.

Authors: I Gonzalez-Fernandez; M A Iglesias-Otero; M Esteki; O A Moldes; J C Mejuto; J Simal-Gandara
Journal: Crit Rev Food Sci Nutr Date: 2018-02-16 Impact factor: 11.176

3. A review of goodness of fit statistics for use in the development of logistic regression models.

Authors: S Lemeshow; D W Hosmer
Journal: Am J Epidemiol Date: 1982-01 Impact factor: 4.897

4. Evaluation of the overall quality of olive oil using fluorescence spectroscopy.

Authors: Elena Guzmán; Vincent Baeten; Juan Antonio Fernández Pierna; José A García-Mesa
Journal: Food Chem Date: 2014-10-14 Impact factor: 7.514

5. Simultaneous fluorometric determination of chlorophylls a and B and pheophytins a and B in olive oil by partial least-squares calibration.

Authors: Teresa Galeano Díaz; Isabel Durán Merás; Carlos Arturo Correa; Belén Roldán; María Isabel Rodríguez Cáceres
Journal: J Agric Food Chem Date: 2003-11-19 Impact factor: 5.279

6. Validation of Fluorescence Spectroscopy to Detect Adulteration of Edible Oil in Extra Virgin Olive Oil (EVOO) by Applying Chemometrics.

Authors: Hina Ali; Muhammad Saleem; Muhammad Ramzan Anser; Saranjam Khan; Rahat Ullah; Muhammad Bilal
Journal: Appl Spectrosc Date: 2018-05-18 Impact factor: 2.388

7. Decision tree methods: applications for classification and prediction.

Authors: Yan-Yan Song; Ying Lu
Journal: Shanghai Arch Psychiatry Date: 2015-04-25

8. Determination of Pigments in Virgin and Extra-Virgin Olive Oils: A Comparison between Two Near UV-Vis Spectroscopic Techniques.

Authors: Eleonora Borello; Valentina Domenici
Journal: Foods Date: 2019-01-07

9. Monitoring Virgin Olive Oil Shelf-Life by Fluorescence Spectroscopy and Sensory Characteristics: A Multidimensional Study Carried Out under Simulated Market Conditions.

Authors: Ana Lobo-Prieto; Noelia Tena; Ramón Aparicio-Ruiz; Diego L García-González; Ewa Sikorska
Journal: Foods Date: 2020-12-11

10. Rapid Analytical Method to Characterize the Freshness of Olive Oils Using Fluorescence Spectroscopy and Chemometric Algorithms.

Authors: Aimen El Orche; Mustapha Bouatia; Mohamed Mbarki
Journal: J Anal Methods Chem Date: 2020-07-11 Impact factor: 2.193

2 in total

1. Application of supervised chemometric techniques and synchronized excitation-emission spectrofluorometric analysis for the verification of Maltese extra virgin olive oils.

Authors: Frederick Lia; Marion Zammit-Mangion; Claude Farrugia
Journal: J Food Sci Technol Date: 2022-02-18 Impact factor: 3.117

Review 2. Integration of Innovative Technologies in the Agri-Food Sector: The Fundamentals and Practical Case of DNA-Based Traceability of Olives from Fruit to Oil.

Authors: Rayda Ben Ayed; Mohsen Hanana; Sezai Ercisli; Rohini Karunakaran; Ahmed Rebai; Fabienne Moreau
Journal: Plants (Basel) Date: 2022-05-02

2 in total