Literature DB >> 36211746

Rapid detection of adulteration in powder of ginger (Zingiber officinale Roscoe) by FT-NIR spectroscopy combined with chemometrics.

Dai-Xin Yu¹, Sheng Guo¹, Xia Zhang², Hui Yan¹, Zhen-Yu Zhang¹, Xin Chen¹, Jiang-Yan Chen¹, Shan-Jie Jin², Jian Yang³, Jin-Ao Duan¹.

Abstract

Ginger powder (GP) is a popular spice in the world. Duo to its nutritional value, GP is regarded as an attractive target for adulteration, which is not easily detected. In this study, chromaticity analysis and Fourier transform near-infrared (FT-NIR) spectroscopy combined with chemometrics were developed to identify and quantify of GP and its adulterants. The result showed that GPs and adulterated GPs cannot be completely distinguished by chromaticity analysis. While, the optimized NIR spectra could accurately distinguish the authentic GPs from those adulterated samples. Random forest and gradient boosting algorithms exhibited the highest accuracies (100%) in classification. Moreover, a quantitative model was successfully established to predict the adulteration level in GP. The optimal parameters of prediction to deviation were 8.92, 13.68, 14.61, and 4.30, for pure and adulterated GPs. Overall, FT-NIR spectroscopy is a promising tool, which can quickly identify potential adulteration in GP and track the types of adulterants.

Entities: Chemical

Keywords: Adulteration; Chemometrics; Ginger powder; NIR spectroscopy; Quantitative analysis

Year: 2022 PMID： 36211746 PMCID： PMC9532869 DOI： 10.1016/j.fochx.2022.100450

Source DB: PubMed Journal: Food Chem X ISSN： 2590-1575

Introduction

In recent years, the issue of food authenticity has been a common concern among consumers, which covers several aspects, including origin falsification, adulteration, variety mixing, and mislabeling (Barreto et al., 2018, Yu et al., 2022). With the globalization and complexity of the food supply chain, the phenomenon of food authenticity has spawned food fraud in production, manufacturing, processing and distribution segments for economic gain (Horn et al., 2021). Usually, food fraud occurs through partial substitution, addition, tampering of food ingredients, and false labeling of geographical origins (Khodabakhshian et al., 2021). These phenomena not only pose potential health and safety problems to consumers, but also creates a credibility crisis for the food industry. As an important part of the food supply chain and human diet, spices have been widely present in food, beverages, medicines, cosmetics and other combinations (Yan et al., 2021). A set of statistic suggests that the global spices and herbs market is estimated at approximately US$ 79 billion in 2022, and the global market for spices is likely to witness expanding its valuation to about US$ 126 billion by the end of 2023 (https://www.statista.com/statistics/876234/global-seasoning-and-spices-market-size/). For ease of usability and portability, the current circulation of spices mostly appears in the form of powder, which gives the unscrupulous merchants some illegal ways to make profits. Adulteration with other similar powder and keeping the original spice in form, color and odor to reduce costs and increase revenue, are considered as the common instance of food fraud (Yu et al., 2022). In other words, mixing spices with any kind of substance is considered as food falsification and it has attracted great concern of the industries, governments and standard-setting organizations (Jahanbakhshi et al., 2021). Ginger (Zingiber officinale Roscoe) is a spicy condiment and is often eaten in fresh form or made into dried slices and powder for flavoring (Yu et al., 2022). Due to its unique flavor, high nutrition, and medicinal value, ginger is extensively consumed as a flavoring agent, dietary supplement, herbal medicine, and the raw materials of many desserts and beverages, such as ginger tea, ginger candies and ginger beer (Srinivasan, 2017). Brown sugar and ginger tea made by ginger slices or powder are widely consumed in traditional Chinese medicine to dispel the cold and improve the immunity. Modern studies found that gingerols, shogaols, terpenes and sugars are the key bioactive constituents of ginger (Zhang et al., 2021). Moreover, these compounds were reported to have beneficial effects to human health, including antioxidant, anti-inflammatory, antimicrobial activity and immune-modulatory activity (Kiyama, 2020). China is the major ginger-producing and -exporting country in the world, where besides being sold as a fresh product, ginger is also processed into ginger powder (GP) in large quantities. According to the UN International Trade Database, the annual average export volumes of GP from China was 67,928 tons from 2017 to 2021, which far exceeds other countries (https://comtrade.un.org/data/). With the substantial increase for GP on both domestic and international markets, adulteration is easy to achieve. Generally, the cheaper or lower quality of edible powder are widely used as adulterants. Once adding, they cannot be identified by visual examination, which reduces the original nutrition and medicinal value of GP and subsequently poses a high risk to consumers and normal market activities. Hence, it is vital to establish a rapid and effective method to distinguish the authentic GPs from those adulterated ones. To date, multiple targeted methods, such as microscopic techniques (Kiani et al., 2019), liquid chromatography (Campmajó et al., 2021), mass spectrometry (Mohamed et al., 2021), and elemental fingerprint analysis (Fiamegos et al., 2021), have been successfully applied for identifying adulteration of spices or herbs. Despite its high precision and stability, these analytical methods are time-consuming, laborious, and environmentally unfriendly. As a common spectroscopic analysis technique, near-infrared spectroscopy (NIRs) has been widely used in quality control and origin tracing for foods and herbs (Chang et al., 2020). Due to its fast and non-destructive detection, NIRs usually gives special spectral information about the samples in a few seconds and reflects the information of chemical composition indirectly (Wu et al., 2022). With the development of chemometrics, NIRs coupled to multivariate statistical analysis has been successfully applied for qualitative and quantitative analyses in food, agricultural, and herbal medicine areas (Xue et al., 2021). Various studies have revealed the great potential of NIRs to distinguish spice adulteration, such as turmeric powder adulterated with corn flour (Kar et al., 2019), green banana flour adulterated with wheat flour (Ndlovu et al., 2021), and paprika powder mixed with potato starch and acacia gum (Oliveira et al., 2020). Comparing to Fourier transform-infrared spectroscopy (FT-IR), the quantitative performance of NIRs is even more prominent. Furthermore, the NIR region is dominated by weak overtones resulting in lower molar absorption and deeper penetration of the NIR waves inside the samples, making it more suitable for the analysis of heterogeneous samples, such as adulterated powders (Nagy et al., 2022). Although a recent study revealed that the adulterated chickpeas in ginger powder can be successfully identified using image recognition technology (Jahanbakhshi et al., 2021), the research did not involve a quantitative model of the extents of GP adulteration. NIRs can achieve simultaneous characterization and quantification, which can fill the gap in the quantitative study of GP dopants. Moreover, there are few reports on the combination of NIR spectroscopy and machine learning algorithms to discriminate GPs and its adulterants. Hence, the main objective of this study was to investigate the feasibility of using NIR spectroscopy combined with chemometrics to identify and quantify the common adulterants of the starch of corn (Zea mays L.), and the flours of wheat (Triticum aestivum L.) and soybean (Glycine max (L.) Merr.) in GPs. The specific aims were to 1) establish classification models using supervised and unsupervised pattern recognition methods to identified authentic and adulterated GP samples, and 2) develop and optimize a quantitative calibration model using partial least squares regression (PLSR) method based on NIR spectra to accomplish the prediction of the concentrations in GPs and its adulterants.

Materials and methods

Materials

All fresh ginger samples were collected from Luoping county of Yunnan Province in China, and these samples were identified as the fresh rhizomes of Zingiber officinale Roscoe by Prof. Jin-ao Duan from Nanjing University of Chinese Medicine. For the market sale and brand value, GPs produced in Yunnan Province has the highest market share, which is often counterfeited by partial adulteration. The corn starch (CS), soybean flour (SF) and wheat flour (WF) used as the adulterants in GPs were purchased from Qingdao Wugu-kang Food and Nutrition Technology Co., ltd., Qingdao, China, and they all passed the food quality detection to ensure their authenticity and reliability.

Preparation of adulterated samples

Fresh ginger samples were washed, sliced, and dried at a constant temperature of 55 °C for 24 h in an electric thermostatic drying oven (DHA-9070A, Shanghai Jinghong Experimental Equipment Co., ltd., Shanghai, China). After drying, the dried ginger samples were crushed into powder by a high-speed grinder (FW-80, Tianjin Taisite Instrument Co., Tianjin, China), and the powder was all passed through 50 mesh sieves (355 μm ± 13 μm). Then, the adulterated GPs were prepared by adding the above CS, SF, and WF to pure GPs with five concentrations of 10 %, 20 %, 30 %, 40 %, and 50 %, respectively, and beyond this percentage, adulteration becomes obvious and can be typically identified by the naked eyes or taste. All samples were put into centrifuge tubes and mixed uniformly for 3 min using a vortex shaker. For each level, 6 samples were prepared, hence a total of 90 adulterated samples were gathered and the representative adulterants were shown in Fig. 1A. In addition, control samples for pure ginger powder (GP, n = 12), pure corn starch (PCS, n = 3), pure soybean flour (PSF, n = 3), and pure wheat flour (PWF, n = 3) were also used for comparative study with those adulterants.

Fig. 1

Ginger powder samples (A) with different adulteration levels (10%, 20%, 30%, 40%, and 50%), A-CS: adulterated with corn starch, A-SF: adulterated with soybean flour, A-WF: adulterated with wheat flour, GP: pure ginger powder, PSF: pure soybean flour, PWF: pure wheat flour, PCS: pure corn starch; The L* values (B), a* values (C), b* values (D), and ΔE values (E) with different adulteration levels (10%, 20%, 30%, 40%, and 50%) by chromaticity analysis; Score plots of the PCA model of pure GP and adulterated GPs (F).

Color measurements

The color characteristics of the different adulterated GPs and pure powder were measured by a chroma analyzer (CM-5, KONICA MINOLTA, Tokyo, Japan). The spectrophotometer was calibrated by a white plate, and the powder was evenly flat in the special dish, which kept the same weight and thickness of all samples. According to the CIELAB color space theory of the International Commission on Illumination, the parameters of L*, a*, and b* values were collected (Biancolillo et al., 2022). The total color variation (ΔE) was used to describe the color change of different adulteration levels with the following equation: The values of L*, a*, and b* were measured by pure ginger powder.

Acquisition of FT-NIR spectra

An Antaris™ II FT-NIR spectrophotometer (Thermo Fisher Scientific Co., USA), equipping with a rotating sample-cup spinner, extended InGaAs detector, and a tungsten halogen lamp as light source, was used to collected NIR spectra for adulterated GP samples. Result Software (Antaris™ II System, Thermo Fisher Scientific Co., USA) was used for the acquisition of NIRs data. The spectral data were acquired using the average of 32 scans in the range of 10000–4000 cm−1, and the spectra collected at a resolution of 8 cm−1. All the analyses were performed at a room temperature of 18–25 °C and the relative humidity of 30 %. To keep the instrument dry and improve the reliability of NIR spectra, a special desiccant was used to eliminate the effect of moisture. Each NIR spectra were recorded in triplicate, and the average spectrum was used for further analysis.

FT-NIR spectra processing

All the adulterated GPs and pure powders were separated into calibration set and prediction set in a 2:1 ratio using the SPXY algorithm for the model development and performance prediction, respectively. To further remove chemical information irrelevant to NIR spectroscopy, including baseline drifts, instrument background noise and light scattering effects, several preprocessing methods were employed. The standard normal variate transformation (SNV) and multiplicative scatter correction (MSC) approaches were adopted to deal with the interferences of the light scatter and particle size. The first and second derivative (1st Der and 2nd Der) methods were used to eliminate the baseline drifts, and separate the broad and overlapping NIR bands. Moreover, Savitzky–Golay algorithm with 11 points of smoothing was performed to reduce the instrument background noise (Oliveira et al., 2020, Zhang et al., 2021). Twelve different pretreatments, including 1st Der, 1st Der + SG, 2nd Der, 2nd Der + SG, MSC + 1st Der, MSC + 1st Der + SG, MSC + 2nd Der, MSC + 2nd Der + SG, SNV + 1st Der, SNV + 1st Der + SG, SNV + 2nd Der, and SNV + 2nd Der + SG, were developed and compared based on the raw NIR spectra. These preprocessing methods were coupled to filter the most suitable combinations for further modeling analyses.

Chemometrics

The classification models

Chemometrics, including unsupervised and supervised pattern recognition methods, were used to identify the adulterants in GP samples. Principal component analysis (PCA), an unsupervised model based on data dimensionality reduction (Yan et al., 2021), was used to visualize the distribution trends of the pure GP and adulterated GP samples. Supervised discriminative models suiting for classification were developed, including partial least squares-discriminant analysis (PLS-DA) and some machine learning algorithms: support vector machine (SVM), gradient boosting (GB), and random forest (RF), which were mainly applied in the precise discrimination between GP and adulterated GP samples.

The quantitative models

PLSR was applied in the quantification of adulteration contents in different GP samples. Linear mathematical correlation between independent variables X (concentration of adulterants) and dependent variable Y (spectral data) can be observed by multivariate calibration analysis through PLSR model. To assess the success of data preprocessing and model performance, the following parameters were calculated: coefficient of determination for calibration (R2), coefficient of determination for prediction (R2), coefficient of determination for cross-validation (R2), root mean square error of estimation (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and the values of prediction to deviation (RPD, the ratio of stander deviation to RMSEP). A good calibration model should have high values of R and R, and with low values of RMSEC and RMSEP (Ndlovu et al., 2021, Ye et al., 2018). RMSECV was the result by 7-fold cross-validation procedure and mainly used to assess the modeling performance of the PLSR models. The RPD values reflect the overall predictive capability of the PLSR models and the performance indicates excellent when the RPD values are greater than 3 (Ndlovu et al., 2021).

Software

The raw NIR spectra were optimized by TQ Analyst 9.0 software (Thermo Fisher Scientific Co., USA). PCA, PLS-DA and PLSR were performed in SIMCA-P software (Version 14.1, Umetrics, Sweden). Machine learning algorithms, including SVM, GB, and RF, were realized in Python (version 3.8, Python Software Foundation, Delaware, USA) language with machine learning library scikit-learn (version1.0.2) and programming tool Jupyter Notebook. The indices of the area under the curve (AUC), precision, recall, and F1-scores, were obtained by the confusion matrix display function in the sklearn.metrics module to evaluate identification performance. The SPXY algorithm were also realized in Python. ORIGIN 2021 pro (Northampton, MA, USA) was used for drawing folding line charts.

Results and discussion

Color analysis

The appearance and color characteristics between GPs and adulterants

The appearance characteristics of GP, adulterated GPs, PCS, PSF, and PWF are shown in Fig. 1A. It is obvious that the color between GP and PCS, PWF and PSF is significantly different through visual observation. In terms of the authentic and adulterated powders, the color characteristics between authentic GP and the GPs with different adulteration levels (10 %, 20 %, 30 %, 40 %, and 50 %) are very similar and cannot be distinguished by naked eyes. For another perspective, at low levels of adulteration, the color of GP tends to mask the color profiles of the adulterants, which makes it difficult to identify the adulterated GPs by visual inspection. To further investigate the color changes of GP dopants, the main color information of pure and adulterated powder was determined by chromaticity analysis technique, and the quantification and characterization of the color properties in different powder were achieved according to CIELAB chromaticity space theory. The chroma data including L*, a*, b*, and ΔE values are listed in Table S1. Generally, the L*, a*, b* values represent the lightness, redness to greenness, yellowness to blueness of the powder samples, respectively. For L* values, all the adulterated GP samples are higher than pure GP, which indicates that the brightness of adulterated samples are increase to some extent. With the adding of adulterated proportions, the L* values of GP adulterants gradually increase, but the magnitude of change is not significant (Fig. 1B). For a* and b* values, the color trends of the adulterated samples are similar with the rise of adulterated percentages (Fig. 1C-D), as shown by the increase of GP adulterated with CS and the decrease of GP adulterated with SF and WF. Compared to GP, the b* values of the adulterated GP samples are all reduced, suggesting that the color of adulterated samples was far from the original yellow of pure ginger powder. The ΔE is an important indicator used to measure the color changes of GP before and after adulterating. As shown in Fig. 1E, the ΔE values of the GP adulterated with CS and SF varied less relative to pure GP, and the GP adulterated with WF varied more compared to pure GP, indicating that the color profiles of GP adulterants with WF changed significantly. It may be attributed to the remarkable differences between pure GP and adulterants (PCS, PSF, and PWF) in a*and b* parameters.

PCA model between the pure and adulterated GPs

To further visualize the difference between authentic and adulterated GPs based on chromaticity analysis, PCA model was established. As an unsupervised pattern recognition method in multivariate statistical analysis, PCA can reduce the dimensionality of complex data by projecting the variables of the dataset into the first few components and offer the objective classification among samples. In this study, the total values of the first two PCs were 50.4 % and 39.3 %, with R2X = 0.897 and Q2 (cum) = 0.616, indicating that the total variation could be better explained and predicted, respectively. As shown in the score plots (Fig. 1F), the GPs could be significantly distinguished from three pure adulterants (PCS, PSF, and PWF), indicating that different types of pure powder were remarkably differed in color properties. The adulterated GP samples and pure GP were categorized into a close cluster in PCA model. In addition, there was no significantly difference between some adulterated GPs and pure GP (Table S1), suggesting that GPs after adulterating cannot be completely identified from authentic powder based on chromaticity measurement. However, color digitization can reflect the accurate and objective information of adulterants and provide a better classification compared to traditional observation by naked eyes.

Qualitative analyses based on FT-NIR spectroscopy

NIR spectra and optimal pretreatment

The raw NIR (10000 cm−1 – 4000 cm−1) spectra of GP, the adulterated GPs with different levels, PCS, PSF and PWF are shown in Fig. 2A. The spectral variations of the samples were similar in the wavelength of 10000–6000 cm−1, and showed significantly different in the range of 6000–4000 cm−1. These average NIR spectra reflects the valuable chemical information of authentic and adulterated powders. The common absorption peaks can be clearly seen around 8350 cm−1, 6930 cm−1, 6352 cm−1, 5762 cm−1, 5179 cm−1, 4381 cm−1, and 4312 cm−1 in GP and adulterated GPs. Generally, the peaks around 8350 cm−1 and 5762 cm−1 are induced by the first and second overtones of C—H stretching, and the peaks at 6930 cm−1 and 6352 cm−1 are assigned to O—H or N—H stretching vibrations in the first overtone (Hong et al., 2019). The band at 5762 cm−1 is related to the first overtone of C—H stretching vibrations (Zhao et al., 2020), and the 5179 cm−1 belongs to the combination of O—H and C—O stretching (Zhang et al., 2021). As shown in Fig. 2B, the spectral signature of GP, PCS, and PWF are similar in absorption bonds. However, the absorption bonds of GP and PSF are different, which specifically reflects in the remarkable absorption peaks at 4751 cm−1, 4852 cm−1, and 4605 cm−1 in PSF. The vibrational differences of the NIRs absorption peaks may be important factors for discrimination of authentic and adulterated powders.

Fig. 2

Average raw NIR spectra of all powder samples (A) and independent pure powder (B); Optimized NIR spectra after MSC + 1st Der + SG pretreatment of all powder samples (C) and independent pure powder (D). Although the spectral characteristics between GP and pure adulterants are different, the spectral discrepancy between GP and adulterated GP samples are subtle (Fig. S1) and cannot be identified by visual observation, especially in the low level of adulterations. Therefore, it is necessary to preprocess the spectra of adulterated samples to improve the accuracy of discrimination. The dataset of NIR spectra includes 68 calibration samples and 34 prediction samples. Then, a total of 19 reprocessing methods of NIR spectra were compared in TQ Analyst 9.0 software, and the prediction accuracy and performance index were used to evaluate the quality of the model. Generally, the highest prediction accuracy and performance index indicate the best pretreatment method. As shown in Table S2, the combination of MSC + 1st Der + SG has the best prediction accuracy (100 %) and performance index (93.0), which can be used for modeling analysis for further identification. The optimized NIR spectra are shown in Fig. 2C-D.

Discriminant analyses by PCA and PLS-DA model

An unsupervised PCA model was established between GP and adulterated GPs based on NIRs information. All NIR spectra were preprocessed by optimal combination of MSC + 1st Der + SG. In this model, twelve principal components were fitted, and the 49.0 % and 34.5 % of all data variance were illustrated by PC1 and PC2, respectively. As shown in Fig. 3A, GP samples were obviously separated from three pure adulterants, suggesting that the chemical composition of GP and pure adulterants may be significantly different. The score points of the adulterated GP samples were closed to pure GP and far away from pure adulterants, which indicated that the adulterated GP samples were similar to those of GP samples in spectral profile. It is obvious that GPs adulterated with SF are classified into one cluster, and GPs adulterated with CS and WF are partially overlapped in the same quadrant (Fig. 3B). Recently, many studies reported that gingerols, shogaols, terpenes are the main constituents in ginger, which are also considered as the material basis of its flavor ( Yu et al., 2022, Zhang et al., 2021). These components consist of C—H, O—H and N—H groups with strong multiplicative and synchrotron absorption in the NIR regions (7000 cm−1 –4000 cm−1) (Nagy et al., 2022). However, CS, WF and SF do not contain these characteristic components, which may be the main reason for discrimination of GPs from its adulterations. Although GP and adulterated GPs shared similar bands in the spectrogram, the involvement of classification algorithms amplified the discrepancy and realized the visualization of the classification. In addition, the distribution of GPs adulterated with CS, SF, and WF was according to the different concentration of adulteration, specifically showing that the samples with low percentage of adulteration were close to GP, and the samples with high percentage of adulteration were far from GP, which still need to be validated by further quantitative analyses. These results indicated that the authentic and adulterated GPs can be effectively distinguished by optimized NIR spectral information.

Fig. 3

Score plots of PCA Model (A) of GP, three pure adulterants and adulterated GPs with different adulteration levels (10%, 20%, 30%, 40%, and 50%); score plots of PCA Model (B) of the authentic and adulterated GPs, A-CS: adulterated with corn starch, A-SF: adulterated with soybean flour, A-WF: adulterated with wheat flour; cross-validation results with 200 times of calculations using a permutation test (C); score plots of PLS-DA Model (D) of the authentic and adulterated GPs. Compared to PCA model, the supervised PLS-DA model was further conducted for discrimination of GP and its adulteration. The main parameters of R2X and Q2 (cum) were 0.894 and 0.820, respectively, indicating that the model has a strong explanatory and prediction in classification of GP samples from different adulterants. To determine whether the model was overfitting, 200 times of permutation tests were conducted. As shown in Fig. 3C, the intercepts of R2 and Q2 were less than 0.3 and 0.05, respectively. This result enhanced the robustness and persuasiveness of the PLS-DA model. Fig. 3D shows the scatter plot of PLS-DA and exhibited a similar distinction to PCA model. Although pure GP can be discriminated from adulterated GPs, the adulterated samples with different types of adulterants and different levels of concentration were not clearly distinguished, for example, 10 % − 30 % CS and 10 % − 50 % SF were significantly overlapped on both PCA and PLS-DA models, which may be attributed to the limitations in the processing of spectral information of PCA and PLS-DA models.

Discriminant analyses by machine learning algorithms

To verify the accuracy and reliability of the above results and obtain more accurate classification between authentic and adulterated GPs, three pattern recognition algorithms, including SVM, RF, and GB, were developed for in-depth analysis. Generally, SVM is used to classify data sets by maximizing their distance (maximum margin) between data points (support vectors) or finding a separating hyperplane with the best classification (Amirvaresi & Parastar, 2021). RF and GB are two ensemble algorithms based on decision trees which reduce the impact of outliers and the possibility of model overfitting, thus improving the accuracy of discrimination (Han et al., 2021, Sun et al., 2021). All the algorithmic models were established based on the optimal NIR spectra processed by MSC + 1st Der + SG. The receiver operating characteristic (ROC) curves and area under the curve (AUC) were used to evaluate the reliability of these models. The ROC curves and AUC values of three algorithms are shown in Fig. S2. Specifically, the AUC values of SVM, RF, and GB models all reached 1.00. For an ideal classifier, the AUC value is usually close to 1.00 (Lyu et al., 2021), indicating that three models (SVM, RF, and GB) attained optimal operation in this work. Accuracy is another metric used to evaluate the performance of these models, usually, 20–30 % of the total samples are selected as the test samples to validate the training set and derive the discriminative accuracy. In this study, the predictive accuracies of, SVM, RF, and GB models were 87 %, 100 %, and 100 %, respectively, which indicated that three classifiers successfully resolved the differences between authentic and adulterated spectra and exhibited powerful explanatory ability. Confusion matrix diagrams of three algorithms are shown in Fig. 4A. It is clearly visible that GP and adulterated GP samples in RF and GB classifiers were not misclassified. However, GPs adulterated with CS and SF were misjudged to those adulterated with WF in SVM classifier. Precision and recall are two metrics used to assess the accuracy of different algorithms. F1-scores are the combined index of precision and recall, and usually, the best result is close to 1. For the RF and GB classifiers, all recognition rates were 1.000 (Fig. 4B), suggesting the high adaptability of these models. Our results indicated that NIRs combined with machine learning algorithms showed better performance in classification when compared to other models (like PCA and PLS-DA models), and similar results were obtained in the previous research reports (de Santana et al., 2019, Li et al., 2021). In summary, machine learning algorithms were successfully developed to identify the authentic and adulterated GPs in this study, and RF and GB classifiers exhibited the best performance.

Fig. 4

The confusion matrices for SVM, RF, and GB classifiers (A); The algorithm evaluation metrics (precision, recall, and F1-scores) for SVM, RF, and GB classifiers (B).

Quantitative analyses based on FT-NIR spectroscopy

Spectral pretreatment based on PLSR model

The existence of soybean flour, corn starch and wheat flour in the adulterated GPs was successfully identified and accurately differentiated by PCA, PLS-DA and machine learning algorithms. Subsequently, a quantitative calibration model by PLSR was developed to further predict the concentration of the adulterants in GP samples. Similarly, the spectral preprocessing methods were optimized to improve the capabilities of PLSR model. The quantitative models were constructed to realize the prediction of the additive ratio of adulterants (CS, SF, and WF) and the concentration of pure GP. As shown in Table 1, the best quantitative pretreatment has been marked in bold by comparing of different metrics. Regarding the prediction of adulterated proportion, the combinations including 2nd Der + SG, 1st Der + SG, and SNV + 2nd Der + SG exhibited the higher RPD values of 13.68, 14.61 and 4.30, and yielded lower RMSEP values of 0.0109, 0.0102 and 0.0347 in the validation for models of SF, WF and CS, respectively. These results indicate that the PLSR model has the best predictive effect after preprocessing. In addition, the coefficients of determination obtained during calibration (R2), prediction (R2), and cross-validation (R2) are close to each other, which indicates that the established models are not underfitted and overfitted (Pandiselvam et al., 2022). Hence, the above preprocessing methods were selected to construct the calibration model for three adulterants. Meanwhile, we also carried out the prediction of the true concentration of pure GP in different adulterated GPs. Although the pretreatment method of MSC + 2nd Der + SG and SNV + 2nd Der + SG exhibited the same RPD (8.92) and R2 (0.9939), the RMSECV value of MSC + 2nd Der + SG (0.0176) performed a little better than SNV + 2nd Der + SG (0.0177). Thus, the NIR spectra processed by MSC + 2nd Der + SG were effective in the prediction of pure GP. Furthermore, the RPD values of the quantitative models established after spectral pretreatment were higher than those quantitative models without preprocessing. Overall, the spectral pretreatment methods screened in this study improved the performance of the PLSR model for quantitative analysis of the concentrations in GPs and adulterants.

Table 1

Parameters of PLSR models for the determination of the concentrations of adulterations (SF, WF, and CS) and the purity of ginger powder using FT-NIR based on different pretreated methods.

Adulteration Scenario	Processing methods	LV^a	R²_c	RMSEC	R²_p	RMSEP	R²_cv	RMSECV	RPD
Adulterated with SF	RAW	2	0.9584	0.0404	0.9780	0.0324	0.9793	0.0481	4.60
	1st Der	1	0.9971	0.0108	0.9971	0.0110	0.9933	0.0250	13.55
	1st Der + SG	1	0.9971	0.0108	0.9971	0.0110	0.9933	0.0279	13.55
	2nd Der	1	0.9997	0.0033	0.9967	0.0130	0.9078	0.1160	11.47
	2nd Der + SG	1	0.9966	0.0116	0.9971	0.0109	0.9937	0.0307	13.68
	MSC + 1st Der	1	0.9960	0.0126	0.9958	0.0133	0.9908	0.0333	11.21
	MSC + 1st Der + SG	1	0.9960	0.0126	0.9958	0.0133	0.9909	0.0369	11.21
	MSC + 2nd Der	2	0.9995	0.0043	0.9955	0.0152	0.8583	0.1210	9.81
	MSC + 2nd Der + SG	1	0.9956	0.0133	0.9961	0.0126	0.9902	0.0337	11.83
	SNV + 1st Der	1	0.9960	0.0126	0.9958	0.0133	0.9908	0.0331	11.21
	SNV + 1st Der + SG	1	0.9960	0.0126	0.9958	0.0133	0.9909	0.0367	11.21
	SNV + 2nd Der	2	0.9995	0.0043	0.9955	0.0152	0.8580	0.1210	9.81
	SNV + 2nd Der + SG	1	0.9956	0.0133	0.9961	0.0126	0.9902	0.0337	11.83
Adulterated with WF	RAW	2	0.9623	0.0358	0.9657	0.0453	0.8829	0.0879	3.29
	1st Der	2	0.9984	0.0078	0.9961	0.0135	0.9958	0.0363	11.04
	1st Der + SG	3	0.9993	0.0053	0.9975	0.0102	0.9978	0.0215	14.61
	2nd Der	1	0.9895	0.0205	0.9168	0.1010	0.8444	0.2080	1.48
	2nd Der + SG	2	0.9982	0.0085	0.9973	0.0120	0.9869	0.0702	12.42
	MSC + 1st Der	1	0.9984	0.0078	0.9971	0.0111	0.9969	0.0324	13.43
	MSC + 1st Der + SG	1	0.9985	0.0078	0.9971	0.0109	0.9981	0.0153	13.68
	MSC + 2nd Der	1	0.9874	0.0224	0.9234	0.1030	0.8434	0.2470	1.45
	MSC + 2nd Der + SG	1	0.9976	0.0098	0.9975	0.0113	0.9886	0.0667	13.19
	SNV + 1st Der	1	0.9984	0.0078	0.9971	0.0111	0.9969	0.0324	13.43
	SNV + 1st Der + SG	1	0.9985	0.0078	0.9971	0.0109	0.9981	0.0153	13.68
	SNV + 2nd Der	1	0.9874	0.0224	0.9234	0.1030	0.8434	0.2470	1.45
	SNV + 2nd Der + SG	1	0.9976	0.0098	0.9975	0.0112	0.9886	0.0667	13.31
Adulterated with CS	RAW	2	0.9623	0.0385	0.9617	0.0392	0.9508	0.0753	3.80
	1st Der	2	0.9938	0.0157	0.9641	0.0397	0.9873	0.0595	3.75
	1st Der + SG	3	0.9989	0.0067	0.9521	0.0455	0.9929	0.0272	3.28
	2nd Der	2	0.9980	0.0089	0.9238	0.0579	0.4579	0.1750	2.57
	2nd Der + SG	3	0.9986	0.0073	0.9723	0.0351	0.9756	0.0826	4.25
	MSC + 1st Der	2	0.9983	0.0081	0.9666	0.0405	0.9939	0.0222	3.68
	MSC + 1st Der + SG	1	0.9904	0.0154	0.9696	0.0369	0.9905	0.0460	4.04
	MSC + 2nd Der	1	0.9205	0.0552	0.8928	0.0653	0.4183	0.1390	2.28
	MSC + 2nd Der + SG	2	0.9973	0.0104	0.9760	0.0348	0.9815	0.0664	4.28
	SNV + 1st Der	2	0.9984	0.0081	0.9666	0.0404	0.9938	0.0224	3.69
	SNV + 1st Der + SG	2	0.9940	0.0154	0.9696	0.0369	0.9903	0.0468	4.04
	SNV + 2nd Der	1	0.9204	0.0553	0.8927	0.0653	0.4164	0.1390	2.28
	SNV + 2nd Der + SG	3	0.9974	0.0103	0.9759	0.0347	0.9810	0.0672	4.30
GP proportion	RAW	4	0.8923	0.0742	0.8917	0.0748	0.8555	0.0857	2.23
	1st Der	4	0.9959	0.0148	0.9888	0.0247	0.9806	0.0325	6.75
	1st Der + SG	4	0.9965	0.0137	0.9921	0.0210	0.9894	0.0250	7.94
	2nd Der	3	0.9752	0.0364	0.9656	0.0610	0.7939	0.1210	2.73
	2nd Der + SG	3	0.9879	0.0225	0.9904	0.0251	0.9669	0.0425	6.64
	MSC + 1st Der	5	0.9974	0.0119	0.9908	0.0225	0.9911	0.0222	7.41
	MSC + 1st Der + SG	4	0.9967	0.0134	0.9917	0.0215	0.9934	0.0196	7.76
	MSC + 2nd Der	4	0.9988	0.0082	0.9893	0.0519	0.7432	0.126	3.21
	MSC + 2nd Der + SG	4	0.9969	0.0130	0.9939	0.0187	0.9945	0.0176	8.92
	SNV + 1st Der	5	0.9974	0.0119	0.9908	0.0225	0.9909	0.0224	7.41
	SNV + 1st Der + SG	4	0.9966	0.0135	0.9918	0.0214	0.9934	0.0196	7.79
	SNV + 2nd Der	4	0.9987	0.0082	0.9893	0.0519	0.7439	0.1260	3.21
	SNV + 2nd Der + SG	4	0.9968	0.0130	0.9939	0.0187	0.9945	0.0177	8.92

Notes: LVa: Latent Variable.

Parameters of PLSR models for the determination of the concentrations of adulterations (SF, WF, and CS) and the purity of ginger powder using FT-NIR based on different pretreated methods. Notes: LVa: Latent Variable.

Regression curves building

Based on the above pretreatment methods, the optimized spectra of GP samples were obtained, and the quantitative regression curves were constructed by combining the real contents of GPs and adulterants. The regression curves of PLSR models for adulteration of SF, WF, and CS are shown in Fig. 5. Generally, the regression line represents the most desirable result in a quantitative model, and that the scatter points close to this line indicates the model is excellent (Sun et al., 2021, Yang et al., 2017). In this study, all the data points in calibration set and prediction set are tightly clustered around the diagonal lines, suggesting that the established PLSR models have perfect performance in content prediction of adulterants. Moreover, Fig. 5 also shows the successful prediction of the quantitative model for actual concentration of GP, which further validates the reliability of the PLSR model for content prediction base on NIR spectra.

Fig. 5

Regression curves of actual and predicted adulteration levels of SF, WF and CS, and the real concentration of GP by PLSR model.

Conclusion

In the present study, NIR spectroscopy combined with chemometrics were first used to identify and quantify adulterated GPs with three types of adulterants. The chromaticity analysis results showed that the color properties could not realize a complete identification between adulterated and non-adulterated GPs. Further analysis was performed using NIR spectroscopy. Both PCA and PLS-DA models could realize initial division between pure GP and adulterated GPs. Several deep learning algorithms (SVM, RF, and GB) could realize the better classification among GP adulteration, and RF and GB algorithms led to the highest accuracy of 100 %. Subsequently, the PLSR models were built to further determine the adulterated levels of GP adulteration. Four content prediction models, including pure GP and GPs adulterated with SF, WF and CS, were fitted based on the optimal pretreatment methods (MSC + 2nd Der + SG, 2nd Der + SG, 1st Der + SG, and SNV + 2nd Der + SG), with RPD values of 8.92, 13.68, 14.61, and 4.30, respectively. The regression curves also showed that four quantitative models of PLSR exhibited good linearity and precision with low RMSECV (0.0176, 0.0307, 0.0215, and 0.0672). Overall, NIR spectroscopy combined with chemometrics were found to be a useful tool for classification and quantification between GP and its adulterants. This method can quickly track the adulteration of GP, which ensure the authenticity of the GP and maintain the stability of the consumer market.

CRediT authorship contribution statement

Dai-xin Yu: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Sheng Guo: Supervision, Project administration, Writing – review & editing. Xia Zhang: Software, Methodology. Hui Yan: Methodology, Resources. Zhen-yu Zhang: Software, Methodology. Xin Chen: Investigation. Jiang-yan Chen: Investigation. Shan-jie Jin: Software. Jian Yang: Resources, Funding acquisition. Jin-ao Duan: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

23 in total

1. FT-NIR spectroscopy coupled with multivariate analysis for detection of starch adulteration in turmeric powder.

Authors: Saumita Kar; Bipan Tudu; Arun Jana; Rajib Bandyopadhyay
Journal: Food Addit Contam Part A Chem Anal Control Expo Risk Assess Date: 2019-04-29

2. Random forest as one-class classifier and infrared spectroscopy for food adulteration detection.

Authors: Felipe Bachion de Santana; Waldomiro Borges Neto; Ronei J Poppi
Journal: Food Chem Date: 2019-04-27 Impact factor: 7.514

3. Assessment of paprika geographical origin fraud by high-performance liquid chromatography with fluorescence detection (HPLC-FLD) fingerprinting.

Authors: Guillem Campmajó; Luis R Rodríguez-Javier; Javier Saurina; Oscar Núñez
Journal: Food Chem Date: 2021-02-23 Impact factor: 7.514

4. External parameter orthogonalization-support vector machine for processing of attenuated total reflectance-mid-infrared spectra: A solution for saffron authenticity problem.

Authors: Arian Amirvaresi; Hadi Parastar
Journal: Anal Chim Acta Date: 2021-02-11 Impact factor: 6.558

5. Potential of smartphone-coupled micro NIR spectroscopy for quality control of green tea.

Authors: Luqing Li; Shanshan Jin; Yujie Wang; Ying Liu; Shanshan Shen; Menghui Li; Zhiyu Ma; Jingming Ning; Zhengzhu Zhang
Journal: Spectrochim Acta A Mol Biomol Spectrosc Date: 2020-10-24 Impact factor: 4.098

6. A field trials-based authentication study of conventionally and organically grown Chinese yams using light stable isotopes and multi-elemental analysis combined with machine learning algorithms.

Authors: Chaogeng Lyu; Jian Yang; Tielin Wang; Chuanzhi Kang; Sheng Wang; Hongyang Wang; Xiufu Wan; Li Zhou; Wenjin Zhang; Luqi Huang; Lanping Guo
Journal: Food Chem Date: 2020-10-31 Impact factor: 7.514

7. Rapid and practical qualitative and quantitative evaluation of non-fumigated ginger and sulfur-fumigated ginger via Fourier-transform infrared spectroscopy and chemometric methods.

Authors: Hui Yan; Peng-Hui Li; Gui-Sheng Zhou; Ying-Jun Wang; Bei-Hua Bao; Qi-Nan Wu; Shen-Liang Huang
Journal: Food Chem Date: 2020-10-01 Impact factor: 7.514

8. Headspace GC/MS and fast GC e-nose combined with chemometric analysis to identify the varieties and geographical origins of ginger (Zingiber officinale Roscoe).

Authors: Dai-Xin Yu; Xia Zhang; Sheng Guo; Hui Yan; Jie-Mei Wang; Jia-Qi Zhou; Jian Yang; Jin-Ao Duan
Journal: Food Chem Date: 2022-07-12 Impact factor: 9.231

9. Authentication of PDO paprika powder (Pimentón de la Vera) by multivariate analysis of the elemental fingerprint determined by ED-XRF. A feasibility study.

Authors: Yiannis Fiamegos; Catalina Dumitrascu; Sergej Papoci; Maria Beatriz de la Calle
Journal: Food Control Date: 2021-02 Impact factor: 5.548