Dai-Xin Yu1, Sheng Guo1, Xia Zhang2, Hui Yan1, Zhen-Yu Zhang1, Xin Chen1, Jiang-Yan Chen1, Shan-Jie Jin2, Jian Yang3, Jin-Ao Duan1. 1. National and Local Collaborative Engineering Center of Chinese Medicinal Resources Industrialization and Formulae Innovative Medicine, Jiangsu Collaborative Innovation Center of Chinese Medicinal Resources Industrialization, Nanjing University of Chinese Medicine, Nanjing 210023, China. 2. College of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing 210023, China. 3. State Key Laboratory of Dao-di Herbs Breeding Base, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China.
Abstract
Ginger powder (GP) is a popular spice in the world. Duo to its nutritional value, GP is regarded as an attractive target for adulteration, which is not easily detected. In this study, chromaticity analysis and Fourier transform near-infrared (FT-NIR) spectroscopy combined with chemometrics were developed to identify and quantify of GP and its adulterants. The result showed that GPs and adulterated GPs cannot be completely distinguished by chromaticity analysis. While, the optimized NIR spectra could accurately distinguish the authentic GPs from those adulterated samples. Random forest and gradient boosting algorithms exhibited the highest accuracies (100%) in classification. Moreover, a quantitative model was successfully established to predict the adulteration level in GP. The optimal parameters of prediction to deviation were 8.92, 13.68, 14.61, and 4.30, for pure and adulterated GPs. Overall, FT-NIR spectroscopy is a promising tool, which can quickly identify potential adulteration in GP and track the types of adulterants.
Ginger powder (GP) is a popular spice in the world. Duo to its nutritional value, GP is regarded as an attractive target for adulteration, which is not easily detected. In this study, chromaticity analysis and Fourier transform near-infrared (FT-NIR) spectroscopy combined with chemometrics were developed to identify and quantify of GP and its adulterants. The result showed that GPs and adulterated GPs cannot be completely distinguished by chromaticity analysis. While, the optimized NIR spectra could accurately distinguish the authentic GPs from those adulterated samples. Random forest and gradient boosting algorithms exhibited the highest accuracies (100%) in classification. Moreover, a quantitative model was successfully established to predict the adulteration level in GP. The optimal parameters of prediction to deviation were 8.92, 13.68, 14.61, and 4.30, for pure and adulterated GPs. Overall, FT-NIR spectroscopy is a promising tool, which can quickly identify potential adulteration in GP and track the types of adulterants.
In recent years, the issue of food authenticity has been a common concern among consumers, which covers several aspects, including origin falsification, adulteration, variety mixing, and mislabeling (Barreto et al., 2018, Yu et al., 2022). With the globalization and complexity of the food supply chain, the phenomenon of food authenticity has spawned food fraud in production, manufacturing, processing and distribution segments for economic gain (Horn et al., 2021). Usually, food fraud occurs through partial substitution, addition, tampering of food ingredients, and false labeling of geographical origins (Khodabakhshian et al., 2021). These phenomena not only pose potential health and safety problems to consumers, but also creates a credibility crisis for the food industry.As an important part of the food supply chain and human diet, spices have been widely present in food, beverages, medicines, cosmetics and other combinations (Yan et al., 2021). A set of statistic suggests that the global spices and herbs market is estimated at approximately US$ 79 billion in 2022, and the global market for spices is likely to witness expanding its valuation to about US$ 126 billion by the end of 2023 (https://www.statista.com/statistics/876234/global-seasoning-and-spices-market-size/). For ease of usability and portability, the current circulation of spices mostly appears in the form of powder, which gives the unscrupulous merchants some illegal ways to make profits. Adulteration with other similar powder and keeping the original spice in form, color and odor to reduce costs and increase revenue, are considered as the common instance of food fraud (Yu et al., 2022). In other words, mixing spices with any kind of substance is considered as food falsification and it has attracted great concern of the industries, governments and standard-setting organizations (Jahanbakhshi et al., 2021).Ginger (Zingiber officinale Roscoe) is a spicy condiment and is often eaten in fresh form or made into dried slices and powder for flavoring (Yu et al., 2022). Due to its unique flavor, high nutrition, and medicinal value, ginger is extensively consumed as a flavoring agent, dietary supplement, herbal medicine, and the raw materials of many desserts and beverages, such as ginger tea, ginger candies and ginger beer (Srinivasan, 2017). Brown sugar and ginger tea made by ginger slices or powder are widely consumed in traditional Chinese medicine to dispel the cold and improve the immunity. Modern studies found that gingerols, shogaols, terpenes and sugars are the key bioactive constituents of ginger (Zhang et al., 2021). Moreover, these compounds were reported to have beneficial effects to human health, including antioxidant, anti-inflammatory, antimicrobial activity and immune-modulatory activity (Kiyama, 2020). China is the major ginger-producing and -exporting country in the world, where besides being sold as a fresh product, ginger is also processed into ginger powder (GP) in large quantities. According to the UN International Trade Database, the annual average export volumes of GP from China was 67,928 tons from 2017 to 2021, which far exceeds other countries (https://comtrade.un.org/data/). With the substantial increase for GP on both domestic and international markets, adulteration is easy to achieve. Generally, the cheaper or lower quality of edible powder are widely used as adulterants. Once adding, they cannot be identified by visual examination, which reduces the original nutrition and medicinal value of GP and subsequently poses a high risk to consumers and normal market activities. Hence, it is vital to establish a rapid and effective method to distinguish the authentic GPs from those adulterated ones.To date, multiple targeted methods, such as microscopic techniques (Kiani et al., 2019), liquid chromatography (Campmajó et al., 2021), mass spectrometry (Mohamed et al., 2021), and elemental fingerprint analysis (Fiamegos et al., 2021), have been successfully applied for identifying adulteration of spices or herbs. Despite its high precision and stability, these analytical methods are time-consuming, laborious, and environmentally unfriendly. As a common spectroscopic analysis technique, near-infrared spectroscopy (NIRs) has been widely used in quality control and origin tracing for foods and herbs (Chang et al., 2020). Due to its fast and non-destructive detection, NIRs usually gives special spectral information about the samples in a few seconds and reflects the information of chemical composition indirectly (Wu et al., 2022). With the development of chemometrics, NIRs coupled to multivariate statistical analysis has been successfully applied for qualitative and quantitative analyses in food, agricultural, and herbal medicine areas (Xue et al., 2021). Various studies have revealed the great potential of NIRs to distinguish spice adulteration, such as turmeric powder adulterated with corn flour (Kar et al., 2019), green banana flour adulterated with wheat flour (Ndlovu et al., 2021), and paprika powder mixed with potato starch and acacia gum (Oliveira et al., 2020). Comparing to Fourier transform-infrared spectroscopy (FT-IR), the quantitative performance of NIRs is even more prominent. Furthermore, the NIR region is dominated by weak overtones resulting in lower molar absorption and deeper penetration of the NIR waves inside the samples, making it more suitable for the analysis of heterogeneous samples, such as adulterated powders (Nagy et al., 2022).Although a recent study revealed that the adulterated chickpeas in ginger powder can be successfully identified using image recognition technology (Jahanbakhshi et al., 2021), the research did not involve a quantitative model of the extents of GP adulteration. NIRs can achieve simultaneous characterization and quantification, which can fill the gap in the quantitative study of GP dopants. Moreover, there are few reports on the combination of NIR spectroscopy and machine learning algorithms to discriminate GPs and its adulterants.Hence, the main objective of this study was to investigate the feasibility of using NIR spectroscopy combined with chemometrics to identify and quantify the common adulterants of the starch of corn (Zea mays L.), and the flours of wheat (Triticum aestivum L.) and soybean (Glycine max (L.) Merr.) in GPs. The specific aims were to 1) establish classification models using supervised and unsupervised pattern recognition methods to identified authentic and adulterated GP samples, and 2) develop and optimize a quantitative calibration model using partial least squares regression (PLSR) method based on NIR spectra to accomplish the prediction of the concentrations in GPs and its adulterants.
Materials and methods
Materials
All fresh ginger samples were collected from Luoping county of Yunnan Province in China, and these samples were identified as the fresh rhizomes of Zingiber officinale Roscoe by Prof. Jin-ao Duan from Nanjing University of Chinese Medicine. For the market sale and brand value, GPs produced in Yunnan Province has the highest market share, which is often counterfeited by partial adulteration. The corn starch (CS), soybean flour (SF) and wheat flour (WF) used as the adulterants in GPs were purchased from Qingdao Wugu-kang Food and Nutrition Technology Co., ltd., Qingdao, China, and they all passed the food quality detection to ensure their authenticity and reliability.
Preparation of adulterated samples
Fresh ginger samples were washed, sliced, and dried at a constant temperature of 55 °C for 24 h in an electric thermostatic drying oven (DHA-9070A, Shanghai Jinghong Experimental Equipment Co., ltd., Shanghai, China). After drying, the dried ginger samples were crushed into powder by a high-speed grinder (FW-80, Tianjin Taisite Instrument Co., Tianjin, China), and the powder was all passed through 50 mesh sieves (355 μm ± 13 μm). Then, the adulterated GPs were prepared by adding the above CS, SF, and WF to pure GPs with five concentrations of 10 %, 20 %, 30 %, 40 %, and 50 %, respectively, and beyond this percentage, adulteration becomes obvious and can be typically identified by the naked eyes or taste. All samples were put into centrifuge tubes and mixed uniformly for 3 min using a vortex shaker. For each level, 6 samples were prepared, hence a total of 90 adulterated samples were gathered and the representative adulterants were shown in Fig. 1A. In addition, control samples for pure ginger powder (GP, n = 12), pure corn starch (PCS, n = 3), pure soybean flour (PSF, n = 3), and pure wheat flour (PWF, n = 3) were also used for comparative study with those adulterants.
Fig. 1
Ginger powder samples (A) with different adulteration levels (10%, 20%, 30%, 40%, and 50%), A-CS: adulterated with corn starch, A-SF: adulterated with soybean flour, A-WF: adulterated with wheat flour, GP: pure ginger powder, PSF: pure soybean flour, PWF: pure wheat flour, PCS: pure corn starch; The L* values (B), a* values (C), b* values (D), and ΔE values (E) with different adulteration levels (10%, 20%, 30%, 40%, and 50%) by chromaticity analysis; Score plots of the PCA model of pure GP and adulterated GPs (F).
Ginger powder samples (A) with different adulteration levels (10%, 20%, 30%, 40%, and 50%), A-CS: adulterated with corn starch, A-SF: adulterated with soybean flour, A-WF: adulterated with wheat flour, GP: pure ginger powder, PSF: pure soybean flour, PWF: pure wheat flour, PCS: pure corn starch; The L* values (B), a* values (C), b* values (D), and ΔE values (E) with different adulteration levels (10%, 20%, 30%, 40%, and 50%) by chromaticity analysis; Score plots of the PCA model of pure GP and adulterated GPs (F).
Color measurements
The color characteristics of the different adulterated GPs and pure powder were measured by a chroma analyzer (CM-5, KONICA MINOLTA, Tokyo, Japan). The spectrophotometer was calibrated by a white plate, and the powder was evenly flat in the special dish, which kept the same weight and thickness of all samples. According to the CIELAB color space theory of the International Commission on Illumination, the parameters of L*, a*, and b* values were collected (Biancolillo et al., 2022). The total color variation (ΔE) was used to describe the color change of different adulteration levels with the following equation:The values of L*, a*, and b* were measured by pure ginger powder.
Acquisition of FT-NIR spectra
An Antaris™ II FT-NIR spectrophotometer (Thermo Fisher Scientific Co., USA), equipping with a rotating sample-cup spinner, extended InGaAs detector, and a tungsten halogen lamp as light source, was used to collected NIR spectra for adulterated GP samples. Result Software (Antaris™ II System, Thermo Fisher Scientific Co., USA) was used for the acquisition of NIRs data. The spectral data were acquired using the average of 32 scans in the range of 10000–4000 cm−1, and the spectra collected at a resolution of 8 cm−1. All the analyses were performed at a room temperature of 18–25 °C and the relative humidity of 30 %. To keep the instrument dry and improve the reliability of NIR spectra, a special desiccant was used to eliminate the effect of moisture. Each NIR spectra were recorded in triplicate, and the average spectrum was used for further analysis.
FT-NIR spectra processing
All the adulterated GPs and pure powders were separated into calibration set and prediction set in a 2:1 ratio using the SPXY algorithm for the model development and performance prediction, respectively. To further remove chemical information irrelevant to NIR spectroscopy, including baseline drifts, instrument background noise and light scattering effects, several preprocessing methods were employed. The standard normal variate transformation (SNV) and multiplicative scatter correction (MSC) approaches were adopted to deal with the interferences of the light scatter and particle size. The first and second derivative (1st Der and 2nd Der) methods were used to eliminate the baseline drifts, and separate the broad and overlapping NIR bands. Moreover, Savitzky–Golay algorithm with 11 points of smoothing was performed to reduce the instrument background noise (Oliveira et al., 2020, Zhang et al., 2021). Twelve different pretreatments, including 1st Der, 1st Der + SG, 2nd Der, 2nd Der + SG, MSC + 1st Der, MSC + 1st Der + SG, MSC + 2nd Der, MSC + 2nd Der + SG, SNV + 1st Der, SNV + 1st Der + SG, SNV + 2nd Der, and SNV + 2nd Der + SG, were developed and compared based on the raw NIR spectra. These preprocessing methods were coupled to filter the most suitable combinations for further modeling analyses.
Chemometrics
The classification models
Chemometrics, including unsupervised and supervised pattern recognition methods, were used to identify the adulterants in GP samples. Principal component analysis (PCA), an unsupervised model based on data dimensionality reduction (Yan et al., 2021), was used to visualize the distribution trends of the pure GP and adulterated GP samples. Supervised discriminative models suiting for classification were developed, including partial least squares-discriminant analysis (PLS-DA) and some machine learning algorithms: support vector machine (SVM), gradient boosting (GB), and random forest (RF), which were mainly applied in the precise discrimination between GP and adulterated GP samples.
The quantitative models
PLSR was applied in the quantification of adulteration contents in different GP samples. Linear mathematical correlation between independent variables X (concentration of adulterants) and dependent variable Y (spectral data) can be observed by multivariate calibration analysis through PLSR model. To assess the success of data preprocessing and model performance, the following parameters were calculated: coefficient of determination for calibration (R2), coefficient of determination for prediction (R2), coefficient of determination for cross-validation (R2), root mean square error of estimation (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and the values of prediction to deviation (RPD, the ratio of stander deviation to RMSEP). A good calibration model should have high values of R and R, and with low values of RMSEC and RMSEP (Ndlovu et al., 2021, Ye et al., 2018). RMSECV was the result by 7-fold cross-validation procedure and mainly used to assess the modeling performance of the PLSR models. The RPD values reflect the overall predictive capability of the PLSR models and the performance indicates excellent when the RPD values are greater than 3 (Ndlovu et al., 2021).
Software
The raw NIR spectra were optimized by TQ Analyst 9.0 software (Thermo Fisher Scientific Co., USA). PCA, PLS-DA and PLSR were performed in SIMCA-P software (Version 14.1, Umetrics, Sweden). Machine learning algorithms, including SVM, GB, and RF, were realized in Python (version 3.8, Python Software Foundation, Delaware, USA) language with machine learning library scikit-learn (version1.0.2) and programming tool Jupyter Notebook. The indices of the area under the curve (AUC), precision, recall, and F1-scores, were obtained by the confusion matrix display function in the sklearn.metrics module to evaluate identification performance. The SPXY algorithm were also realized in Python. ORIGIN 2021 pro (Northampton, MA, USA) was used for drawing folding line charts.
Results and discussion
Color analysis
The appearance and color characteristics between GPs and adulterants
The appearance characteristics of GP, adulterated GPs, PCS, PSF, and PWF are shown in Fig. 1A. It is obvious that the color between GP and PCS, PWF and PSF is significantly different through visual observation. In terms of the authentic and adulterated powders, the color characteristics between authentic GP and the GPs with different adulteration levels (10 %, 20 %, 30 %, 40 %, and 50 %) are very similar and cannot be distinguished by naked eyes. For another perspective, at low levels of adulteration, the color of GP tends to mask the color profiles of the adulterants, which makes it difficult to identify the adulterated GPs by visual inspection.To further investigate the color changes of GP dopants, the main color information of pure and adulterated powder was determined by chromaticity analysis technique, and the quantification and characterization of the color properties in different powder were achieved according to CIELAB chromaticity space theory. The chroma data including L*, a*, b*, and ΔE values are listed in Table S1. Generally, the L*, a*, b* values represent the lightness, redness to greenness, yellowness to blueness of the powder samples, respectively. For L* values, all the adulterated GP samples are higher than pure GP, which indicates that the brightness of adulterated samples are increase to some extent. With the adding of adulterated proportions, the L* values of GP adulterants gradually increase, but the magnitude of change is not significant (Fig. 1B). For a* and b* values, the color trends of the adulterated samples are similar with the rise of adulterated percentages (Fig. 1C-D), as shown by the increase of GP adulterated with CS and the decrease of GP adulterated with SF and WF. Compared to GP, the b* values of the adulterated GP samples are all reduced, suggesting that the color of adulterated samples was far from the original yellow of pure ginger powder. The ΔE is an important indicator used to measure the color changes of GP before and after adulterating. As shown in Fig. 1E, the ΔE values of the GP adulterated with CS and SF varied less relative to pure GP, and the GP adulterated with WF varied more compared to pure GP, indicating that the color profiles of GP adulterants with WF changed significantly. It may be attributed to the remarkable differences between pure GP and adulterants (PCS, PSF, and PWF) in a*and b* parameters.
PCA model between the pure and adulterated GPs
To further visualize the difference between authentic and adulterated GPs based on chromaticity analysis, PCA model was established. As an unsupervised pattern recognition method in multivariate statistical analysis, PCA can reduce the dimensionality of complex data by projecting the variables of the dataset into the first few components and offer the objective classification among samples. In this study, the total values of the first two PCs were 50.4 % and 39.3 %, with R2X = 0.897 and Q2 (cum) = 0.616, indicating that the total variation could be better explained and predicted, respectively. As shown in the score plots (Fig. 1F), the GPs could be significantly distinguished from three pure adulterants (PCS, PSF, and PWF), indicating that different types of pure powder were remarkably differed in color properties. The adulterated GP samples and pure GP were categorized into a close cluster in PCA model. In addition, there was no significantly difference between some adulterated GPs and pure GP (Table S1), suggesting that GPs after adulterating cannot be completely identified from authentic powder based on chromaticity measurement. However, color digitization can reflect the accurate and objective information of adulterants and provide a better classification compared to traditional observation by naked eyes.
Qualitative analyses based on FT-NIR spectroscopy
NIR spectra and optimal pretreatment
The raw NIR (10000 cm−1 – 4000 cm−1) spectra of GP, the adulterated GPs with different levels, PCS, PSF and PWF are shown in Fig. 2A. The spectral variations of the samples were similar in the wavelength of 10000–6000 cm−1, and showed significantly different in the range of 6000–4000 cm−1. These average NIR spectra reflects the valuable chemical information of authentic and adulterated powders. The common absorption peaks can be clearly seen around 8350 cm−1, 6930 cm−1, 6352 cm−1, 5762 cm−1, 5179 cm−1, 4381 cm−1, and 4312 cm−1 in GP and adulterated GPs. Generally, the peaks around 8350 cm−1 and 5762 cm−1 are induced by the first and second overtones of C—H stretching, and the peaks at 6930 cm−1 and 6352 cm−1 are assigned to O—H or N—H stretching vibrations in the first overtone (Hong et al., 2019). The band at 5762 cm−1 is related to the first overtone of C—H stretching vibrations (Zhao et al., 2020), and the 5179 cm−1 belongs to the combination of O—H and C—O stretching (Zhang et al., 2021). As shown in Fig. 2B, the spectral signature of GP, PCS, and PWF are similar in absorption bonds. However, the absorption bonds of GP and PSF are different, which specifically reflects in the remarkable absorption peaks at 4751 cm−1, 4852 cm−1, and 4605 cm−1 in PSF. The vibrational differences of the NIRs absorption peaks may be important factors for discrimination of authentic and adulterated powders.
Fig. 2
Average raw NIR spectra of all powder samples (A) and independent pure powder (B); Optimized NIR spectra after MSC + 1st Der + SG pretreatment of all powder samples (C) and independent pure powder (D).
Average raw NIR spectra of all powder samples (A) and independent pure powder (B); Optimized NIR spectra after MSC + 1st Der + SG pretreatment of all powder samples (C) and independent pure powder (D).Although the spectral characteristics between GP and pure adulterants are different, the spectral discrepancy between GP and adulterated GP samples are subtle (Fig. S1) and cannot be identified by visual observation, especially in the low level of adulterations. Therefore, it is necessary to preprocess the spectra of adulterated samples to improve the accuracy of discrimination. The dataset of NIR spectra includes 68 calibration samples and 34 prediction samples. Then, a total of 19 reprocessing methods of NIR spectra were compared in TQ Analyst 9.0 software, and the prediction accuracy and performance index were used to evaluate the quality of the model. Generally, the highest prediction accuracy and performance index indicate the best pretreatment method. As shown in Table S2, the combination of MSC + 1st Der + SG has the best prediction accuracy (100 %) and performance index (93.0), which can be used for modeling analysis for further identification. The optimized NIR spectra are shown in Fig. 2C-D.
Discriminant analyses by PCA and PLS-DA model
An unsupervised PCA model was established between GP and adulterated GPs based on NIRs information. All NIR spectra were preprocessed by optimal combination of MSC + 1st Der + SG. In this model, twelve principal components were fitted, and the 49.0 % and 34.5 % of all data variance were illustrated by PC1 and PC2, respectively. As shown in Fig. 3A, GP samples were obviously separated from three pure adulterants, suggesting that the chemical composition of GP and pure adulterants may be significantly different. The score points of the adulterated GP samples were closed to pure GP and far away from pure adulterants, which indicated that the adulterated GP samples were similar to those of GP samples in spectral profile. It is obvious that GPs adulterated with SF are classified into one cluster, and GPs adulterated with CS and WF are partially overlapped in the same quadrant (Fig. 3B). Recently, many studies reported that gingerols, shogaols, terpenes are the main constituents in ginger, which are also considered as the material basis of its flavor ( Yu et al., 2022, Zhang et al., 2021). These components consist of C—H, O—H and N—H groups with strong multiplicative and synchrotron absorption in the NIR regions (7000 cm−1 –4000 cm−1) (Nagy et al., 2022). However, CS, WF and SF do not contain these characteristic components, which may be the main reason for discrimination of GPs from its adulterations. Although GP and adulterated GPs shared similar bands in the spectrogram, the involvement of classification algorithms amplified the discrepancy and realized the visualization of the classification. In addition, the distribution of GPs adulterated with CS, SF, and WF was according to the different concentration of adulteration, specifically showing that the samples with low percentage of adulteration were close to GP, and the samples with high percentage of adulteration were far from GP, which still need to be validated by further quantitative analyses. These results indicated that the authentic and adulterated GPs can be effectively distinguished by optimized NIR spectral information.
Fig. 3
Score plots of PCA Model (A) of GP, three pure adulterants and adulterated GPs with different adulteration levels (10%, 20%, 30%, 40%, and 50%); score plots of PCA Model (B) of the authentic and adulterated GPs, A-CS: adulterated with corn starch, A-SF: adulterated with soybean flour, A-WF: adulterated with wheat flour; cross-validation results with 200 times of calculations using a permutation test (C); score plots of PLS-DA Model (D) of the authentic and adulterated GPs.
Score plots of PCA Model (A) of GP, three pure adulterants and adulterated GPs with different adulteration levels (10%, 20%, 30%, 40%, and 50%); score plots of PCA Model (B) of the authentic and adulterated GPs, A-CS: adulterated with corn starch, A-SF: adulterated with soybean flour, A-WF: adulterated with wheat flour; cross-validation results with 200 times of calculations using a permutation test (C); score plots of PLS-DA Model (D) of the authentic and adulterated GPs.Compared to PCA model, the supervised PLS-DA model was further conducted for discrimination of GP and its adulteration. The main parameters of R2X and Q2 (cum) were 0.894 and 0.820, respectively, indicating that the model has a strong explanatory and prediction in classification of GP samples from different adulterants. To determine whether the model was overfitting, 200 times of permutation tests were conducted. As shown in Fig. 3C, the intercepts of R2 and Q2 were less than 0.3 and 0.05, respectively. This result enhanced the robustness and persuasiveness of the PLS-DA model. Fig. 3D shows the scatter plot of PLS-DA and exhibited a similar distinction to PCA model. Although pure GP can be discriminated from adulterated GPs, the adulterated samples with different types of adulterants and different levels of concentration were not clearly distinguished, for example, 10 % − 30 % CS and 10 % − 50 % SF were significantly overlapped on both PCA and PLS-DA models, which may be attributed to the limitations in the processing of spectral information of PCA and PLS-DA models.
Discriminant analyses by machine learning algorithms
To verify the accuracy and reliability of the above results and obtain more accurate classification between authentic and adulterated GPs, three pattern recognition algorithms, including SVM, RF, and GB, were developed for in-depth analysis. Generally, SVM is used to classify data sets by maximizing their distance (maximum margin) between data points (support vectors) or finding a separating hyperplane with the best classification (Amirvaresi & Parastar, 2021). RF and GB are two ensemble algorithms based on decision trees which reduce the impact of outliers and the possibility of model overfitting, thus improving the accuracy of discrimination (Han et al., 2021, Sun et al., 2021). All the algorithmic models were established based on the optimal NIR spectra processed by MSC + 1st Der + SG. The receiver operating characteristic (ROC) curves and area under the curve (AUC) were used to evaluate the reliability of these models. The ROC curves and AUC values of three algorithms are shown in Fig. S2. Specifically, the AUC values of SVM, RF, and GB models all reached 1.00. For an ideal classifier, the AUC value is usually close to 1.00 (Lyu et al., 2021), indicating that three models (SVM, RF, and GB) attained optimal operation in this work. Accuracy is another metric used to evaluate the performance of these models, usually, 20–30 % of the total samples are selected as the test samples to validate the training set and derive the discriminative accuracy. In this study, the predictive accuracies of, SVM, RF, and GB models were 87 %, 100 %, and 100 %, respectively, which indicated that three classifiers successfully resolved the differences between authentic and adulterated spectra and exhibited powerful explanatory ability. Confusion matrix diagrams of three algorithms are shown in Fig. 4A. It is clearly visible that GP and adulterated GP samples in RF and GB classifiers were not misclassified. However, GPs adulterated with CS and SF were misjudged to those adulterated with WF in SVM classifier. Precision and recall are two metrics used to assess the accuracy of different algorithms. F1-scores are the combined index of precision and recall, and usually, the best result is close to 1. For the RF and GB classifiers, all recognition rates were 1.000 (Fig. 4B), suggesting the high adaptability of these models. Our results indicated that NIRs combined with machine learning algorithms showed better performance in classification when compared to other models (like PCA and PLS-DA models), and similar results were obtained in the previous research reports (de Santana et al., 2019, Li et al., 2021). In summary, machine learning algorithms were successfully developed to identify the authentic and adulterated GPs in this study, and RF and GB classifiers exhibited the best performance.
Fig. 4
The confusion matrices for SVM, RF, and GB classifiers (A); The algorithm evaluation metrics (precision, recall, and F1-scores) for SVM, RF, and GB classifiers (B).
The confusion matrices for SVM, RF, and GB classifiers (A); The algorithm evaluation metrics (precision, recall, and F1-scores) for SVM, RF, and GB classifiers (B).
Quantitative analyses based on FT-NIR spectroscopy
Spectral pretreatment based on PLSR model
The existence of soybean flour, corn starch and wheat flour in the adulterated GPs was successfully identified and accurately differentiated by PCA, PLS-DA and machine learning algorithms. Subsequently, a quantitative calibration model by PLSR was developed to further predict the concentration of the adulterants in GP samples. Similarly, the spectral preprocessing methods were optimized to improve the capabilities of PLSR model. The quantitative models were constructed to realize the prediction of the additive ratio of adulterants (CS, SF, and WF) and the concentration of pure GP.As shown in Table 1, the best quantitative pretreatment has been marked in bold by comparing of different metrics. Regarding the prediction of adulterated proportion, the combinations including 2nd Der + SG, 1st Der + SG, and SNV + 2nd Der + SG exhibited the higher RPD values of 13.68, 14.61 and 4.30, and yielded lower RMSEP values of 0.0109, 0.0102 and 0.0347 in the validation for models of SF, WF and CS, respectively. These results indicate that the PLSR model has the best predictive effect after preprocessing. In addition, the coefficients of determination obtained during calibration (R2), prediction (R2), and cross-validation (R2) are close to each other, which indicates that the established models are not underfitted and overfitted (Pandiselvam et al., 2022). Hence, the above preprocessing methods were selected to construct the calibration model for three adulterants. Meanwhile, we also carried out the prediction of the true concentration of pure GP in different adulterated GPs. Although the pretreatment method of MSC + 2nd Der + SG and SNV + 2nd Der + SG exhibited the same RPD (8.92) and R2 (0.9939), the RMSECV value of MSC + 2nd Der + SG (0.0176) performed a little better than SNV + 2nd Der + SG (0.0177). Thus, the NIR spectra processed by MSC + 2nd Der + SG were effective in the prediction of pure GP. Furthermore, the RPD values of the quantitative models established after spectral pretreatment were higher than those quantitative models without preprocessing. Overall, the spectral pretreatment methods screened in this study improved the performance of the PLSR model for quantitative analysis of the concentrations in GPs and adulterants.
Table 1
Parameters of PLSR models for the determination of the concentrations of adulterations (SF, WF, and CS) and the purity of ginger powder using FT-NIR based on different pretreated methods.
Adulteration Scenario
Processing methods
LVa
R2c
RMSEC
R2p
RMSEP
R2cv
RMSECV
RPD
Adulterated with SF
RAW
2
0.9584
0.0404
0.9780
0.0324
0.9793
0.0481
4.60
1st Der
1
0.9971
0.0108
0.9971
0.0110
0.9933
0.0250
13.55
1st Der + SG
1
0.9971
0.0108
0.9971
0.0110
0.9933
0.0279
13.55
2nd Der
1
0.9997
0.0033
0.9967
0.0130
0.9078
0.1160
11.47
2nd Der + SG
1
0.9966
0.0116
0.9971
0.0109
0.9937
0.0307
13.68
MSC + 1st Der
1
0.9960
0.0126
0.9958
0.0133
0.9908
0.0333
11.21
MSC + 1st Der + SG
1
0.9960
0.0126
0.9958
0.0133
0.9909
0.0369
11.21
MSC + 2nd Der
2
0.9995
0.0043
0.9955
0.0152
0.8583
0.1210
9.81
MSC + 2nd Der + SG
1
0.9956
0.0133
0.9961
0.0126
0.9902
0.0337
11.83
SNV + 1st Der
1
0.9960
0.0126
0.9958
0.0133
0.9908
0.0331
11.21
SNV + 1st Der + SG
1
0.9960
0.0126
0.9958
0.0133
0.9909
0.0367
11.21
SNV + 2nd Der
2
0.9995
0.0043
0.9955
0.0152
0.8580
0.1210
9.81
SNV + 2nd Der + SG
1
0.9956
0.0133
0.9961
0.0126
0.9902
0.0337
11.83
Adulterated with WF
RAW
2
0.9623
0.0358
0.9657
0.0453
0.8829
0.0879
3.29
1st Der
2
0.9984
0.0078
0.9961
0.0135
0.9958
0.0363
11.04
1st Der + SG
3
0.9993
0.0053
0.9975
0.0102
0.9978
0.0215
14.61
2nd Der
1
0.9895
0.0205
0.9168
0.1010
0.8444
0.2080
1.48
2nd Der + SG
2
0.9982
0.0085
0.9973
0.0120
0.9869
0.0702
12.42
MSC + 1st Der
1
0.9984
0.0078
0.9971
0.0111
0.9969
0.0324
13.43
MSC + 1st Der + SG
1
0.9985
0.0078
0.9971
0.0109
0.9981
0.0153
13.68
MSC + 2nd Der
1
0.9874
0.0224
0.9234
0.1030
0.8434
0.2470
1.45
MSC + 2nd Der + SG
1
0.9976
0.0098
0.9975
0.0113
0.9886
0.0667
13.19
SNV + 1st Der
1
0.9984
0.0078
0.9971
0.0111
0.9969
0.0324
13.43
SNV + 1st Der + SG
1
0.9985
0.0078
0.9971
0.0109
0.9981
0.0153
13.68
SNV + 2nd Der
1
0.9874
0.0224
0.9234
0.1030
0.8434
0.2470
1.45
SNV + 2nd Der + SG
1
0.9976
0.0098
0.9975
0.0112
0.9886
0.0667
13.31
Adulterated with CS
RAW
2
0.9623
0.0385
0.9617
0.0392
0.9508
0.0753
3.80
1st Der
2
0.9938
0.0157
0.9641
0.0397
0.9873
0.0595
3.75
1st Der + SG
3
0.9989
0.0067
0.9521
0.0455
0.9929
0.0272
3.28
2nd Der
2
0.9980
0.0089
0.9238
0.0579
0.4579
0.1750
2.57
2nd Der + SG
3
0.9986
0.0073
0.9723
0.0351
0.9756
0.0826
4.25
MSC + 1st Der
2
0.9983
0.0081
0.9666
0.0405
0.9939
0.0222
3.68
MSC + 1st Der + SG
1
0.9904
0.0154
0.9696
0.0369
0.9905
0.0460
4.04
MSC + 2nd Der
1
0.9205
0.0552
0.8928
0.0653
0.4183
0.1390
2.28
MSC + 2nd Der + SG
2
0.9973
0.0104
0.9760
0.0348
0.9815
0.0664
4.28
SNV + 1st Der
2
0.9984
0.0081
0.9666
0.0404
0.9938
0.0224
3.69
SNV + 1st Der + SG
2
0.9940
0.0154
0.9696
0.0369
0.9903
0.0468
4.04
SNV + 2nd Der
1
0.9204
0.0553
0.8927
0.0653
0.4164
0.1390
2.28
SNV + 2nd Der + SG
3
0.9974
0.0103
0.9759
0.0347
0.9810
0.0672
4.30
GP proportion
RAW
4
0.8923
0.0742
0.8917
0.0748
0.8555
0.0857
2.23
1st Der
4
0.9959
0.0148
0.9888
0.0247
0.9806
0.0325
6.75
1st Der + SG
4
0.9965
0.0137
0.9921
0.0210
0.9894
0.0250
7.94
2nd Der
3
0.9752
0.0364
0.9656
0.0610
0.7939
0.1210
2.73
2nd Der + SG
3
0.9879
0.0225
0.9904
0.0251
0.9669
0.0425
6.64
MSC + 1st Der
5
0.9974
0.0119
0.9908
0.0225
0.9911
0.0222
7.41
MSC + 1st Der + SG
4
0.9967
0.0134
0.9917
0.0215
0.9934
0.0196
7.76
MSC + 2nd Der
4
0.9988
0.0082
0.9893
0.0519
0.7432
0.126
3.21
MSC + 2nd Der + SG
4
0.9969
0.0130
0.9939
0.0187
0.9945
0.0176
8.92
SNV + 1st Der
5
0.9974
0.0119
0.9908
0.0225
0.9909
0.0224
7.41
SNV + 1st Der + SG
4
0.9966
0.0135
0.9918
0.0214
0.9934
0.0196
7.79
SNV + 2nd Der
4
0.9987
0.0082
0.9893
0.0519
0.7439
0.1260
3.21
SNV + 2nd Der + SG
4
0.9968
0.0130
0.9939
0.0187
0.9945
0.0177
8.92
Notes: LVa: Latent Variable.
Parameters of PLSR models for the determination of the concentrations of adulterations (SF, WF, and CS) and the purity of ginger powder using FT-NIR based on different pretreated methods.Notes: LVa: Latent Variable.
Regression curves building
Based on the above pretreatment methods, the optimized spectra of GP samples were obtained, and the quantitative regression curves were constructed by combining the real contents of GPs and adulterants. The regression curves of PLSR models for adulteration of SF, WF, and CS are shown in Fig. 5. Generally, the regression line represents the most desirable result in a quantitative model, and that the scatter points close to this line indicates the model is excellent (Sun et al., 2021, Yang et al., 2017). In this study, all the data points in calibration set and prediction set are tightly clustered around the diagonal lines, suggesting that the established PLSR models have perfect performance in content prediction of adulterants. Moreover, Fig. 5 also shows the successful prediction of the quantitative model for actual concentration of GP, which further validates the reliability of the PLSR model for content prediction base on NIR spectra.
Fig. 5
Regression curves of actual and predicted adulteration levels of SF, WF and CS, and the real concentration of GP by PLSR model.
Regression curves of actual and predicted adulteration levels of SF, WF and CS, and the real concentration of GP by PLSR model.
Conclusion
In the present study, NIR spectroscopy combined with chemometrics were first used to identify and quantify adulterated GPs with three types of adulterants. The chromaticity analysis results showed that the color properties could not realize a complete identification between adulterated and non-adulterated GPs. Further analysis was performed using NIR spectroscopy. Both PCA and PLS-DA models could realize initial division between pure GP and adulterated GPs. Several deep learning algorithms (SVM, RF, and GB) could realize the better classification among GP adulteration, and RF and GB algorithms led to the highest accuracy of 100 %. Subsequently, the PLSR models were built to further determine the adulterated levels of GP adulteration. Four content prediction models, including pure GP and GPs adulterated with SF, WF and CS, were fitted based on the optimal pretreatment methods (MSC + 2nd Der + SG, 2nd Der + SG, 1st Der + SG, and SNV + 2nd Der + SG), with RPD values of 8.92, 13.68, 14.61, and 4.30, respectively. The regression curves also showed that four quantitative models of PLSR exhibited good linearity and precision with low RMSECV (0.0176, 0.0307, 0.0215, and 0.0672). Overall, NIR spectroscopy combined with chemometrics were found to be a useful tool for classification and quantification between GP and its adulterants. This method can quickly track the adulteration of GP, which ensure the authenticity of the GP and maintain the stability of the consumer market.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.