| Literature DB >> 32210010 |
Tao Shen1,2,3, Hong Yu1,2, Yuan-Zhong Wang4.
Abstract
Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000-4000 cm-1) and Fourier transform mid-infrared (MIR: 4000-600 cm-1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen's kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.Entities:
Keywords: FT-MIR; Gentiana; NIR; chemometrics; feature selection; species identification; stacked generalization
Mesh:
Year: 2020 PMID: 32210010 PMCID: PMC7144467 DOI: 10.3390/molecules25061442
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Raw near-infrared (NIR) (A) and Fourier transform mid-infrared (FT-MIR) (B) spectra of 180 samples of G. rigescens and its related species.
Figure 2Averaged NIR spectra of 18 species of Gentiana (A), (B) and Tripterospermum species (C).
Figure 3Averaged FT-MIR spectra of 18 Gentiana (A), (B), and Tripterospermum species (C).
Figure 4Score plots of PCA for 180 samples using NIR spectra after pretreatment (A) score plot of PC1 vs. PC2, (B) score plot of PC1 vs. PC3. The meaning of the codes (1–18) could be found in the sample information.
Figure 5Score plots of PCA for 180 samples using FT-MIR spectra after pretreatment (A) score plot of PC1 vs. PC2, (B) score plot of PC1 vs. PC3. The meaning of the codes (1–18) could be found in the sample information.
The major parameters of random forests (RF) model based on NIR full spectra data.
| Class | Calibration Set | Validation Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC (%) | SE | SP | MCC | EFF | ACC (%) | SE | SP | MCC | EFF | |
| 1 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 3 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 5 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 6 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 7 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 8 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 0.75 | 1.00 | 0.86 | 0.87 |
| 9 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 10 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 11 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 12 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 13 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 14 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 15 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 16 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 17 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
The major parameters of RF model based on FT-MIR full spectra data.
| Class | Calibration Set | Validation Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC (%) | SE | SP | MCC | EFF | ACC (%) | SE | SP | MCC | EFF | |
| 1 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 3 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 5 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 6 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 7 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 8 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 0.75 | 1.00 | 0.86 | 0.87 |
| 9 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 10 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 11 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 12 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 0.75 | 1.00 | 0.86 | 0.87 |
| 13 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 14 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 15 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 16 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.50 | 1.00 | 0.70 | 0.71 |
| 17 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The major parameters of SVM model based on NIR full spectra data.
| Class | Calibration Set | Validation Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC (%) | SE | SP | MCC | EFF | ACC (%) | SE | SP | MCC | EFF | |
| 1 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 0.75 | 1.00 | 0.86 | 0.87 |
| 2 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 3 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 5 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 6 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 7 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 8 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 0.75 | 1.00 | 0.86 | 0.87 |
| 9 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 10 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 11 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 12 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 13 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 14 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 15 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 16 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 17 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
The major parameters of SVM model based on FT-MIR full spectra data.
| Class | Calibration Set | Validation Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC (%) | SE | SP | MCC | EFF | ACC (%) | SE | SP | MCC | EFF | |
| 1 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 3 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 5 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 6 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 7 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 8 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 9 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 10 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 11 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 12 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 13 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 14 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 15 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 16 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 17 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
The major parameters of K-nearest neighbors (KNN) model based on NIR full spectra data.
| Class | Calibration Set | Validation Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC (%) | SE | SP | MCC | EFF | ACC (%) | SE | SP | MCC | EFF | |
| 1 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 2 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.50 | 1.00 | 0.70 | 0.71 |
| 3 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 5 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 95.83 | 0.75 | 0.97 | 0.65 | 0.85 |
| 6 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 7 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 0.75 | 1.00 | 0.86 | 0.87 |
| 8 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 9 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 10 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 11 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 12 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 95.83 | 0.75 | 0.97 | 0.65 | 0.85 |
| 13 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 14 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 15 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 16 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
| 17 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.75 | 0.99 | 0.74 | 0.86 |
The major parameters of KNN model based on FT-MIR full spectra data.
| Class | Calibration Set | Validation Set | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC (%) | SE | SP | MCC | EFF | ACC (%) | SE | SP | MCC | EFF | |
| 1 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 3 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 5 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 6 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 7 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 8 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 1.00 | 0.97 | 0.80 | 0.99 |
| 9 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.50 | 1.00 | 0.70 | 0.71 |
| 10 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 11 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 98.61 | 1.00 | 0.99 | 0.89 | 0.99 |
| 12 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 13 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 14 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 15 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 16 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 97.22 | 0.50 | 1.00 | 0.70 | 0.71 |
| 17 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 | 100.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Figure 6Feature selection strategies in the study.
Figure 7Size of feature variables of the four algorithms (A) feature selection of NIR spectroscopy, (B) feature selection of FT-MIR spectroscopy.
Figure 8Venn diagram representing the overlap of the selected feature variables by variable importance in projection (VIP), Boruta, genetic algorithm combined with random forest (GARF), and genetic algorithm combined with support vector machine (GASVM) algorithms (A) Venn diagram calculate based on feature selection results of NIR variables, (B) Venn diagram calculate based on feature selection results of FT-MIR variables.
The major parameters (accuracy and kappa) of classification models based on different NIR feature variables.
| Model | Hyperparameters | Calibration Set | Validation Set | |
|---|---|---|---|---|
| Total ACC (%) | Total ACC (%) | K | ||
| VIP-NIR-RF | 100 | 97.22 | 0.97 | |
| Bor-NIR-RF | 100 | 91.67 | 0.91 | |
| GARF-NIR-RF | 100 | 91.67 | 0.91 | |
| GASVM-NIR-RF | 100 | 91.67 | 0.91 | |
| Ven-NIR-RF | 100 | 94.44 | 0.94 | |
| VIP-NIR-SVM | 100 | 97.22 | 0.97 | |
| Bor-NIR-SVM | 100 | 98.61 | 0.99 | |
| GARF-NIR-SVM | 100 | 93.06 | 0.93 | |
| GASVM-NIR-SVM | 100 | 91.67 | 0.91 | |
| Ven-NIR-SVM | 100 | 98.61 | 0.99 | |
| VIP-NIR-KNN | 100 | 95.83 | 0.96 | |
| Bor-NIR-KNN | 100 | 94.44 | 0.94 | |
| GARF-NIR-KNN | 100 | 87.50 | 0.87 | |
| GASVM-NIR-KNN | 100 | 88.89 | 0.88 | |
| Ven-NIR-KNN | 100 | 94.44 | 0.94 | |
Note: VIP-NIR, Bor-NIR, GARF-NIR, GASVM-NIR and Ven-NIR were feature subsets of NIR extracted by VIP, Boruta, GARF, SVM and their common overlap variables.
The major parameters (accuracy and kappa) of classification models based on different FT-MIR feature variables.
| Model | Hyperparameter | Calibration Set | Validation Set | |
|---|---|---|---|---|
| Total ACC (%) | Total ACC (%) | K | ||
| VIP-MIR-RF | 100 | 97.22 | 0.97 | |
| Bor-MIR-RF | 100 | 95.83 | 0.96 | |
| GARF-MIR-RF | 100 | 95.83 | 0.96 | |
| GASVM-MIR-RF | 100 | 94.44 | 0.94 | |
| Ven-MIR-RF | 100 | 98.61 | 0.99 | |
| VIP-MIR-SVM | 100 | 100 | 1.00 | |
| Bor-MIR-SVM | 100 | 100 | 1.00 | |
| GARF-MIR-SVM | 100 | 100 | 1.00 | |
| GASVM-MIR-SVM | 100 | 100 | 1.00 | |
| Ven-MIR-SVM | 100 | 98.61 | 0.99 | |
| VIP-MIR-KNN | 100 | 98.61 | 0.99 | |
| Bor-MIR-KNN | 100 | 97.22 | 0.97 | |
| GARF-MIR-KNN | 100 | 95.83 | 0.96 | |
| GASVM-MIR-KNN | 100 | 94.44 | 0.94 | |
| Ven-MIR-KNN | 100 | 97.22 | 0.97 | |
Note: VIP-MIR, Bor-MIR, GARF-MIR, GASVM-MIR and Ven-MIR were feature subsets of FT-MIR extracted by VIP, Boruta, GARF, SVM, and their common overlap variables.
The major parameters (accuracy and kappa) of the stacking models.
| Scenario | Data Set | Model | Level 1 | Calibration Set | Validation Set | |
|---|---|---|---|---|---|---|
| Total ACC (%) | Total ACC (%) | K | ||||
| A | Ven-NIR | SG-Ven-NIR- RF | RF | 100.00 | 98.61 | 0.99 |
| B | Ven-NIR | SG-Ven-NIR- SVM | SVM | 100.00 | 97.22 | 0.97 |
| C | Ven-NIR | SG-Ven-NIR- KNN | KNN | 100.00 | 95.83 | 0.96 |
| D | Ven-MIR | SG-Ven-MIR- RF | RF | 100.00 | 94.44 | 0.94 |
| E | Ven-MIR | SG-Ven-MIR- SVM | SVM | 100.00 | 100.00 | 1.00 |
| F | Ven-MIR | SG-Ven-MIR- KNN | KNN | 100.00 | 90.28 | 0.90 |
Note: base learners (level-0) of all stacking models were RF and SNV models
Figure 9Stacked generalization in the study.
Figure 10The low-level and mid-level data fusion strategies in the study.
The major parameters (accuracy and kappa) of the data fusion models.
| Data Fusion Strategy | Number of Variables | Models | Calibration Set | Validation Set | |
|---|---|---|---|---|---|
| Total ACC (%) | Total ACC (%) | K | |||
| Low-level fusion | 2701 | Low-RF | 100.00 | 97.22 | 0.97 |
| Low-level fusion | 2701 | Low-SVM | 100.00 | 100.00 | 1.00 |
| Low-level fusion | 2701 | Low-KNN | 100.00 | 97.22 | 0.97 |
| Mid-level fusion | 174 | Mid-RF | 100.00 | 100.00 | 1.00 |
| Mid-level fusion | 174 | Mid-SVM | 100.00 | 100.00 | 1.00 |
| Mid-level fusion | 174 | Mid-KNN | 100.00 | 100.00 | 1.00 |
Figure 11Medicinal Gentiana and its relatives in the study.
Source of 180 Gentian and Tripterospermum species samples.
| Class | Genus | Species | Geographical Location |
|---|---|---|---|
| 1 |
|
| Yongde, Lincang, Yunnan, China |
| 2 |
|
| Xuyong, Luzhou, Sichuan, China |
| 3 |
|
| Jianghua, Yongzhou, Hunan, China |
| 4 |
| Songpan, Aba, Sichuan, China | |
| 5 |
|
| Songpan, Aba, Sichuan, China |
| 6 |
|
| Lanping, Nujiang, Yunnan, China |
| 7 |
|
| Jianghua, Yongzhou, Hunan, China |
| 8 |
|
| Liping, QianDong-nan, Guizhou, China |
| 9 |
|
| Liping, QianDong-nan, Guizhou, China |
| 10 |
|
| Ningqiang, Hanzhong, Shaanxi, China |
| 11 |
|
| Songpan, Aba, Sichuan, China |
| 12 |
|
| Songpan, Aba, Sichuan, China |
| 13 |
|
| Xianfeng, Enshi, Hubei, China |
| 14 |
| Wufeng, Yichang, Hubei, China | |
| 15 |
|
| Nayong, Bijie, Guizhou, China |
| 16 |
|
| Songpan, Aba, Sichuan, China |
| 17 |
|
| Tonggu, Yichun, Jiangxi, China |
| 18 |
|
| Tonggu, Yichun, Jiangxi, China |
Sample information including their application in southwest of China.
| Species | Chinese Name | Disease | Ch.P. |
|
| Dian Longdan | heat-clearing, liver protection, icterohepatitis, Japanese encephalitis, cephalalgia, swelling and pain of eye [ | listed (2015 edition) [ |
|
| Tou hua Longdan | heat-clearing, icterohepatitis | unlisted |
|
| Wu ling Longdan | heat-clearing, urinary tract infection, conjunctivitis [ | unlisted |
| Xian ye Longdan | trachitis, cough, smallpox [ | unlisted | |
|
| Shi e Longdan | none reported | unlisted |
|
| Cu jing qin jiao | heat-clearing, icterohepatitis, hematochezia, rheumatism [ | listed (2015 edition) [ |
|
| Hua nan Longdan | heat-clearing, icterohepatitis, diarrhea, swelling and pain of eye [ | unlisted |
|
| Fu gen Longdan | none reported | unlisted |
|
| Cao dian Longdan | heat-clearing, detumescence analgesic [ | unlisted |
|
| Shan nan Longdan | none reported | unlisted |
|
| Lin ye Longdan | heat-clearing, acute appendicitis, swelling and pain of eye [ | unlisted |
|
| Jia lin ye Longdan | none reported | unlisted |
|
| Shen hong Longdan | dyspepsia, bone fracture, snakebite, diminish inflammation [ | unlisted |
| Xiao fan lu ye Longdan | none reported | unlisted | |
|
| Hong hua Longdan | heat-clearing, diminish inflammation, urinary tract infection, cold, icterohepatitis, diarrhea, scald [ | listed (2015 edition) [ |
|
| Tiao wen Longdan | none reported | unlisted |
|
| Shuang hudie | heat-clearing, phthisis, pulmonary abscess, irregular menstruation [ | unlisted |
|
| E mei Shuang hudie | bone fracture [ | unlisted |
Figure 12ZnSe ATR accessory (left) and the metal O-ring (right) in the study.