| Literature DB >> 31337084 |
Yi-Fei Pei1,2, Zhi-Tian Zuo1, Qing-Zhi Zhang3, Yuan-Zhong Wang4.
Abstract
Origin traceability is important for controlling the effect of Chinese medicinal materials and Chinese patent medicines. Paris polyphylla var. yunnanensis is widely distributed and well-known all over the world. In our study, two spectroscopic techniques (Fourier transform mid-infrared (FT-MIR) and near-infrared (NIR)) were applied for the geographical origin traceability of 196 wild P. yunnanensis samples combined with low-, mid-, and high-level data fusion strategies. Partial least squares discriminant analysis (PLS-DA) and random forest (RF) were used to establish classification models. Feature variables extraction (principal component analysis-PCA) and important variables selection models (recursive feature elimination and Boruta) were applied for geographical origin traceability, while the classification ability of models with the former model is better than with the latter. FT-MIR spectra are considered to contribute more than NIR spectra. Besides, the result of high-level data fusion based on principal components (PCs) feature variables extraction is satisfactory with an accuracy of 100%. Hence, data fusion of FT-MIR and NIR signals can effectively identify the geographical origin of wild P. yunnanensis.Entities:
Keywords: Fourier transform mid-infrared spectroscopy; Paris polyphylla var. yunnanensis; data fusion; near-infrared spectroscopy; origin traceability
Year: 2019 PMID: 31337084 PMCID: PMC6680555 DOI: 10.3390/molecules24142559
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Averaged spectra of P. yunnanensis samples collected from five regions: (a) Fourier transform mid-infrared (FT-MIR) spectra; (b) near-infrared (NIR) spectra.
Peak assignments on the FT-MIR and NIR spectra of wild P. yunnanensis.
| Spectral Type | Wavenumber (cm−1) | Base Group and Vibration Mode | Contribution |
|---|---|---|---|
| NIR | 8347 | C–H, N–H and O–H stretching vibration mode | CH2, saccharides, and glycosides |
| 7256 | C–H stretching and deformation vibration mode | CH2 | |
| 6950 | C–H, N–H and O–H stretching vibration mode | CH2, saccharides, and glycosides | |
| 6324 | C–H, N–H and O–H stretching vibration mode | CH2, saccharides, and glycosides | |
| 5686 | C–H, N–H and O–H stretching vibration mode | CH2, saccharides, and glycosides | |
| 5169 | C–H, N–H and O–H and hydrogen bond stretching vibration mode | CH2, saccharides, glycosides, and water molecule | |
| FT-MIR | 3382 | O–H asymmetric and hydrogen bond stretching vibration mode | Saccharides, glycosides, and water molecule |
| 3334 | O–H asymmetric and hydrogen bond stretching vibration mode | Saccharides, glycosides, and water molecule | |
| 2930 | C–H asymmetric stretching vibration mode | CH2 and CH3 | |
| 1743 | C═O stretching vibration mode | Free carboxyl groups of pectins or/and fatty acids | |
| 1653 | asymmetric stretching vibrations of carboxyl groups participating in the hydrogen bonds and hydrogen bond scissoring vibration mode | Flavonoids, saccharides, steroid saponin, and water molecules | |
| 1610 | COO symmetric normal vibrations mode | The carboxyl group present in pectin | |
| 1456 | CH3 asymmetric deformation and CH2 scissoring vibration | CH2 and CH3 | |
| 1414 | C–H symmetric bending vibration mode and OH–O in-plane bending mode | CH2 | |
| 1370 | C–H symmetric deformation vibration mode | CH3 | |
| 1242 | C–O stretching vibration mode | Saccharides and oils | |
| 1150 | C–C and C–O stretching and C–OH bending vibration mode | Saccharides and glycosides | |
| 1078 | C–C and C–O stretching and C–OH bending vibration mode | Saccharides and glycosides | |
| 1020 | C–C and C–O stretching and C–OH bending vibration mode | Saccharides and glycosides | |
| 922 | Sugar skeleton vibration mode | Saccharides |
The classification efficiency values and total accuracy of independent decision making with Partial least squares discriminant analysis (PLS-DA) and random forest (RF) models. RFE: Recursive feature elimination.
| Model | Calibration Set | Validation Set | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Class1 | Class2 | Class3 | Class4 | Class5 | Accuracy | Class1 | Class2 | Class3 | Class4 | Class5 | Accuracy | ||
| FT-MIR | PLS-DA | 0.961 | 1.000 | 0.995 | 0.981 | 0.990 | 97.66% | 1.000 | 0.991 | 0.913 | 1.000 | 0.991 | 97.06% |
| RF | 0.772 | 0.888 | 0.801 | 0.829 | 0.790 | 71.88% | 0.886 | 0.964 | 0.973 | 1.000 | 0.946 | 92.65% | |
| NIR | PLS-DA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 100% | 0.870 | 0.964 | 0.940 | 0.936 | 0.964 | 89.71% |
| RF | 0.803 | 0.854 | 0.775 | 0.837 | 0.834 | 72.66% | 0.813 | 0.917 | 0.491 | 0.923 | 0.955 | 76.47% | |
| FT-MIR (RFE) | PLS-DA | 0.911 | 0.990 | 0.881 | 0.961 | 0.975 | 91.41% | 0.794 | 1.000 | 0.694 | 0.955 | 0.923 | 82.35% |
| RF | 0.947 | 0.951 | 0.853 | 0.876 | 0.942 | 86.72% | 0.845 | 0.917 | 0.964 | 0.964 | 0.936 | 88.24% | |
| FT-MIR (Bo) | PLS-DA | 0.951 | 0.961 | 0.995 | 0.961 | 0.985 | 95.31% | 0.886 | 0.991 | 0.905 | 0.964 | 0.962 | 91.18% |
| RF | 0.890 | 0.942 | 0.829 | 0.881 | 0.922 | 83.59% | 0.886 | 0.964 | 0.973 | 1.000 | 0.946 | 92.65% | |
| FT-MIR (PCs) | PLS-DA | 0.906 | 0.911 | 0.868 | 0.927 | 0.863 | 83.59% | 0.926 | 0.991 | 0.843 | 0.926 | 0.972 | 89.71% |
| RF | 0.780 | 0.922 | 0.730 | 0.764 | 0.772 | 68.75% | 0.964 | 1.000 | 0.991 | 0.917 | 0.991 | 95.59% | |
| NIR (RFE) | PLS-DA | 0.807 | 0.922 | 0.926 | 0.902 | 0.966 | 85.16% | 0.779 | 0.845 | 0.675 | 0.891 | 0.953 | 75% |
| RF | 0.733 | 0.888 | 0.791 | 0.888 | 0.942 | 77.34% | 0.813 | 0.964 | 0.567 | 0.889 | 0.955 | 77.94% | |
| NIR (Bo) | PLS-DA | 0.807 | 0.922 | 0.858 | 0.906 | 0.947 | 82.81% | 0.779 | 0.837 | 0.551 | 0.962 | 0.962 | 75% |
| RF | 0.729 | 0.878 | 0.764 | 0.893 | 0.922 | 75.78% | 0.772 | 0.878 | 0.486 | 0.870 | 0.955 | 72.06% | |
| NIR (PCs) | PLS-DA | 0.860 | 0.937 | 0.974 | 0.915 | 0.990 | 89.84% | 0.927 | 0.926 | 0.991 | 0.878 | 0.955 | 90% |
| RF | 0.745 | 0.922 | 0.881 | 0.881 | 0.951 | 81.25% | 0.955 | 0.964 | 1.000 | 1.000 | 0.991 | 97.06% | |
Bo: Boruta, PCs: Principal components.
Figure 2The parameter optimization of random forest models of independent decision making: (a) number of trees (ntree) of the FT-MIR dataset; (b) number of variables (mtry) of the FT-MIR dataset; (c) ntree of the NIR dataset; (d) mtry of the NIR dataset.
Figure 3The 10-fold cross-validation error rates of the Random Forest (RF) model (sequentially reduced every five variables) based on total P. yunnanensis samples: (a) FT-MIR dataset; (b) NIR dataset.
Figure 4The important variables of Boruta algorithm and RFE algorithm of random forest models based on total P. yunnanensis samples: (a,b) the FT-MIR dataset; (c,d) the NIR dataset. RFE: Recursive feature elimination.
The classification efficiency values and total accuracy of low-, mid-, and high-level data fusion strategies decision making with PLS-DA and RF models. RFE: Recursive feature elimination, Bo: Boruta, PCs: Principal components, VIP: Variable importance in the projection.
| Model | Calibration Set | Validation Set | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Class1 | Class2 | Class3 | Class4 | Class5 | Accuracy | Class1 | Class2 | Class3 | Class4 | Class5 | Accuracy | ||
| Low-level | PLS-DA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 100% | 0.926 | 1.000 | 0.949 | 1.000 | 0.981 | 95.59% |
| RF | 0.872 | 0.885 | 0.858 | 0.897 | 0.932 | 82.81% | 0.886 | 1.000 | 0.905 | 0.972 | 0.991 | 92.65% | |
| Low-level (VIP) | PLS-DA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 100% | 0.926 | 1.000 | 0.991 | 1.000 | 0.991 | 97.06% |
| RF | 0.927 | 0.927 | 0.849 | 0.922 | 0.947 | 86.72% | 0.926 | 1.000 | 0.991 | 1.000 | 0.991 | 97.06% | |
| Mid-level (RFE) | PLS-DA | 0.946 | 0.712 | 0.764 | 0.864 | 0.878 | 75% | 0.705 | 0.794 | 0.000 | 0.727 | 0.798 | 55.88% |
| RF | 0.966 | 0.888 | 0.868 | 0.894 | 0.881 | 84.38% | 0.764 | 0.861 | 0.491 | 0.900 | 0.854 | 69.12% | |
| Mid-level (Bo) | PLS-DA | 0.961 | 1.000 | 1.000 | 0.995 | 0.995 | 98.44% | 0.926 | 1.000 | 0.991 | 0.991 | 1.000 | 97.06% |
| RF | 0.947 | 0.942 | 0.885 | 0.922 | 0.951 | 89.06% | 0.926 | 1.000 | 0.949 | 0.991 | 0.991 | 95.59% | |
| Mid-level (PCs) | PLS-DA | 0.951 | 0.961 | 0.974 | 0.995 | 0.995 | 96.09% | 1.000 | 1.000 | 0.957 | 0.991 | 1.000 | 98.53% |
| RF | 0.927 | 0.951 | 0.922 | 0.922 | 0.981 | 90.63% | 0.886 | 1.000 | 0.991 | 0.981 | 0.955 | 94.12% | |
| High-level (RFE) | PLS-DA | 0.951 | 0.981 | 0.953 | 1.000 | 0.990 | 96.09% | 0.870 | 1.000 | 0.802 | 0.991 | 0.981 | 89.71% |
| RF | 0.976 | 0.976 | 0.904 | 0.922 | 0.995 | 91.21% | 0.926 | 1.000 | 0.991 | 1.000 | 0.991 | 97.06% | |
| High-level (Bo) | PLS-DA | 0.976 | 0.981 | 0.979 | 1.000 | 0.990 | 97.66% | 0.926 | 0.991 | 0.850 | 1.000 | 0.981 | 92.65% |
| RF | 0.966 | 0.976 | 0.904 | 0.902 | 0.951 | 90.63% | 0.917 | 0.964 | 0.850 | 0.972 | 1.000 | 91.18% | |
| High-level (PCs) | PLS-DA | 0.981 | 1.000 | 0.990 | 0.981 | 1.000 | 98.44% | 0.964 | 1.000 | 1.000 | 0.991 | 1.000 | 98.53% |
| RF | 0.881 | 0.971 | 0.872 | 0.911 | 0.961 | 87.5% | 0.966 | 1.000 | 1.000 | 0.991 | 1.000 | 100% | |
Figure 5Location distribution of wild P. yunnanensis samples in central, western, northwest, southeast, and southwest areas, Yunnan Province.
Figure 6Scheme of the low-, mid-, and high-level data fusion approaches used to combine the FT-MIR signals and NIR signals.